ENTRIES TAGGED "Big Data"

Four short links: 22 May 2014

Four short links: 22 May 2014

Local Clusters, Pancoopticon, Indie Oversupply, and Open Source PDF

  1. Ferryhelps you create big data clusters on your local machine. Define your big data stack using YAML and share your application with Dockerfiles. Ferry supports Hadoop, Cassandra, Spark, GlusterFS, and Open MPI.
  2. What Google Told SECFor example, a few years from now, we and other companies could be serving ads and other content on refrigerators, car dashboards, thermostats, glasses, and watches, to name just a few possibilities. The only thing they make that people want to buy is the ad space around what you’re actually trying to do.
  3. The Indie Bubble is Popping (Jeff Vogel) — gamers’ budgets and the number of hours in the day to play games are not increasing at the rate at which the number of games on the market is increasing.
  4. pdfium — Chrome’s PDF engine, open source.
Comment
Four short links: 1 May 2014

Four short links: 1 May 2014

Cloud Jurisdiction, Driverless Cars, Robotics IPOs, and Fitting a Catalytic Convertor to Your Data Exhaust

  1. US Providers Must Divulge from Offshore Servers (Gigaom) — A U.S. magistrate judge ruled that U.S. cloud vendors must fork over customer data even if that data resides in data centers outside the country. (via Alistair Croll)
  2. Inside Google’s Self-Driving Car (Atlantic Cities) — Urmson says the value of maps is one of the key insights that emerged from the DARPA challenges. They give the car a baseline expectation of its environment; they’re the difference between the car opening its eyes in a completely new place and having some prior idea what’s going on around it. This is a long and interesting piece on the experience and the creator’s concerns around the self-driving cars. Still looking for the comprehensive piece on the subject.
  3. Recent Robotics-Relate IPOs — not all the exits are to Google.
  4. How One Woman Hid Her Pregnancy From Big Data (Mashable) — “I really couldn’t have done it without Tor, because Tor was really the only way to manage totally untraceable browsing. I know it’s gotten a bad reputation for Bitcoin trading and buying drugs online, but I used it for BabyCenter.com.”
Comment
Four short links: 23 April 2014

Four short links: 23 April 2014

Mobile UX, Ideation Tools, Causal Consistency, and Intellectual Ventures Patent Fail

  1. Samsung UX (Scribd) — little shop of self-catalogued UX horrors, courtesy discovery in a lawsuit. Dated (Android G1 as competition) but rewarding to see there are signs of self-awareness in the companies that inflict unusability on the world.
  2. Tools for Ideation and Problem Solving (Dan Lockton) — comprehensive and analytical take on different systems for ideas and solutions.
  3. Don’t Settle for Eventual Consistency (ACM) — proposes “causal consistency”, prototyped in COPS and Eiger from Princeton.
  4. Intellectual Ventures Loses Patent Case (Ars Technica) — The Capital One case ended last Wednesday, when a Virginia federal judge threw out the two IV patents that remained in the case. It’s the first IV patent case seen through to a judgment, and it ended in a total loss for the patent-holding giant: both patents were invalidated, one on multiple grounds.
Comment
Four short links: 22 April 2014

Four short links: 22 April 2014

In-Browser Data Filtering, Alternative to OpenSSL, Game Mechanics, and Selling Private Data

  1. PourOver — NYT open source Javascript for very fast in-browser filtering and sorting of large collections.
  2. LibreSSL — OpenBSD take on OpenSSL. Unclear how sustainable this effort is, or how well adopted it will be. Competing with OpenSSL is obviously an alternative to tackling the OpenSSL sustainability question by funding and supporting the existing OpenSSL team.
  3. Game Mechanic Explorer — helps learners by turning what they see in games into the simple code and math that makes it happen.
  4. HMRC to Sell Taxpayers’ Data (The Guardian) — between this and the UK govt’s plans to sell patient healthcare data, it’s clear that the new government question isn’t whether data have value, but rather whether the collective has the right to retail the individual’s privacy.
Comment
Four short links: 18 April 2014

Four short links: 18 April 2014

Interview Tips, Data of Any Size, Science Writing, and Instrumented Javascript

  1. 16 Interviewing Tips for User Studies — these apply to many situations beyond user interviews, too.
  2. The Backlash Against Big Data contd. (Mike Loukides) — Learn to be a data skeptic. That doesn’t mean becoming skeptical about the value of data; it means asking the hard questions that anyone claiming to be a data scientist should ask. Think carefully about the questions you’re asking, the data you have to work with, and the results that you’re getting. And learn that data is about enabling intelligent discussions, not about turning a crank and having the right answer pop out.
  3. The Science of Science Writing (American Scientist) — also applicable beyond the specific field for which it was written.
  4. earhornEarhorn instruments your JavaScript and shows you a detailed, reversible, line-by-line log of JavaScript execution, sort of like console.log’s crazy uncle.
Comment

The backlash against big data, continued

Yawn. Yet another article trashing “big data,” this time an op-ed in the Times. This one is better than most, and ends with the truism that data isn’t a silver bullet. It certainly isn’t.

I’ll spare you all the links (most of which are much less insightful than the Times piece), but the backlash against “big data” is clearly in full swing. I wrote about this more than a year ago, in my piece on data skepticism: data is heading into the trough of a hype curve, driven by overly aggressive marketing, promises that can’t be kept, and spurious claims that, if you have enough data, correlation is as good as causation. It isn’t; it never was; it never will be. The paradox of data is that the more data you have, the more spurious correlations will show up. Good data scientists understand that. Poor ones don’t.

It’s very easy to say that “big data is dead” while you’re using Google Maps to navigate downtown Boston. It’s easy to say that “big data is dead” while Google Now or Siri is telling you that you need to leave 20 minutes early for an appointment because of traffic. And it’s easy to say that “big data is dead” while you’re using Google, or Bing, or DuckDuckGo to find material to help you write an article claiming that big data is dead.

Read more…

Comment: 1
Four short links: 10 April 2014

Four short links: 10 April 2014

Rise of the Patent Troll, Farm Data, The Block Chain, and Better Writing

  1. Rise of the Patent Troll: Everything is a Remix (YouTube) — primer on patent trolls, in language anyone can follow. Part of the fixpatents.org campaign. (via BoingBoing)
  2. Petabytes of Field Data (GigaOm) — Farm Intelligence using sensors and computer vision to generate data for better farm decision making.
  3. Bullish on Blockchain (Fred Wilson) — our 2014 fund will be built during the blockchain cycle. “The blockchain” is bitcoin’s distributed consensus system, interesting because it’s the return of p2p from the Chasm of Ridicule or whatever the Gartner Trite Cycle calls the time between first investment bubble and second investment bubble under another name.
  4. Hemingway — online writing tool to help you make your writing clear and direct. (via Nina Simon)
Comments: 2

The backlash against big data, continued

Ignore the hype. Learn to be a data skeptic.

Yawn. Yet another article trashing “big data,” this time an op-ed in the Times. This one is better than most, and ends with the truism that data isn’t a silver bullet. It certainly isn’t.

I’ll spare you all the links (most of which are much less insightful than the Times piece), but the backlash against “big data” is clearly in full swing. I wrote about this more than a year ago, in my piece on data skepticism: data is heading into the trough of a hype curve, driven by overly aggressive marketing, promises that can’t be kept, and spurious claims that, if you have enough data, correlation is as good as causation. It isn’t; it never was; it never will be. The paradox of data is that the more data you have, the more spurious correlations will show up. Good data scientists understand that. Poor ones don’t.

It’s very easy to say that “big data is dead” while you’re using Google Maps to navigate downtown Boston. It’s easy to say that “big data is dead” while Google Now or Siri is telling you that you need to leave 20 minutes early for an appointment because of traffic. And it’s easy to say that “big data is dead” while you’re using Google, or Bing, or DuckDuckGo to find material to help you write an article claiming that big data is dead. Read more…

Comments: 6

5 Fun Facts about HBase that you didn’t know

HBase has made inroads in companies across many industries and countries

With HBaseCon right around the corner, I wanted to take stock of one of the more popular1 components in the Hadoop ecosystem. Over the last few years, many more companies have come to rely on HBase to run key products and services. The conference will showcase a wide variety of such examples, and highlight some of the new features that HBase developers have added over the past year. In the meantime here are some things2 you may not have known about HBase:

Many companies have had HBase in production for 3+ years: Large technology companies including Trend Micro, EBay, Yahoo! and Facebook, and analytics companies RocketFuel and Flurry depend on HBase for many mission-critical services.

There are many use cases beyond advertising: Examples include communications (Facebook messages, Xiaomi), security (Trend Micro), measurement (Nielsen), enterprise collaboration (Jive Software), digital media (OCLC), DNA matching (Ancestry.com), and machine data analysis (Box.com). In particular Nielsen uses HBase to track media consumption patterns and trends, mobile handset company Xiaomi uses Hbase for messaging and other consumer mobile services, and OCLC runs the world’s largest online database of library resources on HBase.

Flurry has the largest contiguous HBase cluster: Mobile analytics company Flurry has an HBase cluster with 1,200 nodes (replicating into another 1,200 node cluster). Flurry is planning to significantly expand their large HBase cluster in the near future.

Read more…

Comment: 1
Four short links: 2 April 2014

Four short links: 2 April 2014

Fault-Tolerant Resilient Yadda Yadda, Tour Tips, Punch Cards, and Public Credit

  1. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (PDF) — Berkeley research paper behind Apache Spark. (via Nelson Minar)
  2. Angular Tour — trivially add tour tips (“This is the widget basket, drag and drop for widget goodness!” type of thing) to your Angular app.
  3. Punchcard — generate Github-style punch card charts “with ease”.
  4. Where Credit Belongs for Hack (Bryan O’Sullivan) — public credit for individual contributors in a piece of corporate open source is a sign of confidence in your team, that building their public reputation isn’t going to result in them leaving for one of the many job offers they’ll receive. And, of course, of caring for your individual contributors. Kudos Facebook.
Comment