ENTRIES TAGGED "Big Data"

Four short links: 28 March 2014

Four short links: 28 March 2014

Javascript on Glass, Smart Lights, Hardware Startups, MySQL at Scale

  1. WearScript — open source project putting Javascript on Glass. See story on it. (via Slashdot)
  2. Mining the World’s Data by Selling Street Lights and Farm Drones (Quartz) — Depending on what kinds of sensors the light’s owners choose to install, Sensity’s fixtures can track everything from how much power the lights themselves are consuming to movement under the post, ambient light, and temperature. More sophisticated sensors can measure pollution levels, radiation, and particulate matter (for air quality levels). The fixtures can also support sound or video recording. Bring these lights onto city streets and you could isolate the precise location of a gunshot within seconds.
  3. An Investor’s Guide to Hardware Startups — good to know if you’re thinking of joining one, too.
  4. WebScaleSQL — a MySQL downstream patchset built for “large scale” (aka Google, Facebook type loads).
Comment
Four short links: 24 March 2014

Four short links: 24 March 2014

Google Flu, Embeddable JS, Data Analysis, and Belief in the Browser

  1. The Parable of Google Flu (PDF) — We explore two
    issues that contributed to [Google Flu Trends]’s mistakes—big data hubris and algorithm dynamics—and offer lessons for moving forward in the big data age.
    Overtrained and underfed?
  2. Duktape — a lightweight embeddable Javascript engine. Because an app without an API is like a lightbulb without an IP address: retro but not cool.
  3. Principles of Good Data Analysis (Greg Reda) — Once you’ve settled on your approach and data sources, you need to make sure you understand how the data was generated or captured, especially if you are using your own company’s data. Treble so if you are using data you snaffled off the net, riddled with collection bias and untold omissions. (via Stijn Debrouwere)
  4. Deep Belief Networks in Javascript — just object recognition in the browser. The code relies on GPU shaders to perform calculations on over 60 million neural connections in real time. From the ever-more-awesome Pete Warden.
Comment

Podcast: thinking with data

Data tools are less important than the way you frame your questions.

Max Shron and Jake Porway spoke with me at Strata a few weeks ago about frameworks for making reasoned arguments with data. Max’s recent O’Reilly book, Thinking with Data, outlines the crucial process of developing good questions and creating a plan to answer them. Jake’s nonprofit, DataKind, connects data scientists with worthy causes where they can apply their skills.

A few of the things we talked about:

  • The importance of publishing negative scientific results
  • Give Directly, an organization that facilitates donations directly to households in Kenya and Uganda. Give Directly was able to model income using satellite data to distinguish thatched roofs from metal roofs.
  • Moritz Stefaner calling for a “macroscope”
  • Project Cybersyn, Salvador Allende’s plan for encompassing the entire Chilean economy in a single real-time computer system
  • Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed by James C. Scott

After we recorded this podcast episode at Strata Santa Clara, Max presided over a webcast on his book that’s archived here.

Comment
Four short links: 18 March 2014

Four short links: 18 March 2014

On Managers, Human Data, Driverless Cars, and Bad Business

  1. On Managers (Mike Migurski) — Managers might be difficult, hostile, or useless, but because they are parts of an explicit power structure they can be evaluated explicitly.
  2. Big Data: Humans Required (Sherri Hammons) — the heart of the problem with data: interpretation. Data by itself is of little value. It is only when it is interpreted and understood that it begins to become information. GovTech recently wrote an article outlining why search engines will not likely replace actual people in the near future. If it were merely a question of pointing technology at the problem, we could all go home and wait for the Answer to Everything. But, data doesn’t happen that way. Data is very much like a computer: it will do just as it’s told. No more, no less. A human is required to really understand what data makes sense and what doesn’t. (via Anne Zelenka)
  3. Morgan Stanley on the Economic Benefits of Driverless CarsThe total savings of over $5.6 trillion annually are not envisioned until a couple of decades as Morgan Stanley see four phases of adoption of self-driving vehicles. Phase 1 is already underway, Phase 2 will be semi-autonomous, Phase 3 will be within 5 to 10 years, by which time we will see fully self-driving vehicles on the roads – but not widespread usage. The authors say Phase 4, which will have the biggest impact, is when 100% of all vehicles on the roads will be fully autonomous, they say this may take a couple of decades.
  4. Worse (Marco Arment) — I’ve been sitting on this but can’t fault it. In the last few years, Google, Apple, Amazon, Facebook, and Twitter have all made huge attempts to move into major parts of each others’ businesses, usually at the detriment of their customers or users.
Comment

The dangers of data-driven list-making

Such lists might mean we miss the truly great breakthroughs, inspirations, and leaps of faith necessary to evolve.

Editor’s note: this post originally appeared on Tilt the Windmill; it is republished here with permission.

Startupfest’s Pamela Perotti asked for my thoughts on this great Forbes piece by Lightspeed’s Barry Eggers about using big data to build top ten lists that actually matter.

First: it’s an excellent post. You should read it. I’ll wait.

Every enterprise decision-maker will soon be running their business according to the lists Barry envisions, as the power of big data and analytics finds its way into every boardroom and dashboard. Society will soon demand them, too. But while such analysis is tremendously valuable, it carries two dangers: the politics of setting criteria, and the trap of relying on data for inspiration.

The harsh light of data

Barry is right: rather than using our precious time and resources to make yet another linkbait list of the 50 cutest kittens, or the seven people I’ll try to avoid at SXSW, we should use abundant data and a connected world to build lists that matter: lying politicians, bad cars, lousy doctors. Then we can use these lists to change policy and behaviour because we’ll make things transparent. Shining the harsh light of data on something can improve it.

Unfortunately, expecting big data to be a panacea that cures all our ills is overreaching and can lead to the kind of hype that scuttles otherwise ascendant technologies. Read more…

Comments: 4
Four short links: 13 March 2014

Four short links: 13 March 2014

Parallel Programming, Malignant Computation, Politicised GDS, and Data Stream Toolkit

  1. Is Parallel Programming Hard? And, If So, What Can You Do About It? — book by Paul E. McKenney, on single-machine multi-CPU parallel programming.
  2. Malignant ComputationThe bitcoin mining network would work just as well if it had far less computation devoted to it. Bitcoins would be mined at exactly the same rate if 1/2 or 1/4 of the computational resources were devoted. This means that bitcoin has incentivized a tremendous amount of computational busy work.
  3. GDS Becomes Political (Computer Weekly) — She [Opposition MP] said that digital should not be about imposing a way of working on the public sector – Labour is not fond of the “digital by default” mantra – but about supporting public service delivery. [...] “When this government decided upon the digitalisation of this [online job search] service they apparently did not take into account those with poor literacy skills, mental health issues or learning difficulties – who, as most people would have predicted, make up a higher-than-average proportion of the unemployed.”
  4. streamtools (Github) — a graphical toolkit for dealing with streams of data. Streamtools makes it easy to explore, analyse, modify and learn from streams of data. (via OpenNews)
Comment
Four short links: 11 March 2014

Four short links: 11 March 2014

Game Analysis, Brave New (Disney)World, Internet of Deadly Things, and Engagement vs Sharing

  1. In-Game Graph Analysis (The Economist) — one MLB team has bought a Cray Ulrika graph-processing appliance for in-game analysis of data. Please hold, boggling. (via Courtney Nash)
  2. Disney Bets $1B on Technology (BusinessWeek) — MyMagic+ promises far more radical change. It’s a sweeping reservation and ride planning system that allows for bookings months in advance on a website or smartphone app. Bracelets called MagicBands, which link electronically to an encrypted database of visitor information, serve as admission tickets, hotel keys, and credit or debit cards; a tap against a sensor pays for food or trinkets. The bands have radio frequency identification (RFID) chips—which critics derisively call spychips because of their ability to monitor people and things. (via Jim Stogdill)
  3. Stupid Smart Stuff (Don Norman) — In the airplane, the pilots are not attending, but when trouble does arise, the extremely well-trained pilots have several minutes to respond. In the automobile, when trouble arises, the ill-trained drivers will have one or two seconds to respond. Automobile designers – and law makers – have ignored this information.
  4. What You Think You Know About the Web Is WrongChartbeat looked at deep user behavior across 2 billion visits across the web over the course of a month and found that most people who click don’t read. In fact, a stunning 55% spent fewer than 15 seconds actively on a page. The stats get a little better if you filter purely for article pages, but even then one in every three visitors spend less than 15 seconds reading articles they land on. The entire article makes some powerful points about the difference between what’s engaged with and what’s shared. Articles that were clicked on and engaged with tended to be actual news. In August, the best performers were Obamacare, Edward Snowden, Syria and George Zimmerman, while in January the debates around Woody Allen and Richard Sherman dominated. The most clicked on but least deeply engaged-with articles had topics that were more generic. In August, the worst performers included Top, Best, Biggest, Fictional etc while in January the worst performers included Hairstyles, Positions, Nude and, for some reason, Virginia. That’s data for you.
Comment

Big data and privacy: an uneasy face-off for government to face

MIT workshop kicks off Obama campaign on privacy

Thrust into controversy by Edward Snowden’s first revelations last year, President Obama belatedly welcomed a “conversation” about privacy. As cynical as you may feel about US spying, that conversation with the federal government has now begun. In particular, the first of three public workshops took place Monday at MIT.

Given the locale, a focus on the technical aspects of privacy was appropriate for this discussion. Speakers cheered about the value of data (invoking the “big data” buzzword often), delineated the trade-offs between accumulating useful data and preserving privacy, and introduced technologies that could analyze encrypted data without revealing facts about individuals. Two more workshops will be held in other cities, one focusing on ethics and the other on law.

Read more…

Comment

The technical aspects of privacy

The first of three public workshops kicked off a conversation with the federal government on data privacy in the US.

Thrust into controversy by Edward Snowden’s first revelations last year, President Obama belatedly welcomed a “conversation” about privacy. As cynical as you may feel about US spying, that conversation with the federal government has now begun. In particular, the first of three public workshops took place Monday at MIT.

Given the locale, a focus on the technical aspects of privacy was appropriate for this discussion. Speakers cheered about the value of data (invoking the “big data” buzzword often), delineated the trade-offs between accumulating useful data and preserving privacy, and introduced technologies that could analyze encrypted data without revealing facts about individuals. Two more workshops will be held in other cities, one focusing on ethics and the other on law. Read more…

Comments: 7

Healthcare Lessons from the Data Sages at Strata

Other industries can show health care the way

This article was written with Ellen M. Martin.

Most healthcare clinicians don’t often think about donating or sharing data. Yet, after hearing Stephen Friend of Sage Bionetworks talk about involving citizens and patients in the field of genetic research at StrataRx 2012, I was curious to learn more.

McKinsey points out the 300 billion dollars in potential savings from using open data in healthcare, while a recent IBM Institute of Business Value study showed the need for corporate data collaboration.

Also, during my own research for Big Data in Healthcare: Hype and Hope, the resounding request from all the participants I interviewed was to “find more data streams to analyze.”

Read more…

Comment