"analytics" entries

Five big data predictions for 2013

Diversity and manageability are big data watchwords for the next 12 months.

Here are some of the key big data themes I expect to dominate 2013, and of course will be covering in Strata.

Emergence of a big data architecture

Leadenhall Building skyscraper Under Construction by Martin Pettitt, on FlickrThe coming year will mark the graduation for many big data pilot projects, as they are put into production. With that comes an understanding of the practical architectures that work. These architectures will identify:

  • best of breed tools for different purposes, for instance, Storm for streaming data acquisition
  • appropriate roles for relational databases, Hadoop, NoSQL stores and in-memory databases
  • how to combine existing data warehouses and analytical databases with Hadoop

Of course, these architectures will be in constant evolution as big data tooling matures and experience is gained.

In parallel, I expect to see increasing understanding of where big data responsibility sits within a company’s org chart. Big data is fundamentally a business problem, and some of the biggest challenges in taking advantage of it lie in the changes required to cross organizational silos and reform decision making.

One to watch: it’s hard to move data, so look for a starring architectural role for HDFS for the foreseeable future. Read more…

Four short links: 14 November 2012

Four short links: 14 November 2012

Win95 Tips, Obama's Big Data, Aggregate Statistics, and Foxconn Robots

  1. Windows 95 Tips — hilarious tumblr showing the dark side of life through Windows 95 UI tips. (via Juha Saarinen)
  2. Everything We Know About Obama’s Big Data Operation (Pro Publica) — “White suburban women? They’re not all the same. The Latino community is very diverse with very different interests,” Dan Wagner, the campaign’s chief analytics officer, told The Los Angeles Times. “What the data permits you to do is figure out that diversity.”
  3. cube (GitHub) — time-series data collection and analysis. Cube lets you compute aggregate statistics post hoc. It also enables richer analysis, such as quantiles and histograms of arbitrary event sets. Cube is built on MongoDB and available under the Apache License on GitHub.
  4. 1M Robots to Replace 1M Human Jobs at Foxconn (Singularity Hub) — Foxconn plant opening, making manufacturing robots, and they appear to be dogfooding by using them in other plants. $25k each, 10k+ made, and fits into the pattern: the number of operational robots in China increased by 42 percent from 2010 to 2011.
Four short links: 8 November 2012

Four short links: 8 November 2012

Local Competitive Intelligence, Journalism Doesn't Scale, Winning With Big Data, Predicting the Future

  1. Closely — new startup by Perry Evans (founder of MapQuest), giving businesses a simple app to track competitors’ online deals and social media activity. Seems a genius move to me: so many businesses flounder online, “I don’t know what to do!”, so giving them a birds-eye view of their competition turns the problem into “do better than them!”.
  2. The FT in Play (Reuters) — very interesting point in this analysis of the Financial Times being up for sale: [Traditional] journalism doesn’t have economies of scale. The bigger that journalistic organizations become, the less efficient they get. (via Bernard Hickey)
  3. Big Data Behind Obama’s Win (Time) — huge analytics operation, very secretive, providing insights and updates on everything.
  4. How to Predict the FutureThis is the story of a spreadsheet I’ve been keeping for almost twenty years. Thesis: hardware trends more useful for predicting advances than software trends. (via Kenton Kivestu)
Four short links: 30 October 2012

Four short links: 30 October 2012

Sandy's Latency, Better Buttons, Inside Chargers, and Hidden Warranties

  1. Fastly’s S3 Latency MonitorThe graph represents real-time response latency for Amazon S3 as seen by Fastly’s Ashburn, VA edge server. I’ve been watching #sandy’s effect on the Internet in real-time, while listening to its effect on people in real-time. Amazing.
  2. Button Upgrade (Gizmodo) — elegant piece of button design, for sale on Shapeways.
  3. Inside a Dozen USB Chargers — amazing differences in such seemingly identical products. I love the comparison between genuine and counterfeit Apple chargers. (via Hacker News)
  4. Why Products Fail (Wired) — researcher scours the stock market filings of publicly-listed companies to extract information about warranties. Before, even information like the size of the market—how much gets paid out each year in warranty claims—was a mystery. Nobody, not analysts, not the government, not the companies themselves, knew what it was. Now Arnum can tell you. In 2011, for example, basic warranties cost US manufacturers $24.7 billion. Because of the slow economy, this is actually down, Arnum says; in 2007 it was around $28 billion. Extended warranties—warranties that customers purchase from a manufacturer or a retailer like Best Buy—account for an estimated $30.2 billion in additional claims payments. Before Arnum, this $60 billion-a-year industry was virtually invisible. Another hidden economy revealed. (via BoingBoing)

How to open an industry: data points from Strata Rx

O'Reilly conference brings together health care and data

O’Reilly’s first conference devoted to health care, Strata Rx, wrapped up earlier this week. Despite competing with at least three other conferences being held on the same week around the country on various aspects of health care and technology, we drew a crowd that filled the ballroom during keynotes and spent the breaks networking more hungrily than they attacked the (healthy) food provided throughout.

Springing from O’Reilly’s Strata series about the use of data to change business and society, Strata Rx explored many other directions in health care, as a peek at the schedule will show. The keynotes were filmed and will soon appear online. The unique perspectives offered by expert speakers is evident, but what’s hard is making sense of the two days as a whole.

In this article I’ll try to show the underlying threads that tied together the many sessions about data analytics, electronic records, disruption in the health care industry, 21st-century genetics research, patient empowerment, and other themes. The essential message from the leading practitioners at Strata Rx is ultimately that no one in health care (doctors, administrators, researchers, regulators, patients) can practice their discipline in isolation any more. We are all going to have to work together.

We can’t wait for insights from others, expecting researchers to hand us ideal treatment plans or doctors to make oracular judgments. The systems are all interconnected now. And if we want healthy people, not to mention sustainable health care costs, we will have to play our roles in these systems with nuance and sophistication.

But I’ll get to this insight by steps. Let’s look at some major themes of Strata Rx. Read more…

Four short links: 15 October 2012

Four short links: 15 October 2012

DIY Thermal Camera, Watching Trolls Wither, Discovering Dark Social, and Student Mobile Phone Use

  1. Cheap Thermocam — cheap thermal imaging camera, takes about a minute to capture an image. (via IEEE Spectrum)
  2. Observations on What’s Getting Downvoted (Ars Technica) — fascinating piece of social work, showing how the community polices (or reacts to) trolls. (via Hacker News)
  3. Dark Social (The Atlantic) — Just look at that graph. On the one hand, you have all the social networks that you know. They’re about 43.5 percent of our social traffic. On the other, you have this previously unmeasured darknet that’s delivering 56.5 percent of people to individual stories. This is not a niche phenomenon! It’s more than 2.5x Facebook’s impact on the site.
  4. A Tethered WorldAll students, across all 56 represented countries, are doing generally the same few things. Facebook and Twitter, above all else, are the predominant tools for all information use among the participants. The predominance of these few tools are creating a homogenizing influence around the world.

Advanced analytics for all in the health care system

Arijit Sengupta on the benefits of making health care analytics widely accessible within an organization.

Arijit Sengupta presents a summary of his work as the CEO of BeyondCore in the presentation “Advanced Analytics for All: Enabling business users to act on length of stay patterns at a leading hospital system.” This presentation was part of the Strata Rx Online Conference: Personalized Medicine, a preview of O’Reilly’s conference Strata Rx, highlighting the use of data in medical research and delivery.

Sengupta’s vision is to bring analytics to people throughout an organization who can use them in their work. He hopes to bring analytics that have traditionally been available only to those at the top of a large organization down to those making everyday decisions. Users of analytics should not need to know statistics or computer science. In this presentation, he shows how hospital employees can correlate the length of a hospital stay with other variables.

Key points Sengupta’s session include: Read more…

Four short links: 16 July 2012

Four short links: 16 July 2012

Open Access, Emergency Social Media, A/B Testing Traps, and Post-Moore Sequencing Costs

  1. Britain To Provide Free Access to Scientific Publications (Guardian) — the Finch report is being implemented! British universities now pay around £200m a year in subscription fees to journal publishers, but under the new scheme, authors will pay “article processing charges” (APCs) to have their papers peer reviewed, edited and made freely available online. The typical APC is around £2,000 per article.
  2. Social Media in an Emergency: A Best Practice Guide — from the Wellington City Council in New Zealand, who have been learning from Christchurch earthquakes and Tauranga’s oil spill.
  3. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained (PDF) — Microsoft Research dug into A/B tests done on Bing and reveal some subtle truths. The statistical theory of controlled experiments is well understood, but the devil is in the details and the difference between theory and practice is greater in practice than in theory […] Generating numbers is easy; generating numbers you should trust is hard! (via Greg Linden)
  4. Data Sequencing Costs (National Human Genome Research Institute) — Cost-per-megabase and cost-per-genome are dropping faster than Moore’s Law now they’ve introduced “second generation techniques” for sequencing, aka “high-throughput sequencing” or a parallelization of the process. (via JP Rangaswami)
Four short links: 4 July 2012

Four short links: 4 July 2012

Inside Anonymous, Kanban Board, Extending Objective C, and Football Graphs

  1. How Anonymous Works (Wired) — Quinn Norton explains how the decentralized Anonymous operates, and how the transition to political activism happened. Required reading to understand post-state post-structure organisations, and to make sense of this chaotic unpredictable entity.
  2. Kanban For 1 — very nice progress board for tasks, for the lifehackers who want to apply agile software tools to the rest of their life.
  3. libextobj (GitHub) — library of extensions to Objective C to support patterns from other languages. (via Ian Kallen)
  4. Graph Theory to Understood Football (Tech Review) — players are nodes, passes build edges, and you can see strengths and strategies of teams in the resulting graphs.
Four short links: 2 July 2012

Four short links: 2 July 2012

Predictive Policing, Public Sector Tech Benefits, Wireless Joystick on a Ring, and Recruiter Honeypot

  1. Predicting Crime Before It Occurs (SFGate) — The new program used by LAPD and police in the Northern California city of Santa Cruz is more timely and precise, proponents said. Built on the same model for predicting aftershocks following an earthquake, the software promises to show officers what might be coming based on simple, constantly calibrated data — location, time and type of crime. The software generates prediction boxes — as small as 500 square feet — on a patrol map. When officers have spare time, they are told to “go in the box.”
  2. Realising Benefits From Six Public Sector Technology Projects (PDF) — New Zealand report from the Auditor-General. Conclusion specifically calls out agile development, open source, and open data as technology tools that helped deliver success.
  3. Ringbow (Kickstarter) — a D-pad style joystick controller, built into a ring and designed for use with touchscreen games.
  4. The Recruiter Honeypot (Elaine Wherry) — Brilliant! Trying to ramp up Meebo’s staff, Elaine created a fake employee profile to see where recruiters hunted and to identify the best. Her lessons are great advice for anyone also trying to hire up fast in the Bay Area. Worth reading if only for the squicky stories of sleazy recruiters.