- Welcome to the Malware-Industrial Complex (MIT) — brilliant phrase, sound analysis.
- Stupid Stupid xBox — The hardcore/soft-tv transition and any lead they feel they have is simply not defensible by licensing other industries’ generic video or music content because those industries will gladly sell and license the same content to all other players. A single custom studio of 150 employees also can not generate enough content to defensibly satisfy 76M+ customers. Only with quality primary software content from thousands of independent developers can you defend the brand and the product. Only by making the user experience simple, quick, and seamless can you defend the brand and the product. Never seen a better put statement of why an ecosystem of indies is essential.
- Data Feedback Loops for TV (Salon) — Netflix’s data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher. Therefore, concluded Netflix executives, a remake of the BBC drama with Spacey and Fincher attached was a no-brainer, to the point that the company committed $100 million for two 13-episode seasons.
- wrk — a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.
Tips for interacting with analytics colleagues
To quote Pride and Prejudice, businesses have for many years “labored under the misapprehension” that their analytics talent was made up of misanthropes with neither the will nor the ability to communicate or work with others on strategic or creative business problems. These employees were meant to be kept in the basement out of sight, fed bad pizza, and pumped for spreadsheets to be interpreted in the sunny offices aboveground.
This perception is changing in industry as the big data phenomenon has elevated data science to a C-level priority. Suddenly folks once stereotyped by characters like Milton in Office Space are now “sexy.” The truth is there have always been well-rounded, articulate, friendly analytics professionals (they may just like Battlestar more than you), and now that analytics is an essential business function, personalities of all types are being attracted to practice the discipline.
Malware Industrial Complex, Indies Needed, TV Analytics, and HTTP Benchmarking
Handmade Hardware, Tab Silencer, Surprise and Models, and Sciencey GIFs
- Your USB Sticks Are Made With Chopsticks (Bunnie Huang) — behind-the-scenes on how USB sticks are made.
- mutetab — find and kill the Chrome tab making all the damn noise! (via Nelson Minar)
- Visualization, Modeling, and Surprises (John D Cook) — paraphrases Hadley Wickham: Visualization can surprise you, but it doesn’t scale well. Modelling scales well, but it can’t surprise you.
- Head Like an Orange — science animated GIFs, assembled from nature documentaries. (via Ed Yong)
A deconstructed web analytics report shows what the dashboard missed.
We can all agree that in 2013 web analytics is still a nightmare, right?
The last few years have brought about an enormous expansion in the top of the web analytics information overload funnel, and today I can discover just about any aspect of my web traffic that piques my curiosity.
I know how much traffic I’m getting, who told them to come here, how they got here, how long they’re staying, what they’re looking at, what they’re using to look at it, where they’re from, and just about anything else I want to know about them. If I don’t like what I’m looking at, I can customize everything from my dashboard to reports to parameters within those reports.
What none of this tells me is how I can be more successful at turning the words I put on the Internet into dollars in my pocket.
Now, I know what you’re thinking: “It’s all there! More information than you could ever figure out what to do with.”
The problem with that is that it’s all there. It’s more information than I could ever figure out what to do with. Read more…
Free Books, Analytics Goofs, Book Boilerplate, and Learn CS with the Raspberry Pi
- Free Book Sifter — lists all the free books on Amazon, has RSS feeds and newsletters. (via BoingBoing)
- Whom the Gods Would Destroy, They First Give Realtime Analytics — a few key reasons why truly real-time analytics can open the door to a new type of (realtime!) bad decision making. [U]ser demographics could be different day over day. Or very likely, you could see a major difference in user behavior immediately upon releasing a change, only to watch it evaporate as users learn to use new functionality. Given all of these concerns, the conservative and reasonable stance is to only consider tests that last a few days or more.
- Web Book Boilerplate (Github) — uses plain old markdown and generates a well structured HTML version of your written words. Since it’s sitting on top of Pandoc and Grunt, you can easily make your books available for every platform. MIT-style license.
- Raspberry Pi Education Manual (PDF) — from Scratch to Python and HCI all via the Raspberry Pi. Intended to be informative and a series of lessons for teachers and students learning coding with the Raspberry Pi as their first device.
Diversity and manageability are big data watchwords for the next 12 months.
Here are some of the key big data themes I expect to dominate 2013, and of course will be covering in Strata.
Emergence of a big data architecture
The coming year will mark the graduation for many big data pilot projects, as they are put into production. With that comes an understanding of the practical architectures that work. These architectures will identify:
- best of breed tools for different purposes, for instance, Storm for streaming data acquisition
- appropriate roles for relational databases, Hadoop, NoSQL stores and in-memory databases
- how to combine existing data warehouses and analytical databases with Hadoop
Of course, these architectures will be in constant evolution as big data tooling matures and experience is gained.
In parallel, I expect to see increasing understanding of where big data responsibility sits within a company’s org chart. Big data is fundamentally a business problem, and some of the biggest challenges in taking advantage of it lie in the changes required to cross organizational silos and reform decision making.
One to watch: it’s hard to move data, so look for a starring architectural role for HDFS for the foreseeable future. Read more…
Win95 Tips, Obama's Big Data, Aggregate Statistics, and Foxconn Robots
- Windows 95 Tips — hilarious tumblr showing the dark side of life through Windows 95 UI tips. (via Juha Saarinen)
- Everything We Know About Obama’s Big Data Operation (Pro Publica) — “White suburban women? They’re not all the same. The Latino community is very diverse with very different interests,” Dan Wagner, the campaign’s chief analytics officer, told The Los Angeles Times. “What the data permits you to do is figure out that diversity.”
- cube (GitHub) — time-series data collection and analysis. Cube lets you compute aggregate statistics post hoc. It also enables richer analysis, such as quantiles and histograms of arbitrary event sets. Cube is built on MongoDB and available under the Apache License on GitHub.
- 1M Robots to Replace 1M Human Jobs at Foxconn (Singularity Hub) — Foxconn plant opening, making manufacturing robots, and they appear to be dogfooding by using them in other plants. $25k each, 10k+ made, and fits into the pattern: the number of operational robots in China increased by 42 percent from 2010 to 2011.
Sandy's Latency, Better Buttons, Inside Chargers, and Hidden Warranties
- Fastly’s S3 Latency Monitor — The graph represents real-time response latency for Amazon S3 as seen by Fastly’s Ashburn, VA edge server. I’ve been watching #sandy’s effect on the Internet in real-time, while listening to its effect on people in real-time. Amazing.
- Button Upgrade (Gizmodo) — elegant piece of button design, for sale on Shapeways.
- Inside a Dozen USB Chargers — amazing differences in such seemingly identical products. I love the comparison between genuine and counterfeit Apple chargers. (via Hacker News)
- Why Products Fail (Wired) — researcher scours the stock market filings of publicly-listed companies to extract information about warranties. Before, even information like the size of the market—how much gets paid out each year in warranty claims—was a mystery. Nobody, not analysts, not the government, not the companies themselves, knew what it was. Now Arnum can tell you. In 2011, for example, basic warranties cost US manufacturers $24.7 billion. Because of the slow economy, this is actually down, Arnum says; in 2007 it was around $28 billion. Extended warranties—warranties that customers purchase from a manufacturer or a retailer like Best Buy—account for an estimated $30.2 billion in additional claims payments. Before Arnum, this $60 billion-a-year industry was virtually invisible. Another hidden economy revealed. (via BoingBoing)
O'Reilly conference brings together health care and data
O’Reilly’s first conference devoted to health care, Strata Rx, wrapped up earlier this week. Despite competing with at least three other conferences being held on the same week around the country on various aspects of health care and technology, we drew a crowd that filled the ballroom during keynotes and spent the breaks networking more hungrily than they attacked the (healthy) food provided throughout.
Springing from O’Reilly’s Strata series about the use of data to change business and society, Strata Rx explored many other directions in health care, as a peek at the schedule will show. The keynotes were filmed and will soon appear online. The unique perspectives offered by expert speakers is evident, but what’s hard is making sense of the two days as a whole.
In this article I’ll try to show the underlying threads that tied together the many sessions about data analytics, electronic records, disruption in the health care industry, 21st-century genetics research, patient empowerment, and other themes. The essential message from the leading practitioners at Strata Rx is ultimately that no one in health care (doctors, administrators, researchers, regulators, patients) can practice their discipline in isolation any more. We are all going to have to work together.
We can’t wait for insights from others, expecting researchers to hand us ideal treatment plans or doctors to make oracular judgments. The systems are all interconnected now. And if we want healthy people, not to mention sustainable health care costs, we will have to play our roles in these systems with nuance and sophistication.
But I’ll get to this insight by steps. Let’s look at some major themes of Strata Rx. Read more…