- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (PDF) — Berkeley research paper behind Apache Spark. (via Nelson Minar)
- Angular Tour — trivially add tour tips (“This is the widget basket, drag and drop for widget goodness!” type of thing) to your Angular app.
- Punchcard — generate Github-style punch card charts “with ease”.
- Where Credit Belongs for Hack (Bryan O’Sullivan) — public credit for individual contributors in a piece of corporate open source is a sign of confidence in your team, that building their public reputation isn’t going to result in them leaving for one of the many job offers they’ll receive. And, of course, of caring for your individual contributors. Kudos Facebook.
ENTRIES TAGGED "Big Data"
Fault-Tolerant Resilient Yadda Yadda, Tour Tips, Punch Cards, and Public Credit
Establishing protocols to socialize wearable devices.
The age of ubiquitous computing is accelerating, and it’s creating some interesting social turbulence, particularly where wearable hardware is concerned. Intelligent devices other than phones and screens — smart headsets, glasses, watches, bracelets — are insinuating themselves into our daily lives. The technology for even less intrusive mechanisms, such as jewelry, buttons, and implants, exists and will ultimately find commercial applications.
And as sensor-and-software-augmented devices and wireless connections proliferate through the environment, it will be increasingly difficult to determine who is connected — and how deeply — and how the data each of us generates is disseminated, captured and employed. We’re already seeing some early signs of wearable angst: recent confrontations in bars and restaurants between those wearing Google Glass and others worried they were being recorded.
This is nothing new, of course. Many major technological developments experienced their share of turbulent transitions. Ultimately, though, the benefits of wearable computers and a connected environment are likely to prove too seductive to resist. People will participate and tolerate because the upside outweighs the downside. Read more…
Game Patterns, What Next, GPU vs CPU, and Privacy with Sensors
- Game Programming Patterns — a book in progress.
- Search for the Next Platform (Fred Wilson) — Mobile is now the last thing. And all of these big tech companies are looking for the next thing to make sure they don’t miss it.. And they will pay real money (to you and me) for a call option on the next thing.
- Debunking the 100X GPU vs. CPU Myth — in Pete Warden’s words, “in a lot of real applications any speed gains on the computation side are swamped by the time it takes to transfer data to and from the graphics card.”
- Privacy in Sensor-Driven Human Data Collection (PDF) — see especially the section “Attacks Against Privacy”. More generally, it is often the case the data released by researches is not the source of privacy issues, but the unexpected inferences that can be drawn from it. (via Pete Warden)
- Mining the World’s Data by Selling Street Lights and Farm Drones (Quartz) — Depending on what kinds of sensors the light’s owners choose to install, Sensity’s fixtures can track everything from how much power the lights themselves are consuming to movement under the post, ambient light, and temperature. More sophisticated sensors can measure pollution levels, radiation, and particulate matter (for air quality levels). The fixtures can also support sound or video recording. Bring these lights onto city streets and you could isolate the precise location of a gunshot within seconds.
- An Investor’s Guide to Hardware Startups — good to know if you’re thinking of joining one, too.
- WebScaleSQL — a MySQL downstream patchset built for “large scale” (aka Google, Facebook type loads).
Google Flu, Embeddable JS, Data Analysis, and Belief in the Browser
- The Parable of Google Flu (PDF) — We explore two
issues that contributed to [Google Flu Trends]’s mistakes—big data hubris and algorithm dynamics—and offer lessons for moving forward in the big data age. Overtrained and underfed?
- Principles of Good Data Analysis (Greg Reda) — Once you’ve settled on your approach and data sources, you need to make sure you understand how the data was generated or captured, especially if you are using your own company’s data. Treble so if you are using data you snaffled off the net, riddled with collection bias and untold omissions. (via Stijn Debrouwere)
Data tools are less important than the way you frame your questions.
Max Shron and Jake Porway spoke with me at Strata a few weeks ago about frameworks for making reasoned arguments with data. Max’s recent O’Reilly book, Thinking with Data, outlines the crucial process of developing good questions and creating a plan to answer them. Jake’s nonprofit, DataKind, connects data scientists with worthy causes where they can apply their skills.
A few of the things we talked about:
- The importance of publishing negative scientific results
- Give Directly, an organization that facilitates donations directly to households in Kenya and Uganda. Give Directly was able to model income using satellite data to distinguish thatched roofs from metal roofs.
- Moritz Stefaner calling for a “macroscope”
- Project Cybersyn, Salvador Allende’s plan for encompassing the entire Chilean economy in a single real-time computer system
- Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed by James C. Scott
After we recorded this podcast episode at Strata Santa Clara, Max presided over a webcast on his book that’s archived here.
On Managers, Human Data, Driverless Cars, and Bad Business
- On Managers (Mike Migurski) — Managers might be difficult, hostile, or useless, but because they are parts of an explicit power structure they can be evaluated explicitly.
- Big Data: Humans Required (Sherri Hammons) — the heart of the problem with data: interpretation. Data by itself is of little value. It is only when it is interpreted and understood that it begins to become information. GovTech recently wrote an article outlining why search engines will not likely replace actual people in the near future. If it were merely a question of pointing technology at the problem, we could all go home and wait for the Answer to Everything. But, data doesn’t happen that way. Data is very much like a computer: it will do just as it’s told. No more, no less. A human is required to really understand what data makes sense and what doesn’t. (via Anne Zelenka)
- Morgan Stanley on the Economic Benefits of Driverless Cars — The total savings of over $5.6 trillion annually are not envisioned until a couple of decades as Morgan Stanley see four phases of adoption of self-driving vehicles. Phase 1 is already underway, Phase 2 will be semi-autonomous, Phase 3 will be within 5 to 10 years, by which time we will see fully self-driving vehicles on the roads – but not widespread usage. The authors say Phase 4, which will have the biggest impact, is when 100% of all vehicles on the roads will be fully autonomous, they say this may take a couple of decades.
- Worse (Marco Arment) — I’ve been sitting on this but can’t fault it. In the last few years, Google, Apple, Amazon, Facebook, and Twitter have all made huge attempts to move into major parts of each others’ businesses, usually at the detriment of their customers or users.
Such lists might mean we miss the truly great breakthroughs, inspirations, and leaps of faith necessary to evolve.
Editor’s note: this post originally appeared on Tilt the Windmill; it is republished here with permission.
First: it’s an excellent post. You should read it. I’ll wait.
Every enterprise decision-maker will soon be running their business according to the lists Barry envisions, as the power of big data and analytics finds its way into every boardroom and dashboard. Society will soon demand them, too. But while such analysis is tremendously valuable, it carries two dangers: the politics of setting criteria, and the trap of relying on data for inspiration.
The harsh light of data
Barry is right: rather than using our precious time and resources to make yet another linkbait list of the 50 cutest kittens, or the seven people I’ll try to avoid at SXSW, we should use abundant data and a connected world to build lists that matter: lying politicians, bad cars, lousy doctors. Then we can use these lists to change policy and behaviour because we’ll make things transparent. Shining the harsh light of data on something can improve it.
Parallel Programming, Malignant Computation, Politicised GDS, and Data Stream Toolkit
- Is Parallel Programming Hard? And, If So, What Can You Do About It? — book by Paul E. McKenney, on single-machine multi-CPU parallel programming.
- Malignant Computation — The bitcoin mining network would work just as well if it had far less computation devoted to it. Bitcoins would be mined at exactly the same rate if 1/2 or 1/4 of the computational resources were devoted. This means that bitcoin has incentivized a tremendous amount of computational busy work.
- GDS Becomes Political (Computer Weekly) — She [Opposition MP] said that digital should not be about imposing a way of working on the public sector – Labour is not fond of the “digital by default” mantra – but about supporting public service delivery. [...] “When this government decided upon the digitalisation of this [online job search] service they apparently did not take into account those with poor literacy skills, mental health issues or learning difficulties – who, as most people would have predicted, make up a higher-than-average proportion of the unemployed.”
- streamtools (Github) — a graphical toolkit for dealing with streams of data. Streamtools makes it easy to explore, analyse, modify and learn from streams of data. (via OpenNews)
Game Analysis, Brave New (Disney)World, Internet of Deadly Things, and Engagement vs Sharing
- In-Game Graph Analysis (The Economist) — one MLB team has bought a Cray Ulrika graph-processing appliance for in-game analysis of data. Please hold, boggling. (via Courtney Nash)
- Disney Bets $1B on Technology (BusinessWeek) — MyMagic+ promises far more radical change. It’s a sweeping reservation and ride planning system that allows for bookings months in advance on a website or smartphone app. Bracelets called MagicBands, which link electronically to an encrypted database of visitor information, serve as admission tickets, hotel keys, and credit or debit cards; a tap against a sensor pays for food or trinkets. The bands have radio frequency identification (RFID) chips—which critics derisively call spychips because of their ability to monitor people and things. (via Jim Stogdill)
- Stupid Smart Stuff (Don Norman) — In the airplane, the pilots are not attending, but when trouble does arise, the extremely well-trained pilots have several minutes to respond. In the automobile, when trouble arises, the ill-trained drivers will have one or two seconds to respond. Automobile designers – and law makers – have ignored this information.
- What You Think You Know About the Web Is Wrong — Chartbeat looked at deep user behavior across 2 billion visits across the web over the course of a month and found that most people who click don’t read. In fact, a stunning 55% spent fewer than 15 seconds actively on a page. The stats get a little better if you filter purely for article pages, but even then one in every three visitors spend less than 15 seconds reading articles they land on. The entire article makes some powerful points about the difference between what’s engaged with and what’s shared. Articles that were clicked on and engaged with tended to be actual news. In August, the best performers were Obamacare, Edward Snowden, Syria and George Zimmerman, while in January the debates around Woody Allen and Richard Sherman dominated. The most clicked on but least deeply engaged-with articles had topics that were more generic. In August, the worst performers included Top, Best, Biggest, Fictional etc while in January the worst performers included Hairstyles, Positions, Nude and, for some reason, Virginia. That’s data for you.