"data" entries

Four short links: 4 November 2014

Four short links: 4 November 2014

3D Shares, Autonomous Golf Carts, Competitive Solar, and Interesting Data Problems

  1. Cooper-Hewitt Shows How to Share 3D Scan Data Right (Makezine) — important as we move to a web of physical models, maps, and designs.
  2. Singapore Tests Autonomous Golfcarts (Robohub) — a reminder that the future may not necessarily look like someone used the clone tool to paint Silicon Valley over the world.
  3. Solar Hits Parity in 10 States, 47 by 2016 (Bloomberg) — The reason solar-power generation will increasingly dominate: it’s a technology, not a fuel. As such, efficiency increases and prices fall as time goes on. The price of Earth’s limited fossil fuels tends to go the other direction.
  4. Facebook’s Top Open Data Problems (Facebook Research) — even if you’re not interested in Facebook’s Very First World Problems, this is full of factoids like Facebook’s social graph store TAO, for example, provides access to tens of petabytes of data, but answers most queries by checking a single page in a single machine. (via Greg Linden)
Four short links: 29 October 2014

Four short links: 29 October 2014

Tweet Parsing, Focus and Money, Challenging Open Data Beliefs, and Exploring ISP Data

  1. TweetNLP — CMU open source natural language parsing tools for making sense of Tweets.
  2. Interview with Google X Life Science’s Head (Medium) — I will have been here two years this March. In nineteen months we have been able to hire more than a hundred scientists to work on this. We’ve been able to build customized labs and get the equipment to make nanoparticles and decorate them and functionalize them. We’ve been able to strike up collaborations with MIT and Stanford and Duke. We’ve been able to initiate protocols and partnerships with companies like Novartis. We’ve been able to initiate trials like the baseline trial. This would be a good decade somewhere else. The power of focus and money.
  3. Schooloscope Open Data Post-MortemThe case of Schooloscope and the wider question of public access to school data challenges the belief that sunlight is the best disinfectant, that government transparency would always lead to better government, better results. It challenges the sentiments that see data as value-neutral and its representation as devoid of politics. In fact, access to school data exposes a sharp contrast between the private interest of the family (best education for my child) and the public interest of the government (best education for all citizens).
  4. M-Lab Observatory — explorable data on the data experience (RTT, upload speed, etc) across different ISPs in different geographies over time.

Signals from Strata + Hadoop World New York 2014

From unique data applications to factories of the future, here are key insights from Strata + Hadoop World New York 2014.

Experts from across the data world came together in New York City for Strata + Hadoop World New York 2014. Below we’ve assembled notable keynotes, interviews, and insights from the event.

Unusual data applications and the correct way to say “Hadoop”

Hadoop creator and Cloudera chief architect Doug Cutting discusses surprising data applications — from dating sites to premature babies — and he reveals the proper (but in no way required) pronunciation of “Hadoop.”

Read more…


What happens when fashion meets data: The O’Reilly Radar Podcast

Liza Kindred on the evolving role of data in fashion and the growing relationship between tech and fashion companies.

Editor’s note: you can subscribe to the O’Reilly Radar Podcast through iTunes, SoundCloud, or directly through our podcast’s RSS feed.

In this podcast episode, I talk with Liza Kindred, founder of Third Wave Fashion and author of the new free report “Fashioning Data: How fashion industry leaders innovate with data and what you can learn from what they know.” Kindred addresses the evolving role data and analytics are playing in the fashion industry, and the emerging connections between technology and fashion companies. “One of the things that fashion is doing better than maybe any other industry,” Kindred says, “is facilitating conversations with users.”

Gathering and analyzing user data creates opportunities for the fashion and tech industries alike. One example of this is the trend toward customization. Read more…

Four short links: 15 September 2014

Four short links: 15 September 2014

Weird Machines, Libraries May Scan, Causal Effects, and Crappy Dashboards

  1. The Care and Feeding of Weird Machines Found in Executable Metadata (YouTube) — talk from 29th Chaos Communication Congress, on using tricking the ELF linker/loader into arbitrary computation from the metadata supplied. Yes, there’s a brainfuck compiler that turns code into metadata which is then, through a supernatural mix of pixies, steam engines, and binary, executed. This will make your brain leak. Weird machines are everywhere.
  2. European Libraries May Digitise Books Without Permission“The right of libraries to communicate, by dedicated terminals, the works they hold in their collections would risk being rendered largely meaningless, or indeed ineffective, if they did not have an ancillary right to digitize the works in question,” the court said. Even if the rights holder offers a library the possibility of licensing his works on appropriate terms, the library can use the exception to publish works on electronic terminals, the court ruled. “Otherwise, the library could not realize its core mission or promote the public interest in promoting research and private study,” it said.
  3. CausalImpact (GitHub) — Google’s R package for estimating the causal effect of a designed intervention on a time series. (via Google Open Source Blog)
  4. Laws of Crappy Dashboards — (caution, NSFW language … “crappy” is my paraphrase) so true. Not talking to users will result in a [crappy] dashboard. You don’t know if the dashboard is going to be useful. But you don’t talk to the users to figure it out. Or you just show it to them for a minute (with someone else’s data), never giving them a chance to figure out what the hell they could do with it if you gave it to them.
Comment: 1
Four short links: 3 September 2014

Four short links: 3 September 2014

Distributed Systems Theory, Chinese Manufacturing, Quantified Infant, and Celebrity Data Theft

  1. Distributed Systems Theory for the Distributed Systems EngineerI tried to come up with a list of what I consider the basic concepts that are applicable to my every-day job as a distributed systems engineer; what I consider ‘table stakes’ for distributed systems engineers competent enough to design a new system.
  2. Shenzhen Trip Report (Joi Ito) — full of fascinating observations about how the balance of manufacturing strength has shifted in surprising ways. The retail price of the cheapest full featured phone is about $9. Yes. $9. This could not be designed in the US – this could only be designed by engineers with tooling grease under their fingernails who knew the manufacturing equipment inside and out, as well as the state of the art of high-end mobile phones.
  3. SproutlingThe world’s first sensing, learning, predicting baby monitor. A wearable band for your baby, a smart charger and a mobile app work together to not only monitor more effectively but learn and predict your baby’s sleep habits and optimal sleep conditions. (via Wired)
  4. Notes on the Celebrity Data Theft — wonderfully detailed analysis of how photos were lifted, and the underground industry built around them. This was one of the most unsettling aspects of these networks to me – knowing there are people out there who are turning over data on friends in their social networks in exchange for getting a dump of their private data.

How Flash changes the design of database storage engines

High-performing memory throws many traditional decisions overboard


Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance ratio radically different from both RAM and disk drives. But databases have just started to catch up during the past few years. Most still depend on internal data structures and storage management fine-tuned for spinning disks.

Citing price and performance, one author advised a wide range of database vendors to move to Flash. Certainly, a database administrator can speed up old databases just by swapping out disk drives and inserting Flash, but doing so captures just a sliver of the potential performance improvement promised by Flash. For this article, I asked several database experts — including representatives of Aerospike, Cassandra, FoundationDB, RethinkDB, and Tokutek — how Flash changes the design of storage engines for databases. The various ways these companies have responded to its promise in their database designs are instructive to readers designing applications and looking for the best storage solutions.

Read more…

Comments: 2
Four short links: 5 August 2014

Four short links: 5 August 2014

Discussion Graph Tool, Superlinear Productivity, Go Concurrency, and R Map/Reduce Tools

  1. Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data.
  2. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small groups exhibit non-linear productivity increases by size, which drop off at larger sizes. we document a size effect in the strength and variability of the superlinear effect, with smaller groups exhibiting widely distributed superlinear exponents, some of them characterizing highly productive teams. In contrast, large groups tend to have a smaller superlinearity and less variability.
  3. coop — cheat sheet of the most common concurrency program flows in Go.
  4. Tessera — set of open source tools around Hadoop, R, and visualization.
Four short links: 4 August 2014

Four short links: 4 August 2014

Web Spreadsheet, Correlated Novelty, A/B Ethics, and Replicated Data Structures

  1. EtherCalcopen source web-based spreadsheet.
  2. Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya’s urn, predicts statistical laws for the rate at which novelties happen (Heaps’ law) and for the probability distribution on the space explored (Zipf’s law), as well as signatures of the process by which one novelty sets the stage for another. (via Steven Strogatz)
  3. On The Media Interview with OKCupid CEO — relevant to the debate on ethics of A/B tests. I preferred this to Tim Carmody’s rant.
  4. CRDTs as Alternative to APIswhen using CRDTs to tie your system together, you don’t need to resort to using impoverished representations that simply never come anywhere near the representational power of the data structures you use in your programs at runtime. See also this paper on Convergent and Commutative Replicated Data Types.

Health games platforms mature in preparation for mainstream adoption

Business models and sustainability will drive success in the health games space.


SPARX, a behavioral therapy game for youths,
combines a fantasy setting with skills for life.

For the past several years, researchers have strived to create compelling games that improve behavior, reduce stress, or teach healthy responses to difficult life situations. Such healthy games tend to arise in research settings because of the need to demonstrate clinically that the games are effective. I have covered such efforts in my postings from the Games for Health conference in 2012 and 2013.

These efforts have born fruit, and clinical trials have shown the value of many such games. Ben Sawyer, who founded the Games for Health conference more than 10 years ago, is watching all the pieces fall into place for the widespread adoption of games. Business plans, platforms, and the general environment for the acceptance of games (and other health-related apps) are coming together.

Read more…