- Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data.
- Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small groups exhibit non-linear productivity increases by size, which drop off at larger sizes. we document a size effect in the strength and variability of the superlinear effect, with smaller groups exhibiting widely distributed superlinear exponents, some of them characterizing highly productive teams. In contrast, large groups tend to have a smaller superlinearity and less variability.
- coop — cheat sheet of the most common concurrency program flows in Go.
- Tessera — set of open source tools around Hadoop, R, and visualization.
ENTRIES TAGGED "data"
High-performing memory throws many traditional decisions overboard
Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance ratio radically different from both RAM and disk drives. But databases have just started to catch up during the past few years. Most still depend on internal data structures and storage management fine-tuned for spinning disks.
Citing price and performance, one author advised a wide range of database vendors to move to Flash. Certainly, a database administrator can speed up old databases just by swapping out disk drives and inserting Flash, but doing so captures just a sliver of the potential performance improvement promised by Flash. For this article, I asked several database experts — including representatives of Aerospike, Cassandra, FoundationDB, RethinkDB, and Tokutek — how Flash changes the design of storage engines for databases. The various ways these companies have responded to its promise in their database designs are instructive to readers designing applications and looking for the best storage solutions.
Business models and sustainability will drive success in the health games space.
These efforts have born fruit, and clinical trials have shown the value of many such games. Ben Sawyer, who founded the Games for Health conference more than 10 years ago, is watching all the pieces fall into place for the widespread adoption of games. Business plans, platforms, and the general environment for the acceptance of games (and other health-related apps) are coming together.
Making the case for blended architectures in the rapidly evolving universe of advanced analytics.
Two years ago, most of the conversations around big data had a futuristic, theoretical vibe. That vibe has been replaced with a gritty sense of practically. Today, when big data or some surrogate term arises in conversation, the talk is likely to focus not on “what if,” but on “how do we get it done?” and “what will it cost?”
Real-time big data analytics and the increasing need for applications capable of handling mixed read/write workloads — as well as transactions and analytics on “hot” data — are putting new pressures on traditional data management architectures.
What’s driving the need for change? There are several factors, including a new class of apps for personalizing the Internet, serving dynamic content, and creating rich user experiences. These apps are data driven, which means they essentially feed on deep data analytics. You’ll need a steady supply of activity history, insights, and transactions, plus the ability to combine historical analytics with hot analytics and read/write transactions. Read more…
Data from the Internet of Things makes an integrated data strategy vital.
The Internet of Things (IoT) is more than a network of smart toasters, refrigerators, and thermostats. For the moment, though, domestic appliances are the most visible aspect of the IoT. But they represent merely the tip of a very large and mostly invisible iceberg.
IDC predicts by the end of 2020, the IoT will encompass 212 billion “things,” including hardware we tend not to think about: compressors, pumps, generators, turbines, blowers, rotary kilns, oil-drilling equipment, conveyer belts, diesel locomotives, and medical imaging scanners, to name a few. Sensors embedded in such machines and devices use the IoT to transmit data on such metrics as vibration, temperature, humidity, wind speed, location, fuel consumption, radiation levels, and hundreds of other variables. Read more…
More visible at Health Privacy Summit than Health Datapalooza.
On the first morning of the biggest conference on data in health care–the Health Datapalooza in Washington, DC–newspapers reported a bill allowing the Department of Veterans Affairs to outsource more of its care, sending veterans to private health care providers to relieve its burdensome shortage of doctors.
There has been extensive talk about the scandals at the VA and remedies for them, including the political and financial ramifications of partial privatization. Republicans have suggested it for some time, but for the solution to be picked up by socialist Independent Senator Bernie Sanders clinches the matter. What no one has pointed out yet, however–and what makes this development relevant to the Datapalooza–is that such a reform will make the free flow of patient information between providers more crucial than ever.