- Inside Data Brokers — very readable explanation of the data brokers and how their information is used to track advertising effectiveness.
- Elon, I Want My Data! — Telsa don’t give you access to the data that your cars collects. Bodes poorly for the Internet of Sealed Boxes. (via BoingBoing)
- Pattern Classification (Github) — collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks.
- HOGWILD! (PDF) — the algorithm that Microsoft credit with the success of their Adam deep learning system.
Making the case for blended architectures in the rapidly evolving universe of advanced analytics.
Two years ago, most of the conversations around big data had a futuristic, theoretical vibe. That vibe has been replaced with a gritty sense of practically. Today, when big data or some surrogate term arises in conversation, the talk is likely to focus not on “what if,” but on “how do we get it done?” and “what will it cost?”
Real-time big data analytics and the increasing need for applications capable of handling mixed read/write workloads — as well as transactions and analytics on “hot” data — are putting new pressures on traditional data management architectures.
What’s driving the need for change? There are several factors, including a new class of apps for personalizing the Internet, serving dynamic content, and creating rich user experiences. These apps are data driven, which means they essentially feed on deep data analytics. You’ll need a steady supply of activity history, insights, and transactions, plus the ability to combine historical analytics with hot analytics and read/write transactions. Read more…
Data from the Internet of Things makes an integrated data strategy vital.
The Internet of Things (IoT) is more than a network of smart toasters, refrigerators, and thermostats. For the moment, though, domestic appliances are the most visible aspect of the IoT. But they represent merely the tip of a very large and mostly invisible iceberg.
IDC predicts by the end of 2020, the IoT will encompass 212 billion “things,” including hardware we tend not to think about: compressors, pumps, generators, turbines, blowers, rotary kilns, oil-drilling equipment, conveyer belts, diesel locomotives, and medical imaging scanners, to name a few. Sensors embedded in such machines and devices use the IoT to transmit data on such metrics as vibration, temperature, humidity, wind speed, location, fuel consumption, radiation levels, and hundreds of other variables. Read more…
More visible at Health Privacy Summit than Health Datapalooza.
On the first morning of the biggest conference on data in health care–the Health Datapalooza in Washington, DC–newspapers reported a bill allowing the Department of Veterans Affairs to outsource more of its care, sending veterans to private health care providers to relieve its burdensome shortage of doctors.
There has been extensive talk about the scandals at the VA and remedies for them, including the political and financial ramifications of partial privatization. Republicans have suggested it for some time, but for the solution to be picked up by socialist Independent Senator Bernie Sanders clinches the matter. What no one has pointed out yet, however–and what makes this development relevant to the Datapalooza–is that such a reform will make the free flow of patient information between providers more crucial than ever.
Bio-IT World shows what is possible and what is being accomplished
If your data consists of one million samples, but only 100 have the characteristics you’re looking for, and if each of the million samples contains 250,000 attributes, each of which is built of thousands of basic elements, you have a big data problem. This is kind of challenge faced by the 2,700 Bio-IT World attendees, who discover genetic interactions and create drugs for the rest of us.
Often they are looking for rare (orphan) diseases, or for cohorts who share a rare combination of genetic factors that require a unique treatment. The data sets get huge, particularly when the researchers start studying proteomics (the proteins active in the patients’ bodies).
So last week I took the subway downtown and crossed the two wind- and rain-whipped bridges that the city of Boston built to connect to the World Trade Center. I mingled for a day with attendees and exhibitors to find what data-related challenges they’re facing and what the latest solutions are. Here are some of the major themes I turned up.
A Knowledge Currency Exchange for health and wellness
This article was written together with Mike Kellen, Director of Technology at Sage Bionetworks, and Christine Suver, Senior Scientist at Sage Bionetworks.
The current push towards patient engagement, when clinical researchers trace the outcomes of using pharmaceuticals or other treatments, is a crucial first step towards rewiring the medical-industrial complex with the citizen at the center. For far too long, clinicians, investigators, the government, and private funders have been the key decision makers. The citizen has been at best a research “subject,”and far too often simply a resource from which data and samples can be extracted. The average participant in clinical study never receives the outcomes of the study, never has contact with those analyzing the data, never knows where her samples flow over time (witness the famous story of Henrietta Lacks), and until the past year didn’t even have access to the published research without paying a hefty rental fee.
This is changing. The recent grants by the Patient-Centered Outcomes Research Institute (PCORI) are the most visible evidence of change, but throughout the medical system one finds green shoots of direct patient engagement. Read more…