- Librarybox 2.0 — fork of PirateBox for the TP-Link MR 3020, customized for educational, library, and other needs. Wifi hotspot with free and anonymous file sharing. v2 adds mesh networking and more. (via BoingBoing)
- Chicago PD’s Using Big Data to Justify Racial Profiling (Cory Doctorow) — The CPD refuses to share the names of the people on its secret watchlist, nor will it disclose the algorithm that put it there. […] Asserting that you’re doing science but you can’t explain how you’re doing it is a nonsense on its face. Spot on.
- Cloudwash (BERG) — very good mockup of how and why your washing machine might be connected to the net and bound to your mobile phone. No face on it, though. They’re losing their touch.
- What’s Left of Nokia to Bet on Internet of Things (MIT Technology Review) — With the devices division gone, the Advanced Technologies business will cut licensing deals and perform advanced R&D with partners, with around 600 people around the globe, mainly in Silicon Valley and Finland. Hopefully will not devolve into being a patent troll. […] “We are now talking about the idea of a programmable world. […] If you believe in such a vision, as I do, then a lot of our technological assets will help in the future evolution of this world: global connectivity, our expertise in radio connectivity, materials, imaging and sensing technologies.”
- SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
- madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
- Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
- Disguise Detection — using Raspberry Pi, Arduino, and Python.
Tutorials for designers, data scientists, data engineers, and managers
As the Program Development Director for Strata Santa Clara 2014, I am pleased to announce that the tutorial session descriptions are now live. We’re pleased to offer several day-long immersions including the popular Data Driven Business Day and Hardcore Data Science tracks. We curated these topics as we wanted to appeal to a broad range of attendees including business users and managers, designers, data analysts/scientists, and data engineers. In the coming months we’ll have a series of guest posts from many of the instructors and communities behind the tutorials.
Analytics for Business Users
We’re offering a series of data intensive tutorials for non-programmers. John Foreman will use spreadsheets to demonstrate how data science techniques work step-by-step – a topic that should appeal to those tasked with advanced business analysis. Grammar of Graphics author, SYSTAT creator, and noted Statistician Leland Wilkinson, will teach an introductory course on analytics using an innovative expert system he helped build.
Data Science essentials
Scalding – a Scala API for Cascading – is one of the most popular open source projects in the Hadoop ecosystem. Vitaly Gordon will lead a hands-on tutorial on how to use Scalding to put together effective data processing workflows. Data analysts have long lamented the amount of time they spend on data wrangling. But what if you had access to tools and best practices that would make data wrangling less tedious? That’s exactly the tutorial that distinguished Professors and Trifacta co-founders, Joe Hellerstein and Jeff Heer, are offering.
The co-founders of Datascope Analytics are offering a glimpse into how they help clients identify the appropriate problem or opportunity to focus on by using design thinking (see the recent Datascope/IDEO post on Design Thinking and Data Science). We’re also happy to reprise the popular (Strata Santa Clara 2013) d3.js tutorial by Scott Murray.
Archimedes advances evidence-based medicine to foster model-based medicine
This posting is by guest author Tuan Dinh, who will speak about this topic at the Strata Rx conference.
Legendary Silicon Valley investor Vinod Khosla caused quite a stir last year when he predicted at Strata Rx that “Dr. Algorithm”–artificial intelligence driven by large data sets and computational power–would replace doctors in the not-too-distant future. At that point, he said, technology will be cheaper, more accurate and objective, and will ultimately do a better job than the average human doctor at delivering routine diagnoses with standard treatments.
I not only support Khosla’s provocative prophecy, I’ll add one of my own: that Dr. Algorithm (aka Dr. A) will “come to life” in three to five years, by the time today’s first-year med school students are pulling 30-hour shifts as new interns. But what will it take to build the brain of Dr. A? And how can we teach Dr. A to account for increasingly complex medical inputs, such as laboratory tests results, genomic/genetic information, family and personal history, co-morbidities and patient preferences, so he can make optimal clinical decisions for living, breathing patients?
Evolution from a research tool to a platform for patient engagement
Bruce Springer of OneHealth will speak about this topic at the Strata Rx conference. This article was written by Patrick Bane of OneHealth in coordination with Bruce Springer.
According to a recent study performed by the Jesse Brown VA Medical Center and University of Illinois at Chicago, patient-centered care has demonstrated positive outcomes on patients’ health, patients’ self-report of health, and reduced healthcare utilization. The study’s results are consistent with previous research that the patient-centered care model improves the quality of care while simultaneously lowering the cost of care.
OneHealth’s behavior change platform extends the patient-centered model by connecting members anytime, anywhere through mobile and web applications. Member generate data in their daily lives, outside of a clinical setting, which creates a much richer dataset of behaviors that are required to understand the patients’ condition(s), and their readiness to change. Members freely choose what to do and their choices actively generate data in five classes of information:
A video interview with Colin Hill
Last month, Strata Rx Program Chair Colin Hill, of GNS Healthcare, sat down with Dr. Dennis Ausiello, Jackson Professor of Clinical Medicine at the Harvard Medical School, Co-Director at CATCH, Pfizer Board of Directors Member, and Former Chief of Medicine at the Massachusetts General Hospital (MGH), for a fireside chat at a private reception hosted by GNS. Their insightful conversation covered a range of topics that all touched on or intersected with the need to create smaller and more precise cohorts, as well as the need to focus on phenotypic data as much as we do on genotypic data.
The full video appears below.
A tool for outreach to patients produces unexpected benefits
The traditional, office-based model for health care is episodic. The provider-patient relationship exists almost completely within the walls of the exam room, with little or no follow-up between visits. Data is primarily episodic as well, based on blood pressure reading done at a specific time or surveys administered there and then, with little collected out of the office. And even the existing data collection tools—paper diaries or clunky meters—are focused more on storing data that on connecting the patient and provider through that data in real time.
There is no way to get in touch when, for instance, a patient’s blood sugar starts varying wildly or pain levels change. The provider often depends on the patient reaching out to them. And even when a provider does put into place an outreach protocol, it is usually very crude, based on a general approach to managing a population as opposed to an understanding of a patient. The end result is a system that, while doing its best within a difficult setting, is by default reactive instead of proactive.
Business analytics projects: Using decisions as a basis to prioritize and identify requirements
Most normal people don’t look at data sets just for fun. They study views of the data to make decisions about what to do, be it a decision to take some specific action or a decision to do nothing at all. The main purpose of business analytics projects is to develop systems that turn large and often highly complex data sets into meaningful information from which decisions can be made.
The decisions that people make using business analytics systems can be strategic, operational, or tactical. For example, an executive might look at his sales team’s global performance dashboard to decide who to promote (tactical), which products need different marketing strategies (operational), or which products to target by markets (strategic). Generally speaking, all software systems that include an analytics component should enable users to make decisions that improve organizational performance in some dimension.