- Let’s Pool Our Medical Data (TED) — John Wilbanks (of Science Commons fame) gives a strong talk for creating an open, massive, mine-able database of data about health and genomics from many sources. Money quote: Facebook would never make a change to something as important as an advertising with a sample size as small as a Phase 3 clinical trial.
- Verizon Sells App Use, Browsing Habits, Location (CNet) — Verizon Wireless has begun selling information about its customers’ geographical locations, app usage, and Web browsing activities, a move that raises privacy questions and could brush up against federal wiretapping law. To Verizon, even when you do pay for it, you’re still the product. Carriers: they’re like graverobbing organ harvesters but without the strict ethical standards.
- IBM Watson About to Launch in Medicine (Fast Company) — This fall, after six months of teaching their treatment guidelines to Watson, the doctors at Sloan-Kettering will begin testing the IBM machine on real patients. [...] On the screen, a colorful globe spins. In a few seconds, Watson offers three possible courses of chemotherapy, charted as bars with varying levels of confidence–one choice above 90% and two above 80%. “Watson doesn’t give you the answer,” Kris says. “It gives you a range of answers.” Then it’s up to [the doctor] to make the call. (via Reddit)
- Robot Kills Weeds With 98% Accuracy — During tests, this automated system gathered over a million images as it moved through the fields. Its Computer Vision System was able to detect and segment individual plants – even those that were touching each other – with 98% accuracy.
Celebrating Data Privacy Day, how data fits into Bill Gates' education plan, and why "long data" deserves our attention.
Data Privacy Day and the fight against “digital feudalism”
Data Privacy Day was celebrated this week. Led by the National Cyber Security Alliance, the day is meant to increase awareness of personal data protection and “to empower people to protect their privacy and control their digital footprint and escalate the protection of privacy and data as everyone’s priority,” according to the website.
Many companies used the day as an opportunity to issue transparency reports, re-informing users and customers about how their data is used and and how it’s protected. Google added a new section to its transparency report, a Q&A on how the company handles personal user data requests from government agencies and courts.
How Amazon Web Services and Rackspace measure up; IBM's Watson goes to school; Google researches data; and what will we call really, really big data?
Here are a few stories from the data space that caught my attention this week.
Rackspace vs Amazon
As Rackspace continues to ramp up its services to compete with Amazon Web Services (AWS) — this week, announcing a partnership with Hortonworks to develop a cloud-based enterprise-ready Hadoop platform to compete against Amazon’s Elastic MapReduce — Derrick Harris at GigaOm compared apples to apples.
John Engates, CTO of Rackspace, told Harris the most fundamental difference between the two services is the level of control given to the customer. Harris writes that Rackspace’s new Hadoop services aims to give the customer “granular control over how their systems are configured and how their jobs run,” providing “the experience of owning a Hadoop cluster without actually owning any of the hardware.” Engates pointed out, “It’s not MapReduce as a service; it’s more Hadoop as a service.”
Harris also points out that Rackspace is considering making moves into NoSQL and looks at AWS’ DynamoDB service. He notes that Amazon and Rackspace aren’t the only players on any of these fields, pointing to the likes of Microsoft’s HDInsight, IBM’s BigInsights, Qubole, Infochimps, MongoDB, Cassandra and CouchDB-based services.
In related news, Rackspace announced its new Cloud Networks feature this week that allows customers to design their own networks on Rackspace’s Cloud Servers. In an interview with Jack McCarthy at CRN, Engates explained the background:
“When we went from dedicated physical networks to our public cloud, we lost the ability to segment these networks. We used to have a vLAN. As we moved to OpenStack, we wanted to give our customers the ability to enable segmented networks in the cloud. Cloud Networks gives customers a degree of control over how they build networks in the cloud, whether it’s building networks application servers or for Web servers or databases.”
Engates also points out the networks are software-defined, “so customers can program their network on the fly.” You can read more about the new feature on the Rackspace blog.
Medical Data Commons, Verizon Sell You, Doctor Watson, and Weedkilling Drones
Alasdair Allan on how machine learning is taking over the mainstream.
From Goodreads to Google to Orbitz, machine learning is slowly becoming part of everyday life. Alasdair Allan discusses current uses and how machine learning factors into his own robotic telescope network.
Watson opens the door to conversations, not just answers.
Now that we can build machines that can answer tough and ambiguous questions, the next step is to realize that the answer to a question isn't the end of the process.
Jeopardy was fun, but Watson's practical applications are what's really interesting.
Aside from whipping the pants off two Jeapardy geniuses, the Watson computer is opening the door to new monetization possibilities for search.
The real value of the Watson supercomputer will come from what it inspires.
While IBM's Watson supercomputer / Jeopardy contestant is a masterpiece of natural language processing, it's important to remember that it's just a learning tool that will help us solve more interesting problems.