- Inside Data Brokers — very readable explanation of the data brokers and how their information is used to track advertising effectiveness.
- Elon, I Want My Data! — Telsa don’t give you access to the data that your cars collects. Bodes poorly for the Internet of Sealed Boxes. (via BoingBoing)
- Pattern Classification (Github) — collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks.
- HOGWILD! (PDF) — the algorithm that Microsoft credit with the success of their Adam deep learning system.
ENTRIES TAGGED "data"
Data Brokers, Car Data, Pattern Classification, and Hogwild Deep Learning
Curated Code, Hackable Browser, IoT Should Be Open, and Better Treemaps
- Awesome Awesomeness — list of curated collections of frameworks and libraries in various languages that do not suck. They solve the problem of “so, I’m new to (language) and don’t want to kiss a lot of frogs before I find the right tool for a particular task”.
- Breach — a hackable, modular web browser.
- The CompuServe of Things (Phil Windley) — How we build the Internet of Things has far-reaching consequences for the humans who will use—or be used by—it. Will we push forward, connecting things using forests of silos that are reminiscent the online services of the 1980′s, or will we learn the lessons of the Internet and build a true Internet of Things? (via Cory Doctorow)
Developer Inequality, Weak Signals, Geek Feminism Wiki, and Reidentification Risks
- Developer Inequality (Jonathan Edwards) — The bigger injustice is that programming has become an elite: a vocation requiring rare talents, grueling training, and total dedication. The way things are today if you want to be a programmer you had best be someone like me on the autism spectrum who has spent their entire life mastering vast realms of arcane knowledge — and enjoys it. Normal humans are effectively excluded from developing software. (via Slashdot)
- Signals From Foo Camp (O’Reilly Radar) — useful for me (aka “the stuff I didn’t get to see”), hopefully useful to you too. Companies outside of Silicon Valley badly want to understand it and want to find ways to truly collaborate with it, but they’re worried that conversations can turn into competition. “Old industry” has incredible expertise and operates in very complex environments, and it has much to teach tech, if tech will listen. Silicon Valley isn’t an IT department for the world, it’s the competition.
- Feminist Point of View: Lessons from Running the Geek Feminism Wiki — deck from Alex’s OS Bridge session. Today’s awareness and actions around sexism in tech resulted from their actions, sometimes directly, sometimes indirectly.
- Big Data Should Not Be a Faith-Based Initiative (Cory Doctorow) — Re-identification is part of the Big Data revolution: among the new meanings we are learning to extract from huge corpuses of data is the identity of the people in that dataset. And since we’re commodifying and sharing these huge datasets, they will still be around in ten, twenty and fifty years, when those same Big Data advancements open up new ways of re-identifying — and harming — their subjects.
Making the case for blended architectures in the rapidly evolving universe of advanced analytics.
Two years ago, most of the conversations around big data had a futuristic, theoretical vibe. That vibe has been replaced with a gritty sense of practically. Today, when big data or some surrogate term arises in conversation, the talk is likely to focus not on “what if,” but on “how do we get it done?” and “what will it cost?”
Real-time big data analytics and the increasing need for applications capable of handling mixed read/write workloads — as well as transactions and analytics on “hot” data — are putting new pressures on traditional data management architectures.
What’s driving the need for change? There are several factors, including a new class of apps for personalizing the Internet, serving dynamic content, and creating rich user experiences. These apps are data driven, which means they essentially feed on deep data analytics. You’ll need a steady supply of activity history, insights, and transactions, plus the ability to combine historical analytics with hot analytics and read/write transactions. Read more…
Data from the Internet of Things makes an integrated data strategy vital.
The Internet of Things (IoT) is more than a network of smart toasters, refrigerators, and thermostats. For the moment, though, domestic appliances are the most visible aspect of the IoT. But they represent merely the tip of a very large and mostly invisible iceberg.
IDC predicts by the end of 2020, the IoT will encompass 212 billion “things,” including hardware we tend not to think about: compressors, pumps, generators, turbines, blowers, rotary kilns, oil-drilling equipment, conveyer belts, diesel locomotives, and medical imaging scanners, to name a few. Sensors embedded in such machines and devices use the IoT to transmit data on such metrics as vibration, temperature, humidity, wind speed, location, fuel consumption, radiation levels, and hundreds of other variables. Read more…
More visible at Health Privacy Summit than Health Datapalooza.
On the first morning of the biggest conference on data in health care–the Health Datapalooza in Washington, DC–newspapers reported a bill allowing the Department of Veterans Affairs to outsource more of its care, sending veterans to private health care providers to relieve its burdensome shortage of doctors.
There has been extensive talk about the scandals at the VA and remedies for them, including the political and financial ramifications of partial privatization. Republicans have suggested it for some time, but for the solution to be picked up by socialist Independent Senator Bernie Sanders clinches the matter. What no one has pointed out yet, however–and what makes this development relevant to the Datapalooza–is that such a reform will make the free flow of patient information between providers more crucial than ever.
Modern Software Development, Internet Trends, Software Ethics, and Open Government Data
- Beyond the Stack (Mike Loukides) — tools and processes to support software developers who are as massively distributed as the code they build.
- Mary Meeker’s Internet Trends 2014 (PDF) — the changes on slide 34 are interesting: usage moving away from G+/Facebook-style omniblather creepware and towards phonebook-based chat apps.
- Introduction to Software Engineering Ethics (PDF) — amazing set of provocative questions and scenarios for software engineers about the decisions they made and consequences of their actions. From a course in ethics from SCU.
- Open Government Data Online: Impenetrable (Guardian) — Too much knowledge gets trapped in multi-page pdf files that are slow to download (especially in low-bandwidth areas), costly to print, and unavailable for computer analysis until someone manually or automatically extracts the raw data.
Bio-IT World shows what is possible and what is being accomplished
If your data consists of one million samples, but only 100 have the characteristics you’re looking for, and if each of the million samples contains 250,000 attributes, each of which is built of thousands of basic elements, you have a big data problem. This is kind of challenge faced by the 2,700 Bio-IT World attendees, who discover genetic interactions and create drugs for the rest of us.
Often they are looking for rare (orphan) diseases, or for cohorts who share a rare combination of genetic factors that require a unique treatment. The data sets get huge, particularly when the researchers start studying proteomics (the proteins active in the patients’ bodies).
So last week I took the subway downtown and crossed the two wind- and rain-whipped bridges that the city of Boston built to connect to the World Trade Center. I mingled for a day with attendees and exhibitors to find what data-related challenges they’re facing and what the latest solutions are. Here are some of the major themes I turned up.
A Knowledge Currency Exchange for health and wellness
This article was written together with Mike Kellen, Director of Technology at Sage Bionetworks, and Christine Suver, Senior Scientist at Sage Bionetworks.
The current push towards patient engagement, when clinical researchers trace the outcomes of using pharmaceuticals or other treatments, is a crucial first step towards rewiring the medical-industrial complex with the citizen at the center. For far too long, clinicians, investigators, the government, and private funders have been the key decision makers. The citizen has been at best a research “subject,”and far too often simply a resource from which data and samples can be extracted. The average participant in clinical study never receives the outcomes of the study, never has contact with those analyzing the data, never knows where her samples flow over time (witness the famous story of Henrietta Lacks), and until the past year didn’t even have access to the published research without paying a hefty rental fee.
This is changing. The recent grants by the Patient-Centered Outcomes Research Institute (PCORI) are the most visible evidence of change, but throughout the medical system one finds green shoots of direct patient engagement. Read more…
Retail Student Data, Hacking Hospitals, Testing APIs, and Becoming Superhuman
- UK Government to Sell Its Students’ Data (Wired UK) — The National Pupil Database (NPD) contains detailed information about pupils in schools and colleges in England, including test and exam results, progression at each key stage, gender, ethnicity, pupil absence and exclusions, special educational needs, first language. The UK is becoming patient zero for national data self-harm.
- It’s Insanely Easy to Hack Hospital Equipment (Wired) — Erven won’t identify specific product brands that are vulnerable because he’s still trying to get some of the problems fixed. But he said a wide cross-section of devices shared a handful of common security holes, including lack of authentication to access or manipulate the equipment; weak passwords or default and hardcoded vendor passwords like “admin” or “1234″; and embedded web servers and administrative interfaces that make it easy to identify and manipulate devices once an attacker finds them on a network.
- Postman — API testing tool.
- App Controlled Hearing Aid Improves Even Normal Hearing (NYTimes) — It’s only a slight exaggeration to say that the latest crop of advanced hearing aids are better than the ears most of us were born with. Human augmentation with software and hardware.