"open data" entries

Four short links: 29 October 2014

Tweet Parsing, Focus and Money, Challenging Open Data Beliefs, and Exploring ISP Data

by Nat Torkington | @gnat | +Nat Torkington | October 29, 2014

TweetNLP — CMU open source natural language parsing tools for making sense of Tweets.
Interview with Google X Life Science’s Head (Medium) — I will have been here two years this March. In nineteen months we have been able to hire more than a hundred scientists to work on this. We’ve been able to build customized labs and get the equipment to make nanoparticles and decorate them and functionalize them. We’ve been able to strike up collaborations with MIT and Stanford and Duke. We’ve been able to initiate protocols and partnerships with companies like Novartis. We’ve been able to initiate trials like the baseline trial. This would be a good decade somewhere else. The power of focus and money.
Schooloscope Open Data Post-Mortem — The case of Schooloscope and the wider question of public access to school data challenges the belief that sunlight is the best disinfectant, that government transparency would always lead to better government, better results. It challenges the sentiments that see data as value-neutral and its representation as devoid of politics. In fact, access to school data exposes a sharp contrast between the private interest of the family (best education for my child) and the public interest of the government (best education for all citizens).
M-Lab Observatory — explorable data on the data experience (RTT, upload speed, etc) across different ISPs in different geographies over time.

Four short links: 24 October 2014

Parallel Algorithm, Open Source Bio, 3D Printed Peptides, and Open London Data

by Nat Torkington | @gnat | +Nat Torkington | October 24, 2014

PaGMO — Parallel Global Multiobjective Optimizer […] a generalization of the island model paradigm working for global and local optimization algorithms. Its main parallelization approach makes use of multiple threads, but MPI is also implemented and can be mixed in with multithreading. PaGMO can be used to solve in a parallel fashion, global optimization tasks.
Avoiding the Tragedy of the Anticommons — Many people talk about “open source biology.” Mike Loukides pulls apart open source and biology to see what the relationship might be. I’m still chewing on what devops for bio would be. Modern software systems throw off gigabytes of data, and we have built tools to monitor those systems, archive their data, and automate much of the analysis. There are free and commercial packages for logging and monitoring, and it continues to be a very active area of software development, as anyone who’s attended O’Reilly’s Velocity conference knows.
peppytides (Makezine) — 3d-printed super accurate, scaled 3D-model of a polypeptide chain that can be folded into all the basic protein structures, like α-helices, β-sheets, and β-turns. (via Lenore Edman)
London Data Store — dashboard and open data catalogue for City of London’s data release efforts.

Open data for open lands

Recreation.gov should be a platform, not a silo.

by Tim O'Reilly | @timoreilly | +Tim O'Reilly | October 20, 2014

President Obama’s well-publicized national open data policy (pdf) makes it clear that government data is a valuable public resource for which the government should be making efforts to maximize access and use. This policy was based on lessons from previous government open data success stories, such as weather data and GPS, which form the basis for countless commercial services that we take for granted today and that deliver enormous value to society. (You can see an impressive list of companies reliant on open government data via GovLab’s Open Data 500 project.)

Based on this open data policy, I’ve been encouraging entrepreneurs to invest their time and ingenuity to explore entrepreneurial opportunities based on government data. I’ve even invested (through O’Reilly AlphaTech Ventures) in one such start-up, Hipcamp, which provides user-friendly interfaces to making reservations at national and state parks.

A better system is sorely needed. The current reservation system is clunky and difficult to use. Hipcamp changes all that, making it a breeze to reserve camping spots. Read more…

Four short links: 21 August 2014

Open Data Glue, Smithsonian Crowdsourcing, MIT Family Creativity, and Hardware Owie

by Nat Torkington | @gnat | +Nat Torkington | August 21, 2014

Dat — an open source project that provides a streaming interface between every file format and data storage backend. See the Wired piece on it.
Smithsonian Crowdsourcing Transcription (Smithsonian) — 49 volunteers transcribed 200 pages of correspondence between the Monuments Men in a week. Soon it’ll be mathematics test questions: “if 49 people transcribe 200 pages in 7 days, how many weeks will it take …”
MIT Guide to Family CompSci Sessions — This guide is for educators, community center staff, and volunteers interested in engaging their young people and their families to become designers and inventors in their community.
What to Do When You Screw up 2,000 Orders (SparkFun) — even hardware companies need to do retrospectives.

Four short links: 3 June 2014

Machine Learning Mistakes, Recommendation Bandits, Droplet Robots, and Plain English

by Nat Torkington | @gnat | +Nat Torkington | June 3, 2014

Machine Learning Done Wrong — [M]ost practitioners pick the modeling algorithm they are most familiar with rather than pick the one which best suits the data. In this post, I would like to share some common mistakes (the don’t-s).
Bandits for Recommendations — A common problem for internet-based companies is: which piece of content should we display? Google has this problem (which ad to show), Facebook has this problem (which friend’s post to show), and RichRelevance has this problem (which product recommendation to show). Many of the promising solutions come from the study of the multi-armed bandit problem.
Droplets — the Droplet is almost spherical, can self-right after being poured out of a bucket, and has the hardware capabilities to organize into complex shapes with its neighbors due to accurate range and bearing. Droplets are available open-source and use cheap vibration motors and a 3D printed shell. (via Robohub)
Apple’s App Store Approval Guidelines — some of the plainest English I’ve seen, especially the Introduction. I can only aspire to that clarity. If your App looks like it was cobbled together in a few days, or you’re trying to get your first practice App into the store to impress your friends, please brace yourself for rejection. We have lots of serious developers who don’t want their quality Apps to be surrounded by amateur hour.

Four short links: 30 May 2014

Video Transparency, Software Traffic, Distributed Database, and Open Source Sustainability

by Nat Torkington | @gnat | +Nat Torkington | May 30, 2014

Video Quality Report — transparency is a great way to indirectly exert leverage.
Control Your Traffic Flows with Software — using BGP to balance traffic. Will be interesting to see how the more extreme traffic managers deploy SDN in the data center.
Cockroach — a distributed key/value datastore which supports ACID transactional semantics and versioned values as first-class features. The primary design goal is global consistency and survivability, hence the name. Cockroach aims to tolerate disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention. Cockroach nodes are symmetric; a design goal is one binary with minimal configuration and no required auxiliary services.
Linux Foundation Providing for Core Infrastructure Projects — press release, but interested in how they’re tackling sustainability—they’re taking on identifying worthies (glad I’m not the one who says “you’re not worthy” to a project) and being the non-profit conduit for the dosh. Interesting: implies they think the reason companies weren’t supporting necessary open source projects was some combination of being unsure who to support (projects you use, surely?) and how to get them money (ask?). (Sustainability of open source projects is a pet interest of mine)

Four short links: 29 May 2014

Modern Software Development, Internet Trends, Software Ethics, and Open Government Data

by Nat Torkington | @gnat | +Nat Torkington | May 29, 2014

Beyond the Stack (Mike Loukides) — tools and processes to support software developers who are as massively distributed as the code they build.
Mary Meeker’s Internet Trends 2014 (PDF) — the changes on slide 34 are interesting: usage moving away from G+/Facebook-style omniblather creepware and towards phonebook-based chat apps.
Introduction to Software Engineering Ethics (PDF) — amazing set of provocative questions and scenarios for software engineers about the decisions they made and consequences of their actions. From a course in ethics from SCU.
Open Government Data Online: Impenetrable (Guardian) — Too much knowledge gets trapped in multi-page pdf files that are slow to download (especially in low-bandwidth areas), costly to print, and unavailable for computer analysis until someone manually or automatically extracts the raw data.

Four short links: 3 April 2014

Github for Data, Open Laptop, Crowdsourced Analysis, and Open Source Scraping

by Nat Torkington | @gnat | +Nat Torkington | April 3, 2014

dat — github-like tool for data, still v. early. It’s overdue. (via Nelson Minar)
Novena Open Laptop — Bunnie Huang’s laptop goes on sale.
Crowd Forecasting (NPR) — How is it possible that a group of average citizens doing Google searches in their suburban town homes can outpredict members of the United States intelligence community with access to classified information?
Portia — open source visual web scraping tool.

Four short links: 1 April 2014

Unimaginative Vehicular Connectivity, Data Journalism, VR and Gender, and Open Data Justice

by Nat Torkington | @gnat | +Nat Torkington | April 1, 2014

Connected for a Purpose (Jim Stogdill) — At a recent conference, an executive at a major auto manufacturer described his company’s efforts to digitize their line-up like this: “We’re basically wrapping a two-ton car around an iPad. Eloquent critique of the Internet of Shallow Things.
Why Nate Silver Can’t Explain It All — Data extrapolation is a very impressive trick when performed with skill and grace, like ice sculpting or analytical philosophy, but it doesn’t come equipped with the humility we should demand from our writers. Would be a shame for Nate Silver to become Malcolm Gladwell: nice stories but they don’t really hold up.
Gender and VR (danah boyd) — Although there was variability across the board, biological men were significantly more likely to prioritize motion parallax. Biological women relied more heavily on shape-from-shading. In other words, men are more likely to use the cues that 3D virtual reality systems relied on. Great article, especially notable for there are more sex hormones on the retina than in anywhere else in the body except for the gonads.
Even The Innocent Should Worry About Sex Offender Apps (Quartz) — And when data becomes compressed by third parties, when it gets flattened out into one single data stream, your present and your past collide with potentially huge ramifications for your future. When it comes to personal data—of any kind—we not only need to consider what it will be used for but how that data will be represented, and what such representation might mean for us and others. Data policies are like justice systems: either you suffer a few innocent people being wrongly condemned (bad uses of open data0, or your system permits some wrongdoers to escape (mould grows in the dark).

Four short links: 27 March 2014

Understanding Image Processing, Sharing Data, Fixing Bad Science, and Delightful Dashboard

by Nat Torkington | @gnat | +Nat Torkington | March 27, 2014

2D Image Post-Processing Techniques and Algorithms (DIY Drones) — understanding how automated image matching and processing tools work means you can also get a better understanding how to shoot your images and what to prevent to get good matches.
Scientists Need to Learn to Share — despite science’s reputation for rigor, sloppiness is a substantial problem in some fields. You’re much more likely to check your work and follow best data-handling practices when you know someone is going to run your code and parse your data.
METRICS — Meta-Research Innovation Center at Stanford. John Ioannidis has a posse: connecting researchers into weak science, running conferences, creating a “journal watch”, and engaging policy makers. (says The Economist)
Grafana — elegant dashboard for graphite (the realtime data graphing engine).