TweetNLP — CMU open source natural language parsing tools for making sense of Tweets.
Interview with Google X Life Science’s Head (Medium) — I will have been here two years this March. In nineteen months we have been able to hire more than a hundred scientists to work on this. We’ve been able to build customized labs and get the equipment to make nanoparticles and decorate them and functionalize them. We’ve been able to strike up collaborations with MIT and Stanford and Duke. We’ve been able to initiate protocols and partnerships with companies like Novartis. We’ve been able to initiate trials like the baseline trial. This would be a good decade somewhere else. The power of focus and money.
Schooloscope Open Data Post-Mortem — The case of Schooloscope and the wider question of public access to school data challenges the belief that sunlight is the best disinfectant, that government transparency would always lead to better government, better results. It challenges the sentiments that see data as value-neutral and its representation as devoid of politics. In fact, access to school data exposes a sharp contrast between the private interest of the family (best education for my child) and the public interest of the government (best education for all citizens).
M-Lab Observatory — explorable data on the data experience (RTT, upload speed, etc) across different ISPs in different geographies over time.
Gearpump — Intel’s “actor-driven streaming framework”, initial benchmarks shows that we can process 2 million messages/second (100 bytes per message) with latency around 30ms on a cluster of 4 nodes.
Foundations of Data Science (PDF) — These notes are a first draft of a book being written by Hopcroft and Kannan [of Microsoft Research] and in many places are incomplete. However, the notes are in good enough shape to prepare lectures for a modern theoretical course in computer science.
Project Naptha — automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image. Chrome extension. (via Anil Dash)
Garbage Trucks and FedEx Vans (IEEE) — Foo alum, Ian Wright, found traction for his electric car biz by selling powertrains for garbage trucks and Fedex vans. Trucks have 20-30y lifetime, but powertrains are replaced several times; the trucks for fleets are custom; and “The average garbage truck in the U.S. spends $55,000 a year on fuel, and up to $30,000 a year on maintenance, mostly brake replacements.”
Microsoft’s Quantum Mechanics (MIT TR) — the race for the “topological qubit”, involving newly-discovered fundamental particles and large technology companies racing to be the first to make something that works.
Guidance Note on Uncertainty (PDF) –expert advice to IPCC scientists on identifying, quantifying, and communicating uncertainty. Everyone deals with uncertainty, but none are quite so ruthless in their pursuit of honesty about it as scientists. (via Peter Gluckman)
SparkFun Rapid Prototyping Lab — with links to some other expert advice on creative spaces. Some very obvious software parallels, too. E.g., this from Adam Savage’s advice: The right tool for the job – Despite his oft-cited declaration that ‘every tool is a hammer,’ Adam can usually be relied on to geek-out about purpose-built tools. If you’re having trouble learning a new skill, check that you’re using the right tools. The right tool is the one that does the hard work for you. There’s no point in dropping big bucks on tools you’re almost certainly not going to use, but don’t be afraid to buy the cheap version of the snap-setter, or leather punch, or tamper bit before trying to jerry-rig something that will end up making your life harder.
Dudes with Drones (The Atlantic) — ghastly title (“Bros with Bots”, “Bangers with Clangers”, and “Fratboys with Phat Toys” were presumably already taken), interesting article. San Diego is the Palo Alto of drones. Interesting to compare software startups with the hardware crews’ stance on the FAA. “We want them to regulate us,” Maloney says. “We want nothing more than a framework to allow us to continue to operate safely and legally.”
VCs Return to Backing Science Startups (NY Times) — industry and energy investment doubled this year, biotech up 26% in first half, but a lot of the investments are comically small and the risk remains acutely high.
dronecode — Linux Foundation common, shared open source platform for Unmanned Aerial Vehicles (UAVs). The platform has been adopted by many of the organizations on the forefront of drone technology, including 3DRobotics, DroneDeploy, HobbyKing, Horizon Ag, PrecisionHawk, Agribotics, and Walkera, among other.
TiVo Mega — 24TB of RAID storage, six tuners for capturing broadcasts. Which is rather like building the International Space Station and then hitching it to six horses for launch. Who at this point would make a $5k bet that everything you want to see on a TV will be broadcast by a cable company?
runswift — an in-browser client for compiling and running basic Swift functionality.
Maximum Happy Imagination (Matt Jones) — questioning the true vision of Marc Andreessen’s recent Twitter discourse on the great future that awaits us. His analogies run out in the 20th century when it comes to the political, social and economic implications of his maximum happy imagination.
The Mirrortocracy — It’s astonishing how many of the people conducting interviews and passing judgement on the careers of candidates have had no training at all on how to do it well. Aside from their own interviews, they may not have ever seen one. I’m all for learning on your own but at least when you write a program wrong it breaks. Without a natural feedback loop, interviewing mostly runs on myth and survivor bias.
Longitude Prize — six prize areas, Grand Challenge style, in clean flight, antibiotic resistance, dementia, food, water, and overcoming paralysis. Mysteriously none for library system that avoids DLL hell.
UK Copyright Law Permits Researchers to Data Mine — changes mean Copyright holders can require researchers to pay to access their content but cannot then restrict text or data mining for non-commercial purposes thereafter, under the new rules. However, researchers that use the text or data they have mined for anything other than a non-commercial purpose will be said to have infringed copyright, unless the activity has the consent of rights holders. In addition, the sale of the text or data mined by researchers is prohibited. The derivative works will be very interesting: if university mines the journals, finds new possibility for a Thing, is verified experimentally, is that Thing the university’s to license commercially for profit?
Efficient Online Summary of Microblogging Streams (PDF) — research paper. The algorithm we propose uses a word graph, along with optimization techniques such as decaying windows and pruning. It outperforms the baseline in terms of summary quality, as well as time and memory efficiency.
Statistical Shortcomings in Standard Math Libraries — or “Why C Derivatives Are Not Popular With Statistical Scientists”. The following mathematical functions are necessary for implementing any rudimentary statistics application; and yet they are general enough to have many applications beyond statistics. I hereby propose adding them to the standard C math library and to the libraries which inherit from it. For purposes of future discussion, I will refer to these functions as the Elusive Eight.
fail2ban — open source tool that scans logfiles for signs of malice, and triggers actions (e.g., iptables updates).
The Backlash Against Big Data contd. (Mike Loukides) — Learn to be a data skeptic. That doesn’t mean becoming skeptical about the value of data; it means asking the hard questions that anyone claiming to be a data scientist should ask. Think carefully about the questions you’re asking, the data you have to work with, and the results that you’re getting. And learn that data is about enabling intelligent discussions, not about turning a crank and having the right answer pop out.