- Suro (Github) — Netflix data pipeline service for large volumes of event data. (via Ben Lorica)
- NIPS Workshop on Data Driven Education — lots of research papers around machine learning, MOOC data, etc.
- Proofist — crowdsourced proofreading game.
- 3D-Printed Shoes (YouTube) — LeWeb talk from founder of the company, Continuum Fashion). (via Brady Forrest)
AI Book, Science Superstars, Engineering Ethics, and Crowdsourced Science
- Society of Mind — Marvin Minsky’s book now Creative-Commons licensed.
- Collaboration, Stars, and the Changing Organization of Science: Evidence from Evolutionary Biology — The concentration of research output is declining at the department level but increasing at the individual level. […] We speculate that this may be due to changing patterns of collaboration, perhaps caused by the rising burden of knowledge and the falling cost of communication, both of which increase the returns to collaboration. Indeed, we report evidence that the propensity to collaborate is rising over time. (via Sciblogs)
- As Engineers, We Must Consider the Ethical Implications of our Work (The Guardian) — applies to coders and designers as well.
- Eyewire — a game to crowdsource the mapping of 3D structure of neurons.
Zombie Drones, Algebra Through Code, Data Toolkit, and Crowdsourcing Antibiotic Discovery
- Skyjack — drone that takes over other drones. Welcome to the Malware of Things.
- Bootstrap World — a curricular module for students ages 12-16, which teaches algebraic and geometric concepts through computer programming. (via Esther Wojicki)
- Harvest — open source BSD-licensed toolkit for building web applications for integrating, discovering, and reporting data. Designed for biomedical data first. (via Mozilla Science Lab)
- Project ILIAD — crowdsourced antibiotic discovery.
As companies continue to use crowdsourcing, demand for people who know how to manage projects remains steady
A little over four years ago, I attended the first Crowdsourcing meetup at the offices of Crowdflower (then called Dolores Labs). The crowdsourcing community has grown explosively since that initial gathering, and there are now conference tracks and conferences devoted to this important industry. At the recent CrowdConf1, I found a community of professionals who specialize in managing a wide array of crowdsourcing projects.
Data scientists were early users of crowdsourcing services. I personally am most familiar with a common use case – the use of crowdsourcing to create labeled data sets for training machine-learning models. But as straightforward as it sounds, using crowdsourcing to generate training sets can be tricky – fortunately there are excellent papers and talks on this topic. At the most basic level, before embarking on a crowdsourcing project you should go through a simple checklist (among other things, make sure you have enough scale to justify engaging with a provider).
Beyond building training sets for machine-learning, more recently crowdsourcing is being used to enhance the results of machine-learning models: in active learning, humans2 take care of uncertain cases, models handle the routine ones. The use of ReCAPTCHA to digitize books is an example of this approach. On the flip side, analytics are being used to predict the outcome of crowd-based initiatives: researchers developed models to predict the success of Kickstarter campaigns 4 hours after their launch.
Better Tutorials, Self-Talk, Better AI, and Visualised Mechanics
- pineapple.io — attempt to crowdsource rankings for tutorials for important products, so you’re not picking your way through Google search results littered with tutorials written by incompetent illiterates for past versions of the software.
- BBC Forum — American social psychologist Aleks Krotoski has been looking at how the internet affects the way we talk to ourselves. Podcast (available for next 30 days) from BBC. (via Vaughan Bell)
- Why Can’t My Computer Understand Me (New Yorker) — using anaphora as the basis of an intelligence test, as example of what AI should be striving for. It’s not just that contemporary A.I. hasn’t solved these kinds of problems yet; it’s that contemporary A.I. has largely forgotten about them. In Levesque’s view, the field of artificial intelligence has fallen into a trap of “serial silver bulletism,” always looking to the next big thing, whether it’s expert systems or Big Data, but never painstakingly analyzing all of the subtle and deep knowledge that ordinary human beings possess. That’s a gargantuan task— “more like scaling a mountain than shoveling a driveway,” as Levesque writes. But it’s what the field needs to do.
- 507 Mechanical Movements — an old basic engineering textbook, animated. Me gusta.
Good Dev, User-Hostile Patterns, Patent Victories, and Drone History
- What to Look For in Software Dev (Pamela Fox) — It’s important to find a job where you get to work on a product you love or problems that challenge you, but it’s also important to find a job where you will be happy inside their codebase – where you won’t be afraid to make changes and where there’s a clear process for those changes.
- The Slippery Slope to Dark Patterns — demonstrates and deconstructs determinedly user-hostile pieces of software which deliberately break Nielsen’s usability heuristics to make users agree to things they rationally wouldn’t.
- Victory Lap for Ask Patents (Joel Spolsky) — story of how a StackExchange board on patents helped bust a bogus patent. It’s crowdsourcing the prior art, and Joel shows how easy it is.
- The World as Fire-Free Zone (MIT Technology Review) — data analysis to identify “signature” of terrorist behaviour, civilian deaths from strikes in territories the US has not declared war on, empty restrictions on use. Again, it’s a test that, by design, cannot be failed. Good history of UAVs in warfare and the blowback from their lax use. Quoting retired General Stanley McChrystal: The resentment caused by American use of unmanned strikes … is much greater than the average American appreciates. They are hated on a visceral level, even by people who’ve never seen one or seen the effects of one.
In-Browser p2p, Thinking About The Future, Disruptive Tech, and Crowdsourcing Transcription
- ShareFest — peer-to-peer file sharing in the browser. Source on GitHub. (via Andy Baio)
- Media for Thinking the Unthinkable (Bret Victor) — “Right now, today, we can’t see the thing, at all, that’s going to be the most important 100 years from now.” We cannot see the thing. At all. But whatever that thing is — people will have to think it. And we can, right now, today, prepare powerful ways of thinking for these people. We can build the tools that make it possible to think that thing. (via Matt Jones)
- McKinsey Report on Disruptive Technologies (McKinsey) — the list: Mobile Internet; Automation of knowledge work; Internet of Things; Cloud technology; Advanced Robotics; Autonomous and near-autonomous vehicles; Next-generation genomics; Energy storage; 3D Printing; Advanced Materials; Advanced Oil and Gas exploration and recovery; Renewable energy.
- The Only Public Transcript of the Bradley Manning Trial Will be Produced on a Crowd-Funded Typewriter — [t]he fact that a volunteer stenographer is providing the only comprehensive source of information about such a monumental event is pretty absurd.
Quality and security drive adoption, but community is rising fast
I recently talked to two managers of Black Duck, the first company formed to help organizations deal with the licensing issues involved in adopting open source software. With Tim Yeaton, President and CEO, and Peter Vescuso, Executive Vice President of Marketing and Business Development, I discussed the seventh Future of Open Source survey, from which I’ll post a few interesting insights later. But you can look at the slides for yourself, so this article will focus instead on some of the topics we talked about in our interview. While I cite some ideas from Yeaton and Vescuso, many of the observations below are purely my own.
The spur to collaboration
One theme in the slides is the formation of consortia that develop software for entire industries. One recent example everybody knows about is OpenStack, but many industries have their own impressive collaboration projects, such as GENIVI in the auto industry.
What brings competitors together to collaborate? In the case of GENIVI, it’s the impossibility of any single company meeting consumer demand through its own efforts. Car companies typically take five years to put a design out to market, but customers are used to product releases more like those of cell phones, where you can find something enticingly new every six months. In addition, the range of useful technologies—Bluetooth, etc.—is so big that a company has to become expert at everything at once. Meanwhile, according to Vescuso, the average high-end car contains more than 100 million lines of code. So the pace and complexity of progress is driving the auto industry to work together.
All too often, the main force uniting competitors is the fear of another vendor and the realization that they can never beat a dominant vendor on its own turf. Open source becomes a way of changing the rules out from under the dominant player. OpenStack, for instance, took on VMware in the virtualization space and Amazon.com in the IaaS space. Android attracted phone manufacturers and telephone companies as a reaction to the iPhone.
A valuable lesson can be learned from the history of the Open Software Foundation, which was formed in reaction to an agreement between Sun and AT&T. In the late 1980s, Sun had become the dominant vendor of Unix, which was still being maintained by AT&T. Their combination panicked vendors such as Digital Equipment Corporation and Apollo Computer (you can already get a sense of how much good OSF did them), who promised to create a single, unified standard that would give customers increased functionality and more competition.
The name Open Software Foundation was deceptive, because it was never open. Instead, it was a shared repository into which various companies dumped bad code so they could cynically claim to be interoperable while continuing to compete against each other in the usual way. It soon ceased to exist in its planned form, but did survive in a fashion by merging with X/Open to become the Open Group, an organization of some significance because it maintains the X Window System. Various flavors of BSD failed to dislodge the proprietary Unix vendors, probably because each BSD team did its work in a fairly traditional, closed fashion. It remained up to Linux, a truly open project, to unify the Unix community and ultimately replace the closed Sun/AT&T partnership.
Collaboration can be driven by many things, therefore, but it usually takes place in one of two fashions. In the first, somebody throws out into the field some open source code that everybody likes, as Rackspace and NASA did to launch OpenStack, or IBM did to launch Eclipse. Less common is the GENIVI model, in which companies realize they need to collaborate to compete and then start a project.
A bigger pie for all
The first thing on most companies’ minds when they adopt open source is to improve interoperability and defend themselves against lock-in by vendors. The Future of Open Source survey indicates that the top reasons for choosing open source is its quality (slide 13) and security (slide 15). This is excellent news because it shows that the misconceptions of open source are shattering, and the arguments by proprietary vendors that they can ensure better quality and security will increasingly be seen as hollow.
Know Your HTTP, Digital Exploitation, Insecure Webcams, and CS Courses
- Know Your HTTP Posters (GitHub) — A0-posters about the HTTP protocol.
- Crowdserfing — when a large corp uses crowd-sourced volunteering for its own financial gain, without giving back. It offends my sense of reciprocity as well, but nobody is coerced into using Google Maps or contributing data to it. How do we decide what is “right”?
- Exposed Webcam Viewer — hotels in Russia, lobbies in California, and blinking lights in the darkness from all around the world. (via Hacker News)
- Beauty and Joy of Computing — an introductory computer science curriculum developed at the University of California, Berkeley, intended for non-CS majors at the high school junior through undergraduate freshman level. Uses Snap, a web-based implementation of Scratch.