"crowdsourcing" entries

Yancey Strickler on Kickstarter and public benefit corporations

The O’Reilly Solid Podcast: Kickstarter’s CEO on different models for viewing a company’s success.

Subscribe to the O’Reilly Solid Podcast for insight and analysis about the Internet of Things and the worlds of hardware, software, and manufacturing.

Love_All_Serve_All_-_philanthropy_of_HRC_-_bar_counter,_Hard_Rock_Cafe_Berlin

Kickstarter is one of just a handful of large companies that have become public benefit corporations — committing themselves legally to social as well as financial goals.

In making the transformation, Kickstarter’s leaders have taken a pragmatic, active position in promoting social good — neither purely philanthropic nor purely profit driven.

In this episode of the Solid Podcast, David Cranor and I talk with Kickstarter’s co-founder and CEO, Yancey Strickler, about his decision to take the company through the public benefit process and his promise not to go through an IPO.

Strickler will be among the speakers at the Next:Economy summit, November 12-13, 2015, in San Francisco.

Discussion points:

  • Kickstarter’s reasoning behind its decision not to go public. Why not just sell the company and devote the proceeds to charity?
  • The difference between a B corp and a public benefit corporation
  • The “public good” principles in Kickstarter’s Benefit Corporation charter
  • Determining metrics that can quantify public benefit goals
  • Strickler’s thoughts on how Kickstarter’s PBC designation might influence a corporate model “different than hyper-growth, hyper-capitalist models that aren’t good for anyone other than people investing money”

Read more…

Four short links: 2 June 2015

Four short links: 2 June 2015

Toyota Code, Sapir-Wharf-Emoji, Crowdsourcing Formal Proof, and Safety-Critical Code

  1. Toyota’s Spaghetti CodeToyota had more than 10,000 global variables. And he was critical of Toyota watchdog supervisor — software to detect the death of a task — design. He testified that Toyota’s watchdog supervisor ‘is incapable of ever detecting the death of a major task. That’s its whole job. It doesn’t do it. It’s not designed to do it.’ (via @qrush)
  2. Google’s Design Icons (Kevin Marks) — Google’s design icons distinguish eight kinds of airline seats but has none for trains or buses.
  3. Verigames — DARPA-funded game to crowdsource elements of formal proofs. (via Network World)
  4. 10 Rules for Writing Safety-Critical Code — which I can loosely summarize as “simple = safer, use the built-in checks, don’t play with fire.”

Exploring methods in active learning

Tips on how to build effective human-machine hybrids, from crowdsourcing expert Adam Marcus.

15146_ORM_Webcast_ad(archived)In a recent O’Reilly webcast, “Crowdsourcing at GoDaddy: How I Learned to Stop Worrying and Love the Crowd,” Adam Marcus explains how to mitigate common challenges of managing crowd workers, how to make the most of human-in-the-loop machine learning, and how to establish effective and mutually rewarding relationships with workers. Marcus is the director of data on the Locu team at GoDaddy, where the “Get Found” service provides businesses with a central platform for managing their online presence and content.

In the webcast, Marcus uses practical examples from his experience at GoDaddy to reveal helpful methods for how to:

  • Offset the inevitability of wrong answers from the crowd
  • Develop and train workers through a peer-review system
  • Build a hierarchy of trusted workers
  • Make crowd work inspiring and enable upward mobility

What to do when humans get it wrong

It turns out there is a simple way to offset human error: redundantly ask people the same questions. Marcus explains that when you ask five different people the same question, there are some creative ways to combine their responses, and use a majority vote. Read more…

Four short links: 11 February 2015

Four short links: 11 February 2015

Crowdsourcing Working, etcd DKVS, Psychology Progress, and Inferring Logfile Rules

  1. Crowdsourcing Isn’t Broken — great rundown of ways to keep crowdsourcing on track. As with open sourcing something, just throwing open the doors and hoping for the best has a low probability of success.
  2. etcd Hits 2.0 — first major stable release of an open source, distributed, consistent key-value store for shared configuration, service discovery, and scheduler coordination.
  3. You Can’t Play 20 Questions With Nature and Win (PDF) — There is, I submit, a view of the scientific endeavor that is implicit (and sometimes explicit) in the picture I have presented above. Science advances by playing 20 questions with nature. The proper tactic is to frame a general question, hopefully binary, that can be attacked experimentally. Having settled that bits-worth, one can proceed to the next. The policy appears optimal – one never risks much, there is feedback from nature at every step, and progress is inevitable. Unfortunately, the questions never seem to be really answered, the strategy does not seem to work. An old paper, but still resonant today. (via Mind Hacks)
  4. Sequence: Automated Analyzer for Reducing 100k Messages to 10s of Patterns — induces patterns from the examples in log files.
Four short links: 11 December 2014

Four short links: 11 December 2014

Crowdsourcing Framework, Data Team Culture, Everybody Scrolls, and Honeypot Data

  1. Hive — open source crowdsourcing framework from NYT Labs.
  2. Prezi Data Team Culture — good docs on logging, metrics, etc. The vision is a great place to start.
  3. Scroll Behaviour Across the Web (Chartbeat) — nobody reads above the fold, they immediately scroll.
  4. threat_research (github) — shared raw data and stats from honeypots.
Four short links: 21 August 2014

Four short links: 21 August 2014

Open Data Glue, Smithsonian Crowdsourcing, MIT Family Creativity, and Hardware Owie

  1. Datan open source project that provides a streaming interface between every file format and data storage backend. See the Wired piece on it.
  2. Smithsonian Crowdsourcing Transcription (Smithsonian) — 49 volunteers transcribed 200 pages of correspondence between the Monuments Men in a week. Soon it’ll be mathematics test questions: “if 49 people transcribe 200 pages in 7 days, how many weeks will it take …”
  3. MIT Guide to Family CompSci SessionsThis guide is for educators, community center staff, and volunteers interested in engaging their young people and their families to become designers and inventors in their community.
  4. What to Do When You Screw up 2,000 Orders (SparkFun) — even hardware companies need to do retrospectives.
Four short links: 1 August 2014

Four short links: 1 August 2014

Data Storytelling Tools, Massive Dataset Mining, Failed Crowdsourcing, and IoT Networking

  1. MisoDataset, a JavaScript client-side data management and transformation library, Storyboard, a state and flow-control management library & d3.chart, a framework for creating reusable charts with d3.js. Open source designed to expedite the creation of high-quality interactive storytelling and data visualisation content.
  2. Mining of Massive Datasets (PDF) — book by Stanford profs, focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort.
  3. Lessons from Iceland’s Failed Crowdsourced Constitution (Slate) — Though the crowdsourcing moment could have led to a virtuous deliberative feedback loop between the crowd and the Constitutional Council, the latter did not seem to have the time, tools, or training necessary to process carefully the crowd’s input, explain its use of it, let alone return consistent feedback on it to the public.
  4. Thread a ZigBee Killer?Thread is Nest’s home automation networking stack, which can use the same hardware components as ZigBee, but which is not compatible, also not open source. The Novell NetWare of Things. Nick Hunn makes argument that Google (via Nest) are taking aim at ZigBee: it’s Google and Nest saying “ZigBee doesn’t work”.
Four short links: 8 July 2014

Four short links: 8 July 2014

Virtual Economies, Resource UAVs, Smarter Smaller Crowds, and Scaling Business

  1. Virtual Economies — new book from MIT Press on economics in games. The book will enable developers and designers to create and maintain successful virtual economies, introduce social scientists and policy makers to the power of virtual economies, and provide a useful guide to economic fundamentals for students in other disciplines.
  2. Resource Industry UAV Conference Presentations — collection of presentations from a recent resources industry conference. Includes UaaS: UAVs as a Service. (via DIY Drones)
  3. The Wisdom of Smaller, Smarter Crowdsin domains in which some crowd members have demonstrably more skill than others, smart sub-crowds could possibly outperform the whole. The central question this work addresses is whether such smart subsets of a crowd can be identified a priori in a large-scale prediction contest that has substantial skill and luck components. (via David Pennock)
  4. Larry and Sergey with Vinod (YouTube) — see transcription. I really liked Page’s point about scaling the number of things that companies do, and the constraints on such scaling.
Four short links: 3 April 2014

Four short links: 3 April 2014

Github for Data, Open Laptop, Crowdsourced Analysis, and Open Source Scraping

  1. dat — github-like tool for data, still v. early. It’s overdue. (via Nelson Minar)
  2. Novena Open Laptop — Bunnie Huang’s laptop goes on sale.
  3. Crowd Forecasting (NPR) — How is it possible that a group of average citizens doing Google searches in their suburban town homes can outpredict members of the United States intelligence community with access to classified information?
  4. Portia — open source visual web scraping tool.

Crowdsourcing Feature discovery

More than algorithms, companies gain access to models that incorporate ideas generated by teams of data scientists

Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons he started CrowdFlower was that as a data scientist he got frustrated with having to create training sets for many of the problems he faced. More recently, companies have been experimenting with active learning (humans1 take care of uncertain cases, models handle the routine ones). Along those lines, Adam Marcus described in detail how Locu uses Crowdsourcing services to perform structured extraction (converting semi/unstructured data into structured data).

Another area where crowdsourcing is popping up is feature engineering and feature discovery. Experienced data scientists will attest that generating features is as (if not more) important than choice of algorithm. Startup CrowdAnalytix uses public/open data sets to help companies enhance their analytic models. The company has access to several thousand data scientists spread across 50 countries and counts a major social network among its customers. Its current focus is on providing “enterprise risk quantification services to Fortune 1000 companies”.

CrowdAnalytix breaks up projects in two phases: feature engineering and modeling. During the feature engineering phase, data scientists are presented with a problem (independent variable(s)) and are asked to propose features (predictors) and brief explanations for why they might prove useful. A panel of judges evaluate2 features based on the accompanying evidence and explanations. Typically 100+ teams enter this phase of the project, and 30+ teams propose reasonable features.

Read more…