- Who Writes Wikipedia — reported widely as “bots make most of the contributions to Wikipedia”, but which really should have been “edits are a lousy measure of contributions”. The top bots are doing things like ensuring correctly formatted ISBN references and changing the names of navboxes–things which could be done by humans but which it would be a scandalous waste of human effort if they were. We analyse edits because it’s easy to get data on edits; analysis of value is a different matter.
- How I Failed and Finally Succeeded at Learning How to Code (The Atlantic) — great piece on teaching and learning programming, focusing on Project Euler. Kids are naturally curious. They love blank slates: a sandbox, a bag of LEGOs. Once you show them a little of what the machine can do they’ll clamor for more. They’ll want to know how to make that circle a little smaller or how to make that song go a little faster. They’ll imagine a game in their head and then relentlessly fight to build it. Along the way, of course, they’ll start to pick up all the concepts you wanted to teach them in the first place. And those concepts will stick because they learned them not in a vacuum, but in the service of a problem they were itching to solve.
- The Believing Brain — Belief comes quickly and naturally, skepticism is slow and unnatural, and most people have a low tolerance for ambiguity.
- 3D Printed Rocket — stainless steel rocket engine.
ENTRIES TAGGED "Wikipedia"
education, wikipedia, metrics, brain, science, 3d, fabbing, @fourshort
Internet Cafe Culture, Image Processing, Library Mining, and MediaWiki Parsing
- Chinese Internet Cafes (Bryce Roberts) — a good quick read. My note: people valued the same things in Internet cafes that they value in public libraries, and the uses are very similar. They pose a similar threat to the already-successful, which is why public libraries are threatened in many Western countries.
- SIFT — the Scale Invariant Feature Transform library, built on OpenCV, is a method to detect distinctive, invariant image feature points, which easily can be matched between images to perform tasks such as object detection and recognition, or to compute geometrical transformations between images. The licensing seems dodgy–MIT code but lots of “this isn’t a license to use the patent!” warnings in the LICENSE file. (via Joshua Schachter)
- The Secret Life of Libraries (Guardian) — I like the idea of the most-stolen-books revealing something about a region; it’s an aspect of data revealing truth. For a while, Terry Pratchett was the most-shoplifted author in England but newspapers rarely carried articles about him or mentioned his books (because they were genre fiction not “real” literature). (via Brian Flaherty)
- Sweble — MediaWiki parser library. Until today, Wikitext had been poorly defined. There was no grammar, no defined processing rules, and no defined output like a DOM tree based on a well defined document object model. This is to say, the content of Wikipedia is stored in a format that is not an open standard. The format is defined by 5000 lines of php code (the parse function of MediaWiki). That code may be open source, but it is incomprehensible to most. That’s why there are 30+ failed attempts at writing alternative parsers. (via Dirk Riehle)
The online encyclopedia is a great resource for data scientists
Wikipedia is an essential tool in the data scientist's armory. Today's Strata Gem shows how it can be used to help computers distinguish between different sense of common words.
- A Room to Let in Old Aldgate — a lovely collection of photographs of lost buildings from The Society for Photographing Relics of Old London. Think of them as the Wayback Machine of their day. (via Fiona Rigby on Twitter)
- Wikipedia Fundraising A/B Tests — get a glimpse into the science that resulted in Jimmy Wales’s hollow haunted gaze staring at you with the eerie intensity of a creepy hobo talking about how tasty human liver is.
- It Takes A Lot of Money to Stay in Business (Ponoko) — guest blogs by Chris Anderson on the lessons and rules of maker businesses. Most Maker businesses that I’ve talked to have to hold parts inventory closer to 25% of their annual sales.
Long Tail, Copyright vs Preservation, Diminished Reality, and Augmented Data
- Mechanical Turk Requester Activity: The Insignificance of the Long Tail — For Wikipedia we have the 1% rule, where 1% of the contributors (this is 0.003% of the users) contribute two thirds of the content. In the Causes application on Facebook, there are 25 million users, but only 1% of them contribute a donation. [...] The lognormal distribution of activity, also shows that requesters increase their participation exponentially over time: They post a few tasks, they get the results. If the results are good, they increase by a percentage the size of the tasks that they post next time. This multiplicative behavior is the basic process that generates the lognormal distribution of activity.
- Copyright Destroying Historic Audio — so says the Library of Congress. Were copyright law followed to the letter, little audio preservation would be undertaken. Were the law strictly enforced, it would brand virtually all audio preservation as illegal. Copyright laws related to preservation are neither strictly followed nor strictly enforced. Consequently, some audio preservation is conducted.
- Diminished Reality (Ray Kurzweil) — removes objects from video in real time. Great name, “diminished reality”. (via Andy Baio)
- Data Enrichment Service — using linked government data to augment text with annotations and links. (via Jo Walsh on Twitter)
BBC Machine Learning, Wikipedia for History, Nuggets from Websites, and Lawbreaking Robots
- BBC Jobs — looking for someone to devise advanced machine intelligence techniques to infer high level classification metadata of audio and video content from low-level features extracted from it. (via mattb on Delicious)
- A History of the Iraq War Through Wikipedia Changelogs — printed and bound volumes of the Wikipedia changelogs during the Iraq war. This is historiography. This is what culture actually looks like: a process of argument, of dissenting and accreting opinion, of gradual and not always correct codification. And for the first time in history, we’re building a system that, perhaps only for a brief time but certainly for the moment, is capable of recording every single one of those infinitely valuable pieces of information. Everything should have a history button. We need to talk about historiography, to surface this process, to challenge absolutist narratives of the past, and thus, those of the present and our future. (via Flowing Data)
- Nuggetize — pulls highlights out of a page before you visit it. (via titine on Delicious)
- Antimov — SparkFun running contest where a robot violates one of Asimov’s three laws (not the one about hurting people though). I am in LOVE with the logo, check it out.