ENTRIES TAGGED "Wikipedia"

Strata Week: Investors circle big data

Strata Week: Investors circle big data

Big funding news for data startups, a new verification tool for Wikipedia, and Angry Birds takes down the economy.

This week's data news includes funding announcements from a number of data startups, a new real-time research tool for Ushahidi and Wikipedia, and calculations about the amount of work time Americans waste on Angry Birds.

Read Full Post | Comment |
Four short links: 8 June 2011

Four short links: 8 June 2011

education, wikipedia, metrics, brain, science, 3d, fabbing, @fourshort

  1. Who Writes Wikipedia — reported widely as “bots make most of the contributions to Wikipedia”, but which really should have been “edits are a lousy measure of contributions”. The top bots are doing things like ensuring correctly formatted ISBN references and changing the names of navboxes–things which could be done by humans but which it would be a scandalous waste of human effort if they were. We analyse edits because it’s easy to get data on edits; analysis of value is a different matter.
  2. How I Failed and Finally Succeeded at Learning How to Code (The Atlantic) — great piece on teaching and learning programming, focusing on Project Euler. Kids are naturally curious. They love blank slates: a sandbox, a bag of LEGOs. Once you show them a little of what the machine can do they’ll clamor for more. They’ll want to know how to make that circle a little smaller or how to make that song go a little faster. They’ll imagine a game in their head and then relentlessly fight to build it. Along the way, of course, they’ll start to pick up all the concepts you wanted to teach them in the first place. And those concepts will stick because they learned them not in a vacuum, but in the service of a problem they were itching to solve.
  3. The Believing BrainBelief comes quickly and naturally, skepticism is slow and unnatural, and most people have a low tolerance for ambiguity.
  4. 3D Printed Rocket — stainless steel rocket engine.
Comments Off |
Strata Week: The mortality rate of URLs

Strata Week: The mortality rate of URLs

Parsing link rot, visualizing Wikipedia edits, and deconstructing autocorrect

In the latest Strata Week: How quickly do URLs die? Where in the world are Wikipedia editors? How does the iPhone autocorrect work (or not)?

Read Full Post | Comments Off |
Four short links: 2 May 2011

Four short links: 2 May 2011

Internet Cafe Culture, Image Processing, Library Mining, and MediaWiki Parsing

  1. Chinese Internet Cafes (Bryce Roberts) — a good quick read. My note: people valued the same things in Internet cafes that they value in public libraries, and the uses are very similar. They pose a similar threat to the already-successful, which is why public libraries are threatened in many Western countries.
  2. SIFT — the Scale Invariant Feature Transform library, built on OpenCV, is a method to detect distinctive, invariant image feature points, which easily can be matched between images to perform tasks such as object detection and recognition, or to compute geometrical transformations between images. The licensing seems dodgy–MIT code but lots of “this isn’t a license to use the patent!” warnings in the LICENSE file. (via Joshua Schachter)
  3. The Secret Life of Libraries (Guardian) — I like the idea of the most-stolen-books revealing something about a region; it’s an aspect of data revealing truth. For a while, Terry Pratchett was the most-shoplifted author in England but newspapers rarely carried articles about him or mentioned his books (because they were genre fiction not “real” literature). (via Brian Flaherty)
  4. Sweble — MediaWiki parser library. Until today, Wikitext had been poorly defined. There was no grammar, no defined processing rules, and no defined output like a DOM tree based on a well defined document object model. This is to say, the content of Wikipedia is stored in a format that is not an open standard. The format is defined by 5000 lines of php code (the parse function of MediaWiki). That code may be open source, but it is incomprehensible to most. That’s why there are 30+ failed attempts at writing alternative parsers. (via Dirk Riehle)
Comment: 1 |
Four short links: 25 March 2011

Four short links: 25 March 2011

Passionate Virtuosity, Developer Relations, Beautiful Wikipedia, and Paper Recommendations

  1. Bruce Sterling at SxSW (YouTube) — call to arms for “passionate virtuosity”. (via Mike Brown)
  2. Developer Support Handbook — Pamela Fox’s collected wisdom from years of doing devrel at Google.
  3. Wikipedia Beautifier — Chrome plugin that makes Wikipedia easier on the eyes.
  4. science.io — an open science community. Comment on, recommend and submit papers. Get up-to-date on a research topic. Follow a journal or an author. science.I/O is in beta and is currently focused on Computer Science.
Comments Off |

Strata Gems: Use Wikipedia as training data

The online encyclopedia is a great resource for data scientists

Wikipedia is an essential tool in the data scientist's armory. Today's Strata Gem shows how it can be used to help computers distinguish between different sense of common words.

Read Full Post | Comments: 3 |
Four short links: 16 November 2010

Four short links: 16 November 2010

Preserving History, Jimmy's Thousand Edit Stare, Maker Businesses, and Mobile Javascript

  1. A Room to Let in Old Aldgate — a lovely collection of photographs of lost buildings from The Society for Photographing Relics of Old London. Think of them as the Wayback Machine of their day. (via Fiona Rigby on Twitter)
  2. Wikipedia Fundraising A/B Tests — get a glimpse into the science that resulted in Jimmy Wales’s hollow haunted gaze staring at you with the eerie intensity of a creepy hobo talking about how tasty human liver is.
  3. It Takes A Lot of Money to Stay in Business (Ponoko) — guest blogs by Chris Anderson on the lessons and rules of maker businesses. Most Maker businesses that I’ve talked to have to hold parts inventory closer to 25% of their annual sales.
  4. Sencha Touch — mobile multitouch Javascript toolkit, now fully GPLed. (via Simon St Laurent)
Comments Off |
Four short links: 15 October 2010

Four short links: 15 October 2010

Long Tail, Copyright vs Preservation, Diminished Reality, and Augmented Data

  1. Mechanical Turk Requester Activity: The Insignificance of the Long TailFor Wikipedia we have the 1% rule, where 1% of the contributors (this is 0.003% of the users) contribute two thirds of the content. In the Causes application on Facebook, there are 25 million users, but only 1% of them contribute a donation. [...] The lognormal distribution of activity, also shows that requesters increase their participation exponentially over time: They post a few tasks, they get the results. If the results are good, they increase by a percentage the size of the tasks that they post next time. This multiplicative behavior is the basic process that generates the lognormal distribution of activity.
  2. Copyright Destroying Historic Audio — so says the Library of Congress. Were copyright law followed to the letter, little audio preservation would be undertaken. Were the law strictly enforced, it would brand virtually all audio preservation as illegal. Copyright laws related to preservation are neither strictly followed nor strictly enforced. Consequently, some audio preservation is conducted.
  3. Diminished Reality (Ray Kurzweil) — removes objects from video in real time. Great name, “diminished reality”. (via Andy Baio)
  4. Data Enrichment Service — using linked government data to augment text with annotations and links. (via Jo Walsh on Twitter)
Comments: 2 |
Four short links: 17 September 2010

Four short links: 17 September 2010

BBC Machine Learning, Wikipedia for History, Nuggets from Websites, and Lawbreaking Robots

  1. BBC Jobs — looking for someone to devise advanced machine intelligence techniques to infer high level classification metadata of audio and video content from low-level features extracted from it. (via mattb on Delicious)
  2. A History of the Iraq War Through Wikipedia Changelogs — printed and bound volumes of the Wikipedia changelogs during the Iraq war. This is historiography. This is what culture actually looks like: a process of argument, of dissenting and accreting opinion, of gradual and not always correct codification. And for the first time in history, we’re building a system that, perhaps only for a brief time but certainly for the moment, is capable of recording every single one of those infinitely valuable pieces of information. Everything should have a history button. We need to talk about historiography, to surface this process, to challenge absolutist narratives of the past, and thus, those of the present and our future. (via Flowing Data)
  3. Nuggetize — pulls highlights out of a page before you visit it. (via titine on Delicious)
  4. Antimov — SparkFun running contest where a robot violates one of Asimov’s three laws (not the one about hurting people though). I am in LOVE with the logo, check it out.
Comment: 1 |