- Chinese Internet Cafes (Bryce Roberts) — a good quick read. My note: people valued the same things in Internet cafes that they value in public libraries, and the uses are very similar. They pose a similar threat to the already-successful, which is why public libraries are threatened in many Western countries.
- SIFT — the Scale Invariant Feature Transform library, built on OpenCV, is a method to detect distinctive, invariant image feature points, which easily can be matched between images to perform tasks such as object detection and recognition, or to compute geometrical transformations between images. The licensing seems dodgy–MIT code but lots of “this isn’t a license to use the patent!” warnings in the LICENSE file. (via Joshua Schachter)
- The Secret Life of Libraries (Guardian) — I like the idea of the most-stolen-books revealing something about a region; it’s an aspect of data revealing truth. For a while, Terry Pratchett was the most-shoplifted author in England but newspapers rarely carried articles about him or mentioned his books (because they were genre fiction not “real” literature). (via Brian Flaherty)
- Sweble — MediaWiki parser library. Until today, Wikitext had been poorly defined. There was no grammar, no defined processing rules, and no defined output like a DOM tree based on a well defined document object model. This is to say, the content of Wikipedia is stored in a format that is not an open standard. The format is defined by 5000 lines of php code (the parse function of MediaWiki). That code may be open source, but it is incomprehensible to most. That’s why there are 30+ failed attempts at writing alternative parsers. (via Dirk Riehle)
Internet Cafe Culture, Image Processing, Library Mining, and MediaWiki Parsing
PDP-11 Emulated, Crowdsourcing Culture, Deep Knowing, and Scientific Method
- 2010: The Year of Crowdsourcing Transcription — hasn’t finished yet, as NY Public Library shows. Cultural institutions are huge data sets that need human sensors to process, so we’ll be seeing a lot more of this in years to come as we light up thousands of years of written culture. (via Liza Daley)
- Programming the Commodore 64 — the loss of the total control that we had over our computers back when they were small enough that everything you needed to know would fit inside your head. It’s left me with a taste for grokking systems deeply and intimately, and that tendency is probably not a good fit for most modern programming, where you really don’t have time to go in an learn, say, Hibernate or Rails in detail: you just have to have the knack of skimming through a tutorial or two and picking up enough to get the current job done, more or less. I don’t mean to denigrate that: it’s an important and valuable skill. But it’s not one that moves my soul as Deep Knowing does. This is the kind of deep knowledge of TCP/IP and OS that devops is all about.
- Kids do Science — scientists lets kids invent an experiment, write it up, and it’s published in Biology Letters. Teaching the method of science, not the facts currently in vogue, will give us a generation capable of making data-based decisions.
Tweets as Ads, Do Not Track, OnePage Site, and Lessons Learned
(the author apologizes for the late publication of this item)
- Twitter’s Biggest Problem: Tweets are Ads — having just been to my first social media marketing conference, I see what the author’s talking about. Would you want to pay for advertising in the middle of a sea of free ads? (via Hacker News)
- Safari and Do Not Track Support — now that there’s a technical mechanism for consumers to opt out, the next step is to mandate that publishers respect it. Problem: compliance with do-not-track is largely invisible, so there’s nothing like the feedback loop you get with Do Not Call lists where ANY telemarketer is instantly identifiable as a lawbreaker. Instead, you’ll only know Do Not Track is not working if you see useful advertisements. What the–?
- OnePager — a library-focused one-page website for libraries, attempting to focus the library on providing useful information rather than a lot of it. There’s a lesson here for almost every institution with a website. (via Nina Simon)
- Max Levchin’s Lessons Learned — some resonant ones: You can have successful teams where people hate but deeply respect each other; the opposite (love but not respect among team members) is a recipe for disaster.
Future Retrospective, Political Entrepreneurs, Library DRM, and In-Database Analytics
- A History of the Future in 100 Objects (Kickstarter) — blog+podcast+video+book project, to have future historians tell the story of our century in 100 objects. The BBC show that inspired it was brilliant, and I rather suspect this will be too. It’s a clever way to tell a story of the future (his hardest problem will be creating a single coherent narrative for the 21st century). What are the 100 objects that future historians will use to sum up our century? ‘Smart drugs’ that change the way we think? A fragment from suitcase nuke detonated in Shanghai? A wedding ring between a human and an AI? The world’s most expensive glass of water, returned from a private mission to an asteroid? (via RIG London weekly notes)
- Entrepreneurs Who Create Value vs Entrepreneurs Who Lock Up Value (Andy Kessler) — distinguishes between “political entrepreneurs” who leverage their political power to own something and then overcharge or tax the crap out of the rest of us to use it vs “market entrepreneurs” who recognize the price-to-value gap and jump in. Ignoring legislation, they innovate, disintermediate, compete, stay up all night coding, and offer something better and cheaper until the market starts to shift. My attention was particularly caught by for every stroke of the pen, for every piece of legislation, for every paid-off congressman, there now exists a price umbrella that overvalues what he or any political entrepreneur is doing. (via Bryce Roberts)
- Harper-Collins Caps eBook Loans — The publisher wants to sell libraries DRMed ebooks that will self-destruct after 26 loans. Public libraries have always served and continue to serve those people who can’t access information on the purchase market. Jackass moves like these prevent libraries from serving those people in the future that we hope will come soon: the future where digital is default and print is premium. That premium may well be “the tentacles of soulless bottom-dwelling coprocephalic publishers can’t digitally destroy your purchase”. It’s worth noting that O’Reilly offers DRM-free PDFs of the books they publish, including mine. Own what you buy lest it own you. (via BoingBoing and many astonished library sources)
- MAD Lib — BSD-licensed open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. (via Ted Leung)
Heritage Games, Unpredictable Publishing, Timezones, and Map Tiles
- DigitalKoot — Playing games in Digitalkoot fixes mistakes in our index of old Finnish newspapers. This greatly increases the accuracy of text-based searches of the newspaper archives. (via Springwise and Imran Ali on Twitter)
- Some Things That Need To Be Said (Amanda Hocking) — A.H. is selling a lot of copies of her ebooks, and she cautions against thinking hers is an easily reproduced model. First, I am continuously overwhelmed by the amount of work I have to do that isn’t writing a book. Middlemen give you time in exchange for money. Second, By all accounts, he has done the same things I did, even writing in the same genre and pricing the books low. And he’s even a better writer than I am. So why am I selling more books than he is? I don’t know. I’m reminded of Duncan Watts’s work MusicLab which showed that “hits” aren’t predictable. It’s entirely possible to duplicate Amanda’s efforts and not replicate her success.
- A Literary Appreciation of the Olson Timezone Database — timezones are fickle political creations, and this is a wonderful tribute to the one database which ruled them all for 25 years.
- TileMill — a tool for cartographers to quickly and easily design maps for the web using custom data. Open source, built on Mapnik.
Twitter Mapped, Bibliographic Data Released, Babies Engadgeted, and Nat's Christmas Present Sorted
- A Day in the Life of Twitter (Chris McDowall) — all geo-tagged tweets from 24h of the Twitter firehose, displayed. Interesting things can be seen, such as Jakarta glowing as brightly as San Francisco. (via Chris’s sciblogs post)
- British Library Release 3M Open Bibliographic Records) (OKFN) — This dataset consists of the entire British National Bibliography, describing new books published in the UK since 1950; this represents about 20% of the total BL catalogue, and we are working to add further releases.
- Gadgets for Babies (NY Times) — cry decoders, algorithmically enhanced rocking chairs, and (my favourite) “voice-activated crib light with womb sounds”. I can’t wait until babies can make womb sound playlists and share them on Twitter.
- GP2X Caanoo MAME/Console Emulator (ThinkGeek) — perfect Christmas present for, well, me. Emulates classic arcade machines and microcomputers, including my nostalgia fetish object, the Commodore 64. (via BoingBoing’s Gift Guide)
Long Tail, Copyright vs Preservation, Diminished Reality, and Augmented Data
- Mechanical Turk Requester Activity: The Insignificance of the Long Tail — For Wikipedia we have the 1% rule, where 1% of the contributors (this is 0.003% of the users) contribute two thirds of the content. In the Causes application on Facebook, there are 25 million users, but only 1% of them contribute a donation. […] The lognormal distribution of activity, also shows that requesters increase their participation exponentially over time: They post a few tasks, they get the results. If the results are good, they increase by a percentage the size of the tasks that they post next time. This multiplicative behavior is the basic process that generates the lognormal distribution of activity.
- Copyright Destroying Historic Audio — so says the Library of Congress. Were copyright law followed to the letter, little audio preservation would be undertaken. Were the law strictly enforced, it would brand virtually all audio preservation as illegal. Copyright laws related to preservation are neither strictly followed nor strictly enforced. Consequently, some audio preservation is conducted.
- Diminished Reality (Ray Kurzweil) — removes objects from video in real time. Great name, “diminished reality”. (via Andy Baio)
- Data Enrichment Service — using linked government data to augment text with annotations and links. (via Jo Walsh on Twitter)
Sexy HTTP Parser, 9/11 Pager Leaks, Open Source Science, GLAM and Newspapers
- http-parser — This is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any allocations, it does not buffer data, and it can be interrupted at anytime. It only requires about 128 bytes of data per message stream (in a web server that is per connection). Extremely sexy piece of coding. (via sungo on Twitter)
- Wikileaks to Release 9/11 Pager Intercepts — they’re trickling the half-million messages out in simulated real time. The archive is a completely objective record of the defining moment of our time. We hope that its revelation will lead to a more nuanced understanding of the event and its tragic consequences. (via cshirky on Twitter)
- Promoting Open Source Science — interesting interview with an open science practitioner, but also notable for what it is: he was interviewed and released the text of the interview himself because his responses had been abridged in the printed version. (via suze on Twitter)
- Copyright, Findability, and Other Ideas from NDF (Julie Starr) — a newspaper industry guru attended the National Digital Forum where Galleries, Libraries, Archives, and Museums talk about their digital issues, where she discovered that newspapers and GLAMs have a lot in common. We can build beautiful, rich websites till the cows come home but they’re no good to anyone if people can’t easily find all that lovely content lurking beneath the homepage. That’s as true for news websites as it is for cultural archives and exhibitions, and it’s a topic that arose often in conversation at the NDF conference. I’ve been cooling on destination websites for a while. You need to have a destination website, of course, but you need even more to have your content out where your audience is so they can trip over it often and usefully.
More Twitter Clients, GLAM Tech, Retro Homebrew Audio Hardware, Emerging Open Source
- Echofon — novel take on Twitter apps: sync your unread list between phone, browser, and (ultimately, they promise) desktop Twitter app. (via auchmill on Twitter)
- GLAM Tech (MP3) — Radio New Zealand new technology slot about the use of technology in the Galleries, Libraries, Archives, and Museums (GLAM) sector. For links, see the programme page.
- Man With Miniature Radio — 1950s DIY proto-iPod amusement.
- Open Source in Emerging Markets — the emerging markets — which include India, China, and Brazil — have more FOSS adoption and a higher concentration of effort in open source. Three quarters (74%) of developers in emerging markets use open source software for at least part of their work, compared to 65% of developers worldwide. In this context, “use” means personal use or corporate use, and could include both developer tools and desktop or server applications. (via glynmoody on Twitter)