"libraries" entries

Four short links: 2 May 2011

Four short links: 2 May 2011

Internet Cafe Culture, Image Processing, Library Mining, and MediaWiki Parsing

  1. Chinese Internet Cafes (Bryce Roberts) — a good quick read. My note: people valued the same things in Internet cafes that they value in public libraries, and the uses are very similar. They pose a similar threat to the already-successful, which is why public libraries are threatened in many Western countries.
  2. SIFT — the Scale Invariant Feature Transform library, built on OpenCV, is a method to detect distinctive, invariant image feature points, which easily can be matched between images to perform tasks such as object detection and recognition, or to compute geometrical transformations between images. The licensing seems dodgy–MIT code but lots of “this isn’t a license to use the patent!” warnings in the LICENSE file. (via Joshua Schachter)
  3. The Secret Life of Libraries (Guardian) — I like the idea of the most-stolen-books revealing something about a region; it’s an aspect of data revealing truth. For a while, Terry Pratchett was the most-shoplifted author in England but newspapers rarely carried articles about him or mentioned his books (because they were genre fiction not “real” literature). (via Brian Flaherty)
  4. Sweble — MediaWiki parser library. Until today, Wikitext had been poorly defined. There was no grammar, no defined processing rules, and no defined output like a DOM tree based on a well defined document object model. This is to say, the content of Wikipedia is stored in a format that is not an open standard. The format is defined by 5000 lines of php code (the parse function of MediaWiki). That code may be open source, but it is incomprehensible to most. That’s why there are 30+ failed attempts at writing alternative parsers. (via Dirk Riehle)
Four short links: 20 April 2011

Four short links: 20 April 2011

PDP-11 Emulated, Crowdsourcing Culture, Deep Knowing, and Scientific Method

  1. PDP-11 Emulator in Javascript, Running V6 UNIX — blast from the past, and quite a readable emulator (heads up: cd was chdir back then). See also the 1st edition UNIX source on github. (via Hacker News)
  2. 2010: The Year of Crowdsourcing Transcription — hasn’t finished yet, as NY Public Library shows. Cultural institutions are huge data sets that need human sensors to process, so we’ll be seeing a lot more of this in years to come as we light up thousands of years of written culture. (via Liza Daley)
  3. Programming the Commodore 64the loss of the total control that we had over our computers back when they were small enough that everything you needed to know would fit inside your head. It’s left me with a taste for grokking systems deeply and intimately, and that tendency is probably not a good fit for most modern programming, where you really don’t have time to go in an learn, say, Hibernate or Rails in detail: you just have to have the knack of skimming through a tutorial or two and picking up enough to get the current job done, more or less. I don’t mean to denigrate that: it’s an important and valuable skill. But it’s not one that moves my soul as Deep Knowing does. This is the kind of deep knowledge of TCP/IP and OS that devops is all about.
  4. Kids do Science — scientists lets kids invent an experiment, write it up, and it’s published in Biology Letters. Teaching the method of science, not the facts currently in vogue, will give us a generation capable of making data-based decisions.
Four short links: 15 April 2011

Four short links: 15 April 2011

Tweets as Ads, Do Not Track, OnePage Site, and Lessons Learned

(the author apologizes for the late publication of this item)

  1. Twitter’s Biggest Problem: Tweets are Ads — having just been to my first social media marketing conference, I see what the author’s talking about. Would you want to pay for advertising in the middle of a sea of free ads? (via Hacker News)
  2. Safari and Do Not Track Support — now that there’s a technical mechanism for consumers to opt out, the next step is to mandate that publishers respect it. Problem: compliance with do-not-track is largely invisible, so there’s nothing like the feedback loop you get with Do Not Call lists where ANY telemarketer is instantly identifiable as a lawbreaker. Instead, you’ll only know Do Not Track is not working if you see useful advertisements. What the–?
  3. OnePager — a library-focused one-page website for libraries, attempting to focus the library on providing useful information rather than a lot of it. There’s a lesson here for almost every institution with a website. (via Nina Simon)
  4. Max Levchin’s Lessons Learned — some resonant ones: You can have successful teams where people hate but deeply respect each other; the opposite (love but not respect among team members) is a recipe for disaster.
Four short links: 14 March 2011

Four short links: 14 March 2011

Future Retrospective, Political Entrepreneurs, Library DRM, and In-Database Analytics

  1. A History of the Future in 100 Objects (Kickstarter) — blog+podcast+video+book project, to have future historians tell the story of our century in 100 objects. The BBC show that inspired it was brilliant, and I rather suspect this will be too. It’s a clever way to tell a story of the future (his hardest problem will be creating a single coherent narrative for the 21st century). What are the 100 objects that future historians will use to sum up our century? ‘Smart drugs’ that change the way we think? A fragment from suitcase nuke detonated in Shanghai? A wedding ring between a human and an AI? The world’s most expensive glass of water, returned from a private mission to an asteroid? (via RIG London weekly notes)
  2. Entrepreneurs Who Create Value vs Entrepreneurs Who Lock Up Value (Andy Kessler) — distinguishes between “political entrepreneurs” who leverage their political power to own something and then overcharge or tax the crap out of the rest of us to use it vs “market entrepreneurs” who recognize the price-to-value gap and jump in. Ignoring legislation, they innovate, disintermediate, compete, stay up all night coding, and offer something better and cheaper until the market starts to shift. My attention was particularly caught by for every stroke of the pen, for every piece of legislation, for every paid-off congressman, there now exists a price umbrella that overvalues what he or any political entrepreneur is doing. (via Bryce Roberts)
  3. Harper-Collins Caps eBook Loans — The publisher wants to sell libraries DRMed ebooks that will self-destruct after 26 loans. Public libraries have always served and continue to serve those people who can’t access information on the purchase market. Jackass moves like these prevent libraries from serving those people in the future that we hope will come soon: the future where digital is default and print is premium. That premium may well be “the tentacles of soulless bottom-dwelling coprocephalic publishers can’t digitally destroy your purchase”. It’s worth noting that O’Reilly offers DRM-free PDFs of the books they publish, including mine. Own what you buy lest it own you. (via BoingBoing and many astonished library sources)
  4. MAD Lib — BSD-licensed open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. (via Ted Leung)
Four short links: 7 March 2011

Four short links: 7 March 2011

Heritage Games, Unpredictable Publishing, Timezones, and Map Tiles

  1. DigitalKootPlaying games in Digitalkoot fixes mistakes in our index of old Finnish newspapers. This greatly increases the accuracy of text-based searches of the newspaper archives. (via Springwise and Imran Ali on Twitter)
  2. Some Things That Need To Be Said (Amanda Hocking) — A.H. is selling a lot of copies of her ebooks, and she cautions against thinking hers is an easily reproduced model. First, I am continuously overwhelmed by the amount of work I have to do that isn’t writing a book. Middlemen give you time in exchange for money. Second, By all accounts, he has done the same things I did, even writing in the same genre and pricing the books low. And he’s even a better writer than I am. So why am I selling more books than he is? I don’t know. I’m reminded of Duncan Watts’s work MusicLab which showed that “hits” aren’t predictable. It’s entirely possible to duplicate Amanda’s efforts and not replicate her success.
  3. A Literary Appreciation of the Olson Timezone Database — timezones are fickle political creations, and this is a wonderful tribute to the one database which ruled them all for 25 years.
  4. TileMilla tool for cartographers to quickly and easily design maps for the web using custom data. Open source, built on Mapnik.
Four short links: 25 November 2010

Four short links: 25 November 2010

Twitter Mapped, Bibliographic Data Released, Babies Engadgeted, and Nat's Christmas Present Sorted

  1. A Day in the Life of Twitter (Chris McDowall) — all geo-tagged tweets from 24h of the Twitter firehose, displayed. Interesting things can be seen, such as Jakarta glowing as brightly as San Francisco. (via Chris’s sciblogs post)
  2. British Library Release 3M Open Bibliographic Records) (OKFN) — This dataset consists of the entire British National Bibliography, describing new books published in the UK since 1950; this represents about 20% of the total BL catalogue, and we are working to add further releases.
  3. Gadgets for Babies (NY Times) — cry decoders, algorithmically enhanced rocking chairs, and (my favourite) “voice-activated crib light with womb sounds”. I can’t wait until babies can make womb sound playlists and share them on Twitter.
  4. GP2X Caanoo MAME/Console Emulator (ThinkGeek) — perfect Christmas present for, well, me. Emulates classic arcade machines and microcomputers, including my nostalgia fetish object, the Commodore 64. (via BoingBoing’s Gift Guide)
Four short links: 15 October 2010

Four short links: 15 October 2010

Long Tail, Copyright vs Preservation, Diminished Reality, and Augmented Data

  1. Mechanical Turk Requester Activity: The Insignificance of the Long TailFor Wikipedia we have the 1% rule, where 1% of the contributors (this is 0.003% of the users) contribute two thirds of the content. In the Causes application on Facebook, there are 25 million users, but only 1% of them contribute a donation. […] The lognormal distribution of activity, also shows that requesters increase their participation exponentially over time: They post a few tasks, they get the results. If the results are good, they increase by a percentage the size of the tasks that they post next time. This multiplicative behavior is the basic process that generates the lognormal distribution of activity.
  2. Copyright Destroying Historic Audio — so says the Library of Congress. Were copyright law followed to the letter, little audio preservation would be undertaken. Were the law strictly enforced, it would brand virtually all audio preservation as illegal. Copyright laws related to preservation are neither strictly followed nor strictly enforced. Consequently, some audio preservation is conducted.
  3. Diminished Reality (Ray Kurzweil) — removes objects from video in real time. Great name, “diminished reality”. (via Andy Baio)
  4. Data Enrichment Service — using linked government data to augment text with annotations and links. (via Jo Walsh on Twitter)
Four short links: 15 March 2010

Four short links: 15 March 2010

Digital Libraries, Story Analysis, Scriptable Google Apps, Forensic Rooting

  1. A German Library for the 21st Century (Der Spiegel) — But browsing in Europeana is just not very pleasurable. The results are displayed in thumbnail images the size of postage stamps. And if you click through for a closer look, you’re taken to the corresponding institute. Soon you’re wandering helplessly around a dozen different museum and library Web sites — and you end up lost somewhere between the “Vlaamse Kunstcollectie” and the “Wielkopolska Biblioteka Cyfrowa.” Would it not be preferable to incorporate all the exhibits within the familiar scope of Europeana? “We would have preferred that,” says Gradmann. “But then the museums would not have participated.” They insist on presenting their own treasures. This is a problem encountered everywhere around the world: users hate silos but institutions hate the thought of letting go of their content. We’re going to have to let go to win. (via Penny Carnaby)
  2. StoryGardena web-based tool for gathering and analyzing a large number of stories contributed by the public. The content of the stories, along with some associated survey questions, are processed in an automated semantic computing process for an immediate, interactive display for the lay public, and in a more thorough manual process for expert analysis.
  3. Google Apps Script — VBA for the 2010s. Currently mainly for spreadsheets, but some hooks into Gmail and Google Calendar.
  4. There’s a Rootkit in the Closet — lovely explanation of finding and isolating a rootkit, reconstructing how it got there and deconstructing the rootkit to figure out what it did. It’s a detective story, no less exciting than when Cliff Stohl wrote The Cuckoo’s Egg.
Four short links: 25 November 2009

Four short links: 25 November 2009

Sexy HTTP Parser, 9/11 Pager Leaks, Open Source Science, GLAM and Newspapers

  1. http-parserThis is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any allocations, it does not buffer data, and it can be interrupted at anytime. It only requires about 128 bytes of data per message stream (in a web server that is per connection). Extremely sexy piece of coding. (via sungo on Twitter)
  2. Wikileaks to Release 9/11 Pager Intercepts — they’re trickling the half-million messages out in simulated real time. The archive is a completely objective record of the defining moment of our time. We hope that its revelation will lead to a more nuanced understanding of the event and its tragic consequences. (via cshirky on Twitter)
  3. Promoting Open Source Science — interesting interview with an open science practitioner, but also notable for what it is: he was interviewed and released the text of the interview himself because his responses had been abridged in the printed version. (via suze on Twitter)
  4. Copyright, Findability, and Other Ideas from NDF (Julie Starr) — a newspaper industry guru attended the National Digital Forum where Galleries, Libraries, Archives, and Museums talk about their digital issues, where she discovered that newspapers and GLAMs have a lot in common. We can build beautiful, rich websites till the cows come home but they’re no good to anyone if people can’t easily find all that lovely content lurking beneath the homepage. That’s as true for news websites as it is for cultural archives and exhibitions, and it’s a topic that arose often in conversation at the NDF conference. I’ve been cooling on destination websites for a while. You need to have a destination website, of course, but you need even more to have your content out where your audience is so they can trip over it often and usefully.
Four short links: 18 September 2009

Four short links: 18 September 2009

More Twitter Clients, GLAM Tech, Retro Homebrew Audio Hardware, Emerging Open Source

  1. Echofon — novel take on Twitter apps: sync your unread list between phone, browser, and (ultimately, they promise) desktop Twitter app. (via auchmill on Twitter)
  2. GLAM Tech (MP3) — Radio New Zealand new technology slot about the use of technology in the Galleries, Libraries, Archives, and Museums (GLAM) sector. For links, see the programme page.
  3. Man With Miniature Radio — 1950s DIY proto-iPod amusement.
  4. Open Source in Emerging Marketsthe emerging markets — which include India, China, and Brazil — have more FOSS adoption and a higher concentration of effort in open source. Three quarters (74%) of developers in emerging markets use open source software for at least part of their work, compared to 65% of developers worldwide. In this context, “use” means personal use or corporate use, and could include both developer tools and desktop or server applications. (via glynmoody on Twitter)