Four short links: 7 August 2009

  1. Defragging the Stimuluseach [recovery] site has its own silo of data, and no site is complete. What we need is a unified point of access to all sources of information: firsthand reports from and state portals, commentary from StimulusWatch and MetaCarta, and more. Suggests that should be the hub for this presently-decentralised pile of recovery data.
  2. Memetracker — site accompanying the research written up by the New York Times as Researchers at Cornell, using powerful computers and clever algorithms, studied the news cycle by looking for repeated phrases and tracking their appearances on 1.6 million mainstream media sites and blogs […] For the most part, the traditional news outlets lead and the blogs follow, typically by 2.5 hours […] a relative handful of blog sites are the quickest to pick up on things that later gain wide attention on the Web. Confirming that blogs and traditional media have a symbiotic relationship, not a parasitic one. (via Stats article in NY Times)
  3. Feds at DefCon Alarmed After RFIDs Scanned (Wired) — RFID badges make for convenient security, and for convenient attack. Black hats can read your security cards from 2 or 3 feet away, and few in government are aware of the attack vector. To help prevent surreptitious readers from siphoning RFID data, a company named DIFRWear was doing brisk business at DefCon selling leather Faraday-shielded wallets and passport holders lined with material that prevents readers from sniffing RFID chips in proximity cards.
  4. A Comparison of Open Source Search Engines and Indexing Twitter — Detailed write-up of the open source search options and how they stack up on a pile of Tweets. While researching for the Software section, I was quite surprised by the number of open source vertical search solutions I found: Lucene (Nutch, Solr, Hounder), Sphinx, zettair, Terrier, Galago, Minnion, MG4J, Wumpus, RDBMS (mysql, sqlite), Indri, Xapian, grep … And I was even more surprised by the lack of comparisons between these solutions. Many of these platforms advertise their performance benchmarks, but they are in isolation, use different data sets, and seem to be more focused on speed as opposed to say relevance. (via joshua on Delicious)
