- Terrier IR — open source (Mozilla) text search engine, now with Hadoop support.
- s3ql — open source (GPLv3) Linux filesystem which stores its data on Google Storage, Amazon S3, or OpenStack. (via Adam Shand)
ENTRIES TAGGED "search"
Spatial Search, Exposing Your Phone's Perfidity, School Unconference, and Wikipedia Viz
- VP Trees — a data structure for fast spatial searching. A form of nearest neighbour, useful for melodies (PDF) and image retrieval (PDF) and poetry. (via Reddit)
- iYou — iTunes plugin to show you all the stuff your phone collects about you.
- Bar Camps in Primary Schools — NZ teacher deploys bar camps among students. Great things happen.
- Realtime Wikipedia Edits — fascinating and hypnotic and inspirational and appalling and irrelevant all at once.
DRM Good for Amazon, Arduino Updated, Open Source Foundations, Distributed Search
- Cutting Their Own Throats (Charlie Stross) — DRM on ebooks gives Amazon a great tool for locking ebook customers into the Kindle platform. This essay is gold and so very true. Read, believe.
- v1.0 of Arduino Out — this is the dev environment, with language additions and lots of features in the libraries. Glad to see the 1.0 stamp put on this important piece of the homebrew hardware world.
- Koha and Why We Need Foundations — Simon Phipps looks into the Koha trademark dispute and says that it shows why open source needs foundations (collective IP ownership).
- Majestic-12 — a World Wide Web search engine based on concepts of distributing workload in a similar fashion achieved by successful projects such as SETI@home and distributed.net.
Cell Operating System, Search Savvy, Smiling Sliders, and Recommendation Tools
- Attempts to Make a Cell Operating System (Science Daily) — finally we will be able to have the guaranteed quality of software and the safety of biological organisms.
- Why Kids Can’t Search (Clive Thompson) — kids need to be taught critical thinking skills about what they find on the web. Librarians are our national leaders in this fight; they’re the main ones trying to teach search skills to kids today.
- Smiley Slider — cute little way to get feedback. (via Jyri Tuulos)
- LensKit — an open source toolkit for building, researching, and studying recommender systems.
Search Education, Classic Source, Analyzing Encrypted VoIP, and SQL Injection
- Web Search Education (Google) — lesson plans and materials for teaching people how to use search, from operators to critically evaluating sites. This latter area is the weakest: when I teach innocents about the web, I show them organic vs paid results, discuss why people advertise, how people pay for their sites, noticing domain names and organizations, etc. I wonder how much of the weakness of Google’s materials is due to their business model.
- Metroid Source Code — reverse-engineered source code from the original classic Metroid. (via Hacker News)
- Speaker Recognition From Encrypted VoIP Communications (PDF) — speaker identification, even one encrypted VoIP communications, is 70-75% among a pool of 10 candidates. Impressive. (via Bruce Schneier)
- SQL Injection Cheat Sheet — rundown of the different techniques for doing SQL injection. (via Gaëtan De Brucker)
- Blending Machines and Humans to Get Very High Accuracy (Greg Linden) — use experts to train the models, provide tools for experts to correct mistakes in the classifiers, and constantly evaluate all aspects of the system. This augmentation of human ability with computers lets us tackle problems that can’t be solved by computers alone.
- Electrical Efficiency of Computation (The Atlantic) — If a MacBook Air were as efficient as a 1991 computer, the battery would last 2.5 seconds. Cites research concluding that computations per kWh have doubled every 1.6 years since the 1940s. (via Hacker News)
- recoll — open source tool to make searchable the text buried in your computer (whether in zip files, mail attachments, whatever). (via One Thing Well)
An index in an ebook offers a level of discovery search can't touch.
Why should digital publishers invest in index creation? Because ebooks that give readers efficient ways to access what they need are ebooks that will sell.
Lou Rosenfeld on the benefits of parsing and refining site search.
A gold mine is hiding in the data generated by website search engines, yet many site owners pay little attention to the analytics those engines yield. Author Lou Rosenfeld explains why site search is worth your time.