- Terrier IR — open source (Mozilla) text search engine, now with Hadoop support.
- s3ql — open source (GPLv3) Linux filesystem which stores its data on Google Storage, Amazon S3, or OpenStack. (via Adam Shand)
- Esprima — open source (BSD) fast Javascript parser in Javascript. (via Javascript Weekly)
- Hogan.js — open source (Apache) Javascript templating engine from Twitter. If it proves anywhere near as good as Bootstrap, it’ll be heavily used.
ENTRIES TAGGED "search"
Four short links: 28 December 2011
Text Search, Cloud Filesystem, Javascript Parser, and Twitter Templates
Four short links: 23 December 2011
Preview Colourblindness, Commandline Datamining, Open Source Indexing, and Javascript Time Series
- See the World as a Colour-Blind Person Would — filters that let you see images as protanopes, deuteranopes, and even tritanopes would see them. I am protanoptic (if that’s a word) and I can vouch that the “after” pix look the same as “before” to me. Care, because about 8% of men have some form of colourblindness and hate you and your “red is bad, green is good” visual cues. (via Flowing Data)
- Waffles — seeks to be the world’s most comprehensive collection of command-line tools for machine learning and data mining.
- LinkedIn Open Sources Index and Query Services — full-text index and retrieval engine, APIs, and a framework to manage indexes on infrastructure-as-a-service.
- Rickshaw — a JavaScript toolkit for creating interactive time series graphs.
Four short links: 5 December 2011
Spatial Search, Exposing Your Phone's Perfidity, School Unconference, and Wikipedia Viz
- VP Trees — a data structure for fast spatial searching. A form of nearest neighbour, useful for melodies (PDF) and image retrieval (PDF) and poetry. (via Reddit)
- iYou — iTunes plugin to show you all the stuff your phone collects about you.
- Bar Camps in Primary Schools — NZ teacher deploys bar camps among students. Great things happen.
- Realtime Wikipedia Edits — fascinating and hypnotic and inspirational and appalling and irrelevant all at once.
Four short links: 1 December 2011
DRM Good for Amazon, Arduino Updated, Open Source Foundations, Distributed Search
- Cutting Their Own Throats (Charlie Stross) — DRM on ebooks gives Amazon a great tool for locking ebook customers into the Kindle platform. This essay is gold and so very true. Read, believe.
- v1.0 of Arduino Out — this is the dev environment, with language additions and lots of features in the libraries. Glad to see the 1.0 stamp put on this important piece of the homebrew hardware world.
- Koha and Why We Need Foundations — Simon Phipps looks into the Koha trademark dispute and says that it shows why open source needs foundations (collective IP ownership).
- Majestic-12 — a World Wide Web search engine based on concepts of distributing workload in a similar fashion achieved by successful projects such as SETI@home and distributed.net.
Four short links: 8 November 2011
Cell Operating System, Search Savvy, Smiling Sliders, and Recommendation Tools
- Attempts to Make a Cell Operating System (Science Daily) — finally we will be able to have the guaranteed quality of software and the safety of biological organisms.
- Why Kids Can’t Search (Clive Thompson) — kids need to be taught critical thinking skills about what they find on the web. Librarians are our national leaders in this fight; they’re the main ones trying to teach search skills to kids today.
- Smiley Slider — cute little way to get feedback. (via Jyri Tuulos)
- LensKit — an open source toolkit for building, researching, and studying recommender systems.
Four short links: 25 October 2011
Smart Thermostat, Lamer News, Expensive Meaning, and Hardware Kits
- Nest Learning Thermostat — learns how long it takes your house to adjust temperature, so can tell you not just “it’s 55 now” but “it’ll be 65 in 16 minutes”. Looks gorgeous as well as being a good example of embedded intelligence. Data really does make everything better.
- lamernews (Github) — an implementation of a Reddit / Hacker News style news web site written using Ruby, Sinatra, Redis and jQuery.
- Information is Cheap, Meaning is Expensive — interview with George Dyson. That quote is a wonderful summary of why data is important. But George also says: The danger is not that machines are advancing. The danger is that we are losing our intelligence if we rely on computers instead of our own minds. On a fundamental level, we have to ask ourselves: Do we need human intelligence? And what happens if we fail to exercise it? (via Mathew Ingram)
- Cubelets, Littlebits, and Others (Russell Davies) — he’s been playing with some sweet hardware kits. It’s not new and surprising behaviour in a toy and it’s not unbuildable with Lego or Mecanno. But there’s something different and good about being able to do it so quickly, roughly and spontaneously – throwing bits together and getting behaviour out. Not following instructions or typing laboriously. That ease makes it magical and educational – you start to understand the functions of things as a builder not a thinker. (Slightly, you know, slightly – at a lego level, not at a 5-year engineering degree level, but it’s a start.)
Four short links: 18 October 2011
Search Education, Classic Source, Analyzing Encrypted VoIP, and SQL Injection
- Web Search Education (Google) — lesson plans and materials for teaching people how to use search, from operators to critically evaluating sites. This latter area is the weakest: when I teach innocents about the web, I show them organic vs paid results, discuss why people advertise, how people pay for their sites, noticing domain names and organizations, etc. I wonder how much of the weakness of Google’s materials is due to their business model.
- Metroid Source Code — reverse-engineered source code from the original classic Metroid. (via Hacker News)
- Speaker Recognition From Encrypted VoIP Communications (PDF) — speaker identification, even one encrypted VoIP communications, is 70-75% among a pool of 10 candidates. Impressive. (via Bruce Schneier)
- SQL Injection Cheat Sheet — rundown of the different techniques for doing SQL injection. (via Gaëtan De Brucker)
Four short links: 15 September 2011
DOSBox in Javascript, Augmenting Humans, Energy-efficient Computation, and Searchable Text
- Javascript DOSBox — first cut at a DOS emulator in Javascript, capable of running Doom. As the author said in email to me, The ability to run arbitrary x86 code across platforms without a plugin is kinda cool.
- Blending Machines and Humans to Get Very High Accuracy (Greg Linden) — use experts to train the models, provide tools for experts to correct mistakes in the classifiers, and constantly evaluate all aspects of the system. This augmentation of human ability with computers lets us tackle problems that can’t be solved by computers alone.
- Electrical Efficiency of Computation (The Atlantic) — If a MacBook Air were as efficient as a 1991 computer, the battery would last 2.5 seconds. Cites research concluding that computations per kWh have doubled every 1.6 years since the 1940s. (via Hacker News)
- recoll — open source tool to make searchable the text buried in your computer (whether in zip files, mail attachments, whatever). (via One Thing Well)
Why an ebook still needs an index
An index in an ebook offers a level of discovery search can't touch.
Why should digital publishers invest in index creation? Because ebooks that give readers efficient ways to access what they need are ebooks that will sell.
When was the last time you mined your site's search data?
Lou Rosenfeld on the benefits of parsing and refining site search.
A gold mine is hiding in the data generated by website search engines, yet many site owners pay little attention to the analytics those engines yield. Author Lou Rosenfeld explains why site search is worth your time.
Radar
Radar on
Radar on
Radar on
Radar on 