- Web Search Education (Google) — lesson plans and materials for teaching people how to use search, from operators to critically evaluating sites. This latter area is the weakest: when I teach innocents about the web, I show them organic vs paid results, discuss why people advertise, how people pay for their sites, noticing domain names and organizations, etc. I wonder how much of the weakness of Google’s materials is due to their business model.
- Metroid Source Code — reverse-engineered source code from the original classic Metroid. (via Hacker News)
- Speaker Recognition From Encrypted VoIP Communications (PDF) — speaker identification, even one encrypted VoIP communications, is 70-75% among a pool of 10 candidates. Impressive. (via Bruce Schneier)
- SQL Injection Cheat Sheet — rundown of the different techniques for doing SQL injection. (via Gaëtan De Brucker)
Search Education, Classic Source, Analyzing Encrypted VoIP, and SQL Injection
- Blending Machines and Humans to Get Very High Accuracy (Greg Linden) — use experts to train the models, provide tools for experts to correct mistakes in the classifiers, and constantly evaluate all aspects of the system. This augmentation of human ability with computers lets us tackle problems that can’t be solved by computers alone.
- Electrical Efficiency of Computation (The Atlantic) — If a MacBook Air were as efficient as a 1991 computer, the battery would last 2.5 seconds. Cites research concluding that computations per kWh have doubled every 1.6 years since the 1940s. (via Hacker News)
- recoll — open source tool to make searchable the text buried in your computer (whether in zip files, mail attachments, whatever). (via One Thing Well)
An index in an ebook offers a level of discovery search can't touch.
Why should digital publishers invest in index creation? Because ebooks that give readers efficient ways to access what they need are ebooks that will sell.
Lou Rosenfeld on the benefits of parsing and refining site search.
A gold mine is hiding in the data generated by website search engines, yet many site owners pay little attention to the analytics those engines yield. Author Lou Rosenfeld explains why site search is worth your time.
Ereader search tools need to limit disruption and incorporate web search best practices.
The current crop of ereaders handle ebook searching in a variety of ways — some are useful and creative, some aren’t. Here, Pete Meyers looks at the state of ebook search and how it can be improved.
- Google Keyword Advertising — interesting infographic about the most lucrative advertising categories for Google. #20 is an eye-opener!
- Etsy AB (GitHub) — Etsy’s framework for A/B testing, feature ramp up, and more. (via Randy J. Hunt)
Fair Use, Equation UI, Startup Numbers, and Data Search Engine
- Putting Fair Use Forward (Chronicle of Higher Education) — lawyer and academic collaborating on guidelines for academic fair use, intended to remove the chilling effect of the fear of being sued. Great quotes: People deal with fuzzy laws all the time, she argues. “Obscenity is impossible to define, and yet people have some idea of when they’re committing an obscenity or not. You could walk through your life being haunted by the specter of litigation in every aspect of it. But people don’t usually do this in their other free-speech rights.” (via David Adler)
- Scrubbing Calculator — clever UI for solving equations without needing to know how to solve equations. Imminent death of mathematics skill in the US predicted, film at 11. (via Dan Meyer)
- Startup Genome — a report, written from research into 650 startups. Investors who provide hands-on help have little or no effect on the company’s operational performance. But the right mentors significantly influence a company’s performance and ability to raise money. (However, this does not mean that investors don’t have a significant effect on valuations and M&A) Balanced teams with one technical founder and one business founder raise 30% more money, have 2.9x more user growth and are 19% less likely to scale prematurely than technical or business-heavy founding teams.
- Zanran — search engine for graphs, charts, and data. (via Pia Waugh)
A Princeton search algorithm uses language indicators to measure importance.
A search algorithm being developed by Princeton University researchers parses language to determine relevance. Academic application is one possibility, but this type of algorithm could also extend to news recommendations.
Banshee Bucks, Log Mining, Visualization Secrets, and Repression Tools
- Canonical’s New Plan for Banshee — Canonical prepare the Linux distribution Ubuntu. They will distribute the popular iTunes-alike Banshee, but instead of the standard Amazon store plugin (which generates much $ in affiliate revenue for the GNOME Foundation) they will have Canonical’s own Amazon store plugin and keep 75% of the revenue (25% going to the GNOME Foundation). They’re legally within their rights, and it underscores for me how the goal of providing freedom from control is incompatible with a goal of making money. Free and open source software gives self-destination with software, and that includes the right to replace your money pump with theirs.
- Oluolu — an open source query log mining tool which works on Hadoop. This tool provides resources to add new features to search engines. Concretely Oluolu supports automatic dictionary creation such as spelling correction, context queries or frequent query n-grams from query log data. The dictionaries are applied to search engines to add features such as ‘did you mean’ or ‘related keyword suggestion’ service in search engines. (via Matt Biddulph on Delicious)
- Information is Beautiful Process (David McCandless) — David’s process for creating his beautiful and moving visualizations.
- Facebook for Repressive Regimes — The purpose of this blog post is not to help repressive regimes use Facebook better, but rather to warn activists about the risks they face when using Facebook. (via Justine Sanderson on Delicious)
Jeopardy was fun, but Watson's practical applications are what's really interesting.
Aside from whipping the pants off two Jeapardy geniuses, the Watson computer is opening the door to new monetization possibilities for search.