Mon

Nov 13
2006

Tim O'Reilly

Tim O'Reilly

Thoughts on the State of Search

Sarah Milstein, co-author and editor of Google: The Missing Manual, sent some thought-provoking comments to the O'Reilly editors' list last week. I thought they were very much worth sharing with a wider audience.

A couple of things I've noticed since writing/editing the second edition of the Google Missing Manual earlier this year. Nothing ground-breaking here; more that in aggregate, the observations may spark some interesting conversations.
  • As the Web gets bigger, search results contain more irrelevant stuff. In many cases, it's getting harder to find what you want. Appreciably harder.
  • Assuming search winds up lasting 100+ years, it's still in its infancy. Still, it surprises me that the presentation of Google's main search results pages barely changed in the two years from one edition of the book to the next. The main difference is that now, onebox results with specialized information appear more frequently (though randomly) at the top of results listing. At this point, I'm ready for a better results interface.
  • Other search companies are doing some cool stuff with their main results. For many searches, Microsoft Live presents a super-useful list of related searches; also, somebody hit them with the clean-interface stick. Ask.com does a nice job with simple natural-language questions. Clusty has been offering very handy clustered search results since at least 2004. Daylife (still in alpha) does what is essentially clustering with a thoughtful interface.
  • Vertical search is a hot trendlet; for the most part, it's about clustering and improving the results interface. (Those aren't trivial things; they're really important aspects of helping you find stuff.)
  • As search results get more unwieldy, recommendation engines like Amazon's or iTunes' could become more important tools. (Bonus: they're an iteration of the architecture of participation, so we can claim some kind of credit. ;) Presumably, implicit relevance (based on search-result clickstreams) is going to be a big part of this if it's not already.
  • I wonder when/if search is going to be real-time (i.e. live Web) rather than index-based. And I wonder if the main barrier to it now is hardware or software (to the degree you can separate them). At Web2, I met a woman from Intel R&D who's working on a continuous refresh data system that would allow real-time searching but for which you need multi-core processors that aren't yet ready for primetime. Still, an interesting glimpse of the possible future. [Sidebar: I brought up the idea of real-time Web search with Tony Stubblebine, and he thought it was hilarious. Totally unrealistic viz computing capacity. I thought it was laughable that he thought we wouldn't eventually have the bandwidth and cycles to do it. Tune in in 10 years for an update.]
  • When do the implicit conclusions of hardcore data-mining and analysis become part of search results? For example, Marc Smith's newsgroup analysis can point to potential experts on a topic. If I want to find an SEO guru, when will search results from a major engine contain implicitly derived info from Marc's project?
A couple of thoughts on Google as a company:
  • During the two-year period from one edition of Google TMM to the next, Google began adding features and services at such an increasingly quick clip that while the second edition was at the printer, we missed out on eight new tools. That was in a 10-day period. (Some of those new tools are search-related; many are not.)
  • The other big change from the Google customer POV is that now, a lot of Google services require an account. It's not yet clear, though, how/whether having an account will lead to improved search results.
  • Moreover, it's no longer clear that Google is a search company. They're certainly an ad-brokering network (I'm sure everyone saw the announcement last week that they're reaching into newspaper ads now, with plans for basically all major media). And they're a provider of (mostly) Web-based productivity tools of all kinds. But a lot of those activities seem to have little to do with their mission of "organizing the world's information and making it universally accessible and useful." They do seem to have organized the world's top search experts. But as a customer, I'm not sure I'm feeling the benefit of that in my everyday searching.
A final note: I use a slew of Google tools every day, my life is richer for them, and I'm not looking to bash the company. But I do wonder where the next search innovations are going to come from, and I'm surprised to find that Google isn't the obvious answer to that question anymore.

tags: web 2.0  | comments: 12   | Sphere It
submit:

 
Previous  |  Next

2 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/5042

» Instead of Google from Alpha-Geek.com

I've been using del.icio.us as a Google-replacement now for 20-30% of my searches on certain hard-to-search-for terms. Read More

» Top 17 Search Innovations outside of Google from The Software Abstractions Blog

digg_url = 'http://blog.softwareabstractions.com/the_software_abstractions/2007/05/top_17_search_i.html'; There is an abundance of new search engines (100 at last count ) - each pioneering some innovation in search technology. Here is a list of the to... Read More

Comments: 12

  MIke Lewis [11.13.06 10:05 AM]

It is interesting to see both how far search has come but also to think about how far it can go. As the founder of a vertical search company (qloud.com), i think a lot about how to improve search results. Our philosophy is that Google is great for objective items like web pages but when you need to search for subjective tastes - like music or videos or jokes - then the social (demographic) aspects are very helpful.

Now that Google getting more and more registered users searching, it will be interesting to see how (or if) they incorporate that into their applications.

  John [11.13.06 01:34 PM]

Google needs to create a more fair search engine for unknown sites.

  David Wolber [11.13.06 01:38 PM]

One answer might be double-filtered results. User's filter search results all the time but generally throw their work away. del.icio.us and their ilk provide a way to record, but there hasn't been much innovation in terms of a feedback loop, i.e., using such data to improve future search. I guess the question is 'what can a search result ranker learn from user links that it doesn't know from the links of page creators (what search engines use now)'? And how does time factor in, what is feasible, etc.

  Gary [11.13.06 02:59 PM]

As one of the founders of www.tallstreet.com I'm biased but.

Google and the like are useful for finding specific content. But any algorithm that only takes into account the link structure of the web is going to be biased, because its not taking into account all the factors that determine if a website is the best website under a keyword at a particular time. Sure Amazon has the most sites linking to it, does that mean it has the best deals, best customer service, best shipping times and the range of books I'm looking for?

Tall Street is trying to create a fairer search results by allowing users to determine the rankings.


On TallStreet.com, the websites at the top of the results are those that are best able to inspire their users to make investments in them, this tips the balance in favour of the passionate little guys.

  theDA [11.13.06 06:54 PM]

re: "[Sidebar: I brought up the idea of real-time Web search with Tony Stubblebine, and he thought it was hilarious. Totally unrealistic viz computing capacity. I thought it was laughable that he thought we wouldn't eventually have the bandwidth and cycles to do it. Tune in in 10 years for an update.]"

It's not a matter of bandwidth and cycles. It's a matter of temporal physics. In order to find something it must have already existed and you must process it, both of which will yield some delta-t. Therefor, no "real-time" searching, if that even makes any sort of semantic sense.
In fact let's stop saying "real-time" altogether as nobody uses the term correctly.

  Nilofer Merchant [11.13.06 07:35 PM]

You raise a really good point. And it's an issue many "Incumbents" have. Once you create a category, you worry about doing too much change because you could lose your customer base and disrupt ... yourself. That's why new folks can see new approaches. One thing that came up loud and clear at your summit was the notion of vertical / context-based search so it wasn't just search as it's own discipline but looking at where you already are to make the search relevant. That seems like a valid direction.

  Search Engines WEB [11.13.06 07:48 PM]

Google's main search results pages barely changed in the two years from one edition of the book to the next

Google may have intentionally avoided all the AJAX extras on their Default homepage because Marrissa has recently gone on record as suggesting that research shows Google Users prioritize SPEED......

battellemedia.com/archives/003076.php

The labs projects TESTS potential bells and whistles - it they were accessed to be very popular - they would have graduated from Labs and potentially be a default homepage feature.

Google - as a whole - may have a relatively more experienced search base - and are less likely to need those 'guidance' features that other search engines are adding

  Kempton [11.13.06 11:57 PM]

re: "As the Web gets bigger, search results contain more irrelevant stuff. In many cases, it's getting harder to find what you want. Appreciably harder."

I wonder if it is because Google's algorithm is not scaling up well (which I kinda doubt) or more likely Google is not handling the SEO tricks being thrown at it that well?

By the way, I agree with the above comment by theDA re: Sarah's potential misuse of the term "real time". And I love the short and concise explanation given. Incidentally, Wikipedia has a pretty good entry on what "Real Time" means.

  Diarmad [11.14.06 01:43 AM]

Although Google revolutionised search with their minimalist interface and best in category search algorithms, I?m finding myself increasingly looking for ways to influence the results. We know from usability testing that users are unlikely to go beyond the second page of results which creates a bias to which sites are visited depending on the algorithms used by the search engine. Although one can use the advanced search for more complex queries it may also be useful to support other search behaviours, such as allowing the user to sort the results by last edit date etc., as with more traditional database searches. Similarly, the ability to browse relevant categories rather than literal string matching is a useful information seeking method which many search interfaces now discourage.

  Omar Khan [11.14.06 10:29 AM]

A lot of good points, and it does seem like Google is becoming like an aircraft carrier which can only change course slowly. Still, I am not sure what they have up their sleeve. Google Co-op search, where you can put their search engine on your own site and customize it and which was just launched, is pretty incredible and is likely to entrench them even further and allow verticals to build their own search engines on top of Google's platform.

  Tony Stubblebine [11.14.06 03:40 PM]

re: real time search and temporal physics

I fought Sarah pretty hard on that term and originally dismissed the idea altogether. I think that's the main problem with the term, for people who take it literally the concept of real time search is impossible.

But there's a nugget in the idea. There's more time dependent content on the web. The Sci-Fi channel ran a BSG promo recently that was only available right after the show aired. Video feeds, especially sports feeds are time dependent.

You might not need search results that are real time, updated to the nano-second. But there's a growing class of content that you want to updated every few minutes or so. Is that going to change search?

What about in 50 years when bandwidth and server power is 10000x greater? Will you be able to ping major sites with every potentially relevant search request?

  michaelholloway [11.17.06 08:13 AM]

The potential of LiveWeb boggles the mind. Many times I've wanted to go back in time to see who said what, when - re-create a days events in some subject area - as I missed the passing of that particular parade. Can we get there more quickly then 10 years?
I hear the new Play Station 3 has 7 RAM processors! LiveWeb could be processed inside the PC. I'm guessing it would require 3 RAMs; one to file info. from the server, one to search for date and time references in the text of those pages and the third to process the process. That would increase speed by huge multipliers and allow fast results for longer search lists found by the search engine.
The interface would be an option on the search engine, 'Set Day of Search', in other words the user asks the engine to 'pretend' its a day in the past; 'please do a boolean search and present an index that would have appeared on a day in ...'. It would be based on a search for Dates in the Text, not an actual record of when items were originally posted. This initial LiveWeb would be open to mistakes in the text and more importantly if the www is going to become The Legal Record, Intentional miss information.
In the future, servers would compile date and time logs for all pages when they're routed; so a search engine could perform this kind of search request by adding a field that takes into account the real time list. With enough processing power LiveWeb could be accurate to the second. (Or with-in an hour, or a half day if volume remains a problem.
I'm ignorant, do pages have code on them for the time and date when routed?
To me this seems do-able in the near term.

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.

RECOMMENDED FOR YOU

RECENT COMMENTS