A couple of weeks ago Bing had a small search summit for analysts, bloggers, SEO experts, entrepreneurs and advertisers. It was held in Bellevue; they put us up in the hotel and fed us. While there we received demos from Bing project teams. I was able to snag an interview with Sanaz Ahari, Lead PM on Bing. She led the team that developed the categories you see on a Bing web search. The interview was based on the slides from her presentation at the event. I have posted the significant images from her slides. The first portion of the interview focuses on how the Bing team handles Query level categorization and some of the problems they face. The second portion (up shortly) focuses on the systems used to generate the categorization.
Disclosure: I was on the MSN Search team (now the Bing team) from 2004- March, 2006. I knew Sanaz at that time.
Brady Forrest: Hi, this is Brady Forrest with O’Reilly Radar, and I’m here with Sanaz Ahari, Lead PM on Bing Search. And she’s going to lead us through the categorization process that you see on every page. Hey, Sanaz.
Sanaz Ahari: Hey, Brady. So I’m going to walk you through basically kind of just the journey that we went through for coming up with our categorized experience. And so the categorized experience is basically the left rail experience that you see on Bing today. It doesn’t show up for every single query today, but when it does show up, it’s really about helping the users complete their task essentially. So just to take a step back, when we started on the project, we had done a lot of analysis on queries just in vacuum. And queries are always a part of users completing a task. And in a lot of the analysis we did, we noticed that a lot of the tasks are common. And it’s really just common sense. When you’re looking for a car, you’re either researching it; you already own it; you want to buy one. When you’re looking for a musician, you want to see if they’re on tour; you want lyrics, songs, albums, et cetera.
And so our challenge was can we apply some of that essentially structured aspect to queries. And this is really similar to what you see on sites like Amazon, IMDB, et cetera. They do just a really kick ass job of categorizing their content. The challenge is that A, those sites are really about one domain. And then B, those sites are really operating on top of already structured data. And so the challenge that we have with search is that A, we are a general purpose search engine, and then B, the data that we have is not structured. So the goal that we started out with was we wanted to start very simple. And categorization on clustering, et cetera are nothing really new in the search space. There are a lot of people for years that have been working around the space in the research and computer science space.
So what we started out with was one of the key things that we wanted were two principles. One of them was A, can we achieve aspects and categories that were really, really user intuitive. And B, can we achieve this across a query class. One of the things that we really wanted was in order for us to build a habit for our users, we needed to deliver a predictable consistent experience across a query class. So if I went and told my dad, “Hey, Dad, try any car,” I really want him to get a categorized experience for any car. So those are the two kind of constraints that we really set for ourselves. We said, “Unless we meet these two criteria, it’s not really successful.” And so we started out with a lot of prototyping around, “Hey, can we actually extract intent from queries?” So we started from the intent aspect. And I’ll walk you through an example just to show you a simplistic view and how it gets very easily complicated.
So in the example that you see here, we started out with musicians. So with musicians as a whole, the categories and the tasks essentially that the users do generically are fairly straightforward, you know, people want lyrics, songs, tabs, tour dates, ring tones, et cetera, and the list goes on.
Brady Forrest: And are musicians judged as a category?
Sanaz Ahari: Yes, so musicians here is, for example, a category. Yes. Now this is fairly — what I would say, it’s a fairly meaty high-level category though, because as you dig in deep, there are a lot of different attributes about musicians. So the three different examples I have here are — well, two of them are my favorite bands, but not J Lo exactly. And they kind of cover a wide range. So you’ve got Jennifer Lopez (Bing search) and she’s a pop musician, but she’s one of those people that does a whole bunch of other things as well. You’ve got Gotan Project (Bing search), a little bit more tail. And they’re a trip pop band. And then third, you’ve got Rodrigo Y Gabriella (Bing search) who are more of rhythmic guitarists. And you can think about all different sorts of attributes. You’ve got musicians that may not be alive anymore, et cetera. So there’s all sorts of different attributes that fall out of even just a single musician’s class. And so in this example, ideally, you should nail the right categories that apply to these three different examples.
So in one case, you’ve got the guitarist’s ideally for this case, you know, tabs are pretty relevant. Lyrics definitely don’t make a whole lot of sense. And then you’ve got J Lo and she is multifaceted, and we should really try and capture most of her facets. She’s a fashion designer. She’s an actress, and she’s a musician, et cetera. So this shows you kind of the types of problems that we have to solve. A is a query might fall under different classes. B is that even if you’re under a single class, the intent from that class, it may not be the same. And then there’s the problem of head queries and tail queries, ones were we have a lot of data for and ones where we don’t have a lot of data for. So from here on, we go on to basically our approach for solving this problem. I should say that this is an area where we had a brilliant set of folks working on it. We collaborated pretty closely with research. We had a brilliant set of engineers working on it. And the model that we converged on is one where we basically do category level inference as well as query levels. So in this case, in the category level, we want to figure out — I’ve given a class of queries that are all similar. What are the top things that users are interested in?
In this case, our algorithms basically we used a whole bunch of different features, everything from query clustering, query clicks, session analysis, document extraction, contextual analysis, et cetera. And all of these things were things that we — the features that we added were based on — we did a lot of quick iteration to figure out what is good; what is bad and then where do we fall short to figure out what are the extra things that we really need to add in to our algorithm. So measurements was a very key process into our system because we really, really wanted to achieve categories that the users could make a lot of sense out of.
So algorithms don’t often give you things that users really understand. So we really, really wanted to deliver things that it made sense to the users. And then on the second level, we really wanted to understand everything about just a query standalone as much as possible. And this is to balance the whole, “Okay. What are the top things people care about in a whole category?” If I’ve got this bag of categories that users care about, now how do I pick the right ones that only apply to this query? And that is why we had an approach at a category level and also at our query level. Lastly, we did a lot of work around determining if we know that a query is in a category, is that actually the primary intent for that query. So, I don’t know, like traffic may be a movie, but a lot of users when they type in traffic, they actually are just looking for how bad is the traffic right now. And that’s an example of a query, even though belonging to a category, it may be an obscure intent.
Lastly, we have our ranking model. And our ranking model basically takes all of the different inputs at the category level and at the query level in order to do some modeling around what are the top intents that apply to our re-query. And, of course, we have a very tight feedback loop system from what are the things that users engage with to feed back into the ranking of the categories as well as discovering new ones.
Brady Forrest: And how fast do you have to make this calculation for each query?
Sanaz Ahari: I mean it’s all pretty fast because we are scaling through millions of queries. So there’s a combination of things for performance optimizations, we do some things offline and we do some things online. For things that don’t change a lot and it makes sense for us to do it offline, we try to optimize it. But it’s definitely a combination of the two. And our goal is with users, performance is just an expectation. So that’s something that we can’t compromise on. So everything happens in a matter of milliseconds basically for all of our computations.
Brady Forrest: And how much are you able to cache in case suddenly a query starts to trend up?
Sanaz Ahari: Right. For a lot of our headquarters, we definitely do a lot of caching, et cetera. And for real time spiky things, we have invested in an entire different system where we’re constantly monitoring for spiky trends. So it’s basically the two systems are basically kind of optimized both individually so that we always are aware of what are the things that are all of the sudden spiking a lot. And then being smart about the things that have already been — you know, that are head queries, that people are re-querying for.
The second portion of this interview will be posted shortly.