Bing's Sanaz Ahari on System Feedback (2 of 2)

A couple of weeks ago Bing had a small search summit for analysts, bloggers, SEO experts, entrepreneurs and advertisers. It was held in Bellevue; they put us up in the hotel and fed us. While there we received demos from Bing project teams. I was able to snag an interview with Sanaz Ahari, Lead PM on Bing. She led the team that developed the categories you see on a Bing web search. The interview was based on the slides from her presentation at the event. I have posted the significant images from her slides. The first portion of the interview focuses on how the Bing team handles Query level categorization and some of the problems they face. The second portion focuses on the systems used to generate the categorization.

Disclosure: I was on the MSN Search team (now the Bing team) from 2004- March, 2006. I knew Sanaz at that time.

bing query categorization



Brady Forrest: Now on this image, it shows the ranking model and then it shows engagement and measurement.

Sanaz Ahari: Yes.

Brady Forrest: How does engagement and analytics factor into tweaking the ranking, measurement and engagement algorithms?

Sanaz Ahari: So the key thing about engagement is really there’s two things: A, how often do people click on the different categories and then B, once they click on it, what do they do after that? So we basically feed that back into figuring out, “Okay. Did we actually put up the right thing? If something lower down is getting clicked on more, does it deserve to be higher? If something is not getting enough engagement, does it need to be bumped down?” And as we really expand the system, I’d have to say for us as a team, this is really the first step towards what we want to do. And, ideally, we want to get to the point for where we have enough understanding about every single query that we can really help you refine your tasks and your categories. So the engagement model can also help us in the future as we go in deeper into queries for helping people. We shouldn’t just say, “Seattle, I’m going to Seattle restaurants.” You should be able to go to Seattle restaurants and go in really deep and say, “I want restaurants in this neighborhood. I want of this price range, et cetera.” So all of the engagement metrics can actually help us figure out what are the follow on tasks that users engage in the most as well.

Brady Forrest: And so what is the second flow chart?

bing measurement

Sanaz Ahari: So the second area, so once we felt that we could deliver intense understanding at a level of quality that we felt comfortable with, then we tackled the second area of problems which is equally difficult, which is really around, okay, how do we know that J Lo is a musician in the first place. And this is really around the query understanding aspect of things. And this is an area where we, again, explored multiple different approaches. We could’ve done a very kind of clustering on the entire corpus of our quarries. Or we could’ve said, “We’re going to start a little bit more targeted and only go after the domains that we really want to go after.” Like we said, “Let’s just go after health and see if we can solve a small problem before trying to take on the entire corpus of the web.”

For the Bing release, we focused — and this was just like a principle that we had of the team was we really wanted to start small and see if we can get the level of quality that we wanted before trying to take on a lot more different challenges. And so, in this case, we definitely — we went after the types of domains that we knew were strategic for us. So, all of the sudden, our corpus of quarries that we were interested in was a lot smaller. And we already have abilities to classify quarries into domains and understand, okay, this query is a music query or this query is health query, et cetera, et cetera. And so the other problems that fall out of that is, okay, when people do do health quarries, what are the categories that fall out of that? Like how do we know that people are going to care about diseases and symptoms, et cetera, et cetera. And then the next problem after that is how do we know that we have a comprehensive understanding of all diseases? So we may be able to understand that there are N different diseases, but how do we know that that’s actually a comprehensive list?

And then lastly, there’s a problem about — and this is one of the fascinating search problems — is users query for the same thing in many, many different ways. So an example that I had was, for example, health is actually a very complicated one where the ALS disease is also known as Lou Gehrig’s disease. And it’s also known as one other thing which sounds kind of complicated. I don’t even know how to say it. But there’s lots of different ways that people basically query for the same thing. And so those were the three different problems that we really had to tackle in the query understanding space. So the two areas that we basically looked at was A, if we are able to identify a C set of quarries in a category, how can we actually really expand that out and be able to understand that we have a comprehensive list expander? Like if we start with N items, are we able to expand it out and get a more comprehensive list of items that are very similar to an existing C set that we started out with. And that’s really the query expansion problem.

Brady Forrest: And what type of numbers are you talking about? Is it 100 or 1,000 or 100,000?

Sanaz Ahari: Oh, for the C sets? It completely varies. It completely varies. There are some categories that are small. There are some that are large. Like if you try to tackle musicians as a whole, that’s huge. Whereas if you try to tackle like sports teams or something, that’s pretty small. So it varies.

Brady Forrest: And are you pulling category names? Like are you pulling Wikipedia? Like proper nouns in the case of musicians or are you also pulling raw queries from the logs?

Sanaz Ahari: There’s definitely both. We use a whole bunch of different features. We do a lot of work from logs. We do a lot of work on document extraction as well. What’s very interesting is logs can give you a lot of great information where we have enough information. So it doesn’t necessarily help you address the tail with precision. And document extraction can potentially help you with more comprehensiveness. And one of the things I would say is we also realize the good thing about approach on a whole actually, both on the intense extraction side and on query understanding side has been that it was an amazing learning experience for the team to tackle the problems one at a time because we realized there were so many intricacies that there are some things where we can build a generic system and it can help every category. But there were also cases where we would find a lot of intricacies in some categories where we had to do —

Brady Forrest: So what’s a query that you’re proud of that was like really hard and you feel like — like an example of a query that really came a long way?

Sanaz Ahari: I actually don’t have one at the tip of my — I do like the experience for Jennifer Lopez because she has a lot of different attributes.

Brady Forrest: What’s one that you really want to improve but you didn’t want to tweak by hand?

bing jaguar

Sanaz Ahari: Actually, the Jaguar one was one (Bing search), the one this morning that we talked about. That was a great query. And in some ways, I actually think we do a lot of positive things with that query. Like in one sense, I would say that we definitely deliver a diversified experience. And we at least capture the different intents. Whereas without the left rail altogether, you get the — most users don’t really go past the third algorithm result. And that in and of itself doesn’t really give users enough diverse to creations [word] to say, “Okay. This is really my intent. And this is what I really want to dig down to.” So on one hand, I like what we have done. But in the ideal scenario, I envision us being able to enumerate all of the different intents and all of the different tasks that actually fall under every single intent. So ideally, we should be able to call out animal, team, car, et cetera and then call out the individual tasks that the users want to do beneath every single one of them. There is — the two areas that I really, really want us to improve is one, around that. I think that disambiguation is a pretty hard problem where we’ve barely scratched the surface. And then the second area is the depths of our coverage. You know, I really want us to have a much deeper experience where if I type in Indian restaurants in Fremont (Bing search), I should be able to still get a categorized experience where I can still dig in deeper.

Brady Forrest: What percentage of queries categorize the experience?

Sanaz Ahari: So today, 20 percent of our queries have a categorized experience. And the team is actively working on our next release where we are working on increasing both the quality and the coverage and specifically going more into longer queries.

Brady Forrest: Okay. Well, thank you very much, Sanaz.

Sanaz Ahari: Thank you.

tags: ,