• Print

Database War Stories #5: craigslist

Eric Scheide of craigslist offered me a stream of consciousness summary of the craigslist database setup. At a conference last year, Craig showed a slide (which helped inspire my postings about asymmetric competition [1, 2, 3]) that listed the number of employees at the top ten web sites. Most of them have thousands of employees. Some have tens of thousands. Craigslist, at #7 on the list, has 19.

Eric’s email has that embattled “news from the front” feel that you might expect from a site handling that much traffic with only 19 employees!

First, in response to my question about the craigslist database architecture, Eric wrote:

“all database machines are on 64 bit linux boxen/ 14 local drives with 16gig of ram.

craigslist runs clusters of dbs for each of our various services:

forums: 1 master and 1 slave (mostly for backup) myIsam tables everywhere. DataDir size including indexes 17G. Largest table is approaching 42 million rows.

classifeds: 1 master and 12 slaves. We have various flavor of slave databases. we have teamreader, longreader, thrashbox, for backups and very long adhoc queries and a few extra boxen. At times we have an offsite slave incase the colo goes dark. Currently this is on hold until we get a bigger pipe to our office location. Current footprint including indexes 114G, 56 million rows in the largest table (it’s time to archive some of those oh yes it is) yesterday we wrote 330000 new rows to this table; Myisam everywhere, mostly because it works.

ArchiveDB: 1 master 1 slave. holds all craiglsist postings older than about 3 months. Looks very similar to classifieds except bigger. 238Gigs, 96 million rows. Oh yea we use merge tables all over the archive spliting data into more managable chunks. We may do this in production soon.

searchdbs: 16 of these in 4 clusters. We take live postings and split them by area/category type (sfbay/housing) and then use myisam full text indexing. each cluster only contains a subset of all positngs. We find the right host/table in software. This runs good, but do not think this solution will scale for much more than year. Indexing is expensive and we have a lot of churn.

Authdb: 1 master and 1 slave. smallish.

a few smaller “junk” db’s that have transient data.”

In response to my question about lessons learned in managing the data store, Eric wrote:

“databases are good at doing some of the heavy lifting, go sort this, give me some of that, but if your database gets hot you are in a world of trouble so make sure can cache stuff up front. Protect your db!

you can only go so deep with master -> slave configuration at some point you’re gonna need to break your data over several clusters. Craigslist will do this with our classified data sometime this year.

Do Not expect FullText indexing to work on a very large table. It’s just not fast enough for what user expect on the web and an updating rows will make bad things happen. We want forward facing queries to be measured in a few 100ths of a second.

There appears to be such a thing as a keybuffer that is too large even if you aren’t swapping. Performance blows. So be careful when you bump up the key buffer. But do find the sweet spot.

mysql seems to really love 64 bit boxen. we recently switched to 64 bit, added a few drives and basically took all the mountains out of our load charts.”

As to war stories and lessons learned, he wrote:

“mysql upgrades can be the best thing ever [but can also] make you hate yourself.

We upgraded our search clusters to 4.1x a while back and got a huge performance boost from 4.0. there were no notes in the change log that fulltext indexing had been touched but it surely rocked.

We once rolled at a minor revision 4.0.x 4.0.x++ and query optimization flipped over on its head, seemed fine in testing. It suddenly was choosing complete different indexes than the prior version. But only in some cases. So it hit the live site and bad things happened.”

In response to my question about any information on the scale and type of data they manage and its growth, he wrote:
rates

“some numbers in the data section above. craigslist tends to have 200% growth yearly both in posting and reading.”

More entries in the database war stories series: Second Life, Bloglines and Memeorandum, Flickr, NASA World Wind, O’Reilly Research, Google File System and BigTable, Findory and Amazon, Brian Aker of MySQL Responds.

tags:
  • ted

    The Free Ads phenomenon.

    Online free ads accounted for only 1% of the overall classifieds market in 1999,
    that number grew to over 6% by 2002 and is now estimated to be close to 10%.

    Advantage of the Free Ads markets

    Convenience and ease of use
    Powerful search capabilities
    More personalized
    More timely and up-to-date listings
    photos, video, and sound clips in online ads

    Craigslist
    Traffic Rank for craigslist.org: 27
    (Alexa)

    FreeAds.net
    Traffic Rank for freeads.net: 14,320
    (Alexa)

    Kijiji
    Traffic Rank for kijiji.com: 41,575
    (Alexa)

  • http://www.sprinj.com heri

    woaw, 19 employees. when you count managers, business people and non-technical people, there must be only 5 or 10 engineers in craigslist ? these guys should write a book or teach at universities, whatever. i am in computer science (Msc) and there seem to be a long way before understanding what he does.

  • Frank Tanner

    I do believe that Craigslist only has 19 employees. In actuality they behave like a small schoolyard click more than a business. If for some reason you rub one of them the wrong way your ads are banned from their site. They may tote high ideals about an internet comminuty and posting forum rules and it’s all a crock of bull. I have had my ads for a legitimate business banned from Craigslist simply because someone there doesn’t like my advertisements. Maybe they don’t like me because I’m hispanic, maybe anything. I’m not saying this to be funny, I’m saying this because to date I have not been able to get a straight answer from Craigslist staff as to why my ads are removed. I have emailed Craig himself and asked him why a legitimate, business that has nothing to do with anything illegal, is not involved in spam or any bad business practices, is being targeted, and the response has been nothing. I have emailed everyone on the staff with the same questions and have received no answers back. When a company refuses to respond or give any reasons whatsoever for their actions that only tells me that they are a group of children who simply act on a whim. Otherwise they would respond with a legitmate reason and give the person an opportunity to correct the problem. Craigslist may have started as a great idea, and has gained popularity, but like everything, power has gone to their heads. Once people start finding out about how arbitrary and biased the staff can be, they will start to loose readers.

  • perry ruiz

    I agree with Frank Tanner! Craigslist is a total joke! I am surprised they 19 employees, I would have thought more like 4 or 5 young business wannabees. I have owned an operated avery suceful travel agency for the last 3 years, and have also had my ads pulled for craigslist because someone didnt like the way I advertised. For all I know it was a competitor that just flagged it, so it would be removed. In any case this is a very poor way to run a classified website. Just a month ago I did see that craigslist was featured on the local news for allowing prostitutes to advertise on craigslist, and if you look on craigslist under erotic, you will see that even to this day they allow this type of advertising, yet they will remove a legitimate business ad. Guess that tells all we need to know about what kind of person the owner of craigslist is ( A SICK AND TWISTED PERVERT ).

  • http://www.kameir.com kameir

    Overall the system obviously works. It would be an interesting experiment to take out the personals (and related) for a while, to see how much traffic would be left.

    But programmer rule number one is still:
    ‘never touch a running system’.

  • jim

    craigslist is a step back in the right direction for a free and open internet, which is what made the internet to start with. thanks and keep up the good work. can you put waller texas on the craigs list, thanks jim

  • http://www.spunk.ws Mike

    Craigslist works for the same reason Google does. It’s simple, easy to use, and it gets the job done fast and cheap. Yes, the people who run it can get a little testy, but as a mostly free service, there are legions of people who want to co-opt it. That goes the same for the small business guy or for eBay which owns a minority stake in it.

    Say what you will, they have millions of fans.

  • http://www.stumblehere.com lindsaey

    Even though craigslist is “click-ish” they are going to be hard to top. They blow away even their closest competition in traffic.
    If you do a Google search there are hundreds of “craigslist clones”. The classified ad space is going to get huge and I am wondering how we will look at this a year from now.

    You have newer sites in the free classifieds space like http://www.stumblehere.com that are trying to do something different. And you have eBay’s new classified site kijijiji, a lot of money behind this one. Then there is always Google base, and who knows how far Google will take that.

    I am wondering if craigslist will be the clear-cut leader a year form now. Will there even be a front runner? I guess let’s wait and see what happens. Will the “click” still be sooo dominating?

  • http://www.stumblehere.com lindsaey

    Even though craigslist is “click-ish” they are going to be hard to top. They blow away even their closest competition in traffic.
    If you do a Google search there are hundreds of “craigslist clones”. The classified ad space is going to get huge and I am wondering how we will look at this a year from now.

    You have newer sites in the free classifieds space like http://www.stumblehere.com that are trying to do something different. And you have eBay’s new classified site kijijiji, a lot of money behind this one. Then there is always Google base, and who knows how far Google will take that.

    I am wondering if craigslist will be the clear-cut leader a year form now. Will there even be a front runner? I guess let’s wait and see what happens. Will the “click” still be sooo dominating?

  • http://www.craigslistclones.com gene

    with the internet being as big as it is and growing bigger everyday the competition may be growing but so are the options you can now have a classified ad script on your site just like craigslist http://www.craigslistclones.com I purchased this script and will start a website and so add me to the list of competitors

    http://www.craigslistclones.com

  • Robert

    I can’t understand all the crying about ad’s being pulled. Were they paid for advertising or ‘free’ advertisements?

    If they were ‘free’ then you have no right to complain about an Ad being published or canceled.

    If they were paid for demand you money back and advertise else where.

  • http://www.onlineclassifieds.com Scott

    Craigslist is the current giant. It’s simple to use and has major traffic. Sites like http://www.onlineclassifieds.com are trying to be different by combining other sites feeds and api’s for better search, but still have a long way to go to compete. Kijiji looks to putting a lot of money into promotion and have some nice options, but the next major player will have to stand out by offering something that’s probably not even thought of yet.

  • http://index-classifieds.com Josh

    The two major free classifieds will remain craigslist and kijiji. I’ve seen a few new ones like http://www.backpage.com and another new one http://www.index-classifieds.com who are taking new approach and are trying to give the user more functionality.

  • http://www.squidoo.com/mnroofing Steve

    Online Classifieds are doing better than ever. Craigslist is my favorite site. I do see how sites like OnlineClassifieds.com and Stumblehere.com could catch up over time.