Publishing

Technology is transforming publishing. From the way ideas are generated to the packaging of information to the delivery of products, the industry is in the midst of a sea change. We've always considered O'Reilly as much of a technology company as a publisher, a belief that's led us to develop information products such as GNN (the first commercial website), Safari Books Online, and the Tools of Change for Publishing conference. As publishers seek a new equilibrium in our networked world, we aim to be both a catalyst and chronicler of what has inevitably been called Publishing 2.0.


 

Recent Posts from TOC

Wed

Jul 1
2009

Nat Torkington

Four short links: 1 July 2009

Web Awards, Speed Thrills, Magazines in the Cloud, Augmented Reality

by Nat Torkington@gnatcomments: 0

  1. The Onyas -- New Zealand web design awards launch, from the people behind Webstock and Full Code Press. The name comes from "good on ya", the highest praise that traditionally taciturn New Zealanders are allowed by law to give.
  2. The Year of Business Metrics: Don't make your users run away! -- wrapup of the Velocity conference. AOL: Users who had a slower experience view far fewer pages. Some interesting notes on performance from a Google-Bing study: Notice that as the delays get longer the Time To Click increases at a more extreme rate (1000ms increases by 1900ms). The theory is that the user gets distracted and unengaged in the page. In other words, they've lost the user's full attention and have to get it back. [...] As much as five weeks later, some users, especially those who saw delays greater than 400MS, were still searching less than before. (via timoreilly on Twitter)
  3. Printcasting -- very simple content management system for print magazines that lets anyone start a magazine, add content, sign up contributors, sell ads, and go. Clever!
  4. Pachube Augmented Reality Hack -- sexy hack that pushes all my buttons: computer vision, Arduino, sensor network, ubiquitous computing, pervasive alternate reality cyborg villians with chalk designs hellbent on world domination and the enslavement of the human race to use as meatsack AA batteries for their sex toys. Okay, four out of five ain't bad. (via bruces on Twitter)

Pachube Augmented Reality Demo

 

Wed

Jun 24
2009

Tim O'Reilly

My 140conf Talk: Twitter as Publishing

by Tim O'Reilly@timoreillycomments: 6

I spoke at Jeff Pulver's 140conf a few weeks ago. My subject was the continuity of what I do, from publishing through conferences through my presence on twitter. I tried to draw the connections, and to explain how "social media" means drawing from, curating, and amplifying the voices of a community. I suggest that the role of an editor and publisher is analogous to the role of a point guard in basketball, handing out "assists" and improving the performance of his or her teammates. After all, I point out, I couldn't possibly tweet enough to cover all the topics I am interested in. But by using my retweets to build the visibility of others, I can create and foster a community that cares about the ideas, trends, and people that I care about.

My talk starts about 1:40 into the video, after a few comments from Jeff Pulver, the conference organizer. I've provided a lightly edited and linkified transcript below, for those of you who don't have time to watch the entire 15 minute video. If you do have the time, you can watch the video from the entire two-day conference at http://www.140conf.com/watchit.

What I learned from Twitter

Hi. I want to talk to you a little bit about Twitter and media. I'm a publisher. I'm a publisher in print. And it turns out I'm also a publisher on Twitter. I want to explain the roots of media and how that connects with what we're doing in this newest form of media.

When you think about the original use case of Twitter, which @Leisa described so wonderfully as “ambient intimacy,” it's really news from your close friends. But it's news nonetheless. And sometimes the news from individuals becomes news that matters to a whole lot more people. When someone in Tehran today is reporting their personal news, it's news that matters to all of us. And so you can see the continuum between the personal and the international in those moments.

But that continuum exists all the time, and it's existed always in media.

(continue reading)

tags: 140conf, publishing, twittercomments: 6
submit:

 

Sun

May 17
2009

Andrew Savikas

Scribd Store a Welcome Addition to Ebook Market (and 650 O'Reilly Titles Included)

by Andrew Savikascomments: 7

The document-sharing site Scribd has launched a new "Scribd Store" selling view and download access to documents and books. As part of the launch, there are now more than 650 O'Reilly ebooks now available for preview and sale in the Scribd store, and all include DRM-free PDF downloads with purchase. (Scribd will soon be adding EPUB as a format, and we'll make that available as soon as possible.)


Oreilly_scribd

Many publishers (including O'Reilly) have kept Scribd at arm's length because the service was often used by people posting copyrighted material without permission. Though Scribd was reasonably responsive to takedown requests, that puts the onus for monitoring on the publisher, a whack-a-mole scenario that will consume as many resources as you throw at it if you let it. But Scribd has implemented a new system that uses the ebooks provided for sale to identify (and remove) any other unauthorized versions of that material, as well as prevent future unauthorized uploads. Like any technology it's far from perfect (for example, I suspect scanned images are more difficult to test than standard PDFs), but it's good enough for us to be comfortable participating, and is as good an example as any of turning lemons into lemonade.

For a publisher (and I use the term loosely) the terms for the Scribd store are impressive -- publishers set the sale price directly, and keep 80% of the revenue (compare that to Amazon's DTP program, where the standard terms are that Amazon gets to set the actual price, and the publisher only gets 35% of their "suggested" price). There's also an interesting "automated pricing" option in Scribd, which uses an (unspecified) algorithm to set the sale price. But the pieces of the Scribd store I'm most excited about is the real-time reporting (compared with a lag of a month or more with most ebook resellers, including Amazon), the option to easily provide free updates to existing content, and the variety of adjustable display options -- like preview amount, refreshingly optional DRM, and purchase-link images. Administering and understanding your sales in Scribd is downright delightful compared with the same for Kindle.

A service like Scribd further reduces the barriers to content creators interested in self publishing digital material (and again offers much better terms than Amazon's DTP program for Kindle), so in some ways absolutely a threat to existing publishers. But we also view it as an opportunity to get our books in front of interested readers, and a promising sign that the market for ebooks is large enough to continue attracting startups like Scribd who bring needed diversity and competition among resellers.

 

Fri

May 15
2009

Nat Torkington

Four short links: 15 May 2009

LIfe After socket(), Imminent Death of Web 2.0, Breathalyzer Lameness, and Open Source Science Publishing

by Nat Torkington@gnatcomments: 4

  1. Whither Sockets? -- ACM Queue article on how sockets as a model for network programming have become an obstacle to where networking is going. All of these calls have one thing in common: the calling program must repeatedly ask for data to be delivered. In the world of client/server computing these constant requests make perfect sense, because the server cannot do anything without a request from the client. It makes little sense for a print server to call a client unless the client has something it wishes to print. What, however, if the service being provided is music or video distribution? In a media distribution service there may be one or more sources of data and many listeners. For as long as the user is listening to or viewing the media, the most likely case is that the application will want whatever data has arrived. Specifically requesting new data is a waste of time and resources for the application. The sockets API does not provide the programmer a way in which to say, "Whenever there is data for me, call me to process it directly." (via Slashdot)
  2. Game Web 2.Over? (Meg Pickard) -- update of the classic "wall o' Web 2.0 logos" showing which have folded or been bought. I'm glad to see how many have folded; many were the inevitable "me too"ing of initial successes, and many were simply bad ideas. Death is a natural part of the Darwinian marketplace, painful as it is to those who are naturally selected out of the meme pool. I'm glad to see how many were acquired, showing they had something someone wanted. The diagram's incomplete now, of course: it doesn't show the companies launched after the wall o'logos was made. (via Waxy)
  3. Breathalyzer Source Code Sucks -- 2. Readings are Not Averaged Correctly: When the software takes a series of readings, it first averages the first two readings. Then, it averages the third reading with the average just computed. Then the fourth reading is averaged with the new average, and so on. There is no comment or note detailing a reason for this calculation, which would cause the first reading to have more weight than successive readings. Nonetheless, the comments say that the values should be averaged, and they are not... I periodically worry that I've been so long out of hardcore coding that my skills are rusty and I'd never survive at the coal face again. Then I see something like this and I punch the air and wheeze "I still got it!" as I reach for my cane. (via BoingBoing)
  4. Bloomsbury Science Free Online -- Sir John Sulston, Nobel prize winner and one of the architects of the Human Genome Project, has teamed up with Bloomsbury to edit a new series of books that will look at topics including the ethics of genetics and the cyber enhancement of humans. The series will be the first from Bloomsbury's new venture, Bloomsbury Academic, launched late last year as part of the publisher's post-Harry Potter reinvention. Using Creative Commons licences, the intention is for titles in the imprint to be available for free online for non-commercial use, with revenue to be generated from the hard copies that will be printed via print-on-demand and short-run printing technologies. (via Glyn Moody)

 

Fri

Apr 17
2009

Pamela Samuelson

Legally Speaking: The Dead Souls of the Google Booksearch Settlement

by Pamela Samuelsoncomments: 59

Guest blogger Pamela Samuelson is the Richard M. Sherman Distinguished Professor of Law and Information at the University of California, Berkeley, as well as a Director of the Berkeley Center for Law & Technology and an advisor to the Samuelson High Technology Law & Public Policy Clinic at Boalt Hall. She has written and spoken extensively about the challenges that new information technologies pose for traditional legal regimes, especially for intellectual property law.

This piece will appear in the July 2009 issue of Communications of the ACM. Readers may also be interested in the slides from Pam's recent presentation, "Reflections on the Google Book Search Settlement."

Google has scanned the texts of more than seven million books from major university research libraries for its Book Search initiative and processed the digitized copies to index their contents. Google allows users to download the entirety of these books if they are in the public domain (about 1 million of them are), but at this point makes available only “snippets” of relevant texts when the books are still in copyright unless the copyright owner has agreed to allow more to be displayed.

In the fall of 2005, the Authors Guild, which then had about 8000 members, and five publishers sued Google for copyright infringement. Google argued that its scanning, indexing, and snippet-providing was a fair and non-infringing use because it promoted wider public access to books and because Google would take out of the Book Search corpus any digitized books whose rights holders objected to their inclusion. Many copyright professionals expected the Authors Guild v. Google case to be the most important fair use case of the 21st century.

This column argues that the proposed settlement of this lawsuit is a privately negotiated compulsory license primarily designed to monetize millions of orphan works. It will benefit Google and certain authors and publishers, but it is questionable whether the authors of most books in the corpus (the “dead souls” to which the title refers) would agree that the settling authors and publishers will truly represent their interests when setting terms for access to the Book Search corpus.

Orphan Works

An estimated 70 per cent of the books in the Book Search repository are in-copyright, but out of print. Most of them are, for all practical purposes, “orphan works,” that is, works for which it is virtually impossible to locate the appropriate rights holders to ask for permission to digitize them.

A broad consensus exists about the desirability of making orphan works more widely available. Yet, without a safe harbor against possible infringement lawsuits, digitization projects pose significant copyright risks. Congress is considering legislation to lessen the risks of using orphan works, but it has yet to pass.

The proposed Book Search settlement agreement will solve the orphan works problem for books—at least for Google. Under this agreement, which must be approved by a federal court judge to become final, Google would get, among other things, a license to display up to 20 per cent of the contents of in-copyright out-of-print books, to run ads alongside these displays, and to sell access to the full texts of these books to institutional subscribers and to individual purchasers.

The Book Rights Registry

Approval of this settlement would establish a new collecting society, the Book Rights Registry (BRR), initially funded by Google with $34.5 million. The BRR will be responsible for allocating $45 million in settlement funds that Google is providing to compensate copyright owners for past uses of their books.

More important is Google’s commitment to pay the BRR 63 per cent of the revenues it makes from Book Search that are subject to sharing provisions. The revenue streams will come from ads appearing next to displays of in-copyright books in response to user queries and from individual purchases of and institutional subscriptions to some or all of the books in the corpus. Google and the BRR may also develop new business models over time that will be subject to similar sharing.

One of the main jobs of the BRR will be to distribute the settlement revenues. The money will go, less BRR’s costs, to authors and publishers who have registered their copyright claims with BRR. Although the settlement agreement extends only to books published prior to January 5, 2009, BRR is expected to attract authors and publishers of later-published books to participate in the revenue sharing arrangement that Google has negotiated with BRR.

(continue reading)

 

Tue

Mar 31
2009

Tim O'Reilly

What Publishers Need to Learn from Software Developers

by Tim O'Reilly@timoreillycomments: 29

There was a great exchange on the O'Reilly editors' backchannel the other day, so illuminating that I thought I should share it with the rest of you. We've been discussing the fast-track development we're using to produce The Twitter Book. (We're basically authoring the book as a presentation, after I realized how much more quickly I am able to put together a slide deck to make my points than I am a normal book. Twitter is also such a fast-moving topic that we need to be able to update the book every time we reprint it.)

Sarah Milstein wrote:

Apropos of everything, the NYT on publishers' speeding up the production process, especially with eBooks:
“If this book had gone through the normal publishing procedures,” Mr. Kiyosaki said, “it wouldn’t be worth writing.”
Andrew Savikas replied:
The more I think about it the more obvious it's becoming to me that the next generation of authoring/production tools will have much more in common with today's software development tools than with today's word processors.

Software developers spend enormous amounts of time creatively writing with text, editing, revising, refining multiple interconnected textual works -- and often doing so in a highly distributed way with many collaborators. Few writers or editors spend as much time as developers with text, and it only makes sense to apply the lessons developers have learned about managing collaborative writing and editing projects at scale.

'Nuff said. I await said next generation of authoring/production tools.

tags: publishing, tools, twittercomments: 29
submit:

 

Mon

Feb 23
2009

Mike Shatzkin

Managing monopolies and dominance in the Net age

by Mike Shatzkin@MikeShatzkincomments: 11

Guest blogger Mike Shatzkin is Founder and CEO of The Idea Logical Company, where he has focused on supply chain and digital change issues since 1979. Mike has spoken at and organized publishing industry conferences all over the world. He recently launched The Shatzkin Files blog. One of Mike's several books, The Ballplayers, forms the core of BaseballLibrary.com.

Our thinking about "monopoly" may need to be recast in the Internet age. This is a complicated question to consider and we need to start gathering some good minds around it.

Network effects were noticed before there was an Internet. Both the phone company and the electric company were networks, and it became clear about a century ago that everything worked better for everybody if they WERE monopolies and everybody was hooked up to the same network, not competing ones. So phones and electricity became regulated monopolies, with prices and other behavior, including mandated service levels, controlled. Whether because of a changing ethos or because things became more complicated, or both, "competition" has been introduced in both spheres over the past two or three decades. With debatable results.

Amazon's dominance -- which is not a monopoly but which certainly looks like unassailable hegemony in the world of online bookselling -- can be largely attributed to brilliant execution and maintaining a tight focus on serving the customer. But part of their success at eliminating meaningful competition for online book sales has to do with the nature of the Internet. Online likes one winner in many spaces because it serves the users better NOT to fragment aggregations. If Amazon's reader reviews were spread over 1000 web sites, they wouldn't be as useful to the consumers. And their recommendation engine thrives on data; fewer customers would mean less helpful recommendations for those customers remaining, and the concentration at Amazon means less useful recommendations come from all their retailing competitors. This is an edge that may not stay with the retailer forever, though, because the playing field for information about books is being leveled by social networking sites. That's why Amazon is investing in them.

This tendency to concentration makes it urgent for publishers to get into niches and start trying to own them while they have legacy advantages. If the history of the Net so far is any guide, each information and interest niche will end up being owned by a very small number of players; often it will boil down to one. We seem to have been pretty fortunate with the dominant players (perhaps we should call them "monopoly threats") that have emerged so far, among them: Amazon, Google, ebay, Craigslist, wikipedia, and a now-emerging Facebook. They've executed well and kept their eye on the stakeholders they serve. They, so far, have been more benign dominators than were Microsoft and AOL, two big winners on the previous go-round.

(continue reading)

 

Mon

Feb 16
2009

Joshua-Michéle Ross

Radar Interview with Clay Shirky

by Joshua-Michéle Ross@jmichelecomments: 3

Clay Shirky is one of the most incisive thinkers on technology and its effects on business and society. I had the pleasure to sit down with him after his keynote at the FASTForward '09 conference last week in Las Vegas.
In this interview Clay talks about

  • The effects of low cost coordination and group action.
  • Where to find the next layer of value when many professions are being disrupted by the Internet
  • The necessary role of low cost experimentation in finding new business models


A big thanks to the FASTForward Blog team for hosting me there.

 

Mon

Feb 9
2009

Jim Stogdill

The Kindle and the End of the End of History

by Jim Stogdill@jstogdillcomments: 24

This morning I was absentmindedly checking out the New York Times' bits blog coverage of the Kindle 2 launch and saw this:

“Our vision is every book, ever printed, in any language, all available in less than 60 seconds.

It wasn't the main story for sure. It was buried in the piece like an afterthought, but it was the big news to me. It certainly falls into the category of big hairy audacious goal, and I think it's a lot more interesting than the device Bezos was there to launch (which still can't flatten a colorful maple leaf). I mean, he didn't say "every book in our inventory" or "every book in the catalogues of the major publishers that we work with." Or even, "every book that has already been digitized." He said "every book ever printed."

When I'm working I tend to write random notes to myself on 3x5 cards. Sometimes they get transcribed into Evernote, but all too often they just end up in piles. I read that quote and immediately started digging into the closest pile looking for a card I had just scribbled about an hour earlier.

I had been doing some research this morning and was reading a book published in 1915. It's long out of print, and may have only had one printing, but I know from contemporary news clippings found tucked in its pages that the author had been well known and somewhat controversial back in his day. Yet, Google had barely a hint that he ever existed. I fared even worse looking for other people referenced in the text. Frustrated, I grabbed a 3x5 card and scribbled:

"Google and the end of history... History is no longer a continuum. The pre-digital past doesn't exist, at least not unless I walk away from this computer, get all old school, and find an actual library."

My house is filled with books, it's ridiculous really. They are piled up everywhere. I buy a lot of old used books because I like to see how people lived and how they thought in other eras, and I guess I figure someday I'll find time to read them all. For me, it's often less about the facts they contain and more about peeking into alternative world views. Which is how I originally came upon the book I mentioned a moment ago.

The problem is that old books reference people and other stuff that a contemporary reader would have known immediately, but that are a mystery to me today - a mystery that needs solving if I want to understand what the author is trying to say, and to get that sense of how they saw the world. If you want to see what I mean, try reading Winston Churchill's Second World War series.

Churchill speaks conversationally about people, events, and publications that a London resident in 1950 would have been familiar with. However, without a ready reference to all that minutiae you'll have no idea what he's talking about. Unfortunately, a lot of the stuff he references is really obscure today and today's search engines are hit and miss with it - they only know what a modern wikipedia editor or some other recent writer thinks is relevant today. Google is brilliant for things that have been invented or written about in the digital age, or that made enough of a splash in their day to still get digital now, but the rest of it just doesn't exist. It's B.G. (before Google) or P.D. (pre digital) or something like that.

To cut to the chase, if you read old books you get a sense for how thin the searchable veneer of the web is on our world. The web's view of our world is temporally compressed, biased toward the recent, and even when it does look back through time to events memorable enough to have been digitally remembered, it sees them through our digital-age lens. They are being digitally remembered with our world view overlaid on top.

I posted some of these thoughts to the Radar backchannel list and Nat responded with his usual insight. He pointed out that cultural artifacts have always been divided into popular culture (on the tips of our tongues), cached culture (readily available in an encyclopedia or at the local library) and archived culture (gotta put on your researcher hat and dig, but you can find it in a research library somewhere). The implication is that it's no worse now because of the web.

I like that trichotomy, and of course Nat's right. It's not like the web is burying the archive any deeper. It's right there in the research library where it has always been. Besides, history never really operates as a continuum anyway. It's always been lumpy for a bunch of reasons. But as habit and convenience make us more and more reliant on the web, the off-the-web archive doesn't just seem hard to find, it becomes effectively invisible. In the A.G. era, the deep archive is looking more and more like those charts used by early explorers, with whole blank regions labeled "there be dragons".

So, back to Bezo's big goal... I'd love it to come true, because a comprehensive archive that is accessible in 60 seconds is an archive that is still part of history.

 

Sat

Feb 7
2009

Michael Jon Jensen

For-Profit, Non-Profit, and Scary Humor

by Michael Jon Jensencomments: 6

Guest blogger Michael Jon Jensen, Director of Strategic Web Communications for the Office of Communications of the National Academies and National Academies Press, has been at the interface between digital technologies and scholarly/academic publishing since the late 1980s.

Tim was kind enough to suggest that I expand on a longish comment I made on his recent post Stuff That Matters: Non-profit to For-profit.

Two threads wove my argument: first, I pushed back at his conventional framing of the non-profit vs. for-profit sectors. But what I think caught his attention most was my description of a project that's trying to "find the funny" in the grinding, slo-motion collapse of our natural world.

An easy knee-slapper, eh?

I'll get back to that second theme after some musings on non-profit vs. for-profit:

Tim: The heart of my message is that work on stuff that matters is a great hedge in down times: even if there isn't a huge monetary payoff, you've done something that needs doing. And it's certainly true that non-profit enterprises are often a good way to tackle hard problems that the marketplace doesn't seem to be addressing.

But I want to make clear that I'm not just talking about charity work. I'm talking about the creation of real economic value. There are huge opportunities for entrepreneurs in solving hard problems, and in so doing creating new markets that can be exploited not just by themselves but by those that follow in their footsteps.

I certainly can't disagree with most of that statement -- but we need to do better at clarifying the roles and mission-driven goals underlying the nonprofit and the for-profit worlds, especially on "stuff that matters."


Non-profits vs. For-profits

Tim comes to his benign perspective on the for-profit sector honestly: O'Reilly has historically been a responsible for-profit, building immense social value at the same time that it profits from its actions. But O'Reilly Media is a somewhat exceptional company.

On the main, the for-profit world has a different "maturation goal" than the non-profit world has, and it affects nearly every decision made in either kind of enterprise.

I heard my favorite summation of the distinction from Peter Likens at an Online Computer Library Center conference years ago. He was then President of the University of Arizona; I first used this quote more than a decade ago, in a presentation I gave entitled "Entrepreneurs of Social Value":

"A for-profit's mission is to create as much value for its stockholders as possible, within the constraints of society. The non-profit's mission is to create as much value for society as possible, within the constraints of its money."

Of course there are, as Tim mentions, great overlaps betwixt the two, and the more that the for-profit world addresses the "stuff that matters," the better. But quite frequently -- at least in publishing, and online, and in the "public good" sector -- when a for-profit takes advantage of that overlap, the pattern has been to decrease the public good.

Take a look at, for example, scientific publishing: in the post-WWII economy, most non-profit scientific journals were bought up by a handful of smart for-profit publishers who, over the following decades, began to ratchet up the prices far beyond what university libraries could afford, producing a dramatic shift in library resource use: an increasing share of nonprofit money went to for-profit scholarly publishing. One could argue that $50,000 a year is a fair price for a really important specialty journal, but it's not an argument that fits into the "stuff that matters" or "social value" meme.

In that instance, smart, rapacious for-profit cherry-picking decreased the means that nonprofit publishers had to fund their other, less profitable work in the humanities, the social sciences, or even the sciences themselves.

A for-profit takeover of formerly nonprofit work could also describe what has happened with Blackwater, and the privatization of the military in general -- higher costs, less accountability, and unintended consequences.

I've worked in nonprofit publishing for more than 20 years, and while I recognize the need for a risk-reward economy, some care needs to be taken to acknowledge that the "public good" rarely is profit-making. It can be sustainable, but is rarely super-profitable.


That said, over those 20+ years, I've always had side projects of some kind -- "stuff that matters" projects that I hoped would end up being profitable, or potentially commercial ones that might be fabulously so.

My hoped-for goals for those projects have changed over time, and recently shifted drastically. For the last 18 months my side project has been with my oldest, bestest friend -- a project which has changed my entire thinking on "what *really* matters," and what "breakthroughs" we need in the next decade -- from the Web 2.0 community, from myself, and from the world at large.

Yeah, it's time for phase II of this guest blog: about trying to turn the onrushing apocalypses into laughter -- or at least a knowing grin.

(continue reading)

tags: publishing, web 2.0comments: 6
submit:

 

Tue

Jan 27
2009

Nat Torkington

Four short links: 27 Jan 2009

by Nat Torkington@gnatcomments: 0

Fantasy, feedback, facts, and flies, all will be revealed in today's links of loops and life:

  1. Blueful - a story told in text, but delivered through the medium of web sites. It's like an xkcd cartoon embodied in the web. Interesting, artistic, and makes you look at web sites in a new way. From Aaron A. Reed.
  2. The Case Against Candy Land - Steven Johnson talks about how dull the children's games of our youth are. "What’s irritating about the games is that they are exercises in sheer randomness. It’s not that they fail to sharpen any useful skills; it’s that they make it literally impossible for a player to acquire any skills at all." Every process in life should have a feedback loop that lets you get better at it.
  3. Journo Data - a Guardian journalist publishes data resources about the US economy as Google spreadsheets. This is the start of something interesting, where the raw data is available from journalists not just the (textual or programmatic) interpretation. As mentioned in the fantastic presentation Tim just linked to, access to the data behind our world view is essential if we are to critically assess that world view.
  4. Userfly - a usability tool that records and then recreates your users' sessions on your web site, so you can see where and when they type, click on, backtrack, etc. (via
 

Thu

Jan 22
2009

Vanessa Fox

Making Site Architecture Search-Friendly: Lessons From whitehouse.gov

by Vanessa Foxcomments: 10

Guest blogger Vanessa Fox is co-chair of the new O'Reilly conference Found: Search Acquisition and Architecture. Find more from Vanessa at ninebyblue.com and janeandrobot.com. Vanessa is also entrepreneur in residence at Ignition Partners, and Features Editor at Search Engine Land.

Yesterday, as President-elect Obama became president Obama, we geeky types filled the web with chatter about change. That change of change.gov becoming whitehouse.gov, that is. The new whitehouse.gov robots.txt file opens everything up to search engines while the previous one had 2400 lines! The site has a blog! The fonts are Mac-friendly! That Obama administration sure is online savvy.

Or is it?

An amazing amount of customer acquisition can come from search (a 2007 Jupiter research study found that 92% of online Americans search monthly and over half search daily). Whitehouse.gov likely doesn't need the kind of visibility that most sites need in search, but when people search for information about today's issues, such as the economy, the Obama administration surely wants the whitehouse.gov pages that explain their position to show up.

The site has a blog, which is awesome, but the title tag, the most important tag on the page, has only the text "blog". Nothing else. Which might help the page rank well for people doing a search for blog, but that's probably not what they're going for. This doesn't just hurt them in search of course. It's also what shows up in the browser tab and bookmarks.

The site runs on IIS 6.0. Does the site developer know about tricky configuration that makes the redirects search engine-friendly?

Search engines are text-based, so they can't read text hidden in images. Some whitehouse.gov pages get around this issue well, by making the text look image-like, but leaving it as text, such as below.

whitehouse.gov text example

However, other pages have text in images and don't use ALT text to describe them. (This, of course, is an accessibility issue as well, as it keeps screen readers from being able to access the text in the images.) An example of this is the home page, which may be part of why whitehouse.gov doesn't show up on the first page in a search for President Obama.

whitehouse.gov image example

There are all kinds of technical issues, big and small, that impact whether your site can be found in search results for what you want to be found for. (whitehouse.gov using underscores rather than dashes in URLs, the meta descriptions are the same on every page...) Probably the biggest issue in this case is the lack of 301 redirects between the old site and the new site. When you change domains and move content to the new domain, you don't want to have to rebuild the audience and links all over again. (Not that Obama or whitehouse.gov will have a problem with attracting and audience, but we all can't be president!) When you use a 301 redirect, both visitors and search engines know to replace the old page with the new one.

In the case of change.gov, it's unclear if they intend to maintain the old site. The home page asks people to join them at whitehouse.gov, but all the old pages still exist (even the old home page at http://change.gov/content/home).

change.gov example

And in many cases, the same content exists at both change.gov and whitehouse.org (see, for instance, http://change.gov/agenda/iraq_agenda/ and http://www.whitehouse.gov/agenda/iraq/).

As Matt Cutts, Googler extraordinaire pointed out, give them a few days to relax before worrying so much about SEO. And I certainly think the site is an excellent step towards better communication between the president and the American people. But not everyone has the luxury of having one of the most well-known names and sites in the world, so the technical details are more important for the rest of us.

If you want to know more about technical issues that can keep your site from being found in search and tips for making sure that you don't lose visibility in a site move, join us for the O'Reilly Found conference June 9-11 in Burlingame. And if you're in Mountain View tomorrow night (Thursday, January 22nd), stop by Ooyala from 6pm to 9pm for our webdev/seo meetup, and get all your search questions answered. Hope to see you there! (Macon Phillips and the whitehouse.gov webmasters are welcome, but my guess is that they're a little busy.)

 

Recent Posts

 

TIM'S TWITTER UPDATES

RECOMMENDED FOR YOU

CURRENT CONFERENCES