Metadata, Not E-Books, Can Save Publishing…

Metadata is king. I will repeat this as it is important. Metadata is king.

I need not go through the barrage of articles and statistics that show that publishing is in a minor state of panic. Revenues are down and until recently (past 1-2 years) many publishers were unsure how they should play with e-books (many are still not completely settled in their e-strategy). The reason that e-books will not save publishing is that all they are is another format. E-books will not revolutionize reading, nor will they change the content. I’ve seen some social reading projects (copia) but they are in beta and I cannot make a prediction if readers are willing to accept a completely new reading experience.

Some statistics:
* There are roughly 230MM adults in the US.
* US Literacy rate is 99% – leaving 227.7MM adults in the US who can read.
* 28% of US adults are avid (5+ hours/week) readers [Verso] – 64MM Avid reader

* 20% of book purchasing happened online [PW 2007] This has grown, but is still under 30% (have heard via word-of-mouth from some book stats people but cannot quote them).

Why won’t e-books save publishing?
E-books represent a format, just like hardcovers and paperbacks. Because they are a different format, they require different pricing. Things that are consumed and priced differently do open themselves up to a new market but unless that new consumption method is revolutionary, the growth (new readers) to the market cannot be large. E-readers will never be purchased by non-readers in the hopes of becoming readers (until they reach an extremely cheap price-point). The iPad is one such device that can create new readers. Its conceivable that someone who would buy an iPad and is not a book buyer, but because they can do so while sitting in their La-Z-Boy, will buy a book. If they like that book, they may even buy another. Ok. Now re-read that last statement. “If they like that book, they may even buy another.” If they don’t like the book, their sentiment of “this is why I don’t buy books” will be solidified. Another non-book-buyer remains a non-book-buyer.

According to the Wikipedia bestselling books chart Dan Brown’s The Da Vinci Code sold over 80 Million copies. Harry Potter and the Deathly Hollows sold at least 44 Million copies. Does that mean that nearly every avid reader bought the last Harry Potter book? Does that mean that every avid reader bought 1.25 copies of Da Vinci Code? No. It means that people who normally don’t read books opened their wallets to buy and read a book. That means that 163 million non avid reader Americans are potential readers.

How to capture some of those 163 million and get avid readers to buy more.
Simple: Give them what they want and more of it. How do we do this? Metadata. It’s that simple. Tech people love metadata. We eat it up and beg for more and build amazing utilities around it. In fact, Pandora is an amazing example of what metadata can do for music. But, a limiting factor of Pandora is their selection and their metadata gathering techniques (they have to do it manually). How does metadata sell? Let me start with an anecdote.

The book Paradox of Choice talks about how people tend to shut down when shown too many options. If you’re a seasoned book buyer, when you walk into a bookstore (or are browsing Amazon or another online retailer) you know exactly where to go for bargain-bin books, where your favorite genre is located, and where the new releases section. If you’re new to reading, a bookstore is extremely intimidating. Don’t believe me? Go wander into an electronics expo, the car audio section of a Best Buy, or some sporting goods store (assuming you’re not a tech-geek, car tinkerer, or sportsperson). You’ll soon see that there are 14 different types of cables or gloves and all at different prices. How do you make your decision? Thankfully in those stores there are sales people who are trained to spot people like this and offer their help. Brick and mortar stores offer an information booth at best. Online you’re left to your own devices…

Imagine you just finished reading a book. We’ll take the Da Vinci Code. After putting it down you filled out a short survey which asked you what qualities you liked out of the book (lets call them tags). For me, I liked that it was a suspense novel, that it was a religious mystery, and that it took place in present day. Now, assume that this tag data was available for all books and that I could walk into a bookstore, hand them my little survey and they could show me 6 books. That would be much easier to chose from. In fact, if you showed me only 3 books, I may even buy all three. In the current environment, the best I could do was to buy more of Dan Brown’s books (Author is the #1 reason why people buy books) and hope that he’s written more than 1 book, or attempt to use the recommendation engines provided by online retailers. Recommendation engines are OK, but they are based on purchasing habits or in rare cases “those who liked also liked” which is fairly arbitrary and not nearly as good a predictor as metadata.

That is giving a user more of what they want. They read a book, extract from it what they liked and you give them books with similar qualities. Next is giving them what they want in the first place.

Giving a user what they want.
The best metadata we have in mass is category data. This data isn’t exactly easy to wade through, but if you like romance, you can click on the “romance” category and see a list of books that are considered romance. For new readers, the amount of books within the romance category is daunting, plus what is “paranormal” and how do I know if I’ll like it? Categories are also boxes that have connotations. Yes, books can live in multiple genres, but can a book have a vampire in it and not be a book about vampires? Can a book be a love story with an intimate scene without being romance? Tags help narrow down specific traits of a book. Some tags are already gathered but below are a list of tags I feel are important to gather:

  • Page numbers (or word count). Its important for readers to know if this is a short read or long one.
  • Time Period. (1990s, 1870s, future). Some people love historical fiction. Some people hate a specific time period.
  • Categories. A category is a specific tag. But they don’t live in hierarchies. A book can have the category tag “romance” and “vampires” independently. For non-fiction themes make great categories for example: “war” “history” “1900s” “war of 1912″
  • Writing Style. Is this a 3-act play? Is this done in 3rd person or 1st person? Is this dialog heavy?
  • Series. Is this part of a series? What number in the series? Is this an ordered series or just a collection of fiction built around a specific world?

I could go on and on with more data, but these are what I believe is the core. If every book had this data, you could essentially have an eharmony for books. You fill out a small profile of your likes and dislikes and now are shown a much smaller set of books to chose from. The best part of these selections is that there is a very good chance you’ll like them. If you like a book, you’re more likely to buy more books turning new readers into avid readers and avid readers into, well, hyper-avid readers.

To bring it all together, if you want to grow the market, you must do things better and in a new way. E-books aren’t a new way to sell, just a new format to sell the same old books you’ve been selling for years. Make readers happy by providing them with the books they want.

nick-rufillo.jpgAbout Nick Ruffilo: In 1998, Nick Ruffilo help to found and online video game cheats website that utilized early forms of metadata for recommendations. Afterward, he worked in helped to defining metadata standards for the financial industry from 2001-2008.

In 2008, he joined BookSwim and has worked to aggregate multiple sources of data as well as gather internal data to redefine the internal recommendation engine of BookSwim.