Thu

Sep 21
2006

Dale Dougherty

Dale Dougherty

Deconstructing Databases

At EuroOSCON, Greg Stein of Google gave a talk about the open source software development tools offered for developers at Google Code and I came away with an unexpected insight into Web 2.0.

In describing the new bug tracking system, he said, that while he liked many existing bug systems, he realized there was an opportunity to redesign a new, much simpler bug tracking system for Google Code. The key he said was understanding that they had great full-text search tools available. That made them think differently about how to collect and organize the information in the bug "database." He believed that existing systems spent too much time deciding how to structure data entry and presenting a detailed form for users to fill out. They also then lock down the display of the information. He decided to keep structured data entry to a minimum and rely on text entry. A lot happens with labels/tags/keywords, for instance, to assign priority. The new bug submission form consisted of a text area with a few questions already inside it.

Greg makes a terrific point that could be applied more broadly to business applications, and might even be a design approach for Web 2.0 applications. A whole lot of effort goes into defining and refining the database structure behind most business apps. What is carefully placed in one bucket (or category or grouping) is not found when you look in another bucket. What if powerful full-text search tools change all that? What if instead of describing in detail the many specific fields of a record that might be important, and then having to train users on what they actually mean, and when to use them, you sidestep those tedious tasks and encourage users to write text. And write freely. The more the better. My hunch is that unstructured data can be richer and easier to collect than highly structured data, and therefore more valuable.

It would be an interesting exercise to look at overdesigned business applications and consider how they might be designed to look less like a database and more like a conversation.


tags: web 2.0  | comments: 17   | Sphere It
submit:

 
Previous  |  Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/4931

Comments: 17

  Reilly Haes [09.21.06 04:29 PM]


There is a lot of merit to this approach.

The trick is to figure out a good general way of handling data that is required to be present or even required to be structured. For example, there is data you MUST capture when take a customers order to buy or sell stock. Simply dividing the world into user-structured and program-structured halves is going to create cognitive dissonance. And telling the user that you couldn't parse out some required value will create angry users.

  Tom Brown [09.21.06 06:30 PM]

To play devil's advocate, consider a main difference between EBay and Amazon. With EBay, you use search instead of product ID like on Amazon. Because of this, EBay cannot do product reviews like Amazon can.

  Dick Weisinger [09.21.06 07:54 PM]

Full-text has dramatically improved the search and management of unstructured data, and I think it is great. Full-text provides an excellent supplement to structured metadata and good way to browse data. But for many applications, traditional structured metadata will be hard to totally replace with full-text alone.

I think I need to see the Google bug database app to appreciate it. To manage a project you want to be able track things like who reported a bug, who fixed the bug, when it was reported, when it was fixed, when it was verified, the platform the bug relates to, the software version, etc. Full-Text will have a hard time extracting all this data reliably from a single text entry field. It seems natural to store this information as structured data.

Even if the user sees some questions pre-populated in the text form to fill out, it would be difficult to get users to enter information in a way so that it will be consistently recognized. Data validation would be difficult too.

  Andy Todd [09.21.06 10:02 PM]

And don't forget most people would rather chew their own foot off than write, especially in a 'business application'.

The quality of written communication in business has never been that high, one of the advantages of structured forms of data entry and management is that it doesn't force people to think. Remember, the majority of people in office jobs are just there to pick up a pay check. It's a completely different audience from people who will use the bug tracker on Google Code.

  Al [09.22.06 04:34 AM]

Hi Dale

"It would be an interesting exercise to look at overdesigned business applications and consider how they might be designed to look less like a database and more like a conversation."

Wow that is almost exactly what we are working on for a business application right now. It is not as general as you state but more specificly aimed at certain key operations taht we beleive lend themselves to such representation.

I can ceratinly talk about it offline if you are interested, let me know.

regards
Al

  Danny [09.22.06 07:38 AM]

Hmm, having "great full-text search tools available" certainly changes things, but doesn't necessarily mean the solution is better in the general case. For example, the use of a flexible data model like RDF avoids the problem of being bound to a predetermined database structure.

But in terms of knowledge acquisition, a free-text entry box can be a lot easier than having to fit within the confines of a form.

When it comes to finding things, there's a danger of throwing the baby out with the bathwater. Having explicit relationships in the data offers a lot that text search can't.

I reckon it's possible to get the best of both worlds, I saw a glimpse not long ago. I was experimenting with Longwell and made my blog data available to the install. For the fun of it I added a few arbitrary RDF data files. One of these was on famous people. Longwell offers a text search (covers all literal fields) and one search I tried was "Beethoven". This produced two results: one from the famous people data giving some biography of the man, another a blog post I'd forgotten about linking to some audio files.

Although the two sets of data had little in common, they could be integrated using RDF, in a fairly formal logical structure. But there was some (unexpected) crossover, which was discovered using simple text indexing techniques. A powerful combination.

  anjan bacchu [09.22.06 10:55 AM]

Hi Dale,

I have been thinking along similar lines for a new version of our Business Software.

good full text Search is important to improve the user experience for a bug/issue tracker BUT... it will be interesting to make it work in Business applications at the typical office.

I know of a Banking software company that had to launch a new product code-named the "McTeller" since most of the customers were competing with McDonalds for the Teller Positions. They wanted to make the Teller application as dumb as the McDonald Cashier screens(have you seen those ?)

Since business applications cater to people who like dumb interface and don't want to enter a lot of text, it will be interesting to see how little text needs to be entered in such an app ? Also, as far as the developer is concerned, how about validation ? Example, date validation is going to be interesting and challenging.

BR,

~A

  Joe Hunkins [09.22.06 07:54 PM]

Brilliant as usual Tim. Databases really *should* not be considered an end product in most cases - rather a means to get to a rich conversational and analytical immersion in a topic where associations are made in both structured/linear ways with the data and unstructured ways with the greater number of associations one can make with free flowing information.

  Bill Christian [09.22.06 09:16 PM]

I am failing to make the connection between the increased plausability for unstructured text searching and Web 2.0 (buzz). The evolution of full text search is not much different than what was experience in the migration from fixed format flat files and relational databases. At first, RDMS were too slow and design advantages were little understood. Now, as you've pointed out, it's the default design choice. What is so special about Web 2.0 applications to warrant an alternative concept?

Architects and designers should focus on finding the appropriate tool for the business need and the culture. I, for one, would not trust my QC team to freely describe the problem. However, this may be appropriate for the team at Google. Either way I hope the decision was not based on the availability of an alternative. Unstructured text searching is but one up and coming tool.

  Richard Dyce [09.23.06 02:44 AM]

I've been using a rather simplistic (MAMP) approach for some time that I think manages to bridge the gap a little: using templates to build themed nodes which have small header details - id, mod time, type, and a serialized text field containing an array structure given by the template and filled in by the end user. Depending how you look at it, there's only one type of node (database view) or as many types as there are templates (end user view). Thanks to the way PHP serializes arrays, all node types can be searched with a full text search, or you can narrow it down by node type. There are inefficiencies, but I'm sure it could be easily extended to bug tracking. ;-)

  Tim O'Reilly [09.23.06 09:56 AM]

Joe -- just a reminder that there are multiple radar bloggers, all speaking for O'Reilly, but not me (Tim O'Reilly.) This particular post was written by Dale Dougherty.

I do however subscribe to Dale's observation. But like a lot of commenters, I don't know that full text is a substitute for structured approaches. But just as folksonomy has emerged as an alternative to taxonomy, I think increased computing power is making some previously unsuccessful approaches work better.

Just as a for instance, how many of you still put your email in folders, vs. just saving by month or year, and using search to find what you want? How many of you find it easier to tag a bookmark and put it on del.icio.us than to save it in a folder?

What I find particularly interesting is the combination of computing power and harvesting human input. Google's search isn't just text search, it's text search informed by implicit metadata from human activities (i.e. linking). I'd love to see other forms of free form text informed in the same way. For example, why doesn't my address book remember every email address or phone call, but remember those that I reply to right away better than those that I delete right away? I think that there's a whole new dimension to search that becomes possible when we have enough power and storage to remember and analyze human activity in the database.

  Alex Marxer [09.23.06 11:32 AM]

Dale,
Your post reflects a key insight we recently gained for successfully implementing solutions supporting highly collaborative processes ? particularly in the area of CRM.

The traditional approach - as in your example of the QA application ? was (and in many cases still IS) to use a large number of structured data elements on forms ? all based on very thorough ?process analysis? to arrive at a data model.
The fundamental problem with this approach is that it will inevitably lead to an overly rigid solution conflicting with the actual fluidity of human interaction. The inevitable result: many of the fields don?t get filled out, roll out is facing massive adoption challenges or ? on the other side of the spectrum - overly simplistic forms that do not capture the richness of our human insight. In almost all cases these deficiencies lead to the use of email as a surrogate mechanism to provide the needed flexibility. For all its benefits however, email is a horrible process management tool re-introducing much of the opacity into the processes the structured database solution was to make transparent in the first place.

So in our experience it comes down to the right mixture of structured vs. unstructured components. In our case, we created a minimum amount of data elements mostly tracking the key process metrics for performance reporting. The core structure of the processes was left in simple text fields that were then populated with ?Standard Text? that can be easily changed by the end users.

The results were quite astounding: we cut down internal email traffic by a good 30%, adoption was a breeze, and due to the flexibility of the solution, the line staff and supervisors took ownership of the tool and started moving all of their key processes into the tool themselves (instead of it ?coming down? as a directive from management).

Now, we achieved the above without tagging, full text search and any other so called Web 2.0 tools - just simply breaking with the tradition of overly structured process engineering and control yielded success far beyond our initial expectations.
That is in essence also the insight from the Google bug tracking solution in my view.
I could not agree more with Tim?s comment post: Google (+Google Desktop) and tagging practically made the hierarchical approach to unstructured information obsolete. Web 2.0 technologies have the same potential for ?structured information? up to now mapped into rigid database applications.

Ultimately, Web 2.0 toolsets now allow us to support our biological, complex reality much more closely than the mechanistic, overengineered approaches of software of the past. That is a fundamental shift with incredible potential as we are able to set free our creative potential?

  Mike Mudd [09.24.06 08:16 PM]

Full-text search is a rather blunt tool. What matters most for the kind of corporate info. systems I deal with are relationships (structure) - the linking of name and address for a trivial example. Unwittingly undo those kinds of critical relations and text search can become dangerously meaningful. Buying patterns without demographic structure springs easily to mind.

  Jason Kester [09.29.06 09:24 AM]

"To manage a project you want to be able track things like who reported a bug, who fixed the bug, when it was reported, when it was fixed, when it was verified, the platform the bug relates to, the software version, etc. Full-Text will have a hard time extracting all this data reliably from a single text entry field."


I think the article glossed over an important fact, that Google's bug database probably does track things like Username and dates in traditional database-style fields. It's only the bug description, steps to reproduce, etc. that can be entered into a single field.


Having worked with plenty of bug databases in the real world, that seems like a great idea. As much as QA loves to see 20 detailed fields for each bug report, developers hate filling them in. The result is that dev is less likely to bother reporting small things if it means spending 5 minutes twiddling with the bug tracker.


  Doug Pederson [10.22.06 06:07 AM]

My full text search moves along at 20,000,000 cps

Most any data can be exported to txt files
making it easier to search.

I merge the text files into 1 large file.
The merge option puts a text line with the
file name being merged before the data.

When searching the file that the match was found
in, is displayed in the border. No directory
lookups just 1 huge file.

To make full text usefull you need to have
notepad capabilites that are fast and append
to the end of your notes.txt file. (with
optional date/time stamp)

Another option quickly displays the last page
(most recent notes) by reading to the end of
the file counting the lines. Then reading to
that number less 32 lines and display from there.

Search is about finding your results in full
context not the gibberish most indexing
search engines deliver.

  Bob K [10.26.06 08:47 PM]

Hi Dale,

I had to scan through this thread to see if there was
anything really new. There are some interesting ideas,
but I think the google guys are the typical "when all you
got is a hammer, everything looks like a nail" breed.


You've said in the past that "software represents a lot of decisions
that have already been made." I think similarly, structured,
field-based data represents a lot of business process that have
already been decided. Context is there, and imposed through
field values, because that is what the organization has agreed
are the rules. The trick is to effectively obtain the structured data with
the unstructured in an unincumbered way.


I still struggle with this trade off after twenty + years in the industry.
At Media Net Link, we have a mantra "you can't change behavior.", where
designing to the needs of the user population is all that matters, in
a way that enables them. Every instance of a problem is different, but
many common design patterns in business exist.


[[off soap box]]


(Aside)


At the government entity where I am presently consulting, we are
still designing a content management system that has been designed to enable
a scientific community to share very detailed project information.


The goal was to have it very field specific. I argued in the beginning
of the design process that you can't
change behavior. Researchers write reports. They don't like filling
in database fields. It wasn't until serious user testing
at the end of the development process that they truly realized what I
was saying. Now we've moved to creating dynamic MS word report templates,
and then parse them when they are uploaded.


---bottom line---


Effective UI / DB design is intermingled, and the end result will more often than
not be a balance between free form and field driven input.


...

Hmm, let's see, to enter this post, I need to add some text, fill in
a name, a URL, an e-mail address, and click "remember me". (That's a
pretty good balance :))

  Chris Lu [03.28.07 05:21 PM]

Please take a look at this

http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

You can create a full-text database search service, return results as HTML/XML/JSON. It uses the Lucene directly in java, but can be easily used with Ruby, PHP, or any existing database web applicatoins.

You can easily index, re-index, incremental-index. It's also highly scalable and easily customizable.

The best thing is, it's super easy. You can create a production-level search in 3 minutes.

To create a full-text search for any bug database, you can select all the fields you need, and the bug priorities, categories, etc. Then easily navigate through all the bugs, narrow down by priority, category, owner, etc.

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.

RECOMMENDED FOR YOU

RECENT COMMENTS