Thu

Jan 18
2007

Tim O'Reilly

Tim O'Reilly

Tag Cloud of "What is Web 2.0?"

Andrew Odewahn, the director of the O'Reilly Network, wrote:

Tim Allwine and I been experimenting with various services to make tag clouds of arbitrary data. (i.e., services to do stemming, remove stop words, calculate stats, and render a styled cloud using an AJAX widget John Allwine put together). Just for fun, I pointed these tools at Tim's "What is Web 2.0" article and got the attached file back. I thought it was pretty amazing how quickly the broad themes popped out.

what_is_web20.jpg

Also interesting to me is how tag clouds are morphing into a general analytical metaphor. The term seems to have become a shorthand way of saying "a visual display that conveys the broad themes that emerge from textual analysis." For example, here's something Richard MacManus pointed to on Read/WriteWeb: Bill Gates CES speech as a tag cloud. Here's another example by Chirag Mehta -- a visualization of the state of the union address -- it's even animated, so you can see themes rise and fade over time."

Now, on the one hand, this is very cool, and the tag cloud for the article is a true reflection of many of the concepts in the article. For example, the size of the word "data" emphasizes a point that is central to my thinking on Web 2.0 but that I still don't think has been grasped entirely by many people in the software industry. But the tag cloud also shows the limits of tag clouds. The single most important phrase that describes Web 2.0 is "collective intelligence" and you'll never see that in the cloud. Maybe a version that looks for phrases and not just words would help. In addition to collective intelligence, words and phrases that I would think of as central to Web 2.0 include "data", "users", "platform", "network effects", and "internet."

All that being said, I'm a big fan of tag clouds. We used them a bunch to analyze the topics, companies and people at the last FOO Camp, and they were the most useful of the visualizations we did. They helped us see where we were under- and over-represented in terms of companies and particular technologies we were wanting to explore. Ryan Grimm and Andy Bruno also built an awesome tag cloud/search engine of the content of all O'Reilly books. So they have many uses beyond just showing what we normally think of as tags. Thanks, Andrew!


tags: web 2.0  | comments: 11   | Sphere It
submit:

 
Previous  |  Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/5177

Comments: 11

  michael holloway [01.18.07 06:10 AM]

What about a cloud from all Web 2.0 articles comments sections? What themes come forward? Are they similar? Or in other words, are we getting you?

  Mark Woodman [01.18.07 07:36 AM]

Tim,

I've been noodling around with the idea of a "tag constellation" that associates tags - and their content - based on overlapping use among bloggers. Be curious to hear your thoughts on these. I have a working example here that generates a daily tag constellation based on CNET's 100 Top Blogs:

http://labs.techbrew.net/jetstream

Regards,

Mark Woodman

  Walter [01.18.07 07:44 AM]

You should definitely take a look at Moritz Stefaner's relational tagclouds:
http://well-formed-data.net/archives/31/tag-clouds



I work on some text-mining demos too:
http://www.metaportaldermedienpolemik.net/blog/Wahlprogramme+im+Vergleich
What I did is analyzing how words correlate to each other in texts of programs of political parties here in Austria.



Regards,

Walter

  steve [01.19.07 02:06 AM]

I think you have a conflict of interest in getting your message across. On one hand, you want to convey the concepts of infoware and help get across a broader direction.

On the other hand, you have a commercial interest in the term "Web 2.0", and your string of posts have a constant effort to associate your broader concepts with the phrase.

The problem is that "Web 2.0" is meaningless, has a built in time-limit due to the "2.0" number, and the web doesn't have anything to do with the concepts you're attempting to communicate.

So I don't think your post-facto legitimisation of the term is really making any headway. It's just more of the same, more of the "look at that, it's what I mean by this"...

Unfortunately, I think this year is when Web 2.0 will die... as people have an expectation that the number will increment every year.

2007 is the year of Web 3.0 or a different term, and you now have the problem of schlepping the baggage from one term to another. I hope you can make the jump, as the ideas are worth preserving instead of attaching to a sinking ship.

  Ted Shelton [01.19.07 07:48 AM]

The idea of "tag" clouds is interesting, but the flaw shows up in great big type in this tag cloud -- "DATA" -- as an example. Sure, Data is important to Web 2.0 -- but Web 2.0 isn't very important to Data. It would be like searching for information about the White House using the word "home."


More interesting to us (we've been working on it for two years) are phrase clouds - in which text is extracted from the articles themselves. In our patent pending process we analyze these phrases against a historical analysis of relevant material, selecting and promoting phrases that are representative of emergent news trends within the topic.

Compare these two phrase clouds for example:


In the first case its "Joost" and "wikiseek" - because these are interesting web 2.0 phrases RIGHT NOW.


In the case of Headline news, it is the death of Art Buchwald and the declining rate of cancer deaths (must be a slow news day...)


Another way we use this raw data is to analyze the phrase cloud to find clusters of phrases that all point to a specific topic. These topic maps are even more useful. Explore how we use topics and the associated stories on our "Top 5 Stories" website:


http://www.top5stories.com

  Tim O'Reilly [01.19.07 08:17 AM]

Steve --

If I had my druthers, Web 2.0 wouldn't be the term that stuck, but it did. And so I continue to explain what I think it means. If another term catches on, I'll be happy to switch.

But language lawyers always say what words *should* be used. But language itself doesn't listen. "The hot blood leaps over the cold decree," and people use what words make sense to them.

Web 2.0, for all its flaws, has indeed caught on, and for now, we just have to live with it.

As for conflict of interest, I wonder what your interest in repeated debunking of the term might be :-)

  Tim O'Reilly [01.19.07 08:18 AM]

Mark, Walter, Ted --

Thanks for all the awesome tag cloud links!

  steve [01.20.07 09:33 AM]

Hi Tim... no, that's about it from me. It's been good (and surprising) that you had an open door policy, so thanks for listening.

I'm pleased that you're happy to switch to new terms as they come along, so the meaning behind the phrase can survive and continue providing direction.

Regardless of any criticism of terms, the message that you are conveying is important and please don't stop. :)

  Kitten Lulu [01.21.07 08:49 AM]

tagclouds are a nice visualization tool, but I think that just a big problem when applying them to bodies of text is synonyms and other ways to convey the same meaning.


Have you thought about using wordnet lookups (or other similar databases) instead of pure lexical stemming? Instead of making a cloud of tags, you'd be making a semantic cloud.

We'll also need better visualizations if we use semantic relationships among words. We need some sort of hierarchical+size visualization, similar to treemaps but which can be easily rendered in XHTML+AJAX.

  Michael Fassnacht [01.21.07 07:31 PM]

I think the concept of tagclouds as shown in your Bill Gates' speech example is helpful to better understand any kind of text. But I think there are several limitations in your example that should and can be addressed:
1. The alphabetical order of words is not really useful. The words should be shown in correlation of how close they have been used/appear in the text, so one can see connections between words and concepts
2. In publically available texts, one should include the dimension of "Impression" per word into the tag cloud. Thereby one can see which of the words have been seen by more eyes/individuals. I have used this analysis of brand communications as well as blog entries across a certain category (Overlay how often certain word have been used across a multitude of blogs and how popular have been these blogs). Visually one can just show the more "viewed words in a darker color than less "viewed" words

Interestingly enough big analytical software player (e.g. SPSS, SAS) have done quite some innovative work around these more advanced text mining techniques.

Regards,

Michael Fassnacht

  Moritz Stefaner [01.22.07 02:00 PM]

Hi all,

thanks Walter for posting my experiment; this is a great discussion. I just wanted to let you all know that I posted an update along with some explanations at: http://well-formed-data.net/archives/42/tag-maps-update

Regards,
mo

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.

RECOMMENDED FOR YOU

RECENT COMMENTS