by Nat Torkington
| @gnat
| Comments: 4
| 2 March 2008
- Visual Complexity: a proof sheet and index for the visualization projects on the web. cf InfoVis.
- TextRank Paper (PDF): "In this paper, we introduce TextRank--a graph-based ranking model for text processing, and show how this model can be successfully used in natural language applications." Some detective work suggests it might be (at least part of) the algorithm Google uses for book search results, raising the ghastly idea of print books being SEOed.
- SNA, a social network tools package for the R project. "A range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, p* modeling, random graph generation, and 2D/3D network visualization."
- Catching a Poker Cheat with Data Mining (Simple Complexity): how did the stats-savvy top players on a gambling website figure out that they were losing to cheats rather than just better players? They gathered and graphed the stats, and the bad guy stood out like the proverbial sore thumb. That's a link to a summary of the story, the full details are also online.
- The Meaning of Confidence (hunch.net): the word "confidence" is used several ways in machine learning papers. This short post teases out the difference senses.
- Data Park Search Engine: full text indexer for web sites. Love the feature list! Multiple languages, IDNA, accent-insensitive search, and more. Disclaimer: only just found it, haven't used it. It might delete all your cat captions and send your money to the moon for all I know.
|
Comments: 4
Noah Iliinsky [ 2 March 2008 11:54 AM]
People interested in complex visualizations may enjoy reading my master's thesis:
Generation of Complex Diagrams: How to Make Lasagna Instead of Spaghetti.
My focus in that paper is on visualization of qualitative information, but the methods apply to quantitative data as well. I'm always interested in feedback, questions, and interesting data sets to play with.
Cheers, Noah
Noah Iliinsky [ 2 March 2008 12:00 PM]
The thesis mentioned above can be found at http://ComplexDiagrams.com
Thanks, Noah
Maxime [ 2 March 2008 12:42 PM]
That's nice, DataparkSearch uses technology similar to TextRank for automatic summarization of indexing documents, see http://www.dataparksearch.org/dpsearch-rel.en.html#sea
Tom Carden [ 4 March 2008 01:41 AM]
Andrew Vande Moere's http://infosthetics.com is essential in this area too.
I also enjoy Matthew Hurst's "Data Mining" blog: http://datamining.typepad.com/