Radar Roundup: Data Mining and Visualization

  • Visual Complexity: a proof sheet and index for the visualization projects on the web. cf InfoVis.
  • TextRank Paper (PDF): “In this paper, we introduce TextRank–a graph-based ranking model for text processing, and show how this model can be successfully used in natural language applications.” Some detective work suggests it might be (at least part of) the algorithm Google uses for book search results, raising the ghastly idea of print books being SEOed.
  • SNA, a social network tools package for the R project. “A range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, p* modeling, random graph generation, and 2D/3D network visualization.”
  • Catching a Poker Cheat with Data Mining (Simple Complexity): how did the stats-savvy top players on a gambling website figure out that they were losing to cheats rather than just better players? They gathered and graphed the stats, and the bad guy stood out like the proverbial sore thumb. That’s a link to a summary of the story, the full details are also online.
  • The Meaning of Confidence (hunch.net): the word “confidence” is used several ways in machine learning papers. This short post teases out the difference senses.
  • Data Park Search Engine: full text indexer for web sites. Love the feature list! Multiple languages, IDNA, accent-insensitive search, and more. Disclaimer: only just found it, haven’t used it. It might delete all your cat captions and send your money to the moon for all I know.