ENTRIES TAGGED "nltk"

Four short links: 22 December 2011

Four short links: 22 December 2011

Fuzzy Text, Big Data Crime, Map Visualization, and Attacking Server-Side Javascript

  1. Fuzzy String Matching in Python (Streamhacker) — useful if you’re to have a hope against the swelling dark forces powered by illiteracy and touchscreen keyboards.
  2. The Business of Illegal Data (Strata Conference) — fascinating presentation on criminal use of big data. “The more data you produce, the happier criminals are to receive and use it. Big data is big business for organized crime, which represents 15% of GDP.”
  3. Isarithmic Maps — an alternative to chloropleths for geodata visualization.
  4. Server-Side Javascript Injection (PDF) — a Blackhat talk about exploiting backend vulnerabilities with techniques learned from attacking Javascript frontends. Both this paper and the accompanying talk will discuss security vulnerabilities that can arise when software developers create applications or modules for use with JavaScript-based server applications such as NoSQL database engines or Node.js web servers. In the worst-case scenario, an attacker can exploit these vulnerabilities to upload and execute arbitrary binary files on the server machine, effectively granting him full control over the server.
Comment |
Four short links: 17 August 2010

Four short links: 17 August 2010

Stemming Demo, Mapping Service, Value of Data, and The Magic of the Valley

  1. Demo of Stemming Algorithms — type in text and see what it looks like when stemmed with different algorithms provided by NLTK. (via zelandiya on Twitter)
  2. Crowdmap — hosted Ushahidi. (via dvansickle on Twitter)
  3. Opinions vs Data — talks about the usability of a new gmail UI element, but notable for this quote from Jakob Nielsen: In my two examples, the probability of making the right design decision was vastly improved when given the tiniest amount of empirical data. (via mcannonbrookes on Twitter)
  4. The Next Silicon Valley — long and detailed list of the many forces contributing to Silicon Valley’s success as tech hub, arguing that the valley’s position is path-dependent and can simply be grown ab initio in some aspiring nation’s co-prosperity zone of policy whim. (via imran and timoreilly on Twitter)
Comments Off |
Four short links: 17 June 2010

Four short links: 17 June 2010

Statistical Jeopardy Wins, Mobile Taxonomy, Geodata Mystery, and Machine Learning Blog

  1. What is IBM’s Watson? (NY Times) — IBM joining the big data machine learning race, and hatching a Blue Gene system that can answer Jeopardy questions. Does good, not great, and is getting better.
  2. Google Lays Out its Mobile Strategy (InformationWeek) — notable to me for Rechis said that Google breaks down mobile users into three behavior groups: A. “Repetitive now” B. “Bored now” C. “Urgent now”, a useful way to look at it. (via Tim)
  3. BP GIS and the Mysteriously Vanishing Letter — intrigue in the geodata world. This post makes it sound as though cleanup data is going into a box behind BP’s firewall, and the folks who said “um, the government should be the depot, because it needs to know it has a guaranteed-untampered and guaranteed-able-to-access copy of this data” were fired. For more info, including on the data that is available, see the geowanking thread.
  4. Streamhacker — a blog talking about text mining and other good things, with nltk code you can run. (via heraldxchaos on Delicious)
Comments Off |