ENTRIES TAGGED "nltk"
- Fuzzy String Matching in Python (Streamhacker) — useful if you’re to have a hope against the swelling dark forces powered by illiteracy and touchscreen keyboards.
- The Business of Illegal Data (Strata Conference) — fascinating presentation on criminal use of big data. “The more data you produce, the happier criminals are to receive and use it. Big data is big business for organized crime, which represents 15% of GDP.”
- Isarithmic Maps — an alternative to chloropleths for geodata visualization.
Stemming Demo, Mapping Service, Value of Data, and The Magic of the Valley
- Demo of Stemming Algorithms — type in text and see what it looks like when stemmed with different algorithms provided by NLTK. (via zelandiya on Twitter)
- Crowdmap — hosted Ushahidi. (via dvansickle on Twitter)
- Opinions vs Data — talks about the usability of a new gmail UI element, but notable for this quote from Jakob Nielsen: In my two examples, the probability of making the right design decision was vastly improved when given the tiniest amount of empirical data. (via mcannonbrookes on Twitter)
- The Next Silicon Valley — long and detailed list of the many forces contributing to Silicon Valley’s success as tech hub, arguing that the valley’s position is path-dependent and can simply be grown ab initio in some aspiring nation’s co-prosperity zone of policy whim. (via imran and timoreilly on Twitter)
Statistical Jeopardy Wins, Mobile Taxonomy, Geodata Mystery, and Machine Learning Blog
- What is IBM’s Watson? (NY Times) — IBM joining the big data machine learning race, and hatching a Blue Gene system that can answer Jeopardy questions. Does good, not great, and is getting better.
- Google Lays Out its Mobile Strategy (InformationWeek) — notable to me for Rechis said that Google breaks down mobile users into three behavior groups: A. “Repetitive now” B. “Bored now” C. “Urgent now”, a useful way to look at it. (via Tim)
- BP GIS and the Mysteriously Vanishing Letter — intrigue in the geodata world. This post makes it sound as though cleanup data is going into a box behind BP’s firewall, and the folks who said “um, the government should be the depot, because it needs to know it has a guaranteed-untampered and guaranteed-able-to-access copy of this data” were fired. For more info, including on the data that is available, see the geowanking thread.
- Streamhacker — a blog talking about text mining and other good things, with nltk code you can run. (via heraldxchaos on Delicious)