- Pattern — a BSD-licensed bundle of Python tools for data retrieval, text analysis, and data visualization. If you were going to get started with accessible data (Twitter, Google), the fundamentals of analysis (entity extraction, clustering), and some basic visualizations of graph relationships, you could do a lot worse than to start here.
- Factorie (Google Code) — Apache-licensed Scala library for a probabilistic modeling technique successfully applied to [...] named entity recognition, entity resolution, relation extraction, parsing, schema matching, ontology alignment, latent-variable generative models, including latent Dirichlet allocation. The state-of-the-art big data analysis tools are increasingly open source, presumably because the value lies in their application not in their existence. This is good news for everyone with a new application.
- Playtomic — analytics as a service for gaming companies to learn what players actually do in their games. There aren’t many fields untouched by analytics.
- Write or Die — iPad app for writers where, if you don’t keep writing, it begins to delete what you wrote earlier. Good for production to deadlines; reflective editing and deep thought not included.
ENTRIES TAGGED "analytics"
Text Analysis Bundle, Scala Probabilistic Modeling, Game Analytics, and Encouraging Writing
Crypography Illustrated, Hollywood Futures, Machine Learning Mastery, and Analytics Assumptions
- An Illustrated Guide to Crypographic Hashes — exactly what it says: learn how hashing works and how you’d use it for passwords, digital signatures, etc.
- The Age of Fanfiction — We live in a time where copyright means very little to younger people, and it’s not just because they want free movies or free music. More than that, they want to be able to play with the amazing toys that they’ve been given by filmmakers and comic book writers and TV creators, and they want to do so without the constraints that copyright creates. Eloquent and thoughtful piece on what this means for Hollywood and how “the Age of Fanfiction is reflected in what Hollywood’s making. (via Sacha Judd)
- How Khan Academy is Using Machine Learning to Assess Student Mastery — it is bloody hard to know when a student has mastered a subject, both for real live teachers and for roboteachers like Khan Academy. This is a detailed discussion of a change in assessment within Khan Academy. if we define proficiency as your chance of getting the next problem correct being above a certain threshold, then the streak becomes a poor binary classifier. Experiments conducted on our data showed a significant difference between students who take, say, 30 problems to get a streak vs. 10 problems right off the bat — the former group was much more likely to miss the next problem after a break than the latter.
- In Which I Declare Four Things My Probability Class is Not About — a reminder of the assumptions we make when we use numerical analysis to understand a problem.
Panagiotis Ipeirotis on the vagaries of semantic analysis and Mechanical Turk's quirks.
In a recent interview, NYU Professor Panagiotis Ipeirotis explained why a "good" online review is often perceived negatively. He also discussed Mechanical Turk's growing pains.
Flurry's Sean Byrnes on mobile metrics and tablet apps vs phone apps.
Flurry's CTO Sean Byrnes discusses app life cycles, the specifics of user engagement, and the difference between smartphone apps and tablet apps.
Unlike traditional TV analytics, social data tracks both viewership and sentiment.
TV shows broke out of the television years ago, but traditional analytics still focus on limited metrics. PeopleBrowsr CEO Jodee Rich says social data offers a better way to see what audiences watch and what they care about.
Open data products with purpose, why finance should care about big data, and the untapped value of site search.
This week on O'Reilly: Tom Steinberg from mySociety offered practical advice for building useful and long lasting open data products, we examined the intersection of big data and finance, and we learned why neglected site search engines deserve way more attention.
Hilary Mason on how Bitly applies the Internet's real-time data.
In this interview, Bitly chief scientist and Strata speaker Hilary Mason discusses the application of real-time data and the difference between analytics and data science.
Lou Rosenfeld on the benefits of parsing and refining site search.
A gold mine is hiding in the data generated by website search engines, yet many site owners pay little attention to the analytics those engines yield. Author Lou Rosenfeld explains why site search is worth your time.
Flurry's Sean Byrnes on the challenges of mobile analytics.
Flurry's Sean Byrnes talks about the intricacies of mobile analytics, the metrics app developers care about most, and the problems that stem from Android fragmentation.
Opera Solutions' Arnab Gupta says human plus machine always trumps human vs machine.
Managing data and extracting meaning require new approaches, new education, and even a new language. Opera Solutions CEO Arnab Gupta discusses each of these areas in the following interview.