- Web-Scale User Modeling for Targeting (Yahoo! Research, PDF) — research paper that shows how online advertisers build profiles of us and what matters (e.g., ads we buy from are more important than those we simply click on). Our recent surfing patterns are more relevant than historical ones, which is another indication that value of data analytics increases the closer to real-time it happens. (via Greg Linden)
- Information Technology and Economic Change — research showing that cities which adopted the printing press no prior growth advantage, but subsequently grew far faster than similar cities without printing presses. [...] The second factor behind the localisation of spillovers is intriguing given contemporary questions about the impact of information technology. The printing press made it cheaper to transmit ideas over distance, but it also fostered important face-to-face interactions. The printer’s workshop brought scholars, merchants, craftsmen, and mechanics together for the first time in a commercial environment, eroding a pre-existing “town and gown” divide.
- They Just Don’t Get It (Cameron Neylon) — curating access to a digital collection does not scale.
- Should Libraries Get Out of the Ebook Business? — provocative thought: the ebook industry is nascent, a small number of patrons have ereaders, the technical pain of DRM and incompatible formats makes for disproportionate support costs, and there are already plenty of worthy things libraries should be doing. I only wonder how quickly the dynamics change: a minority may have dedicated ereaders but a large number have smartphones and are reading on them already.
ENTRIES TAGGED "analytics"
Inside Personalized Advertising, Printing Presses Were Good For The Economy, Digital Access, and Ebooks in Libraries
The work of data journalists and a comparison of four data markets.
This week's data news includes a look at the work of various data journalists, Edd Dumbill surveys four data marketplaces, and the MIT Sloan Sports Analytics Conference experiences impressive growth.
Stuff That Matters, Web Waste, Learning Analytics, and Thoughtful Quotes
- SoupHub — NZ project putting a computer with Internet access (and instruction and help) into a soup kitchen. I can’t take any credit for it, but I’m delighted beyond measure that the idea for this was hatched at Kiwi Foo Camp. I love that my peeps are doing stuff that matters. (See also the newspaper writeup)
- Bandwidth of Pages — view a 140 character tweet on the web and you’re load 2MB of, well, let’s call it crap.
- On The Reductionism of Analytics in Education (Anne Zelenka) — Learning analytics, as practiced today, is reductionist to an extreme. We are reducing too many dimensions into too few. More than that, we are describing and analyzing only those things that we can describe and analyze, when what matters exists at a totally different level and complexity. We are missing emergent properties of educational and learning processes by focusing on the few things we can measure and by trying to automate what decisions and actions might be automated. A fantastic post, which coins the phrase “the math is not the territory”.
- Quotes Worth Spreading (Karl Fisch) — collection of thought-provoking quotes from recent TED talks. Be generous by graciously accepting compliments. It’s a gift you give the complimenter (John Bates) is something I’m particularly working on.
Analytics in Excel, HTTP Debugger, Analytics for Personalized Healthcare, and EFF To The Rescue
- Excel Cloud Data Analytics (Microsoft Research) — clever–a cloud analytics backend with Excel as the frontend. Almost every business and finance person I’ve known has been way more comfortable with Excel than any other tool. (via Dr Data)
- HTTP Client — Mac OS X app for inspecting and automating a lot of HTTP. cf the lovely Charles proxy for debugging. (via Nelson Minar)
- The Creative Destruction of Medicine — using big data, gadgets, and sweet tech in general to personalize and improve healthcare. (via New York Times)
- EFF Wins Protection of Time Zone Database (EFF) — I posted about the silliness before (maintainers of the only comprehensive database of time zones was being threatened by astrologers). The EFF stepped in, beat back the buffoons, and now we’re back to being responsible when we screw up timezones for phone calls.
Text Analysis Bundle, Scala Probabilistic Modeling, Game Analytics, and Encouraging Writing
- Pattern — a BSD-licensed bundle of Python tools for data retrieval, text analysis, and data visualization. If you were going to get started with accessible data (Twitter, Google), the fundamentals of analysis (entity extraction, clustering), and some basic visualizations of graph relationships, you could do a lot worse than to start here.
- Factorie (Google Code) — Apache-licensed Scala library for a probabilistic modeling technique successfully applied to [...] named entity recognition, entity resolution, relation extraction, parsing, schema matching, ontology alignment, latent-variable generative models, including latent Dirichlet allocation. The state-of-the-art big data analysis tools are increasingly open source, presumably because the value lies in their application not in their existence. This is good news for everyone with a new application.
- Playtomic — analytics as a service for gaming companies to learn what players actually do in their games. There aren’t many fields untouched by analytics.
- Write or Die — iPad app for writers where, if you don’t keep writing, it begins to delete what you wrote earlier. Good for production to deadlines; reflective editing and deep thought not included.
Crypography Illustrated, Hollywood Futures, Machine Learning Mastery, and Analytics Assumptions
- An Illustrated Guide to Crypographic Hashes — exactly what it says: learn how hashing works and how you’d use it for passwords, digital signatures, etc.
- The Age of Fanfiction — We live in a time where copyright means very little to younger people, and it’s not just because they want free movies or free music. More than that, they want to be able to play with the amazing toys that they’ve been given by filmmakers and comic book writers and TV creators, and they want to do so without the constraints that copyright creates. Eloquent and thoughtful piece on what this means for Hollywood and how “the Age of Fanfiction is reflected in what Hollywood’s making. (via Sacha Judd)
- How Khan Academy is Using Machine Learning to Assess Student Mastery — it is bloody hard to know when a student has mastered a subject, both for real live teachers and for roboteachers like Khan Academy. This is a detailed discussion of a change in assessment within Khan Academy. if we define proficiency as your chance of getting the next problem correct being above a certain threshold, then the streak becomes a poor binary classifier. Experiments conducted on our data showed a significant difference between students who take, say, 30 problems to get a streak vs. 10 problems right off the bat — the former group was much more likely to miss the next problem after a break than the latter.
- In Which I Declare Four Things My Probability Class is Not About — a reminder of the assumptions we make when we use numerical analysis to understand a problem.
Panagiotis Ipeirotis on the vagaries of semantic analysis and Mechanical Turk's quirks.
In a recent interview, NYU Professor Panagiotis Ipeirotis explained why a "good" online review is often perceived negatively. He also discussed Mechanical Turk's growing pains.