"analytics" entries

Top Stories: June 11-15, 2012

The future of desktops, ethics and big data, narrative vs spreadsheets.

This week on O'Reilly: Josh Marinacci predicted that 90% of computer users will rely on mobile, but 10% will still need desktops; the authors of "Ethics of Big Data" explored data's trickiest issues; and Narrative Science CTO Kris Hammond discussed narrative's role in data analytics.

O'Reilly Radar Show 3/12/12: Best data interviews from Strata California 2012

Doug Cutting on Hadoop, Max Gadney on video data graphics, Jeremy Howard on big data and analytics.

Hadoop creator Doug Cutting discussing the similarities between Linux and the big data world, Max Gadney from After the Flood explains the benefits of video data graphics, Kaggle's Jeremy Howard looks at the difference between big data and analytics.

Four short links: 12 March 2012

Four short links: 12 March 2012

Inside Personalized Advertising, Printing Presses Were Good For The Economy, Digital Access, and Ebooks in Libraries

  1. Web-Scale User Modeling for Targeting (Yahoo! Research, PDF) — research paper that shows how online advertisers build profiles of us and what matters (e.g., ads we buy from are more important than those we simply click on). Our recent surfing patterns are more relevant than historical ones, which is another indication that value of data analytics increases the closer to real-time it happens. (via Greg Linden)
  2. Information Technology and Economic Change — research showing that cities which adopted the printing press no prior growth advantage, but subsequently grew far faster than similar cities without printing presses. […] The second factor behind the localisation of spillovers is intriguing given contemporary questions about the impact of information technology. The printing press made it cheaper to transmit ideas over distance, but it also fostered important face-to-face interactions. The printer’s workshop brought scholars, merchants, craftsmen, and mechanics together for the first time in a commercial environment, eroding a pre-existing “town and gown” divide.
  3. They Just Don’t Get It (Cameron Neylon) — curating access to a digital collection does not scale.
  4. Should Libraries Get Out of the Ebook Business? — provocative thought: the ebook industry is nascent, a small number of patrons have ereaders, the technical pain of DRM and incompatible formats makes for disproportionate support costs, and there are already plenty of worthy things libraries should be doing. I only wonder how quickly the dynamics change: a minority may have dedicated ereaders but a large number have smartphones and are reading on them already.
Four short links: 9 March 2012

Four short links: 9 March 2012

Real World User Experience, Biovis your Social Network, Analytics for Phone Sales, and Classy OpenStreetMap

  1. Why The Symphony Needs A Progress Bar (Elaine Wherry) — an excellent interaction designer tackles the real world.
  2. Biologic — view your social network as though looking at cells through a microscope. Gorgeous and different.
  3. The Cost of Cracking — analysis of used phone listings to see what improves and decreases price yields some really interesting results. Phones described as “decent” are typically priced 23% below the median. Who would describe something they’re selling as “decent” and price it below market value unless something fishy was going on? […] On average, cracking your phone destroys 30-50% of its value instantly. Particularly interesting to me since Ms 10 just brought home her phone with *cough* a new starburst screensaver.
  4. OpenStreetMap Welcomes Apple — this is the classy way to deal with the world’s richest company quietly and badly using your work without acknowledgement.

Strata Week: Profiling data journalists

The work of data journalists and a comparison of four data markets.

This week's data news includes a look at the work of various data journalists, Edd Dumbill surveys four data marketplaces, and the MIT Sloan Sports Analytics Conference experiences impressive growth.

Four short links: 6 March 2012

Four short links: 6 March 2012

Stuff That Matters, Web Waste, Learning Analytics, and Thoughtful Quotes

  1. SoupHub — NZ project putting a computer with Internet access (and instruction and help) into a soup kitchen. I can’t take any credit for it, but I’m delighted beyond measure that the idea for this was hatched at Kiwi Foo Camp. I love that my peeps are doing stuff that matters. (See also the newspaper writeup)
  2. Bandwidth of Pages — view a 140 character tweet on the web and you’re load 2MB of, well, let’s call it crap.
  3. On The Reductionism of Analytics in Education (Anne Zelenka) — Learning analytics, as practiced today, is reductionist to an extreme. We are reducing too many dimensions into too few. More than that, we are describing and analyzing only those things that we can describe and analyze, when what matters exists at a totally different level and complexity. We are missing emergent properties of educational and learning processes by focusing on the few things we can measure and by trying to automate what decisions and actions might be automated. A fantastic post, which coins the phrase “the math is not the territory”.
  4. Quotes Worth Spreading (Karl Fisch) — collection of thought-provoking quotes from recent TED talks. Be generous by graciously accepting compliments. It’s a gift you give the complimenter (John Bates) is something I’m particularly working on.
Four short links: 24 February 2012

Four short links: 24 February 2012

Analytics in Excel, HTTP Debugger, Analytics for Personalized Healthcare, and EFF To The Rescue

  1. Excel Cloud Data Analytics (Microsoft Research) — clever–a cloud analytics backend with Excel as the frontend. Almost every business and finance person I’ve known has been way more comfortable with Excel than any other tool. (via Dr Data)
  2. HTTP Client — Mac OS X app for inspecting and automating a lot of HTTP. cf the lovely Charles proxy for debugging. (via Nelson Minar)
  3. The Creative Destruction of Medicine — using big data, gadgets, and sweet tech in general to personalize and improve healthcare. (via New York Times)
  4. EFF Wins Protection of Time Zone Database (EFF) — I posted about the silliness before (maintainers of the only comprehensive database of time zones was being threatened by astrologers). The EFF stepped in, beat back the buffoons, and now we’re back to being responsible when we screw up timezones for phone calls.
Four short links: 6 February 2012

Four short links: 6 February 2012

E-Commerce Analytics, Text Mining on Hadoop, Bozonics, and It's Safe To Write With a Mac Again

  1. Jirafe — open source e-commerce analytics for Magento platform.
  2. iModela — a $1000 3D milling machine. (via BoingBoing)
  3. It’s Too Late to Save The Common Web (Robert Scoble) — paraphrased: “Four years ago, I told you all that Google and Facebook were evil. You did nothing, which is why I must now use Google and Facebook.” His list of reasons that Facebook beats the Open Web gives new shallows to the phrase “vanity metrics”. Yes, the open web does not go out of its way to give you an inflated sense of popularity and importance. On the other hand, the things you do put there are in your control and will stay as long as you want them to. But that’s obviously not a killer feature compared to a bottle of Astroglide and an autorefreshing page showing your Klout score and the number of Google+ circles you’re in.
  4. iBooks Author EULA Clarified (MacObserver) — important to note that it doesn’t say you can’t use the content you’ve written, only that you can’t sell .ibook files through anyone but Apple. Less obnoxious than the “we own all your stuff, dude” interpretation, but still a bit crap. I wonder how anticompetitive this will be seen as. Apple’s vertical integration is ripe for Justice Department investigation.
Four short links: 26 December 2011

Four short links: 26 December 2011

Text Analysis Bundle, Scala Probabilistic Modeling, Game Analytics, and Encouraging Writing

  1. Pattern — a BSD-licensed bundle of Python tools for data retrieval, text analysis, and data visualization. If you were going to get started with accessible data (Twitter, Google), the fundamentals of analysis (entity extraction, clustering), and some basic visualizations of graph relationships, you could do a lot worse than to start here.
  2. Factorie (Google Code) — Apache-licensed Scala library for a probabilistic modeling technique successfully applied to […] named entity recognition, entity resolution, relation extraction, parsing, schema matching, ontology alignment, latent-variable generative models, including latent Dirichlet allocation. The state-of-the-art big data analysis tools are increasingly open source, presumably because the value lies in their application not in their existence. This is good news for everyone with a new application.
  3. Playtomic — analytics as a service for gaming companies to learn what players actually do in their games. There aren’t many fields untouched by analytics.
  4. Write or Die — iPad app for writers where, if you don’t keep writing, it begins to delete what you wrote earlier. Good for production to deadlines; reflective editing and deep thought not included.
Four short links: 30 November 2011

Four short links: 30 November 2011

Crypography Illustrated, Hollywood Futures, Machine Learning Mastery, and Analytics Assumptions

  1. An Illustrated Guide to Crypographic Hashes — exactly what it says: learn how hashing works and how you’d use it for passwords, digital signatures, etc.
  2. The Age of FanfictionWe live in a time where copyright means very little to younger people, and it’s not just because they want free movies or free music. More than that, they want to be able to play with the amazing toys that they’ve been given by filmmakers and comic book writers and TV creators, and they want to do so without the constraints that copyright creates. Eloquent and thoughtful piece on what this means for Hollywood and how “the Age of Fanfiction is reflected in what Hollywood’s making. (via Sacha Judd)
  3. How Khan Academy is Using Machine Learning to Assess Student Mastery — it is bloody hard to know when a student has mastered a subject, both for real live teachers and for roboteachers like Khan Academy. This is a detailed discussion of a change in assessment within Khan Academy. if we define proficiency as your chance of getting the next problem correct being above a certain threshold, then the streak becomes a poor binary classifier. Experiments conducted on our data showed a significant difference between students who take, say, 30 problems to get a streak vs. 10 problems right off the bat — the former group was much more likely to miss the next problem after a break than the latter.
  4. In Which I Declare Four Things My Probability Class is Not About — a reminder of the assumptions we make when we use numerical analysis to understand a problem.