mbox — more technical information than you ever thought you’d need, to be saved for the time when you have to parse mailbox files. It’s a nightmare. (via Hacker News)
Maui (Google Code) — Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles. GPLv3.
Pattern — a BSD-licensed bundle of Python tools for data retrieval, text analysis, and data visualization. If you were going to get started with accessible data (Twitter, Google), the fundamentals of analysis (entity extraction, clustering), and some basic visualizations of graph relationships, you could do a lot worse than to start here.
Factorie (Google Code) — Apache-licensed Scala library for a probabilistic modeling technique successfully applied to [...] named entity recognition, entity resolution, relation extraction, parsing, schema matching, ontology alignment, latent-variable generative models, including latent Dirichlet allocation. The state-of-the-art big data analysis tools are increasingly open source, presumably because the value lies in their application not in their existence. This is good news for everyone with a new application.
Playtomic — analytics as a service for gaming companies to learn what players actually do in their games. There aren’t many fields untouched by analytics.
Write or Die — iPad app for writers where, if you don’t keep writing, it begins to delete what you wrote earlier. Good for production to deadlines; reflective editing and deep thought not included.
Fuzzy String Matching in Python (Streamhacker) — useful if you’re to have a hope against the swelling dark forces powered by illiteracy and touchscreen keyboards.
The Business of Illegal Data (Strata Conference) — fascinating presentation on criminal use of big data. “The more data you produce, the happier criminals are to receive and use it. Big data is big business for organized crime, which represents 15% of GDP.”
Isarithmic Maps — an alternative to chloropleths for geodata visualization.
Critically Making the Internet of Things (Anne Galloway) — session notes from a conference, see also part two. Good thoughts, hastily captured. For example, this from Bruce Sterling: RFID + Superglue + Object ≠ IoT and the talk I want to see: “A study of how broken, hacked and malfunctioning digital road signs subvert the physical space of roadways.”
Conquering the CHAOS of Online Community at StackExchange — StackExchange is doing some thoughtful work analysing conversations and channeling dissent into a healthy construction to guide future productive discussion. “We taught the users that it was alright to disagree, and gave them a set of arguments they could reference without every thread degenerating into a fight.”
Learning With Quantified Self — this CS grad student broke Jeopardy records using an app he built himself to quantify and improve his ability to answer Jeopardy questions in different categories. This is an impressive short talk and well worth watching.
Evaluating Text Extraction Algorithms — The gold standard of both datasets was produced by human annotators. 14 different algorithms were evaluated in terms of precision, recall and F1 score. The results have show that the best opensource solution is the boilerpipe library. (via Hacker News)
Quneo Multitouch Open Source MIDI and USB Pad (Kickstarter) — interesting to see companies using Kickstarter to seed interest in a product. This one looks a doozie: pads, sliders, rotary sensors, with LEDs underneath and open source drivers and SDK. Looks almost sophisticated enough to drive emacs :-)
Microsoft’s Patent Claims Against Android (Groklaw) — behold, citizen, the formidable might of Microsoft’s patents and how they justify a royalty from every Android device equal to that which you would owe if you built a Windows Mobile device: These Microsoft patents can be divided into several basic categories: (1) the ’372 and ’780 patents relate to web browsers; (2) the ’551 and ’233 patents relate to electronic document annotation and highlighting; (3) the ’522 patent relates to resources provided by operating systems; (4) the ’517 and ’352 patents deal with compatibility with file names once employed by old, unused, and outmoded operating systems; (5) the ’536 and ’853 patents relate to simulating mouse inputs using non-mouse devices; and (6) the ’913 patent relates to storing input/output access factors in a shared data structure. A shabby display of patent menacing.
Science Hack Day SF Videos (justin.tv) — the demos from Science Hack Day SF. The journey of a thousand miles starts with a Hack Day.
A Cross-Sectional Study of Canine Tail-Chasing and Human Responses to It, Using a Free Video-Sharing Website (PLoSone) — Approximately one third of tail-chasing dogs showed clinical signs, including habitual (daily or “all the time”) or perseverative (difficult to distract) performance of the behaviour. These signs were observed across diverse breeds. Clinical signs appeared virtually unrecognised by the video owners and commenting viewers; laughter was recorded in 55% of videos, encouragement in 43%, and the commonest viewer descriptors were that the behaviour was “funny” (46%) or “cute” (42%).
RSS Died For Your Sins (Danny O’Brien) — if you have seven thousand people following you, a good six thousand of those are going to be people you don’t particularly like. The problem, as ever, is—how do you pick out the other thousand? Especially when they keep changing? I firmly believe that one of the pressing unsolved technological problems of the modern age is getting safely away from people you don’t like, without actually throttling them to death beforehand, nor somehow coming to the conclusion that they don’t exist, nor ending up turning yourself into a hateful monster.
Generating Text from Functional Brain Images (Frontiers in Human Neuroscience) — We built a model of the mental semantic representation of concrete concepts from text data and learned to map aspects of such representation to patterns of activation in the corresponding brain image. Turns out that the clustering of concepts in Wikipedia is similar to how they’re clustered in the brain. They found clusters in Wikipedia, mapped to the brain activity for known words, and then used that mapping to find words for new images of brain activity. (via The Economist)
What Mozilla is Up To (Luke Wroblewski) — notes from a talk that Brendan Eich gave at Web 2.0 Summit. The new browser war is between the Web and new walled gardens of native networked apps. Interesting to see the effort Mozilla’s putting into native-alike Web apps.
Ultrasound for iPhone (Geekwire) — this personal sensor is $8000 today, but bound to drop. I want personal ultrasound at least once a month. How long until it’s in the $200-500 range? (via BERG London)