- How to Dispel Your Illusions (NY Review of Books) — Freeman Dyson writing about Daniel Kahneman’s latest book. Only by understanding our cognitive illusions can we hope to transcend them.
- Appify-UI (github) — Create the simplest possible Mac OS X apps. Uses HTML5 for the UI. Supports scripting with anything and everything. (via Hacker News)
- Translation Memory (Etsy) — using Lucene/SOLR to help automate the translation of their UI. (via Twitter)
- Automatically Tagging Entities with Descriptive Phrases (PDF) — Microsoft Research paper on automated tagging. Under the hood it uses Map/Reduce and the Microsoft Dryad framework. (via Ben Lorica)
ENTRIES TAGGED "text analysis"
Dispel Your Illusions, Simple Mac OS X Apps, Assisted Translation, and AutoTagging
Quantified Learner, Text Extraction, Backup Flickr, and Multitouch UI Awesomeness
- Learning With Quantified Self — this CS grad student broke Jeopardy records using an app he built himself to quantify and improve his ability to answer Jeopardy questions in different categories. This is an impressive short talk and well worth watching.
- Evaluating Text Extraction Algorithms — The gold standard of both datasets was produced by human annotators. 14 different algorithms were evaluated in terms of precision, recall and F1 score. The results have show that the best opensource solution is the boilerpipe library. (via Hacker News)
- Parallel Flickr — tool for backing up your Flickr account. (Compare to one day of Flickr photos printed out)
- Quneo Multitouch Open Source MIDI and USB Pad (Kickstarter) — interesting to see companies using Kickstarter to seed interest in a product. This one looks a doozie: pads, sliders, rotary sensors, with LEDs underneath and open source drivers and SDK. Looks almost sophisticated enough to drive emacs :-)
Internet Asthma Care, C Fulltext, Citizen Science, and Mozilla
- Cost-Effectiveness of Internet-Based Self-Management Compared with Usual Care in Asthma (PLoSone) — Internet-based self-management of asthma can be as effective as current asthma care and costs are similar.
- Apache Lucy — full-text search engine library written in C and targeted at dynamic languages. It is a “loose C” port of Apache Lucene™, a search engine library for Java.
- The Near Future of Citizen Science (Fiona Romeo) — near future of science is all about honing the division of labour between professionals, amateurs and bots. See Bryce’s bionic software riff. (via Matt Jones)
- Microsoft’s Patent Claims Against Android (Groklaw) — behold, citizen, the formidable might of Microsoft’s patents and how they justify a royalty from every Android device equal to that which you would owe if you built a Windows Mobile device: These Microsoft patents can be divided into several basic categories: (1) the ’372 and ’780 patents relate to web browsers; (2) the ’551 and ’233 patents relate to electronic document annotation and highlighting; (3) the ’522 patent relates to resources provided by operating systems; (4) the ’517 and ’352 patents deal with compatibility with file names once employed by old, unused, and outmoded operating systems; (5) the ’536 and ’853 patents relate to simulating mouse inputs using non-mouse devices; and (6) the ’913 patent relates to storing input/output access factors in a shared data structure. A shabby display of patent menacing.
Mozilla's Projects, YouTube Insults, iPhone Ultrasound, RoR Intro
- What Mozilla is Up To (Luke Wroblewski) — notes from a talk that Brendan Eich gave at Web 2.0 Summit. The new browser war is between the Web and new walled gardens of native networked apps. Interesting to see the effort Mozilla’s putting into native-alike Web apps.
- YouTube Insult Generator (Adrian Holovaty) — mines YouTube for insults of a particular form.
- Ultrasound for iPhone (Geekwire) — this personal sensor is $8000 today, but bound to drop. I want personal ultrasound at least once a month. How long until it’s in the $200-500 range? (via BERG London)
- Web Applications Class at Stanford OpenClassroom — a Ruby on Rails class taught by John Ousterhout, creator of TCL/Tk and log-structured filesystems.
Earth's Birthday, Messy Data, Evil iOS Apps, and Cooking Chemistry
- Earth Turns 6015 — my plan to celebrate on Saturday the amazing thing that is our universe. Scientists know humility, curiosity, and awe. All the scientists I know speak of their awe at the natural world. I’d like to see data scientists take a moment to soak in the complexity of a problem, appreciating it in all its tangled majesty, separate from attempts to unravel it.
- Data Jujitsu — Luke Wroblewski took notes at DJ Patil’s Web 2.0 Expo talk, and this caught my eye: Unstructured data is harder to work with. Open text fields in forms are can cause issues. There are between 4 and 8 thousand variations of IBM and “Software Engineer” in LinkedIn’s database.
- Secret iOS Business — the dirty innards of iOS apps: phoning home, crap security, and bloated lazy design. My horror grew with every example.
- Culinary Reactions: Everyday Chemistry of Cooking — Simon Quellen Field’s new book on the chemistry of cooking. Simon’s the man behind scitoys and his passion for understanding is a force of nature.
Sentiment analysis sheds new light on an old book.
OpenBible.info found a novel way to examine one of the world's most analyzed texts: Create a visualization showing the rise and fall of sentiment across the Bible.
- Backup Bouncer — software to test how effective your backup tools are: you copy files to a test area by whatever means you like, then run this tool to see whether permissions, flags, owners, contents, timestamps, etc. are preserved. (via Joshua Schachter)
- reVerb — open source (GPLv3) toolkit for learning triples from text. See the paper for more details.
Organisational Warfare, RTFM, Timezone Shapefile, Microsoft Adventure
- Organisational Warfare (Simon Wardley) — notes on the commoditisation of software, with interesting analyses of the positions of some large players. On closer inspection, Salesforce seems to be doing more than just commoditisation with an ILC pattern, as can be clearly seen from Radian’s 6 acquisition. They also seem to be operating a tower and moat strategy, i.e. creating a tower of revenue (the service) around which is built a moat devoid of differential value with high barriers to entry. When their competitors finally wake up and realise that the future world of CRM is in this service space, they’ll discover a new player dominating this space who has not only removed many of the opportunities to differentiate (e.g. social CRM, mobile CRM) but built a large ecosystem that creates high rates of new innovation. This should be a fairly fatal combination.
- Learning to Win by Reading Manuals in a Monte-Carlo Framework (MIT) — starting with no prior knowledge of the game or its UI, the system learns how to play and to win by experimenting, and from parsed manual text. They used FreeCiv, and assessed the influence of parsing the manual shallowly and deeply. Trust MIT to turn RTFM into a paper. For human-readable explanation, see the press release.
- A Shapefile of the TZ Timezones of the World — I have nothing but sympathy for the poor gentleman who compiled this. Political boundaries are notoriously arbitrary, and timezones are even worse because they don’t need a war to change. (via Matt Biddulph)
- Microsoft Adventure — 1979 Microsoft game for the TRS-80 has fascinating threads into the past and into what would become Microsoft’s future.