"text analysis" entries

Four short links: 15 November 2011

Four short links: 15 November 2011

Internet Asthma Care, C Fulltext, Citizen Science, and Mozilla

  1. Cost-Effectiveness of Internet-Based Self-Management Compared with Usual Care in Asthma (PLoSone) — Internet-based self-management of asthma can be as effective as current asthma care and costs are similar.
  2. Apache Lucy full-text search engine library written in C and targeted at dynamic languages. It is a “loose C” port of Apache Luceneā„¢, a search engine library for Java.
  3. The Near Future of Citizen Science (Fiona Romeo) — near future of science is all about honing the division of labour between professionals, amateurs and bots. See Bryce’s bionic software riff. (via Matt Jones)
  4. Microsoft’s Patent Claims Against Android (Groklaw) — behold, citizen, the formidable might of Microsoft’s patents and how they justify a royalty from every Android device equal to that which you would owe if you built a Windows Mobile device: These Microsoft patents can be divided into several basic categories: (1) the ‘372 and ‘780 patents relate to web browsers; (2) the ‘551 and ‘233 patents relate to electronic document annotation and highlighting; (3) the ‘522 patent relates to resources provided by operating systems; (4) the ‘517 and ‘352 patents deal with compatibility with file names once employed by old, unused, and outmoded operating systems; (5) the ‘536 and ‘853 patents relate to simulating mouse inputs using non-mouse devices; and (6) the ‘913 patent relates to storing input/output access factors in a shared data structure. A shabby display of patent menacing.
Four short links: 14 November 2011

Four short links: 14 November 2011

Science Hack Days, YouTube Doggy Science, Antisocial Software, and Mind Reading with Wikipedia

  1. Science Hack Day SF Videos (justin.tv) — the demos from Science Hack Day SF. The journey of a thousand miles starts with a Hack Day.
  2. A Cross-Sectional Study of Canine Tail-Chasing and Human Responses to It, Using a Free Video-Sharing Website (PLoSone) — Approximately one third of tail-chasing dogs showed clinical signs, including habitual (daily or “all the time”) or perseverative (difficult to distract) performance of the behaviour. These signs were observed across diverse breeds. Clinical signs appeared virtually unrecognised by the video owners and commenting viewers; laughter was recorded in 55% of videos, encouragement in 43%, and the commonest viewer descriptors were that the behaviour was “funny” (46%) or “cute” (42%).
  3. RSS Died For Your Sins (Danny O’Brien) — if you have seven thousand people following you, a good six thousand of those are going to be people you don’t particularly like. The problem, as ever, is—how do you pick out the other thousand? Especially when they keep changing? I firmly believe that one of the pressing unsolved technological problems of the modern age is getting safely away from people you don’t like, without actually throttling them to death beforehand, nor somehow coming to the conclusion that they don’t exist, nor ending up turning yourself into a hateful monster.
  4. Generating Text from Functional Brain Images (Frontiers in Human Neuroscience) — We built a model of the mental semantic representation of concrete concepts from text data and learned to map aspects of such representation to patterns of activation in the corresponding brain image. Turns out that the clustering of concepts in Wikipedia is similar to how they’re clustered in the brain. They found clusters in Wikipedia, mapped to the brain activity for known words, and then used that mapping to find words for new images of brain activity. (via The Economist)
Four short links: 21 October 2011

Four short links: 21 October 2011

Mozilla's Projects, YouTube Insults, iPhone Ultrasound, RoR Intro

  1. What Mozilla is Up To (Luke Wroblewski) — notes from a talk that Brendan Eich gave at Web 2.0 Summit. The new browser war is between the Web and new walled gardens of native networked apps. Interesting to see the effort Mozilla’s putting into native-alike Web apps.
  2. YouTube Insult Generator (Adrian Holovaty) — mines YouTube for insults of a particular form.
  3. Ultrasound for iPhone (Geekwire) — this personal sensor is $8000 today, but bound to drop. I want personal ultrasound at least once a month. How long until it’s in the $200-500 range? (via BERG London)
  4. Web Applications Class at Stanford OpenClassroom — a Ruby on Rails class taught by John Ousterhout, creator of TCL/Tk and log-structured filesystems.
Four short links: 20 October 2011

Four short links: 20 October 2011

Earth's Birthday, Messy Data, Evil iOS Apps, and Cooking Chemistry

  1. Earth Turns 6015 — my plan to celebrate on Saturday the amazing thing that is our universe. Scientists know humility, curiosity, and awe. All the scientists I know speak of their awe at the natural world. I’d like to see data scientists take a moment to soak in the complexity of a problem, appreciating it in all its tangled majesty, separate from attempts to unravel it.
  2. Data Jujitsu — Luke Wroblewski took notes at DJ Patil’s Web 2.0 Expo talk, and this caught my eye: Unstructured data is harder to work with. Open text fields in forms are can cause issues. There are between 4 and 8 thousand variations of IBM and “Software Engineer” in LinkedIn’s database.
  3. Secret iOS Business — the dirty innards of iOS apps: phoning home, crap security, and bloated lazy design. My horror grew with every example.
  4. Culinary Reactions: Everyday Chemistry of Cooking — Simon Quellen Field’s new book on the chemistry of cooking. Simon’s the man behind scitoys and his passion for understanding is a force of nature.

Visualization of the Week: Sentiment in the Bible

Sentiment analysis sheds new light on an old book.

OpenBible.info found a novel way to examine one of the world's most analyzed texts: Create a visualization showing the rise and fall of sentiment across the Bible.

Four short links: 20 September 2011

Four short links: 20 September 2011

Android Plan 9, Virtualization OS, Rogue Games, and Wikipedia Semantics

  1. Plan 9 on Androidreplacing the Java stack on Android fans with Inferno. Inferno is the Plan 9 operating system originally from Bell Labs.
  2. SmartOS — Joyent-created open source operating system built for virtualization. (via Nelson Minar)
  3. libtcod — open source library for creating Rogue-like games. (via Nelson Minar)
  4. Wikipedia Miner — toolkit for working with semantics in Wikipedia pages, e.g. find the connective topics that link two chosen topics. (via Alyona Medelyan)
Four short links: 6 September 2011

Four short links: 6 September 2011

Javascript Primitives, Test Backups, Learn Triples, and Scale Javascript

  1. The Secret Life of Javascript Primitives — good writing and clever headlines can make even the dullest topic seem interesting. This is interesting, I hasten to add.
  2. Backup Bouncer — software to test how effective your backup tools are: you copy files to a test area by whatever means you like, then run this tool to see whether permissions, flags, owners, contents, timestamps, etc. are preserved. (via Joshua Schachter)
  3. reVerb — open source (GPLv3) toolkit for learning triples from text. See the paper for more details.
  4. Patterns for Large-Scale Javascript Architecture — enterprise (aka “scalable”) architectures for Javascript apps.
Four short links: 18 July 2011

Four short links: 18 July 2011

Organisational Warfare, RTFM, Timezone Shapefile, Microsoft Adventure

  1. Organisational Warfare (Simon Wardley) — notes on the commoditisation of software, with interesting analyses of the positions of some large players. On closer inspection, Salesforce seems to be doing more than just commoditisation with an ILC pattern, as can be clearly seen from Radian’s 6 acquisition. They also seem to be operating a tower and moat strategy, i.e. creating a tower of revenue (the service) around which is built a moat devoid of differential value with high barriers to entry. When their competitors finally wake up and realise that the future world of CRM is in this service space, they’ll discover a new player dominating this space who has not only removed many of the opportunities to differentiate (e.g. social CRM, mobile CRM) but built a large ecosystem that creates high rates of new innovation. This should be a fairly fatal combination.
  2. Learning to Win by Reading Manuals in a Monte-Carlo Framework (MIT) — starting with no prior knowledge of the game or its UI, the system learns how to play and to win by experimenting, and from parsed manual text. They used FreeCiv, and assessed the influence of parsing the manual shallowly and deeply. Trust MIT to turn RTFM into a paper. For human-readable explanation, see the press release.
  3. A Shapefile of the TZ Timezones of the World — I have nothing but sympathy for the poor gentleman who compiled this. Political boundaries are notoriously arbitrary, and timezones are even worse because they don’t need a war to change. (via Matt Biddulph)
  4. Microsoft Adventure — 1979 Microsoft game for the TRS-80 has fascinating threads into the past and into what would become Microsoft’s future.
Four short links: 20 June 2011

Four short links: 20 June 2011

Recording Glasses, Food Hacks, Visualizing Documents, Human Computation

  1. HD Video Recording Glasses (Kickstarter) — as Bryce says, “wearable computing is on the rise. As the price for enabling components drops, always on connectivity in our pockets and purses increases, and access to low cost manufacturing resources and know-how rises we’ll see innovation continue to push into these most personal forms of computing.” (via Bryce Roberts)
  2. Sketching in Food (Chris Heathcote) — a set of taste tests to demonstrate that we’ve been food hacking for a very long time. We started with two chemical coated strips – sodium benzoate, a preservative used in lots of food that a significant percentage of people can taste (interestingly in different ways, sweet, sour and bitter). Secondly was a chemical known as PTC that about 70% of people perceive as bitter, and a smaller number perceiving as really really horribly bitter. This was to show that taste is genetic, and different people perceive the same food differently. He includes pointers to sources for the materials in the taste test.
  3. Investigating Millions of Documents by Visualizing Clusters — recording of talk about our recent work at the AP with the Iraq and Afghanistan war logs.
  4. Managing Crowdsourced Human Computation (Slideshare) — half a six-hour tutorial at WWW2011 on crowdsourcing and human computation. See also the author’s comments. (via Matt Biddulph)
Four short links: 24 May 2011

Four short links: 24 May 2011

Kindle List, Insider Knowledge, Google News Archive Archived, and Work Week in Video

  1. Delivereads — genius idea, a mailing list for Kindles. Yes, if you can send email then you can be a Kindle publisher. (via Sacha Judd)
  2. Abnormal Returns From the Common Stock Investments of Members of the U.S. House of RepresentativesWe measure abnormal returns for more than 16,000 common stock transactions made by approximately 300 House delegates from 1985 to 2001. Consistent with the study of Senatorial trading activity, we find stocks purchased by Representatives also earn significant positive abnormal returns (albeit considerably smaller returns). A portfolio that mimics the purchases of House Members beats the market by 55 basis points per month (approximately 6% annually). (via Ellen Miller)
  3. Google News Archive Ends — hypothesizes that old material was “too hard” to make sense of, but that seems unlikely to me. More likely is that it wasn’t useful enough to their machine learning efforts. Newspapers can have their scanned/OCRed content for free now the program is being closed.
  4. Week Report 310 — BERG’s first (that I’ve seen) video report of the week, and it’s a cracker. No newsreel, just some really clever evocation of the mood of the place and the nature of the projects. I continue to be impressed by the BERG crew’s conscious creation of culture.