"text analysis" entries

Four short links: 6 December 2012

Four short links: 6 December 2012

What You Do, Wordnik Branches, 5 Whys, and Hardware Hackathon

  1. You’re Saving Time — can you explain what you do, as well as this? Love the clarity of thought, as well as elegance of expression.
  2. Related Content, by Wordnik — branching out by offering a widget for websites which recommends other content on your site which is related to the current page. I’ve been keen to see what Wordnik do with their text knowledge.
  3. How to Run a 5 Whys with Humans, Not Robots (Slideshare) — gold Gold GOLD! (via Hacker News)
  4. Open Computer Project Hackathon — have never heard of a hardware hackathon before, keen to see how it works out. (via Jim Stogdill)
Four short links: 19 October 2012

Four short links: 19 October 2012

3D Printed Drones, When Pacemakers Attack, N-Gram Updated, and Deanonymizing Datasets

  1. Home-made 3D-Printed Drones — if only they used computer-vision to sequence DNA, they’d be the perfect storm of O’Reilly memes :-)
  2. Hacking Pacemakers For DeathIOActive researcher Barnaby Jack has reverse-engineered a pacemaker transmitter to make it possible to deliver deadly electric shocks to pacemakers within 30 feet and rewrite their firmware.
  3. Google N-Gram Viewer Updated — now with more books, better OCR, parts of speech, and complex queries. e.g., the declining ratio of sex to drugs. Awesome work by Friend of O’Reilly, Jon Orwant.
  4. Deanonymizing Mobility Traces: Using Social Networks as a Side-Channela set of location traces can be deanonymized given an easily obtained social network graph. […] Our experiments [on standard datasets] show that 80% of users are identified precisely, while only 8% are identified incorrectly, with the remainder mapped to a small set of users. (via Network World)
Four short links: 14 August 2012

Four short links: 14 August 2012

Search Fail, Recruiter Data, Ed Web, and Enterprise IT Yuks

  1. WTF — when keyword matching fails.
  2. The Best Recruiters, Pt II (Elaine Wherry) — almost all these tips are relevant to the cold-call “hey, you don’t know me but …” email messages you’ll have to send at some point in your life. Read, learn, obey.
  3. Best Websites for Teaching And Learning — as decided by the American Association of School Librarians. Lots of these I didn’t know existed but can see being used in class, e.g. Gamestar Mechanic which walks kids through the process of creating a game, teaching them how to think about games even as they produce one.
  4. Enterprise IT Adoption Curve — so very very true.
Four short links: 24 May 2012

Four short links: 24 May 2012

Maker Tribe, Concept Mapping, Magic Wand, and Site Performance Matters

  1. Last Saturday My Son Found His People at the Maker Faire — aww to the power of INFINITY.
  2. Dictionaries Linking Words to Concepts (Google Research) — Wikipedia entries for concepts, text strings from searches and the oppressed workers down the Text Mines, and a count indicating how often the two were related.
  3. Magic Wand (Kickstarter) — I don’t want the game, I want a Bluetooth magic wand. I don’t want to click the OK button, I want to wave a wand and make it so! (via Pete Warden)
  4. E-Commerce Performance (Luke Wroblewski) — If a page load takes more than two seconds, 40% are likely to abandon that site. This is why you should follow Steve Souders like a hawk: if your site is slower than it could be, you’re leaving money on the table.
Four short links: 19 April 2012

Four short links: 19 April 2012

Text Similarity, Designing Engagement, Clustering Stories, and Prince of Persia

  1. Superfastmatch — open source text comparison tool, used to locate plagiarism/churnalism in online news sites. You can pull out the text engine and use it for your own “find where this text is used elsewhere” applications (e.g., what’s being forwarded out in email, how much of this RFP is copy and paste, what’s NOT boilerplate in this contract, etc.). (via Pete Warden)
  2. Ten Design Principles for Engaging Math Tasks (Dan Meyer) — education gold, engagement gold, and some serious ideas you can use in your own apps.
  3. Clustering Related Stories (Jenny Finkel) — description of how to cluster related stories, talks about some of the tricks. Interesting without being too scary.
  4. Prince of Persia (GitHub) — I have waited to see if the novelty wore off, but I still find this cool: 1980s source code on GitHub.

Visualization of the Week: Anachronistic language in “Mad Men”

A look at the historical accuracy of "Mad Men's" dialogue.

"Mad Men" is praised for its precise attention to historical visuals, but how does its dialogue stack up against text from the 1960s? Ben Schmidt's new visualization explores that question.

Visualization of the Week: Anachronistic language in "Mad Men"

A look at the historical accuracy of "Mad Men's" dialogue.

"Mad Men" is praised for its precise attention to historical visuals, but how does its dialogue stack up against text from the 1960s? Ben Schmidt's new visualization explores that question.

Four short links: 23 March 2012

Four short links: 23 March 2012

Caching Pages, Node NLP, Digital Native are Clueless, and Wal-Mart Loves Your Calendar

  1. Cache Them If You Can (Steve Souders) — the percentage of resources that are cacheable has increased 4% during the past year. Over that same time the number of requests per page has increased 12% and total transfer size has increased 24%.
  2. Natural — MIT-licensed general natural language facility for nodejs. Tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflection are currently supported. (via Javascript Weekly)
  3. How Millennials SearchStatistically significant findings suggest that millennial generation Web searchers proceed erratically through an information search process, make only a limited attempt to evaluate the quality or validity of information gathered, and may perform some level of ‘backfilling’ or adding sources to a research project before final submission of the work. Never let old people tell you that “digital natives” actually know what they’re doing.
  4. Walmart Buys A Facebook App for Calendar Access (Ars Technica) — The Social Calendar app and its file of 110 million birthdays and other events, acquired from Newput Corp., will give Walmart the ability to expand its efforts to dig deeper into the lives of customers. Interesting to think that by buying a well-loved app, a company could get access to your Facebook details whether you Like them or not.
Four short links: 16 February 2012

Four short links: 16 February 2012

Wikipedia Fail, DIY Text Adventures, Antisocial Software, and Formats Matter

  1. The Undue Weight of Truth (Chronicle of Higher Education) — Wikipedia has become fossilized fiction because the mechanism of self-improvement is broken.
  2. Playfic — Andy Baio’s new site that lets you write text adventures in the browser. Great introduction to programming for language-loving kids and adults.
  3. Review of Alone Together (Chris McDowall) — I loved this review, its sentiments, and its presentation. Work on stuff that matters.
  4. Why ESRI As-Is Can’t Be Part of the Open Government Movement — data formats without broad support in open source tools are an unnecessary barrier to entry. You’re effectively letting the vendor charge for your data, which is just stupid.
Four short links: 8 February 2012

Four short links: 8 February 2012

Text Mining, Unstoppable Sociality, Unicode Fun, and Scholarly Publishing

  1. Mavunoan open source, modular, scalable text mining toolkit built upon Hadoop. (Apache-licensed)
  2. Cow Clicker — Wired profile of Cowclicker creator Ian Bogost. I was impressed by Cow Clickers […] have turned what was intended to be a vapid experience into a source of camaraderie and creativity. People create communities around social activities, even when they are antisocial. (via BoingBoing)
  3. Unicode Has a Pile of Poo Character (BoingBoing) — this is perfect.
  4. The Research Works Act and the Breakdown of Mutual Incomprehension (Cameron Neylon) — an excellent summary of how researchers and publishers view each other and their place in the world.