ENTRIES TAGGED "text analysis"

Four short links: 16 February 2012

Four short links: 16 February 2012

Wikipedia Fail, DIY Text Adventures, Antisocial Software, and Formats Matter

  1. The Undue Weight of Truth (Chronicle of Higher Education) — Wikipedia has become fossilized fiction because the mechanism of self-improvement is broken.
  2. Playfic — Andy Baio’s new site that lets you write text adventures in the browser. Great introduction to programming for language-loving kids and adults.
  3. Review of Alone Together (Chris McDowall) — I loved this review, its sentiments, and its presentation. Work on stuff that matters.
  4. Why ESRI As-Is Can’t Be Part of the Open Government Movement — data formats without broad support in open source tools are an unnecessary barrier to entry. You’re effectively letting the vendor charge for your data, which is just stupid.
Comment |
Four short links: 8 February 2012

Four short links: 8 February 2012

Text Mining, Unstoppable Sociality, Unicode Fun, and Scholarly Publishing

  1. Mavunoan open source, modular, scalable text mining toolkit built upon Hadoop. (Apache-licensed)
  2. Cow Clicker — Wired profile of Cowclicker creator Ian Bogost. I was impressed by Cow Clickers [...] have turned what was intended to be a vapid experience into a source of camaraderie and creativity. People create communities around social activities, even when they are antisocial. (via BoingBoing)
  3. Unicode Has a Pile of Poo Character (BoingBoing) — this is perfect.
  4. The Research Works Act and the Breakdown of Mutual Incomprehension (Cameron Neylon) — an excellent summary of how researchers and publishers view each other and their place in the world.
Comment |
Four short links: 13 January 2012

Four short links: 13 January 2012

Internet in Culture, Flash Security Tool, Haptic E-Books, and Facebook Mining Private Updates

  1. How The Internet Gets Inside Us (The New Yorker) — at any given moment, our most complicated machine will be taken as a model of human intelligence, and whatever media kids favor will be identified as the cause of our stupidity. When there were automatic looms, the mind was like an automatic loom; and, since young people in the loom period liked novels, it was the cheap novel that was degrading our minds. When there were telephone exchanges, the mind was like a telephone exchange, and, in the same period, since the nickelodeon reigned, moving pictures were making us dumb. When mainframe computers arrived and television was what kids liked, the mind was like a mainframe and television was the engine of our idiocy. Some machine is always showing us Mind; some entertainment derived from the machine is always showing us Non-Mind. (via Tom Armitage)
  2. SWFScan — Windows-only Flash decompiler to find hardcoded credentials, keys, and URLs. (via Mauricio Freitas)
  3. Paranga — haptic interface for flipping through an ebook. (via Ben Bashford)
  4. Facebook Gives Politico Deep Access to Users Political Sentiments (All Things D) — Facebook will analyse all public and private updates that mention candidates and an exclusive partner will “use” the results. Remember, if you’re not paying for it then you’re the product and not the customer.
Comment: 1 |
Four short links: 12 January 2012

Four short links: 12 January 2012

Smart Meter Snitches, Company Culture, Text Classification, and Live Face Substitution

  1. Smart Hacking for Privacy — can mine smart power meter data (or even snoop it) to learn what’s on the TV. Wow. (You can also watch the talk). (via Rob Inskeep)
  2. Conditioning Company Culture (Bryce Roberts) — a short read but thought-provoking. It’s easy to create mindless mantras, but I’ve seen the technique that Bryce describes and (when done well) it’s highly effective.
  3. hydrat (Google Code) — a declarative framework for text classification tasks.
  4. Dynamic Face Substitution (FlowingData) — Kyle McDonald and Arturo Castro play around with a face tracker and color interpolation to replace their own faces, in real-time, with celebrities such as that of Brad Pitt and Paris Hilton. Awesome. And creepy. Amen.
Comment: 1 |
The hidden language and "wonderful experience" of product reviews

The hidden language and "wonderful experience" of product reviews

Panagiotis Ipeirotis on the phrases and formatting of effective product reviews.

How much is an Amazon review — good or bad — worth? Computer scientist and NYU professor Panagiotis Ipeirotis analyzed the text in thousands of Amazon reviews to find out.

Read Full Post | Comments: 3 |
Four short links: 9 January 2012

Four short links: 9 January 2012

Apple Factories, Open Source Spy Drones, Mail Files, and Text Topic Extraction

  1. Mr Daisey and the Apple Factor (This American Life) — episode looking at the claims of human rights problems in Apple’s Chinese factories.
  2. OpenPilot — open source UAVs with cameras. Yes, a DIY spy drone on autopilot. (via Jim Stogdill)
  3. mbox — more technical information than you ever thought you’d need, to be saved for the time when you have to parse mailbox files. It’s a nightmare. (via Hacker News)
  4. Maui (Google Code) — Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles. GPLv3.
Comment: 1 |
Four short links: 28 December 2011

Four short links: 28 December 2011

Text Search, Cloud Filesystem, Javascript Parser, and Twitter Templates

  1. Terrier IR — open source (Mozilla) text search engine, now with Hadoop support.
  2. s3ql — open source (GPLv3) Linux filesystem which stores its data on Google Storage, Amazon S3, or OpenStack. (via Adam Shand)
  3. Esprima — open source (BSD) fast Javascript parser in Javascript. (via Javascript Weekly)
  4. Hogan.js — open source (Apache) Javascript templating engine from Twitter. If it proves anywhere near as good as Bootstrap, it’ll be heavily used.
Comment |
Four short links: 26 December 2011

Four short links: 26 December 2011

Text Analysis Bundle, Scala Probabilistic Modeling, Game Analytics, and Encouraging Writing

  1. Pattern — a BSD-licensed bundle of Python tools for data retrieval, text analysis, and data visualization. If you were going to get started with accessible data (Twitter, Google), the fundamentals of analysis (entity extraction, clustering), and some basic visualizations of graph relationships, you could do a lot worse than to start here.
  2. Factorie (Google Code) — Apache-licensed Scala library for a probabilistic modeling technique successfully applied to [...] named entity recognition, entity resolution, relation extraction, parsing, schema matching, ontology alignment, latent-variable generative models, including latent Dirichlet allocation. The state-of-the-art big data analysis tools are increasingly open source, presumably because the value lies in their application not in their existence. This is good news for everyone with a new application.
  3. Playtomic — analytics as a service for gaming companies to learn what players actually do in their games. There aren’t many fields untouched by analytics.
  4. Write or Die — iPad app for writers where, if you don’t keep writing, it begins to delete what you wrote earlier. Good for production to deadlines; reflective editing and deep thought not included.
Comment |
Four short links: 22 December 2011

Four short links: 22 December 2011

Fuzzy Text, Big Data Crime, Map Visualization, and Attacking Server-Side Javascript

  1. Fuzzy String Matching in Python (Streamhacker) — useful if you’re to have a hope against the swelling dark forces powered by illiteracy and touchscreen keyboards.
  2. The Business of Illegal Data (Strata Conference) — fascinating presentation on criminal use of big data. “The more data you produce, the happier criminals are to receive and use it. Big data is big business for organized crime, which represents 15% of GDP.”
  3. Isarithmic Maps — an alternative to chloropleths for geodata visualization.
  4. Server-Side Javascript Injection (PDF) — a Blackhat talk about exploiting backend vulnerabilities with techniques learned from attacking Javascript frontends. Both this paper and the accompanying talk will discuss security vulnerabilities that can arise when software developers create applications or modules for use with JavaScript-based server applications such as NoSQL database engines or Node.js web servers. In the worst-case scenario, an attacker can exploit these vulnerabilities to upload and execute arbitrary binary files on the server machine, effectively granting him full control over the server.
Comment |
Four short links: 9 December 2011

Four short links: 9 December 2011

Designing Ubicomp, Online Community, Design Examples, and Ranking Discussions

  1. Critically Making the Internet of Things (Anne Galloway) — session notes from a conference, see also part two. Good thoughts, hastily captured. For example, this from Bruce Sterling: RFID + Superglue + Object ≠ IoT and the talk I want to see: “A study of how broken, hacked and malfunctioning digital road signs subvert the physical space of roadways.”
  2. Conquering the CHAOS of Online Community at StackExchange — StackExchange is doing some thoughtful work analysing conversations and channeling dissent into a healthy construction to guide future productive discussion. “We taught the users that it was alright to disagree, and gave them a set of arguments they could reference without every thread degenerating into a fight.”
  3. Little Big Details — one small detail done right, every day.
  4. Ranking Live Streams of Data (LinkedIn) — behind the “interesting discussions” report.
Comment |