"NLP" entries

Four short links: 9 September 2014

Four short links: 9 September 2014

Go Text, Science Consensus, Broadcast Fallacy, and In-Browser Swift

  1. bleveA modern text indexing library for go.
  2. Scientific Consensus Has A Bad Reputation—And Doesn’t Deserve It (Ars Technica) — a lovely explanation of how informal consensus works in science. NB for anyone building social software which attempts to formalise and automate consensus.
  3. TiVo Mega — 24TB of RAID storage, six tuners for capturing broadcasts. Which is rather like building the International Space Station and then hitching it to six horses for launch. Who at this point would make a $5k bet that everything you want to see on a TV will be broadcast by a cable company?
  4. runswift — an in-browser client for compiling and running basic Swift functionality.
Four short links: 2 September 2014

Four short links: 2 September 2014

Antilogs, Waitbots, Interactive Interactive Visualisation Book, IoTbot

  1. AntilogsThere are companies before you who have done something like you want to do that you can copy from, and others who have also done something similar, but that you choose not to copy from. These are your analogs and antilogs respectively.
  2. Korean Meal-Transport Robot (RoboHub) — the hyphen is important. It transports all meals, not just Korean ones. Interesting not only grammatically, but for the gradual arrival of the service robot.
  3. Interactive Data Visualisation for the Web — online (interactive, even) version of the O’Reilly book by Scott Murray.
  4. wit.aiNatural language processing for the Internet of Things. Startup, racing to build strategic value beyond “have brought voice recognition to irc bots and aimed it at Internet of Things investors.”
Four short links: 6 December 2012

Four short links: 6 December 2012

What You Do, Wordnik Branches, 5 Whys, and Hardware Hackathon

  1. You’re Saving Time — can you explain what you do, as well as this? Love the clarity of thought, as well as elegance of expression.
  2. Related Content, by Wordnik — branching out by offering a widget for websites which recommends other content on your site which is related to the current page. I’ve been keen to see what Wordnik do with their text knowledge.
  3. How to Run a 5 Whys with Humans, Not Robots (Slideshare) — gold Gold GOLD! (via Hacker News)
  4. Open Computer Project Hackathon — have never heard of a hardware hackathon before, keen to see how it works out. (via Jim Stogdill)
Four short links: 19 October 2012

Four short links: 19 October 2012

3D Printed Drones, When Pacemakers Attack, N-Gram Updated, and Deanonymizing Datasets

  1. Home-made 3D-Printed Drones — if only they used computer-vision to sequence DNA, they’d be the perfect storm of O’Reilly memes :-)
  2. Hacking Pacemakers For DeathIOActive researcher Barnaby Jack has reverse-engineered a pacemaker transmitter to make it possible to deliver deadly electric shocks to pacemakers within 30 feet and rewrite their firmware.
  3. Google N-Gram Viewer Updated — now with more books, better OCR, parts of speech, and complex queries. e.g., the declining ratio of sex to drugs. Awesome work by Friend of O’Reilly, Jon Orwant.
  4. Deanonymizing Mobility Traces: Using Social Networks as a Side-Channela set of location traces can be deanonymized given an easily obtained social network graph. […] Our experiments [on standard datasets] show that 80% of users are identified precisely, while only 8% are identified incorrectly, with the remainder mapped to a small set of users. (via Network World)
Four short links: 14 August 2012

Four short links: 14 August 2012

Search Fail, Recruiter Data, Ed Web, and Enterprise IT Yuks

  1. WTF — when keyword matching fails.
  2. The Best Recruiters, Pt II (Elaine Wherry) — almost all these tips are relevant to the cold-call “hey, you don’t know me but …” email messages you’ll have to send at some point in your life. Read, learn, obey.
  3. Best Websites for Teaching And Learning — as decided by the American Association of School Librarians. Lots of these I didn’t know existed but can see being used in class, e.g. Gamestar Mechanic which walks kids through the process of creating a game, teaching them how to think about games even as they produce one.
  4. Enterprise IT Adoption Curve — so very very true.
Four short links: 13 August 2012

Four short links: 13 August 2012

Mobile Money, Quantified Server, Mobile Chatbot, and YouTube's Content Detection

  1. Mobile Numbers (Luke Wroblewski) — eBay’s mobile shoppers and mobile payers are 3 to 4 times more valuable than Web only […] Yelp runs ads on the mobile web, and those ads see a higher clickthrough rate than their desktop counterparts.
  2. Data-Driven Restaurants (Washingtonian) — Did Elizabeth bring your Pinot Gris within three minutes of the time you ordered it? Were your appetizers delivered within seven minutes, entrées within ten, desserts within seven? Were these plates described at the table before they were set in front of you? Were napkins refolded when you went to the restroom? Was non-bottled water referred to as “ice water” (correct) or “water” (incorrect)? (via Daniel Bachhuber)
  3. Rei Toei (Jesse Vincent) — Writing a plugin to give Rei a new superpower is a few lines of JavaScript. Very early stage project, but one to watch. Siri + ircbots + NLP = awesome. (Open source on GitHub)
  4. Content Detection Fail (Ars Technica) — five other media organizations (mostly television stations, including some from overseas) had claimed the content of his video through YouTube’s Content ID system. That video? A Google+ hangout where he played NASA videos of the Mars landing. Shonky rights verification is a problem, as Google pays ad royalties to those who claim the rights–creating incentives to lie. And as Google doesn’t pay any royalties while material is disputed and the dispute is unresolved, it’s not really in Google’s interest to make this work either. (via Andy Baio)
Four short links: 23 March 2012

Four short links: 23 March 2012

Caching Pages, Node NLP, Digital Native are Clueless, and Wal-Mart Loves Your Calendar

  1. Cache Them If You Can (Steve Souders) — the percentage of resources that are cacheable has increased 4% during the past year. Over that same time the number of requests per page has increased 12% and total transfer size has increased 24%.
  2. Natural — MIT-licensed general natural language facility for nodejs. Tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflection are currently supported. (via Javascript Weekly)
  3. How Millennials SearchStatistically significant findings suggest that millennial generation Web searchers proceed erratically through an information search process, make only a limited attempt to evaluate the quality or validity of information gathered, and may perform some level of ‘backfilling’ or adding sources to a research project before final submission of the work. Never let old people tell you that “digital natives” actually know what they’re doing.
  4. Walmart Buys A Facebook App for Calendar Access (Ars Technica) — The Social Calendar app and its file of 110 million birthdays and other events, acquired from Newput Corp., will give Walmart the ability to expand its efforts to dig deeper into the lives of customers. Interesting to think that by buying a well-loved app, a company could get access to your Facebook details whether you Like them or not.
Four short links: 6 September 2011

Four short links: 6 September 2011

Javascript Primitives, Test Backups, Learn Triples, and Scale Javascript

  1. The Secret Life of Javascript Primitives — good writing and clever headlines can make even the dullest topic seem interesting. This is interesting, I hasten to add.
  2. Backup Bouncer — software to test how effective your backup tools are: you copy files to a test area by whatever means you like, then run this tool to see whether permissions, flags, owners, contents, timestamps, etc. are preserved. (via Joshua Schachter)
  3. reVerb — open source (GPLv3) toolkit for learning triples from text. See the paper for more details.
  4. Patterns for Large-Scale Javascript Architecture — enterprise (aka “scalable”) architectures for Javascript apps.
Four short links: 21 March 2011

Four short links: 21 March 2011

Javascript Master Class, Stats for Pythonistas, CAM Floor, and HTML Extraction

  1. Javascript Trie Performance Analysis (John Resig) — if you program in Javascript and you’re not up to John’s skill level (*cough*) then you should read this and follow along. It’s a ride-along in the brain of a master.
  2. Think Stats — an introduction to statistics for Python programmers. (via Edd Dumbill)
  3. Bolefloor — they build curvy wooden floors. Instead of straightening naturally curvy wood (which is wasteful), they use CV and CAD/CAM to figure the smallest cuts to slot strips of wood together. It’s gorgeous, green, and geeky. (via BoingBoing)
  4. Extracting Article Text from HTML Documents — everyone’s doing it, now you know how. It’s the theory behind the lovingly hand-crafted magic of readability. (via Hacker News)
Four short link: 4 January 2011

Four short link: 4 January 2011

100 Trends, Mobile to Web, Geometry Fun, and C# NLP Tools

  1. 100 Things to Watch in 2011 — people who consider tech trends without considering social trends are betting on the atom bomb without considering the Summer of Love. (via Fred Wilson)
  2. Mobile Economics will Trend Towards Web Economics (Fred Wilson) — A central issue with the Internet, no matter what device and presentation layer you use to access it, is that there is an unlimited amount of content available. Evan Williams calls it “a web of infinite information” in this chat with Om Malik. What is valuable is filtering and curation. Restricting access to content doesn’t work. Someone else’s content will get filtered and curated instead of yours. Scarcity is not a viable business model on the Internet.
  3. Magic Tilegeometric and topological analogues of Rubik’s Cube. Mindblowing fun with math.
  4. SharpNLP — open source C# NLP tools.