"NLP" entries

Four short links: 22 April 2016

Unicorn Hazards Ahead, Brainprinting for Identity, Generating News Headlines, and Anthropic Capitalism

by Nat Torkington | @gnat | +Nat Torkington | April 22, 2016

Why The Unicorn Financing Market Just Became Dangerous to Everyone — read with Fortune’s take on the Tech IPO Market. “They profess to take a long-term view, but the data shows post-IPO stocks are very volatile in the case of tech IPOs, and that is not a problem the underwriters try to address.” Damning breakdown of the current state. As Bryce said, Single-horned, majestic, Weapons of Mass Extraction.
Brainprints (Kurzweil) — 50 subjects, 500 images, EEG headset, 100% accuracy identifying person from their brain’s response to the images. We’ll need much larger studies, but this is promising.
Generating News Headlines with Recurrent Neural Networks — We find that the model is quite effective at concisely paraphrasing news articles.
Anthropic Capitalism And The New Gimmick Economy — market capitalism struggles with “public goods” (those which are inexhaustible and non-excludable, like infinitely copyable bits that any number of people can have copies of at once), yet much of the world is being recast as an activity where software manipulates information, thus becoming a public good. Capitalism and Communism, which briefly resembled victor and vanquished, increasingly look more like Thelma and Louise; a tragic couple sent over the edge by forces beyond their control. What comes next is anyone’s guess and the world hangs in the balance.

(more…)

Four short links: 30 March 2016

Deep Babbage, Supervisors in Go, Brittle Code, and Quantum NLP

by Nat Torkington | @gnat | +Nat Torkington | March 30, 2016

Deep Learning for Analytical Engine — This repository contains an implementation of a convolutional neural network as a program for Charles Babbage’s Analytical Engine, capable of recognizing handwritten digits to a high degree of accuracy (98.5% if provided with a sufficient amount of training data and left running sufficiently long).
Supervisor Trees in Go — A well-structured Erlang program is broken into multiple independent pieces that communicate via messages, and when a piece crashes, the supervisor of that piece automatically restarts it. […] Even as I have been writing suture, I have on occasion been astonished to flip my screen over to the console of Go program I’ve written with suture, and been surprised to discover that it’s actually been merrily crashing away during my manual testing, but soldiering on so well I didn’t even know.
How to Avoid Brittle Code — If it hurts, do it more often.
Developing Quantum Annealer Driven Data Discovery (Joseph Dulny III, Michael Kim) — In this paper, we gain novel insights into the application of quantum annealing (QA) to machine learning (ML) through experiments in natural language processing (NLP), seizure prediction, and linear separability testing.

Four short links: 28 March 2016

Holoportation, Filter Your Bot, Curriculum for the Future, and Randomized Control Trials for Policy

by Nat Torkington | @gnat | +Nat Torkington | March 28, 2016

Holoportation (YouTube) — video of teleconferencing with the Hololens. I hope my avatar wears more pants than I do.
Wordfilter — package to filter out slurs and the kinds of things you don’t want your bot saying on Twitter. (via How Not to Make a Racist Bot)
Curriculum For the Future (iTunes) — in game form, you get to figure out how to sell your preferred curriculum (“maker!”) to the parents and politicians who care about different things. Similar game mechanic to Win the White House from Sandra Day O’Connor’s iCivics.
Test, Learn, Adapt: Developing Public Policy with Randomized Controlled Trials (PDF) — 2012 paper from the UK Cabinet Office talking about running real randomized control trials of policy. (I’d like to be part of one that looks at better health care!)

Four short links: 21 March 2016

Legacy Tech, Gender Prediction, Text Generation, and Human Performance

by Nat Torkington | @gnat | +Nat Torkington | March 21, 2016

Ten More Years! — my brand spanking new chip card from a UK issuer not only arrived with a 2000s app of a 1990s implementation of a 1980s product (debit) on 1970s chip, it also came with a 1960s magnetic stripe on it and a 1950s PAN with a 1940s signature panel on the back. It’s no wonder it seems a little out of place in the modern world.
Age and Gender Classification Using Convolutional Neural Nets — oh, this will end well.
The Uncanny Valley of Words (Ross Goodwin) — lessons learned from an NYU ITP neural networker making poetry and surprises from text.
The Paradox of Human Performance (YouTube) — Human dexterity and agility vastly exceed that of contemporary robots. Yet, humans have vastly slower “hardware” (e.g. muscles) and “wetware” (e.g. neurons). How can this paradox be resolved? Slow actuators and long communication delays require predictive control based on some form of internal model—but what form? (via Robohub)

Four short links: 17 March 2016

Boozy Tweets, Quantum Games, Viva Vectors, and Being Fired

by Nat Torkington | @gnat | +Nat Torkington | March 17, 2016

Algorithm Identifies Tweets Sent Under the Influence of Alcohol (MIT TR) — notable for how they determined whether a Tweet was sent from home. They made a list of phrases like “home at last!” and had MTurkers confirm the Tweets were about being home, then used those as training data for an algorithm to identify other Tweets talking about home.
Puzzle Game to Help Program Quantum Computers (New Scientist) — Devitt has turned the problem of programming a quantum computer into a game called meQuanics. His team has developed a prototype to test the game, which you can play now, and today launched a Kickstarter campaign to fund a fully fledged version for iOS and Android phones.
Deep or Shallow, NLP is Breaking Out (ACM) — readable roundup of how NLP changed in the last five years, with a useful list for further reading and watching.
Firing and Being Fired (Zach Holman) — advice for the fired, the firing, and the coworkers. All solid.

Four short links: 10 March 2016

Cognitivist and Behaviourist AI, Math and Social Computing, A/B Testing Stats, and Rat Cyborgs are Smarter

by Nat Torkington | @gnat | +Nat Torkington | March 10, 2016

Crossword-Solving Neural Networks — Hill describes recent progress in learning-based AI systems in terms of behaviourism and cognitivism: two movements in psychology that effect how one views learning and education. Behaviourism, as the name implies, looks at behaviour without looking at what the brain and neurons are doing, while cognitivism looks at the mental processes that underlie behaviour. Deep learning systems like the one built by Hill and his colleagues reflect a cognitivist approach, but for a system to have something approaching human intelligence, it would have to have a little of both. “Our system can’t go too far beyond the dictionary data on which it was trained, but the ways in which it can are interesting, and make it a surprisingly robust question and answer system – and quite good at solving crossword puzzles,” said Hill. While it was not built with the purpose of solving crossword puzzles, the researchers found that it actually performed better than commercially-available products that are specifically engineered for the task.
Mathematical Foundations for Social Computing (PDF) — collection of pointers to existing research in social computing and some open challenges for work to be done. Consider situations where a highly structured decision must be made. Some examples are making budgets, assigning water resources, and setting tax rates. […] One promising candidate is “Knapsack Voting.” […] This captures most budgeting processes — the set of chosen budget items must fit under a spending limit, while maximizing societal value. Goel et al. prove that asking users to compare projects in terms of “value for money” or asking them to choose an entire budget results in provably better properties than using the more traditional approaches of approval or rank-choice voting.
Power, Minimal Detectable Effect, and Bucket Size Estimation in A/B Tests (Twitter) — This post describes how Twitter’s A/B testing framework, DDG, addresses one of the most common questions we hear from experimenters, product managers, and engineers: how many users do we need to sample in order to run an informative experiment?
Intelligence-Augmented Rat Cyborgs in Maze Solving (PLoS) — We compare the performance of maze solving by computer, by individual rats, and by computer-aided rats (i.e. rat cyborgs). They were asked to find their way from a constant entrance to a constant exit in 14 diverse mazes. Performance of maze solving was measured by steps, coverage rates, and time spent. The experimental results with six rats and their intelligence-augmented rat cyborgs show that rat cyborgs have the best performance in escaping from mazes. These results provide a proof-of-principle demonstration for cyborg intelligence. In addition, our novel cyborg intelligent system (rat cyborg) has great potential in various applications, such as search and rescue in complex terrains.

Four short links: 7 March 2016

Trajectory Data Mining, Manipulating Search Rankings, Open Source Data Exploration, and a Linter for Prose.

by Nat Torkington | @gnat | +Nat Torkington | March 7, 2016

Trajectory Data Mining: An Overview (Paper a Day) — This is the data created by a moving object, as a sequence of locations, often with uncertainty around the exact location at each point. This could be GPS trajectories created by people or vehicles, spatial trajectories obtained via cell phone tower IDs and corresponding transmission times, the moving trajectories of animals (e.g. birds) fitted with trackers, or even data concerning natural phenomena such as hurricanes and ocean currents. It turns out, there’s a lot to learn about working with such data!
Search Engine Manipulation Effect (PNAS) — Internet search rankings have a significant impact on consumer choices, mainly because users trust and choose higher-ranked results more than lower-ranked results. Given the apparent power of search rankings, we asked whether they could be manipulated to alter the preferences of undecided voters in democratic elections. They could. Read the article for their methodology. (via Aeon)
Keshif — open source interactive data explorer.
proselint — analyse text for sins of usage and abusage.

Four short links: 24 February 2016

UX Metrics, Page Scraping, IoT Pain, and NLP + Deep Learning

by Nat Torkington | @gnat | +Nat Torkington | February 24, 2016

Critical Metric: Critical Responses (Steve Souders) — new UX-focused metrics […] Start Render and Speed Index.
Automatically Scrape and Import a Table in Google Spreadsheets (Zach Klein) — =ImportHtml("URL", "table", num) where “table” is the element name (“table” or a list tag), and num is the number of the element in case there are multiple on the page. Bam!
Getting Visibility on the iBeacon Problem (Brooklyn Museum) — the Internet of Things is great, but I wouldn’t want to have to update its firmware. As we started to troubleshoot beacon issues, we wanted a clean slate. This meant updating the firmware on all the beacons, checking the battery life, and turning off the advanced power settings that Estimote provides. This was a painstakingly manual process where I’d have to go and update each unit one-by-one. In some cases, I’d use Estimote’s cloud tool to pre-select certain actions, but I’d still have to walk to each unit to execute the changes and use of the tool hardly made things faster. Perhaps when every inch of the world is filled with sensors, Google Street View cars will also beam out firmware updates.
NLP Meets Deep Learning — easy to follow slide deck talking about how deep learning is tackling NLP problems.

Four short links: 30 December 2015

Bitcoin Patents, Wall-Climbing Robot, English 2 Code, and Decoding USB

by Nat Torkington | @gnat | +Nat Torkington | December 30, 2015

Bank of America Loading up on Bitcoin Patents — The wide-ranging patents cover everything from a “cryptocurrency transaction payment system” which would let users make transactions using cryptocurrency, to risk detection, storing cryptocurrencies offline, and using the blockchain to measure fraudulent activity.
Vertigo: A Wall-Climbing Robot (Disney Research) — watch the video. YOW! (via David Pescovitz)
Synthesizing What I Mean — In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries.
serialusb — this is how you decode USB protocols.

Four short links: 29 October 2014

Tweet Parsing, Focus and Money, Challenging Open Data Beliefs, and Exploring ISP Data

by Nat Torkington | @gnat | +Nat Torkington | October 29, 2014

TweetNLP — CMU open source natural language parsing tools for making sense of Tweets.
Interview with Google X Life Science’s Head (Medium) — I will have been here two years this March. In nineteen months we have been able to hire more than a hundred scientists to work on this. We’ve been able to build customized labs and get the equipment to make nanoparticles and decorate them and functionalize them. We’ve been able to strike up collaborations with MIT and Stanford and Duke. We’ve been able to initiate protocols and partnerships with companies like Novartis. We’ve been able to initiate trials like the baseline trial. This would be a good decade somewhere else. The power of focus and money.
Schooloscope Open Data Post-Mortem — The case of Schooloscope and the wider question of public access to school data challenges the belief that sunlight is the best disinfectant, that government transparency would always lead to better government, better results. It challenges the sentiments that see data as value-neutral and its representation as devoid of politics. In fact, access to school data exposes a sharp contrast between the private interest of the family (best education for my child) and the public interest of the government (best education for all citizens).
M-Lab Observatory — explorable data on the data experience (RTT, upload speed, etc) across different ISPs in different geographies over time.