"speech recognition" entries

Four short links: 16 February 2016

Four short links: 16 February 2016

Full-on Maker, Robot Recap, Decoding Mandarin, and Sequencing Birds

  1. Washers and Screws (YouTube) — this chap is making his own clock from scratch, and here he is making his own washers and screws. Sometimes another person’s obsession can be calming. (via Greg Sadetsky)
  2. ROScon 2015 Recap with Videos (Robohub) — Shuttleworth suggests that robotics developers really need two things at this point: a robust Internet of Things infrastructure, followed by the addition of dynamic mobility that robots represent. However, software is a much more realistic business proposition for a robotics startup, especially if you leverage open source to create a developer community around your product and let others innovate through what you’ve built.
  3. Getting Deep Speech to Work in Mandarin (Baidu SVAIL) — TIL that some of the preprocessing traditionally used in speech-to-text systems throws away pitch information necessary to decode tonal languages like Mandarin. Deep Speech doesn’t use specialized features like MFCCs. We train directly from the spectrogram of the input audio signal. The spectrogram is a fairly general representation of an audio signal. The neural network is able to learn directly which information is relevant from the input, so we didn’t need to change anything about the features to move from English speech recognition to Mandarin speech recognition. Their model works better than humans at decoding short text such as queries.
  4. Sequencing Genomes of All Known Kakapo — TIL there’s a project to sequence genomes of 10,000 bird species and that there’s this crowdfunded science project to sequence the kakapo genome. There are only 125 left, and conservationists expect to use the sequenced genomes to ensure rare genes are preserved. Every genome in this species could be sequenced … I’m boggling. (via Duke)
Four short links: 3 September 2015

Four short links: 3 September 2015

Lock Patterns, Peer-to-Peer Markets, Community Products, and Speech Recognition

  1. The Surprising Predictability of Android Lock Patterns (Ars Technica) — people use the same type of strategy for remembering a pattern as a password
  2. Peer to Peer Markets (PDF) — We discuss elements of market design that make this possible, including search and matching algorithms, pricing, and reputation systems. We then develop a simple model of how these markets enable entry by small or flexible suppliers, and the resulting impact on existing firms. Finally, we consider the regulation of peer-to-peer markets, and the economic arguments for different approaches to licensing and certification, data, and employment regulation.
  3. 16 Product Things I learned at ImgurYou can A/B test individuals, but it’s nearly impossible to A/B test communities because they work based on a mutually reinforcing self-conception. Use a combination of intuition (which comes from experience), talking to other community managers and 1:1 contact with a sample of your community. But you’ll still be wrong a lot.
  4. kaldia toolkit for speech recognition written in C++ and licensed under the Apache License v2.0
Four short links: 10 February 2015

Four short links: 10 February 2015

Speech Recognition, Predictive Analytic Queries, Video Chat, and Javascript UI Library

  1. The Uncanny Valley of Speech Recognition (Zach Holman) — I’m reminded of driving up US-280 in 2003 or so with @raelity, a Kiwi and a South African trying every permutation of American accent from Kentucky to Yosemite Sam in order to get TellMe to stop giving us the weather for zipcode 10000. It didn’t recognise the swearing either. (Caution: features similarly strong language.)
  2. TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries (PDF) — an integrated PAQ [Predictive Analytic Queries] planning architecture that combines advanced model search techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching. The resulting system, TUPAQ, solves the PAQ planning problem with comparable accuracy to exhaustive strategies but an order of magnitude faster, and can scale to models trained on terabytes of data across hundreds of machines.
  3. p2pvc — point-to-point video chat. In an 80×25 terminal window.
  4. Sortable — nifty UI library.

In the future we'll be talking, not typing

Stephan Spencer on how autonomous intelligence and language processing will transform search.

Stephan Spencer, co-author of "The Art of SEO," says searching the Internet of the future will be like talking to a human being.

Big Data shakes up the Speech Industry

I spent a few hours at the Mobile Voice conference and left with an appreciation of Google's impact on the speech industry. Google's speech offerings loomed over the few sessions I attended. Some of that was probably due to Michael Cohen's keynote1 describing Google's philosophy and approach, but clearly Google has the attention of all the speech vendors. Tim's recent…

The State of the Internet Operating System

Ask yourself for a moment, what is the operating system of a Google or Bing search? What is the operating system of a mobile phone call? What is the operating system of maps and directions on your phone? What is the operating system of a tweet? I’ve been talking for years about “the internet operating system“, but I realized I’ve never written an extended post to define what I think it is, where it is going, and the choices we face. This is that missing post.