"machine learning" entries

Four short links: 2 March 2015

Four short links: 2 March 2015

Onboarding UX, Productivity Vision, Bad ML, and Lifelong Learning

  1. User Onboarding Teardowns — the UX of new users. (via Andy Baio)
  2. Microsoft’s Productivity Vision — always-on thinged-up Internet everywhere, with predictions and magic by the dozen.
  3. Machine Learning Done WrongWhen dealing with small amounts of data, it’s reasonable to try as many algorithms as possible and to pick the best one since the cost of experimentation is low. But as we hit “big data,” it pays off to analyze the data upfront and then design the modeling pipeline (pre-processing, modeling, optimization algorithm, evaluation, productionization) accordingly.
  4. Ten Simple Rules for Lifelong Learning According to Richard Hamming (PLoScompBio) — Exponential growth of the amount of knowledge is a central feature of the modern era. As Hamming points out, since the time of Isaac Newton (1642/3-1726/7), the total amount of knowledge (including but not limited to technical fields) has doubled about every 17 years. At the same time, the half-life of technical knowledge has been estimated to be about 15 years. If the total amount of knowledge available today is x, then in 15 years the total amount of knowledge can be expected to be nearly 2x, while the amount of knowledge that has become obsolete will be about 0.5x. This means that the total amount of knowledge thought to be valid has increased from x to nearly 1.5x. Taken together, this means that if your daughter or son was born when you were 34 years old, the amount of knowledge she or he will be faced with on entering university at age 17 will be more than twice the amount you faced when you started college.
Comment
Four short links: 25 February 2015

Four short links: 25 February 2015

Bricking Cars, Mapping Epigenome, Machine Learning from Encrypted Data, and Phone Privacy

  1. Remotely Bricking Cars (BoingBoing) — story from 2010 where an intruder illegally accessed Texas Auto Center’s Web-based remote vehicle immobilization system and one by one began turning off their customers’ cars throughout the city.
  2. Beginning to Map the Human Epigenome (MIT) — Kellis and his colleagues report 111 reference human epigenomes and study their regulatory circuitry, in a bid to understand their role in human traits and diseases. (The paper itself.)
  3. Machine Learning Classification over Encrypted Data (PDF) — It is worth mentioning that our work on privacy-preserving classification is complementary to work on differential privacy in the machine learning community. Our work aims to hide each user’s input data to the classification phase, whereas differential privacy seeks to construct classifiers/models from sensitive user training data that leak a bounded amount of information about each individual in the training data set. See also The Morning Paper’s unpacking of it.
  4. Privacy of Phone Audio (Reddit) — unconfirmed report from Redditor I started a new job today with Walk N’Talk Technologies. I get to listen to sound bites and rate how the text matches up with what is said in an audio clip and give feed back on what should be improved. At first, I though these sound bites were completely random. Then I began to notice a pattern. Soon, I realized that I was hearing peoples commands given to their mobile devices. Guys, I’m telling you, if you’ve said it to your phone, it’s been recorded…and there’s a damn good chance a 3rd party is going to hear it.
Comment
Four short links: 24 February 2015

Four short links: 24 February 2015

Open Data, Packet Dumping, GPU Deep Learning, and Genetic Approval

  1. Wiki New Zealand — open data site, and check out the chart builder behind the scenes for importing the data. It’s magic.
  2. stenographer (Google) — open source packet dumper for capturing data during intrusions.
  3. Which GPU for Deep Learning?a lot of numbers. Overall, I think memory size is overrated. You can nicely gain some speedups if you have very large memory, but these speedups are rather small. I would say that GPU clusters are nice to have, but that they cause more overhead than the accelerate progress; a single 12GB GPU will last you for 3-6 years; a 6GB GPU is plenty for now; a 4GB GPU is good but might be limiting on some problems; and a 3GB GPU will be fine for most research that looks into new architectures.
  4. 23andMe Wins FDA Approval for First Genetic Test — as they re-enter the market after FDA power play around approval (yes, I know: one company’s power play is another company’s flouting of safeguards designed to protect a vulnerable public).
Comment
Four short links: 20 February 2015

Four short links: 20 February 2015

Robotic Garden, Kids Toys, MSFT ML, and Twitter Scale

  1. The Distributed Robotic Garden (MIT) — We consider plants, pots, and robots to be systems with different levels of mobility, sensing, actuation, and autonomy. (via Robohub)
  2. CogniToys Leverages Watson’s Brain to Befriend, Teach Your Kids (IEEE) — Through the dino, Watson’s algorithms can get to know each child that it interacts with, tailoring those interactions to the child’s age and interests.
  3. How Machine Learning Ate Microsoft (Infoworld) — Azure ML didn’t merely take the machine learning algorithms MSR had already handed over to product teams and stick them into a drag-and-drop visual designer. Microsoft has made the functionality available to developers who know the R statistical programming language and Python, which together are widely used in academic machine learning. Microsoft plans to integrate Azure ML closely with Revolution Analytics, the R startup it recently acquired.
  4. Handling Five Billion Sessions a Day in Real Time (Twitter) — infrastructure porn.
Comments: 2

Exploring methods in active learning

Tips on how to build effective human-machine hybrids, from crowdsourcing expert Adam Marcus.

15146_ORM_Webcast_ad(archived)In a recent O’Reilly webcast, “Crowdsourcing at GoDaddy: How I Learned to Stop Worrying and Love the Crowd,” Adam Marcus explains how to mitigate common challenges of managing crowd workers, how to make the most of human-in-the-loop machine learning, and how to establish effective and mutually rewarding relationships with workers. Marcus is the director of data on the Locu team at GoDaddy, where the “Get Found” service provides businesses with a central platform for managing their online presence and content.

In the webcast, Marcus uses practical examples from his experience at GoDaddy to reveal helpful methods for how to:

  • Offset the inevitability of wrong answers from the crowd
  • Develop and train workers through a peer-review system
  • Build a hierarchy of trusted workers
  • Make crowd work inspiring and enable upward mobility

What to do when humans get it wrong

It turns out there is a simple way to offset human error: redundantly ask people the same questions. Marcus explains that when you ask five different people the same question, there are some creative ways to combine their responses, and use a majority vote. Read more…

Comment
Four short links: 10 February 2015

Four short links: 10 February 2015

Speech Recognition, Predictive Analytic Queries, Video Chat, and Javascript UI Library

  1. The Uncanny Valley of Speech Recognition (Zach Holman) — I’m reminded of driving up US-280 in 2003 or so with @raelity, a Kiwi and a South African trying every permutation of American accent from Kentucky to Yosemite Sam in order to get TellMe to stop giving us the weather for zipcode 10000. It didn’t recognise the swearing either. (Caution: features similarly strong language.)
  2. TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries (PDF) — an integrated PAQ [Predictive Analytic Queries] planning architecture that combines advanced model search techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching. The resulting system, TUPAQ, solves the PAQ planning problem with comparable accuracy to exhaustive strategies but an order of magnitude faster, and can scale to models trained on terabytes of data across hundreds of machines.
  3. p2pvc — point-to-point video chat. In an 80×25 terminal window.
  4. Sortable — nifty UI library.
Comment
Four short links: 6 February 2015

Four short links: 6 February 2015

Active Learning, Tongue Sensors, Cybernetic Management, and HTML5 Game Publishing

  1. Real World Active Learningthe point at which algorithms fail is precisely where there’s an opportunity to insert human judgment to actively improve the algorithm’s performance. An O’Reilly report with CrowdFlower.
  2. Hearing With Your Tongue (BoingBoing) — The tongue contains thousands of nerves, and the region of the brain that interprets touch sensations from the tongue is capable of decoding complicated information. “What we are trying to do is another form of sensory substitution,” Williams said.
  3. The Art of Management — cybernetics and management.
  4. kiwi.jsa mobile & desktop browser based HTML5 game framework. It uses CocoonJS for publishing to the AppStore.
Comment: 1

Human-in-the-loop machine learning

Practical machine-learning applications and strategies from experts in active learning.

What do you call a practice that most data scientists have heard of, few have tried, and even fewer know how to do well? It turns out, no one is quite certain what to call it. In our latest free report Real-World Active Learning: Applications and Strategies for Human-in-the-Loop Machine Learning, we examine the relatively new field of “active learning” — also referred to as “human computation,” “human-machine hybrid systems,” and “human-in-the-loop machine learning.” Whatever you call it, the field is exploding with practical applications that are proving the efficiency of combining human and machine intelligence.

Learn from the experts

Through in-depth interviews with experts in the field of active learning and crowdsource management, industry analyst Ted Cuzzillo reveals top tips and strategies for using short-term human intervention to actively improve machine models. As you’ll discover, the point at which a machine model fails is precisely where there’s an opportunity to insert — and benefit from — human judgment.

Find out:

  • When active learning works best
  • How to manage crowdsource contributors (including expert-level contributors)
  • Basic principles of labeling data
  • Best practice methods for assessing labels
  • When to skip the crowd and mine your own data

Explore real-world examples

This report gives you a behind-the-scenes look at how human-in-the-loop machine learning has helped improve the accuracy of Google Maps, match business listings at GoDaddy, rank top search results at Yahoo!, refer relevant job postings to people on LinkedIn, identify expert-level contributors using the Quizz recruitment method, and recommend women’s clothing based on customer and product data at Stitch Fix. Read more…

Comment: 1
Four short links: 2 February 2015

Four short links: 2 February 2015

Weather Forecasting, Better Topic Modelling, Cyberdefense, and Facebook Warriors

  1. Global Forecast System — National Weather Service open sources its weather forecasting software. Hope you have a supercomputer and all the data to make use of it …
  2. High-reproducibility and high-accuracy method for automated topic classificationLatent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.
  3. Army Open Sources Cyberdefense Codegit push is the new “for immediate release”.
  4. British Army Creates Team of Facebook Warriors (The Guardian) — no matter how much I know the arguments for it, it still feels vile.
Comment: 1
Four short links: 19 January 2015

Four short links: 19 January 2015

Going Offline, AI Ethics, Human Risks, and Deep Learning

  1. Reset (Rowan Simpson) — It was a bit chilling to go back over a whole years worth of tweets and discover how many of them were just junk. Visiting the water cooler is fine, but somebody who spends all day there has no right to talk of being full.
  2. Google’s AI Brain — on the subject of Google’s AI ethics committee … Q: Will you eventually release the names? A: Potentially. That’s something also to be discussed. Q: Transparency is important in this too. A: Sure, sure. Such reassuring.
  3. AVA is now Open Source (Laura Bell) — Assessment, Visualization and Analysis of human organisational information security risk. AVA maps the realities of your organisation, its structures and behaviors. This map of people and interconnected entities can then be tested using a unique suite of customisable, on-demand, and scheduled information security awareness tests.
  4. Deep Learning for Torch (Facebook) — Facebook AI Research open sources faster deep learning modules for Torch, a scientific computing framework with wide support for machine learning algorithms.
Comment