"machine learning" entries

Four short links: 8 April 2015

Four short links: 8 April 2015

Learning Poses, Kafkaesque Things, Hiring Research, and Robotic Movement

  1. Apple Patent on Learning-based Estimation of Hand and Finger Pose — machine learning to identify gestures (hand poses) that works even when partially occluded. See writeup in Apple Insider.
  2. The Internet of Kafkaesque Things (ACLU) — As computers are deployed in more regulatory roles, and therefore make more judgments about us, we may be afflicted with many more of the rigid, unjust rulings for which bureaucracies are so notorious.
  3. Schmidt and Hunter (1998): Validity and Utility of Selection Methods in Personnel (PDF) — On the basis of meta-analytic findings, this article examines and summarizes what 85 years of research in personnel psychology has revealed about the validity of measures of 19 different selection methods that can be used in making decisions about hiring, training, and developmental assignments. (via Wired)
  4. Complete Force Control in Constrained Under-actuated Mechanical Systems (Robohub) — Nori focuses on finding ways to advance the dynamic system of a robot – the forces that interact and make the system move. Key to developing dynamic movements in a robot is control, accompanied by the way the robot interacts with the environment. Nori talks us through the latest developments, designs, and formulas for floating-base/constrained mechanical systems, whole-body motion control of humanoid systems, whole-body dynamics computation on the iCub humanoid, and finishes with a video on recent implementations of whole-body motion control on the iCub. Video and download of presentation.
Comment
Four short links: 3 April 2015

Four short links: 3 April 2015

Augmenting Humans, Body-Powered CPUs, Predicting the Future, and Hermit Life

  1. Unpowered Ankle Exoskeleton“As we understand human biomechanics better, we’ve begun to see wearable robotic devices that can restore or enhance human motor performance,” says Collins. “This bodes well for a future with devices that are lightweight, energy-efficient, and relatively inexpensive, yet enhance human mobility.”
  2. Body-Powered Processing (Ars Technica) — The new SAM L21 32-bit ARM family of microcontroller (MCUs) consume less than 35 microamps of power per megahertz of processing speed while active, and less than 200 nanoamps of power overall when in deep sleep mode—with varying states in between. The chip is so low power that it can be powered off energy capture from the body. (via Greg Linden)
  3. Temporal Effects in Trend Prediction: Identifying the Most Popular Nodes in the Future (PLOSone) — We find that TBP have high general accuracy in predicting the future most popular nodes. More importantly, it can identify many potential objects with low popularity in the past but high popularity in the future.
  4. The Shut-In EconomyIn 1998, Carnegie Mellon researchers warned that the Internet could make us into hermits. They released a study monitoring the social behavior of 169 people making their first forays online. The Web-surfers started talking less with family and friends, and grew more isolated and depressed. “We were surprised to find that what is a social technology has such anti-social consequences,” said one of the researchers at the time. “And these are the same people who, when asked, describe the Internet as a positive thing.”
Comment
Four short links: 19 March 2015

Four short links: 19 March 2015

Changing Behaviour, Building Filters, Public Access, and Working Capital

  1. Using Monitoring Dashboards to Change Behaviour[After years of neglect] One day we wrote some brittle Ruby scripts that polled various services. They collated the metrics into a simple database and we automated some email reports and built a dashboard showing key service metrics. We pinpointed issues that we wanted to show people. Things like the login times, how long it would take to search for certain keywords in the app, and how many users were actually using the service, along with costs and other interesting facts. We sent out the link to the dashboard at 9am on Monday morning, before the weekly management call. Within 2 weeks most problems were addressed. It is very difficult to combat data, especially when it is laid out in an easy to understand way.
  2. Quiet Mitsubishi Cars — noise-cancelling on phone calls by using machine learning to build the filters.
  3. NSF Requiring Public AccessNSF will require that articles in peer-reviewed scholarly journals and papers in juried conference proceedings or transactions be deposited in a public access compliant repository and be available for download, reading, and analysis within one year of publication.
  4. Filtered for Capital (Matt Webb) — It’s important to get a credit line [for hardware startups] because growing organically isn’t possible — even if half your sell-in price is margin, you can only afford to grow your batch size at 50% per cycle… and whether it’s credit or re-investing the margin, all that growth incurs risk, because the items aren’t pre-sold. There are double binds all over the place here.
Comment
Four short links: 9 March 2015

Four short links: 9 March 2015

Shareable Audio, Designing Robot Relationships, Machine Learning for Programming, and Geospatial Databases

  1. Four Types of Audio That People Share (Nieman Lab) — Audio Explainers, Whoa! Sounds, Storytellers, and Snappy Reviews, the results of experiments with NPR stations.
  2. Designing the Human-Robot Relationship (O’Reilly) — We can use those same principles [Jakob Nielsen’s usability heuristics] and look for implications of robots serving our higher ordered needs, as we move from serving needs related to convenience or performance to actually supporting our decision making to emerging technologies, moving from being able to do anything or be magic in terms of the user interface to being more human in the user interface.
  3. Machine Learning for General Programming — Peter Norvig talk. What more do you need to know?
  4. Why Are Geospatial Databases So Hard To Build?Algorithms in computer science, with rare exception, leverage properties unique to one-dimensional scalar data models. In other words, data types you can abstractly represent as an integer. Even when scalar data types are multidimensional, they can often be mapped to one dimension. This works well, as the majority of [what] data people care about can be represented with scalar types. If your data model is inherently non-scalar, you enter an algorithm wasteland in the computer science literature.
Comment
Four short links: 5 March 2015

Four short links: 5 March 2015

Web Grain, Cognition and Computation, New Smart Watch, and Assessing Accuracy

  1. The Web’s Grain (Frank Chimero) — What would happen if we stopped treating the web like a blank canvas to paint on, and instead like a material to build with?
  2. Bruce Sterling on Convergence of Humans and MachinesI like to use the terms “cognition” and “computation”. Cognition is something that happens in brains, physical, biological brains. Computation is a thing that happens with software strings on electronic tracks that are inscribed out of silicon and put on fibre board. They are not the same thing, and saying that makes the same mistake as in earlier times, when people said that human thought was like a steam engine.
  3. Smart Pocket Watch — I love to see people trying different design experiences. This is beautiful. And built on Firefox OS!
  4. Knowledge-Based Trust (PDF) — Google research paper on how to assess factual accuracy of web page content. It was bad enough when Google incentivised people to make content-free pages. Next there’ll be a reward for scamming bogus facts into Google’s facts database.
Comment
Four short links: 2 March 2015

Four short links: 2 March 2015

Onboarding UX, Productivity Vision, Bad ML, and Lifelong Learning

  1. User Onboarding Teardowns — the UX of new users. (via Andy Baio)
  2. Microsoft’s Productivity Vision — always-on thinged-up Internet everywhere, with predictions and magic by the dozen.
  3. Machine Learning Done WrongWhen dealing with small amounts of data, it’s reasonable to try as many algorithms as possible and to pick the best one since the cost of experimentation is low. But as we hit “big data,” it pays off to analyze the data upfront and then design the modeling pipeline (pre-processing, modeling, optimization algorithm, evaluation, productionization) accordingly.
  4. Ten Simple Rules for Lifelong Learning According to Richard Hamming (PLoScompBio) — Exponential growth of the amount of knowledge is a central feature of the modern era. As Hamming points out, since the time of Isaac Newton (1642/3-1726/7), the total amount of knowledge (including but not limited to technical fields) has doubled about every 17 years. At the same time, the half-life of technical knowledge has been estimated to be about 15 years. If the total amount of knowledge available today is x, then in 15 years the total amount of knowledge can be expected to be nearly 2x, while the amount of knowledge that has become obsolete will be about 0.5x. This means that the total amount of knowledge thought to be valid has increased from x to nearly 1.5x. Taken together, this means that if your daughter or son was born when you were 34 years old, the amount of knowledge she or he will be faced with on entering university at age 17 will be more than twice the amount you faced when you started college.
Comment
Four short links: 25 February 2015

Four short links: 25 February 2015

Bricking Cars, Mapping Epigenome, Machine Learning from Encrypted Data, and Phone Privacy

  1. Remotely Bricking Cars (BoingBoing) — story from 2010 where an intruder illegally accessed Texas Auto Center’s Web-based remote vehicle immobilization system and one by one began turning off their customers’ cars throughout the city.
  2. Beginning to Map the Human Epigenome (MIT) — Kellis and his colleagues report 111 reference human epigenomes and study their regulatory circuitry, in a bid to understand their role in human traits and diseases. (The paper itself.)
  3. Machine Learning Classification over Encrypted Data (PDF) — It is worth mentioning that our work on privacy-preserving classification is complementary to work on differential privacy in the machine learning community. Our work aims to hide each user’s input data to the classification phase, whereas differential privacy seeks to construct classifiers/models from sensitive user training data that leak a bounded amount of information about each individual in the training data set. See also The Morning Paper’s unpacking of it.
  4. Privacy of Phone Audio (Reddit) — unconfirmed report from Redditor I started a new job today with Walk N’Talk Technologies. I get to listen to sound bites and rate how the text matches up with what is said in an audio clip and give feed back on what should be improved. At first, I though these sound bites were completely random. Then I began to notice a pattern. Soon, I realized that I was hearing peoples commands given to their mobile devices. Guys, I’m telling you, if you’ve said it to your phone, it’s been recorded…and there’s a damn good chance a 3rd party is going to hear it.
Comment
Four short links: 24 February 2015

Four short links: 24 February 2015

Open Data, Packet Dumping, GPU Deep Learning, and Genetic Approval

  1. Wiki New Zealand — open data site, and check out the chart builder behind the scenes for importing the data. It’s magic.
  2. stenographer (Google) — open source packet dumper for capturing data during intrusions.
  3. Which GPU for Deep Learning?a lot of numbers. Overall, I think memory size is overrated. You can nicely gain some speedups if you have very large memory, but these speedups are rather small. I would say that GPU clusters are nice to have, but that they cause more overhead than the accelerate progress; a single 12GB GPU will last you for 3-6 years; a 6GB GPU is plenty for now; a 4GB GPU is good but might be limiting on some problems; and a 3GB GPU will be fine for most research that looks into new architectures.
  4. 23andMe Wins FDA Approval for First Genetic Test — as they re-enter the market after FDA power play around approval (yes, I know: one company’s power play is another company’s flouting of safeguards designed to protect a vulnerable public).
Comment
Four short links: 20 February 2015

Four short links: 20 February 2015

Robotic Garden, Kids Toys, MSFT ML, and Twitter Scale

  1. The Distributed Robotic Garden (MIT) — We consider plants, pots, and robots to be systems with different levels of mobility, sensing, actuation, and autonomy. (via Robohub)
  2. CogniToys Leverages Watson’s Brain to Befriend, Teach Your Kids (IEEE) — Through the dino, Watson’s algorithms can get to know each child that it interacts with, tailoring those interactions to the child’s age and interests.
  3. How Machine Learning Ate Microsoft (Infoworld) — Azure ML didn’t merely take the machine learning algorithms MSR had already handed over to product teams and stick them into a drag-and-drop visual designer. Microsoft has made the functionality available to developers who know the R statistical programming language and Python, which together are widely used in academic machine learning. Microsoft plans to integrate Azure ML closely with Revolution Analytics, the R startup it recently acquired.
  4. Handling Five Billion Sessions a Day in Real Time (Twitter) — infrastructure porn.
Comments: 2

Exploring methods in active learning

Tips on how to build effective human-machine hybrids, from crowdsourcing expert Adam Marcus.

15146_ORM_Webcast_ad(archived)In a recent O’Reilly webcast, “Crowdsourcing at GoDaddy: How I Learned to Stop Worrying and Love the Crowd,” Adam Marcus explains how to mitigate common challenges of managing crowd workers, how to make the most of human-in-the-loop machine learning, and how to establish effective and mutually rewarding relationships with workers. Marcus is the director of data on the Locu team at GoDaddy, where the “Get Found” service provides businesses with a central platform for managing their online presence and content.

In the webcast, Marcus uses practical examples from his experience at GoDaddy to reveal helpful methods for how to:

  • Offset the inevitability of wrong answers from the crowd
  • Develop and train workers through a peer-review system
  • Build a hierarchy of trusted workers
  • Make crowd work inspiring and enable upward mobility

What to do when humans get it wrong

It turns out there is a simple way to offset human error: redundantly ask people the same questions. Marcus explains that when you ask five different people the same question, there are some creative ways to combine their responses, and use a majority vote. Read more…

Comment