"machine learning" entries

Four short links: 20 May 2014

Four short links: 20 May 2014

Machine Learning, Deep Learning, Sewing Machines & 3D Printers, and Smart Spoons

  1. Basics of Machine Learning Course Notes — slides and audio from university course. Watch along on YouTube.
  2. A Primer on Deep Learning — a very quick catch-up on WTF this is all about.
  3. 3D Printers Have a Lot to Learn from Sewing MachinesSewing does not create more waste but, potentially, less, and the process of sewing is filled with opportunities for increasing one’s skills and doing it over as well as doing it yourself. What are quilts, after all, but a clever way to use every last scrap of precious fabric? (via Jenn Webb)
  4. Liftware — Parkinson’s-correcting spoons.
Comment
Four short links: 14 May 2014

Four short links: 14 May 2014

Problem Solving, Fashion Mining, Surprising Recommendations, and Migrating Engines

  1. Data Jujitsu — new O’Reilly Radar report by the wonderful DJ Patil about the exploration and problem solving part of data science. Me gusta.
  2. Style Finder: Fine-Grained Clothing Style Recognition and Retrieval (PDF) — eBay labs machine learning, featuring the wonderful phrase “Women’s Fashion: Coat dataset”.
  3. Amazon’s Drug Dealer Shopping List — reinforcing recommendations surface unexpected patterns …
  4. Migrating Virtual Machines from Amazon EC2 to Google Compute Engine — if you want the big players fighting for your business, you should ensure you have portability.
Comment: 1
Four short links: 13 May 2014

Four short links: 13 May 2014

Reverse Engineering, Incident Response, 3D Museum, and Social Prediction

  1. Reverse Engineering for Beginners (GitHub) — from assembly language through stack overflows, dongles, and more.
  2. Incident Response at Heroku — the difference between good and bad shops is that good shops have a routine for exceptions.
  3. 3D Petrie MuseumThe Petrie Museum of Egyptian Archaeology has one of the largest ancient Egyptian and Sudanese collections in the world and they’ve put 3D models of their goods online. Not (yet) available for download, only viewing which seem a bug.
  4. Sandy Pentland on Wearables (The Verge) — Pentland was also Nathan Eagle’s graduate advisor, and behind the Reality Mining work at MIT. Check out his sociometer: One study revealed that the sociometer helps discern when someone is bluffing at poker roughly 70 percent of the time; another found that a wearer can determine who will win a negotiation within the first five minutes with 87 percent accuracy; yet another concluded that one can accurately predict the success of a speed date before the participants do.
Comment
Four short links: 9 May 2014

Four short links: 9 May 2014

Hardening Android, Samsung Connivery, Scalable WebSockets, and Hardware Machine Learning

  1. Hardening Android for Security and Privacy — a brilliant project! prototype of a secure, full-featured, Android telecommunications device with full Tor support, individual application firewalling, true cell network baseband isolation, and optional ZRTP encrypted voice and video support. ZRTP does run over UDP which is not yet possible to send over Tor, but we are able to send SIP account login and call setup over Tor independently.
  2. The Great Smartphone War (Vanity Fair) — “I represented [the Swedish telecommunications company] Ericsson, and they couldn’t lie if their lives depended on it, and I represented Samsung and they couldn’t tell the truth if their lives depended on it.” That’s the most catching quote, but interesting to see Samsung’s patent strategy described as copying others, delaying the lawsuits, settling before judgement, and in the meanwhile ramping up their own innovation. Perhaps the other glory part is the description of Samsung employee shredding and eating incriminating documents while stalling lawyers out front. An excellent read.
  3. socketclusterhighly scalable realtime WebSockets based on Engine.io. They have screenshots of 100k messages/second on an 8-core EC2 m3.2xlarge instance.
  4. Machine Learning on a Board — everything good becomes hardware, whether in GPUs or specialist CPUs. This one has a “Machine Learning Co-Processor”. Interesting idea, to package up inputs and outputs with specialist CPU, but I wonder whether it’s a solution in search of a problem. (via Pete Warden)
Comment: 1
Four short links: 9 April 2014

Four short links: 9 April 2014

Internet of Listeners, Mobile Deep Belief, Crowdsourced Spectrum Data, and Quantum Minecraft

  1. Jasper Projectan open source platform for developing always-on, voice-controlled applications. Shouting is the new swiping—I eagerly await Gartner touting the Internet-of-things-that-misunderstand-you.
  2. DeepBeliefSDK — deep neural network library for iOS. (via Pete Warden)
  3. Microsoft Spectrum Observatory — crowdsourcing spectrum utilisation information. Just open sourced their code.
  4. qcraft — beginner’s guide to quantum physics in Minecraft. (via Nelson Minar)
Comment
Four short links: 24 March 2014

Four short links: 24 March 2014

Google Flu, Embeddable JS, Data Analysis, and Belief in the Browser

  1. The Parable of Google Flu (PDF) — We explore two
    issues that contributed to [Google Flu Trends]’s mistakes—big data hubris and algorithm dynamics—and offer lessons for moving forward in the big data age.
    Overtrained and underfed?
  2. Duktape — a lightweight embeddable Javascript engine. Because an app without an API is like a lightbulb without an IP address: retro but not cool.
  3. Principles of Good Data Analysis (Greg Reda) — Once you’ve settled on your approach and data sources, you need to make sure you understand how the data was generated or captured, especially if you are using your own company’s data. Treble so if you are using data you snaffled off the net, riddled with collection bias and untold omissions. (via Stijn Debrouwere)
  4. Deep Belief Networks in Javascript — just object recognition in the browser. The code relies on GPU shaders to perform calculations on over 60 million neural connections in real time. From the ever-more-awesome Pete Warden.
Comment

Crowdsourcing Feature discovery

More than algorithms, companies gain access to models that incorporate ideas generated by teams of data scientists

Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons he started CrowdFlower was that as a data scientist he got frustrated with having to create training sets for many of the problems he faced. More recently, companies have been experimenting with active learning (humans1 take care of uncertain cases, models handle the routine ones). Along those lines, Adam Marcus described in detail how Locu uses Crowdsourcing services to perform structured extraction (converting semi/unstructured data into structured data).

Another area where crowdsourcing is popping up is feature engineering and feature discovery. Experienced data scientists will attest that generating features is as (if not more) important than choice of algorithm. Startup CrowdAnalytix uses public/open data sets to help companies enhance their analytic models. The company has access to several thousand data scientists spread across 50 countries and counts a major social network among its customers. Its current focus is on providing “enterprise risk quantification services to Fortune 1000 companies”.

CrowdAnalytix breaks up projects in two phases: feature engineering and modeling. During the feature engineering phase, data scientists are presented with a problem (independent variable(s)) and are asked to propose features (predictors) and brief explanations for why they might prove useful. A panel of judges evaluate2 features based on the accompanying evidence and explanations. Typically 100+ teams enter this phase of the project, and 30+ teams propose reasonable features.

Read more…

Comments: 4
Four short links: 14 March 2014

Four short links: 14 March 2014

Facebook Criticism, New Games, Face Recognition, and Public Uber

  1. The Facebook experiment has failed. Let’s go backFacebook gets worse the more you use it. The innovation within Facebook happens within a framework that’s taken as given. This essay questions that frame, well.
  2. Meet the People Making New Games for Old Hardware“We’re all fighting for the same goal,” Cobb says. “There’s something artistic, and disciplined, about creating games for machines with limited hardware. You can’t pass off bloat as content, and you can’t drop in a licensed album in place of a hand-crafted digital soundtrack. To make something great you have to work hard, and straight from the heart. That’s what a lot of gamers still wish to see. And we’re happy to provide it for them.”
  3. DeepFace: Closing the Gap to Human-Level Performance in Face Verification — Facebook research into using deep neural networks for face recognition. Our method reaches an accuracy of 97.25% on the Labeled Faces in the Wild (LFW) dataset, reducing the error of the current state of the art by more than 25%, closely approaching human-level performance. “The best minds of my generation are thinking about how to make people click ads.” —Jeff Hammerbacher.
  4. Helsinki Does Uber for BusesHelsinki’s Kutsuplus lets you select your pick-up and drop-off locations and times, using a phone app, and then sends out a bus to take you exactly where you need to go.
Comment

An Invitation to Practical Machine Learning

PracticalMachineLearning_covDoes it make sense for me to have a car? If so, which one is the best choice for my needs: a gasoline, hybrid, or electric?  And should I buy or lease?

In order to make an effective decision, I need to understand key issues about the design, performance, and cost of cars, regardless of whether or not I actually know how to build one myself. The same is true for people deciding if machine learning is a good choice for their business goals or project.  Will the payoff be worth the effort?  What machine learning approach is most likely to produce valuable results for your particular situation? What size team with what expertise is necessary to be able to develop, deploy, and maintain your machine learning system?

Given the complex and previously esoteric nature of machine learning as a field – the sometimes daunting array of learning algorithms and the math needed to understand and employ them – many people feel the topic is one best left only to the few.

Read more…

Comments: 2

Bridging the gap between research and implementation

Hardcore Data Science speakers provided many practical suggestions and tips

One of the most popular offerings at Strata Santa Clara was Hardcore Data Science day. Over the next few weeks we hope to profile some of the speakers who presented, and make the video of the talks available as a bundle. In the meantime here are some notes and highlights from a day packed with great talks.

Data Structures
We’ve come to think of analytics as being comprised primarily of data and algorithms. Once data has been collected, “wrangled”, and stored, algorithms are unleashed to unlock its value. Longtime machine-learning researcher Alice Zheng of GraphLab, reminded attendees that data structures are critical to scaling machine-learning algorithms. Unfortunately there is a disconnect between machine-learning research and implementation (so much so, that some recent advances in large-scale ML are “rediscoveries” of known data structures):

Data and Algorithms: The Disconnect

While there are many data structures that arise in computer science, Alice devoted her talk to two data structures1 that are widely used in machine-learning:

Read more…

Comments: 2