Pete Warden

Pete is the CTO of Jetpac Inc, a start-up focused on analyzing billions of public photos. He's been a recipient of an NSF grant for his computer vision work, worked on image processing at Apple for five years, and has published a number of popular open source data analysis projects and O'Reilly books. He is writing a book on deep learning and blogs at https://petewarden.com. He is @petewarden on Twitter.

How to build and run your first deep learning network

Step-by-step instruction on training your own neural network.

NeuralTree

When I first became interested in using deep learning for computer vision I found it hard to get started. There were only a couple of open source projects available, they had little documentation, were very experimental, and relied on a lot of tricky-to-install dependencies. A lot of new projects have appeared since, but they’re still aimed at vision researchers, so you’ll still hit a lot of the same obstacles if you’re approaching them from outside the field.

In this article — and the accompanying webcast — I’m going to show you how to run a pre-built network, and then take you through the steps of training your own. I’ve listed the steps I followed to set up everything toward the end of the article, but because the process is so involved, I recommend you download a Vagrant virtual machine that I’ve pre-loaded with everything you need. This VM lets us skip over all the installation headaches and focus on building and running the neural networks. Read more…

Comments: 4

What is deep learning, and why should you care?

Announcing a new series delving into deep learning and the inner workings of neural networks.

OrganicNeuron

Editor’s note: this post is part of our Intelligence Matters investigation.

When I first ran across the results in the Kaggle image-recognition competitions, I didn’t believe them. I’ve spent years working with machine vision, and the reported accuracy on tricky tasks like distinguishing dogs from cats was beyond anything I’d seen, or imagined I’d see anytime soon. To understand more, I reached out to one of the competitors, Daniel Nouri, and he demonstrated how he used the Decaf open-source project to do so well. Even better, he showed me how he was quickly able to apply it to a whole bunch of other image-recognition problems we had at Jetpac, and produce much better results than my conventional methods.

I’ve never encountered such a big improvement from a technique that was largely unheard of just a couple of years before, so I became obsessed with understanding more. To be able to use it commercially across hundreds of millions of photos, I built my own specialized library to efficiently run prediction on clusters of low-end machines and embedded devices, and I also spent months learning the dark arts of training neural networks. Now I’m keen to share some of what I’ve found, so if you’re curious about what on earth deep learning is, and how it might help you, I’ll be covering the basics in a series of blog posts here on Radar, and in a short upcoming ebook. Read more…

Comments: 4

Why deep belief matters so much

We’re entering a new world where computers can see.

If you’re a programmer who reads the Internet, you’ll have heard of deep belief networks. Google loves them, Facebook just hired one of the pioneers to lead a new group, and they win Kaggle competitions. I’ve been using deep belief networks extensively for image recognition at Jetpac across hundreds of millions of Instagram photos, and the reports are true: the results really are great.

If you’re like me, you probably want to see it for yourself, to understand by experimenting. There are several great open-source solutions out there, but they’re largely aimed at academics and back-end engineers, with a lot of dependencies and a steep learning curve. The current state of software makes it look like the technology is destined to remain in data centers running on high-end machines.

The good news is that the code for running deep belief doesn’t have to be complex, and by expanding beyond data centers, the networks are going to add some spookily effective AI capabilities to all sorts of devices. You can check out a demo of a full deep belief network running in Javascript and WebGL in real time on Jetpac’s website, and we have a tiny footprint C version optimized for mobile ARM devices that runs the same full 60 million connection network on everything from smart phones to Raspberry Pis, completing in under 300ms on an iPhone 5S. Read more…

Comments: 2

How to analyze 100 million images for $624

There's a lot of new ground to be explored in large-scale image processing.

Jetpac is building a modern version of Yelp, using big data rather than user reviews. People are taking more than a billion photos every single day, and many of these are shared publicly on social networks. We analyze these pictures to discover what they can tell us about bars, restaurants, hotels, and other venues around the world — spotting hipster favorites by the number of mustaches, for example.

Treating large numbers of photos as data, rather than just content to display to the user, is a pretty new idea. Traditionally it’s been prohibitively expensive to store and process image data, and not many developers are familiar with both modern big data techniques and computer vision. That meant we had to cut a path through some thick underbrush to get a system working, but the good news is that the free-falling price of commodity servers makes running it incredibly cheap. Read more…

Comments: 3
How to create a visualization

How to create a visualization

Pete Warden walks through the steps behind his latest Facebook visualization.

Creating a visualization requires more than just data and imagery. Pete Warden outlines the process and actions that drove his new Facebook visualization project.

Comments: 2
3 ideas you should steal from HubSpot

3 ideas you should steal from HubSpot

HubSpot has found the sweet spot between data, education and customer loyalty.

HubSpot's location (near Boston) and its target market (small businesses) may keep it under the radar of Silicon Valley, but the company's approach to data products and customer empowerment are worthy of attention.

Comments: 2
Lessons of the Victorian data revolution

Lessons of the Victorian data revolution

Transaction costs, crowdsourcing, and the persuasiveness of data were all in play long ago.

Examples from the Victorian era show that if we're going to improve the world with data, it's absolutely essential we stay grounded in reality.

Comments: 8

Why you can't really anonymize your data

It's time to accept and work within the limits of data anonymization.

Because we now have so much data at our disposal, any dataset with a decent amount of information can be matched against identifiable public records. To keep datasets available, we must acknowledge that foolproof anonymization is an illusion.

Comments: 9

Why you can’t really anonymize your data

It's time to accept and work within the limits of data anonymization.

Because we now have so much data at our disposal, any dataset with a decent amount of information can be matched against identifiable public records. To keep datasets available, we must acknowledge that foolproof anonymization is an illusion.

Comments: 9

Why the term "data science" is flawed but useful

Counterpoints to four common data science criticisms.

While formal boundaries and professional criteria for "data science" remain undefined, here's why we should keep using the term.

Comments: 9