How to analyze 100 million images for $624

There's a lot of new ground to be explored in large-scale image processing.

Jetpac is building a modern version of Yelp, using big data rather than user reviews. People are taking more than a billion photos every single day, and many of these are shared publicly on social networks. We analyze these pictures to discover what they can tell us about bars, restaurants, hotels, and other venues around the world —…
Four short links: 18 June 2013

Backbone Stack, Automating Card Games, Ozzie on PRISM, and Stuff that Matters

  1. Our Backbone Stack (Pamela Fox) — fascinating glimpse into the tech used and why.
  2. Automating Card Games Using OpenCV and PythonMy vision for an automated version of the game was simple. Players sit across a table on which the cards are laid out. My program would take a picture of the cards and recognize them. It would then generate valid expression that yielded 24, and then project the answer on to the table.
  3. Ray Ozzie on PRISM — posted on Hacker News (!). In particular, in this world where “SaaS” and “software eats everything” and “cloud computing” and “big data” are inevitable and already pervasive, it pains me to see how 3rd Party Doctrine may now already be being leveraged to effectively gut the intent of U.S. citizens’ Fourth Amendment rights. Don’t we need a common-sense refresh to the wording of our laws and potentially our constitution as it pertains to how we now rely upon 3rd parties? It makes zero sense in a “services age” where granting third parties limited rights to our private information is so basic and fundamental to how we think, work, conduct and enjoy life. (via Alex Dong)
  4. Larry Brilliant’s Commencement Speech (HufPo) — speaking to med grads, he’s full of purpose and vision and meaning for their lives. His story is amazing. I wish more CS grads were inspired to work on stuff that matters, and cautioned about adding their great minds to the legion trying to solve the problem of connecting you with brands you love.
Four short links: 2 May 2011

Internet Cafe Culture, Image Processing, Library Mining, and MediaWiki Parsing

  1. Chinese Internet Cafes (Bryce Roberts) — a good quick read. My note: people valued the same things in Internet cafes that they value in public libraries, and the uses are very similar. They pose a similar threat to the already-successful, which is why public libraries are threatened in many Western countries.
  2. SIFT — the Scale Invariant Feature Transform library, built on OpenCV, is a method to detect distinctive, invariant image feature points, which easily can be matched between images to perform tasks such as object detection and recognition, or to compute geometrical transformations between images. The licensing seems dodgy–MIT code but lots of “this isn’t a license to use the patent!” warnings in the LICENSE file. (via Joshua Schachter)
  3. The Secret Life of Libraries (Guardian) — I like the idea of the most-stolen-books revealing something about a region; it’s an aspect of data revealing truth. For a while, Terry Pratchett was the most-shoplifted author in England but newspapers rarely carried articles about him or mentioned his books (because they were genre fiction not “real” literature). (via Brian Flaherty)
  4. Sweble — MediaWiki parser library. Until today, Wikitext had been poorly defined. There was no grammar, no defined processing rules, and no defined output like a DOM tree based on a well defined document object model. This is to say, the content of Wikipedia is stored in a format that is not an open standard. The format is defined by 5000 lines of php code (the parse function of MediaWiki). That code may be open source, but it is incomprehensible to most. That’s why there are 30+ failed attempts at writing alternative parsers. (via Dirk Riehle)
