ENTRIES TAGGED "machine learning"

Four short links: 10 June 2014

Four short links: 10 June 2014

Trusting Code, Deep Pi, Docker DevOps, and Secure Database

  1. Trusting Browser Code (Tim Bray) — on the fundamental weakness of the ‘net as manifest in the browser.
  2. Deep Learning in the Raspberry Pi (Pete Warden) — $30 now gets you a computer you can run deep learning algorithms on. Awesome.
  3. Announcing Docker Hub and Official Repositories — as Docker went 1.0 and people rave about how they use it, comes this. They’re thinking hard about “integrating into the build ship run loop”, which aligns well with DevOps-enabling tool use.
  4. Apple’s Secure Database for Users (Ian Waring) — excellent breakdown of how Apple have gone out of their way to make their cloud database product safe and robust. They may be slow to “the cloud” but they have decades of experience having users as customers instead of products.
Comment
Four short links: 26 May 2014

Four short links: 26 May 2014

Statistical Sensitivity, Scientific Mining, Data Mining Books, and Two-Sided Smartphones

  1. Car Alarms and Smoke Alarms (Slideshare) — how to think about and draw the line between sensitivity and specificity.
  2. 101 Uses for Content Mining — between the list in the post and the comments from readers, it’s a good introduction to some of the value to be obtained from full-text structured and unstructured access to scientific research publications.
  3. 12 Free-as-in-beer Data Mining Books — for your next flight.
  4. Dual-Touch Smartphone Concept — brilliant design sketches for interactivity using the back of the phone as a touch-sensitive input device.
Comment
Four short links: 20 May 2014

Four short links: 20 May 2014

Machine Learning, Deep Learning, Sewing Machines & 3D Printers, and Smart Spoons

  1. Basics of Machine Learning Course Notes — slides and audio from university course. Watch along on YouTube.
  2. A Primer on Deep Learning — a very quick catch-up on WTF this is all about.
  3. 3D Printers Have a Lot to Learn from Sewing MachinesSewing does not create more waste but, potentially, less, and the process of sewing is filled with opportunities for increasing one’s skills and doing it over as well as doing it yourself. What are quilts, after all, but a clever way to use every last scrap of precious fabric? (via Jenn Webb)
  4. Liftware — Parkinson’s-correcting spoons.
Comment
Four short links: 14 May 2014

Four short links: 14 May 2014

Problem Solving, Fashion Mining, Surprising Recommendations, and Migrating Engines

  1. Data Jujitsu — new O’Reilly Radar report by the wonderful DJ Patil about the exploration and problem solving part of data science. Me gusta.
  2. Style Finder: Fine-Grained Clothing Style Recognition and Retrieval (PDF) — eBay labs machine learning, featuring the wonderful phrase “Women’s Fashion: Coat dataset”.
  3. Amazon’s Drug Dealer Shopping List — reinforcing recommendations surface unexpected patterns …
  4. Migrating Virtual Machines from Amazon EC2 to Google Compute Engine — if you want the big players fighting for your business, you should ensure you have portability.
Comment: 1
Four short links: 13 May 2014

Four short links: 13 May 2014

Reverse Engineering, Incident Response, 3D Museum, and Social Prediction

  1. Reverse Engineering for Beginners (GitHub) — from assembly language through stack overflows, dongles, and more.
  2. Incident Response at Heroku — the difference between good and bad shops is that good shops have a routine for exceptions.
  3. 3D Petrie MuseumThe Petrie Museum of Egyptian Archaeology has one of the largest ancient Egyptian and Sudanese collections in the world and they’ve put 3D models of their goods online. Not (yet) available for download, only viewing which seem a bug.
  4. Sandy Pentland on Wearables (The Verge) — Pentland was also Nathan Eagle’s graduate advisor, and behind the Reality Mining work at MIT. Check out his sociometer: One study revealed that the sociometer helps discern when someone is bluffing at poker roughly 70 percent of the time; another found that a wearer can determine who will win a negotiation within the first five minutes with 87 percent accuracy; yet another concluded that one can accurately predict the success of a speed date before the participants do.
Comment
Four short links: 9 May 2014

Four short links: 9 May 2014

Hardening Android, Samsung Connivery, Scalable WebSockets, and Hardware Machine Learning

  1. Hardening Android for Security and Privacy — a brilliant project! prototype of a secure, full-featured, Android telecommunications device with full Tor support, individual application firewalling, true cell network baseband isolation, and optional ZRTP encrypted voice and video support. ZRTP does run over UDP which is not yet possible to send over Tor, but we are able to send SIP account login and call setup over Tor independently.
  2. The Great Smartphone War (Vanity Fair) — “I represented [the Swedish telecommunications company] Ericsson, and they couldn’t lie if their lives depended on it, and I represented Samsung and they couldn’t tell the truth if their lives depended on it.” That’s the most catching quote, but interesting to see Samsung’s patent strategy described as copying others, delaying the lawsuits, settling before judgement, and in the meanwhile ramping up their own innovation. Perhaps the other glory part is the description of Samsung employee shredding and eating incriminating documents while stalling lawyers out front. An excellent read.
  3. socketclusterhighly scalable realtime WebSockets based on Engine.io. They have screenshots of 100k messages/second on an 8-core EC2 m3.2xlarge instance.
  4. Machine Learning on a Board — everything good becomes hardware, whether in GPUs or specialist CPUs. This one has a “Machine Learning Co-Processor”. Interesting idea, to package up inputs and outputs with specialist CPU, but I wonder whether it’s a solution in search of a problem. (via Pete Warden)
Comment: 1
Four short links: 9 April 2014

Four short links: 9 April 2014

Internet of Listeners, Mobile Deep Belief, Crowdsourced Spectrum Data, and Quantum Minecraft

  1. Jasper Projectan open source platform for developing always-on, voice-controlled applications. Shouting is the new swiping—I eagerly await Gartner touting the Internet-of-things-that-misunderstand-you.
  2. DeepBeliefSDK — deep neural network library for iOS. (via Pete Warden)
  3. Microsoft Spectrum Observatory — crowdsourcing spectrum utilisation information. Just open sourced their code.
  4. qcraft — beginner’s guide to quantum physics in Minecraft. (via Nelson Minar)
Comment
Four short links: 24 March 2014

Four short links: 24 March 2014

Google Flu, Embeddable JS, Data Analysis, and Belief in the Browser

  1. The Parable of Google Flu (PDF) — We explore two
    issues that contributed to [Google Flu Trends]’s mistakes—big data hubris and algorithm dynamics—and offer lessons for moving forward in the big data age.
    Overtrained and underfed?
  2. Duktape — a lightweight embeddable Javascript engine. Because an app without an API is like a lightbulb without an IP address: retro but not cool.
  3. Principles of Good Data Analysis (Greg Reda) — Once you’ve settled on your approach and data sources, you need to make sure you understand how the data was generated or captured, especially if you are using your own company’s data. Treble so if you are using data you snaffled off the net, riddled with collection bias and untold omissions. (via Stijn Debrouwere)
  4. Deep Belief Networks in Javascript — just object recognition in the browser. The code relies on GPU shaders to perform calculations on over 60 million neural connections in real time. From the ever-more-awesome Pete Warden.
Comment

Crowdsourcing Feature discovery

More than algorithms, companies gain access to models that incorporate ideas generated by teams of data scientists

Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons he started CrowdFlower was that as a data scientist he got frustrated with having to create training sets for many of the problems he faced. More recently, companies have been experimenting with active learning (humans1 take care of uncertain cases, models handle the routine ones). Along those lines, Adam Marcus described in detail how Locu uses Crowdsourcing services to perform structured extraction (converting semi/unstructured data into structured data).

Another area where crowdsourcing is popping up is feature engineering and feature discovery. Experienced data scientists will attest that generating features is as (if not more) important than choice of algorithm. Startup CrowdAnalytix uses public/open data sets to help companies enhance their analytic models. The company has access to several thousand data scientists spread across 50 countries and counts a major social network among its customers. Its current focus is on providing “enterprise risk quantification services to Fortune 1000 companies”.

CrowdAnalytix breaks up projects in two phases: feature engineering and modeling. During the feature engineering phase, data scientists are presented with a problem (independent variable(s)) and are asked to propose features (predictors) and brief explanations for why they might prove useful. A panel of judges evaluate2 features based on the accompanying evidence and explanations. Typically 100+ teams enter this phase of the project, and 30+ teams propose reasonable features.

Read more…

Comments: 4
Four short links: 14 March 2014

Four short links: 14 March 2014

Facebook Criticism, New Games, Face Recognition, and Public Uber

  1. The Facebook experiment has failed. Let’s go backFacebook gets worse the more you use it. The innovation within Facebook happens within a framework that’s taken as given. This essay questions that frame, well.
  2. Meet the People Making New Games for Old Hardware“We’re all fighting for the same goal,” Cobb says. “There’s something artistic, and disciplined, about creating games for machines with limited hardware. You can’t pass off bloat as content, and you can’t drop in a licensed album in place of a hand-crafted digital soundtrack. To make something great you have to work hard, and straight from the heart. That’s what a lot of gamers still wish to see. And we’re happy to provide it for them.”
  3. DeepFace: Closing the Gap to Human-Level Performance in Face Verification — Facebook research into using deep neural networks for face recognition. Our method reaches an accuracy of 97.25% on the Labeled Faces in the Wild (LFW) dataset, reducing the error of the current state of the art by more than 25%, closely approaching human-level performance. “The best minds of my generation are thinking about how to make people click ads.” —Jeff Hammerbacher.
  4. Helsinki Does Uber for BusesHelsinki’s Kutsuplus lets you select your pick-up and drop-off locations and times, using a phone app, and then sends out a bus to take you exactly where you need to go.
Comment