"machine learning" entries

Four short links: 6 November 2014

Four short links: 6 November 2014

Javascript Testing, Dark Data, Webapp Design, and Design Trumps Data

  1. Karma — kick-ass open source Javascript test environment.
  2. The Dark Market for Personal Data (NYTimes) — can buy lists of victims of sexual assault, of impulse buyers, of people with sexually transmitted disease, etc. The cost of a false-positive when those lists are used for marketing is less than the cost of false-positive when banks use the lists to decide whether you’re a credit risk. The lists fall between the cracks in privacy legislation; essentially, the compilation and use of lists of people are unregulated territory.
  3. 7 Principles of Rich Web Applications — “rich web applications” sounds like 2007 wants its ideas back, but the content is modern and useful. Predict behaviour for negative latency.
  4. Collaborative Filtering at LinkedIn (PDF) — This paper presents LinkedIn’s horizontal collaborative filtering infrastructure, known as browsemaps. Great lessons learned, including context and presentation of browsemaps or any recommendation is paramount for a truly relevant user experience. That is, design and presentation represents the largest ROI, with data engineering being a second, and algorithms last. (via Greg Linden)
Comment

Challenges facing predictive APIs

Solutions to a number of problems must be found to unlock PAPI value.

Key_in_Lock_nikolajnewyork_FlickrIn November, the first International Conference on Predictive APIs and Apps will take place in Barcelona, just ahead of Strata Barcelona. This event will bring together those who are building intelligent web services (sometimes called Machine Learning as a Service) with those who would like to use these services to build predictive apps, which, as defined by Forrester, deliver “the right functionality and content at the right time, for the right person, by continuously learning about them and predicting what they’ll need.”

This is a very exciting area. Machine learning of various sorts is revolutionizing many areas of business, and predictive services like the ones at the center of predictive APIs (PAPIs) have the potential to bring these capabilities to an even wider range of applications. I co-founded one of the first companies in this space (acquired by Salesforce in 2012), and I remain optimistic about the future of these efforts. But the field as a whole faces a number of challenges, for which the answers are neither easy nor obvious, that must be addressed before this value can be unlocked.

In the remainder of this post, I’ll enumerate what I see as the most pressing issues. I hope that the speakers and attendees at PAPIs will keep these in mind as they map out the road ahead. Read more…

Comment: 1
Four short links: 30 September 2014

Four short links: 30 September 2014

Continuous Testing, Programmable Bees, Deep Learning on GPUs, and Silk Road Numbers

  1. Continuously Testing Infrastructure — “infrastructure as code”. I can’t figure out whether what I feel are thrills or chills.
  2. Engineer Sees Big Possibilities in Micro-robots, Including Programmable Bees (National Geographic) — He and fellow researchers devised novel techniques to fabricate, assemble, and manufacture the miniature machines, each with a housefly-size thorax, three-centimeter (1.2-inch) wingspan, and weight of just 80 milligrams (.0028 ounces). The latest prototype rises on a thread-thin tether, flaps its wings 120 times a second, hovers, and flies along preprogrammed paths. (via BoingBoing)
  3. cuDNN — NVIDIA’s library of primitives for deep neural networks (on GPUS, natch). Not open source (registerware).
  4. Analysing Trends in Silk Road 2.0If, indeed every sale can map to a transaction, some vendors are doing huge amounts of business through mail order drugs. While the number is small, if we sum up all the product reviews x product prices, we get a huge number of USD $20,668,330.05. REMEMBER! This is on Silk Road 2.0 with a very small subset of their entire inventory. A peek into a largely invisible economy.
Comment
Four short links: 29 September 2014

Four short links: 29 September 2014

Feedback Surprises, Ownership Changes, Teaching Lessons, and 3D Retail

  1. How Community Feedback Shapes Behaviour (PDF) — Not only do authors of negatively-evaluated content contribute more, but also their future posts are of lower quality, and are perceived by the community as such. Moreover, these authors are more likely to subsequently evaluate their fellow users negatively, percolating these effects through the community. In contrast, positive feedback does not carry similar effects, and neither encourages rewarded authors to write more, nor improves the quality of their posts. Interestingly, the authors that receive no feedback are most likely to leave a community. Furthermore, a structural analysis of the voter network reveals that evaluations polarize the community the most when positive and negative votes are equally split.
  2. When Everything Works Like Your Cell Phone (The Atlantic) — our relationship to ownership is about to undergo a wild transformation.
  3. Teaching Me Softly — article of anecdotes drawing parallels between case studies in machine learning and things we know about human learning.
  4. SuperAwesome Me (3D Print) — Walmart to install 3d scanning booths and 3d printers so you can put your own head on a Hasbro action figure. Hasbro have the religion: they also paired with Shapeways for superfanart.com. (via John Battelle)
Comment
Four short links: 26 September 2014

Four short links: 26 September 2014

Good Communities, AI Games, Design Process, and Web Server Library

  1. 15 Lessons from 15 Years of Blogging (Anil Dash) — If your comments are full of assholes, it’s your fault. Good communities don’t just happen by accident.
  2. Replicating DeepMind — open source attempt to build deep learning network that can play Atari games. (via RoboHub)
  3. ToyTalk — fantastic iterative design process for the product (see the heading “A Bit of Trickery”)
  4. h2oan optimized HTTP server implementation that can be used either as a standalone server or a library.
Comment
Four short links: 19 September 2014

Four short links: 19 September 2014

Deep Learning Bibliography, Go Playground, Tweet-a-Program, and Memory Management

  1. Deep Learning Bibliographyan annotated bibliography of recent publications (2014-) related to Deep Learning.
  2. Inside the Go Playground — on safely offering a REPL over the web to strangers.
  3. Wolfram Tweet-a-Program — clever marketing trick, and reminiscent of Perl Golf-style “how much can you fit into how little” contests.
  4. Memory Management Reference — almost all you ever wanted to know about memory management.
Comment
Four short links: 15 September 2014

Four short links: 15 September 2014

Weird Machines, Libraries May Scan, Causal Effects, and Crappy Dashboards

  1. The Care and Feeding of Weird Machines Found in Executable Metadata (YouTube) — talk from 29th Chaos Communication Congress, on using tricking the ELF linker/loader into arbitrary computation from the metadata supplied. Yes, there’s a brainfuck compiler that turns code into metadata which is then, through a supernatural mix of pixies, steam engines, and binary, executed. This will make your brain leak. Weird machines are everywhere.
  2. European Libraries May Digitise Books Without Permission“The right of libraries to communicate, by dedicated terminals, the works they hold in their collections would risk being rendered largely meaningless, or indeed ineffective, if they did not have an ancillary right to digitize the works in question,” the court said. Even if the rights holder offers a library the possibility of licensing his works on appropriate terms, the library can use the exception to publish works on electronic terminals, the court ruled. “Otherwise, the library could not realize its core mission or promote the public interest in promoting research and private study,” it said.
  3. CausalImpact (GitHub) — Google’s R package for estimating the causal effect of a designed intervention on a time series. (via Google Open Source Blog)
  4. Laws of Crappy Dashboards — (caution, NSFW language … “crappy” is my paraphrase) so true. Not talking to users will result in a [crappy] dashboard. You don’t know if the dashboard is going to be useful. But you don’t talk to the users to figure it out. Or you just show it to them for a minute (with someone else’s data), never giving them a chance to figure out what the hell they could do with it if you gave it to them.
Comment: 1
Four short links: 12 September 2014

Four short links: 12 September 2014

Knowledge Graphs, Multi-Language Declarations, Monitoring, and More Monitoring

  1. Google Knowledge Vault and Topic Modeling — recap of talks by Google and Facebook staff about how they use their knowledge graphs. I found this super-interesting.
  2. djinniA tool for generating cross-language type declarations and interface bindings.
  3. monita small Open Source utility for managing and monitoring Unix systems. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.
  4. perf-toolingList of performance analysis, monitoring and optimization tools.
Comments: 3
Four short links: 1 September 2014

Four short links: 1 September 2014

Sibyl, Bitrot, Estimation, and ssh

  1. Sibyl: Google’s System for Large Scale Machine Learning (YouTube) — keynote at DSN2014 acting as an intro to Sibyl. (via KD Nuggets)
  2. Bitrot from 1997That’s 205 failures, an actual link rot figure of 91%, not 57%. That leaves only 21 URLs as 200 OK and containing effectively the same content.
  3. What We Do And Don’t Know About Software Effort Estimation — nice rundown of research in the field.
  4. fabric — simple yet powerful ssh library for Python.
Comment: 1
Four short links: 27 August 2014

Four short links: 27 August 2014

Discourse 1.0, Programmable Matter, Versioned Databases, and What Humans Learned About Machine Learning

  1. Discourse turns 1.0 — community/forum software that doesn’t suck.
  2. Programmable Matter (IEEE Spectrum) — recap of where research is going in this area.
  3. Liquibasesource control for your database. Apache 2.0 licensed.
  4. A Few Useful Things to Know About Machine Learning (PDF) — This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions. My fave: First-timers are often surprised by how little time in a machine learning project is spent actually doing machine learning. But it makes sense if you consider how time-consuming it is to gather data, integrate it, clean it and pre-process it, and how much trial and error can go into feature design.
Comments: 2