"machine learning" entries

Four short links: 15 December 2014

Four short links: 15 December 2014

Transferable Learning, At-Scale Telemetry, Ugly DRM, and Fast Packet Processing

  1. How Transferable Are Features in Deep Neural Networks? — (answer: “very”). A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset. (via Pete Warden)
  2. Introducing Atlas: Netflix’s Primary Telemetry Platform — nice solution to the problems that many have, at a scale that few have.
  3. The Many Facades of DRM (PDF) — Modular software systems are designed to be broken into independent pieces. Each piece has a clear boundary and well-defined interface for ‘hooking’ into other pieces. Progress in most technologies accelerates once systems have achieved this state. But clear boundaries and well-defined interfaces also make a technology easier to attack, break, and reverse-engineer. Well-designed DRMs have very fuzzy boundaries and are designed to have very non-standard interfaces. The examples of the uglified DRM code are inspiring.
  4. DPDKa set of libraries and drivers for fast packet processing […] to: receive and send packets within the minimum number of CPU cycles (usually less than 80 cycles); develop fast packet capture algorithms (tcpdump-like); run third-party fast path stacks.
Comment
Four short links: 8 December 2014

Four short links: 8 December 2014

Systemic Improvement, Chinese Trends, Deep Learning, and Technical Debt

  1. Reith Lectures — this year’s lectures are by Atul Gawande, talking about preventable failure and systemic improvement — topics of particular relevance to devops cultural devotees. (via BoingBoing)
  2. Chinese Mobile App UI Trends — interesting differences between US and China. Phone number authentication interested me: You key in your number and receive a confirmation code via SMS. Here, all apps offer this type of phone number registration/login (if not prefer it). This also applies to websites, even those without apps. (via Matt Webb)
  3. Large Scale Deep Learning (PDF) — Jeff Dean from Google. Starts easy! Starts.
  4. Machine Learning: The High-Interest Credit Card of Technical Debt (PDF) — Google research paper on the ways in which machine learning can create problems rather than solve them.
Comment: 1
Four short links: 4 December 2014

Four short links: 4 December 2014

Click to Captcha, Managing Hackers, Easy Ordering, and Inside Ad Auctions

  1. One Click Captcha (Wired) — Google’s new Captcha tech is just a checkbox: “I am not a robot”. Instead of depending upon the traditional distorted word test, Google’s “reCaptcha” examines cues every user unwittingly provides: IP addresses and cookies provide evidence that the user is the same friendly human Google remembers from elsewhere on the Web. And Shet says even the tiny movements a user’s mouse makes as it hovers and approaches a checkbox can help reveal an automated bot.
  2. The Responsive Enterprise: Embracing the Hacker Way (ACM) — Letting developers wander around without clear goals in the vastness of the software universe of all computable functions is one of the major reasons why projects fail, not because of lack of process or planning. I like all of this, although at times it can be a little like what I imagine it would be like if Cory Doctorow wrote a management textbook. (via Greg Linden)
  3. Pizza Hut Tests Ordering via Eye-TrackingThe digital menu shows diners a canvas of 20 toppings and builds their pizza, from one of 4,896 combinations, based on which toppings they looked at longest.
  4. How Browsers Get to Know You in Milliseconds (Andy Oram) — breaks down info exchange, data exchange, timing, even business relationships for ad auctions. Augment understanding of the user from third-party data (10 milliseconds). These third parties are the companies that accumulate information about our purchasing habits. The time allowed for them to return data is so short that they often can’t spare time for network transmission, and instead co-locate at the AppNexus server site. In fact, according to Magnusson, the founders of AppNexus created a cloud server before opening their exchange.
Comment
Four short links: 6 November 2014

Four short links: 6 November 2014

Javascript Testing, Dark Data, Webapp Design, and Design Trumps Data

  1. Karma — kick-ass open source Javascript test environment.
  2. The Dark Market for Personal Data (NYTimes) — can buy lists of victims of sexual assault, of impulse buyers, of people with sexually transmitted disease, etc. The cost of a false-positive when those lists are used for marketing is less than the cost of false-positive when banks use the lists to decide whether you’re a credit risk. The lists fall between the cracks in privacy legislation; essentially, the compilation and use of lists of people are unregulated territory.
  3. 7 Principles of Rich Web Applications — “rich web applications” sounds like 2007 wants its ideas back, but the content is modern and useful. Predict behaviour for negative latency.
  4. Collaborative Filtering at LinkedIn (PDF) — This paper presents LinkedIn’s horizontal collaborative filtering infrastructure, known as browsemaps. Great lessons learned, including context and presentation of browsemaps or any recommendation is paramount for a truly relevant user experience. That is, design and presentation represents the largest ROI, with data engineering being a second, and algorithms last. (via Greg Linden)
Comment

Challenges facing predictive APIs

Solutions to a number of problems must be found to unlock PAPI value.

Key_in_Lock_nikolajnewyork_FlickrIn November, the first International Conference on Predictive APIs and Apps will take place in Barcelona, just ahead of Strata Barcelona. This event will bring together those who are building intelligent web services (sometimes called Machine Learning as a Service) with those who would like to use these services to build predictive apps, which, as defined by Forrester, deliver “the right functionality and content at the right time, for the right person, by continuously learning about them and predicting what they’ll need.”

This is a very exciting area. Machine learning of various sorts is revolutionizing many areas of business, and predictive services like the ones at the center of predictive APIs (PAPIs) have the potential to bring these capabilities to an even wider range of applications. I co-founded one of the first companies in this space (acquired by Salesforce in 2012), and I remain optimistic about the future of these efforts. But the field as a whole faces a number of challenges, for which the answers are neither easy nor obvious, that must be addressed before this value can be unlocked.

In the remainder of this post, I’ll enumerate what I see as the most pressing issues. I hope that the speakers and attendees at PAPIs will keep these in mind as they map out the road ahead. Read more…

Comment: 1
Four short links: 30 September 2014

Four short links: 30 September 2014

Continuous Testing, Programmable Bees, Deep Learning on GPUs, and Silk Road Numbers

  1. Continuously Testing Infrastructure — “infrastructure as code”. I can’t figure out whether what I feel are thrills or chills.
  2. Engineer Sees Big Possibilities in Micro-robots, Including Programmable Bees (National Geographic) — He and fellow researchers devised novel techniques to fabricate, assemble, and manufacture the miniature machines, each with a housefly-size thorax, three-centimeter (1.2-inch) wingspan, and weight of just 80 milligrams (.0028 ounces). The latest prototype rises on a thread-thin tether, flaps its wings 120 times a second, hovers, and flies along preprogrammed paths. (via BoingBoing)
  3. cuDNN — NVIDIA’s library of primitives for deep neural networks (on GPUS, natch). Not open source (registerware).
  4. Analysing Trends in Silk Road 2.0If, indeed every sale can map to a transaction, some vendors are doing huge amounts of business through mail order drugs. While the number is small, if we sum up all the product reviews x product prices, we get a huge number of USD $20,668,330.05. REMEMBER! This is on Silk Road 2.0 with a very small subset of their entire inventory. A peek into a largely invisible economy.
Comment
Four short links: 29 September 2014

Four short links: 29 September 2014

Feedback Surprises, Ownership Changes, Teaching Lessons, and 3D Retail

  1. How Community Feedback Shapes Behaviour (PDF) — Not only do authors of negatively-evaluated content contribute more, but also their future posts are of lower quality, and are perceived by the community as such. Moreover, these authors are more likely to subsequently evaluate their fellow users negatively, percolating these effects through the community. In contrast, positive feedback does not carry similar effects, and neither encourages rewarded authors to write more, nor improves the quality of their posts. Interestingly, the authors that receive no feedback are most likely to leave a community. Furthermore, a structural analysis of the voter network reveals that evaluations polarize the community the most when positive and negative votes are equally split.
  2. When Everything Works Like Your Cell Phone (The Atlantic) — our relationship to ownership is about to undergo a wild transformation.
  3. Teaching Me Softly — article of anecdotes drawing parallels between case studies in machine learning and things we know about human learning.
  4. SuperAwesome Me (3D Print) — Walmart to install 3d scanning booths and 3d printers so you can put your own head on a Hasbro action figure. Hasbro have the religion: they also paired with Shapeways for superfanart.com. (via John Battelle)
Comment
Four short links: 26 September 2014

Four short links: 26 September 2014

Good Communities, AI Games, Design Process, and Web Server Library

  1. 15 Lessons from 15 Years of Blogging (Anil Dash) — If your comments are full of assholes, it’s your fault. Good communities don’t just happen by accident.
  2. Replicating DeepMind — open source attempt to build deep learning network that can play Atari games. (via RoboHub)
  3. ToyTalk — fantastic iterative design process for the product (see the heading “A Bit of Trickery”)
  4. h2oan optimized HTTP server implementation that can be used either as a standalone server or a library.
Comment
Four short links: 19 September 2014

Four short links: 19 September 2014

Deep Learning Bibliography, Go Playground, Tweet-a-Program, and Memory Management

  1. Deep Learning Bibliographyan annotated bibliography of recent publications (2014-) related to Deep Learning.
  2. Inside the Go Playground — on safely offering a REPL over the web to strangers.
  3. Wolfram Tweet-a-Program — clever marketing trick, and reminiscent of Perl Golf-style “how much can you fit into how little” contests.
  4. Memory Management Reference — almost all you ever wanted to know about memory management.
Comment
Four short links: 15 September 2014

Four short links: 15 September 2014

Weird Machines, Libraries May Scan, Causal Effects, and Crappy Dashboards

  1. The Care and Feeding of Weird Machines Found in Executable Metadata (YouTube) — talk from 29th Chaos Communication Congress, on using tricking the ELF linker/loader into arbitrary computation from the metadata supplied. Yes, there’s a brainfuck compiler that turns code into metadata which is then, through a supernatural mix of pixies, steam engines, and binary, executed. This will make your brain leak. Weird machines are everywhere.
  2. European Libraries May Digitise Books Without Permission“The right of libraries to communicate, by dedicated terminals, the works they hold in their collections would risk being rendered largely meaningless, or indeed ineffective, if they did not have an ancillary right to digitize the works in question,” the court said. Even if the rights holder offers a library the possibility of licensing his works on appropriate terms, the library can use the exception to publish works on electronic terminals, the court ruled. “Otherwise, the library could not realize its core mission or promote the public interest in promoting research and private study,” it said.
  3. CausalImpact (GitHub) — Google’s R package for estimating the causal effect of a designed intervention on a time series. (via Google Open Source Blog)
  4. Laws of Crappy Dashboards — (caution, NSFW language … “crappy” is my paraphrase) so true. Not talking to users will result in a [crappy] dashboard. You don’t know if the dashboard is going to be useful. But you don’t talk to the users to figure it out. Or you just show it to them for a minute (with someone else’s data), never giving them a chance to figure out what the hell they could do with it if you gave it to them.
Comment: 1