"machine learning" entries

Cheap sensors, fast networks, and distributed computing

The history of computing has been a constant pendulum — that pendulum is now swinging back toward distribution.

Editor’s note: this is an excerpt from our new report Data: Emerging Trends and Technologies, by Alistair Croll. You can download the free report here.

The trifecta of cheap sensors, fast networks, and distributing computing are changing how we work with data. But making sense of all that data takes help, which is arriving in the form of machine learning. Here’s one view of how that might play out.

Clouds, edges, fog, and the pendulum of distributed computing

The history of computing has been a constant pendulum, swinging between centralization and distribution.

The first computers filled rooms, and operators were physically within them, switching toggles and turning wheels. Then came mainframes, which were centralized, with dumb terminals.

As the cost of computing dropped and the applications became more democratized, user interfaces mattered more. The smarter clients at the edge became the first personal computers; many broke free of the network entirely. The client got the glory; the server merely handled queries.

Once the web arrived, we centralized again. LAMP (Linux, Apache, MySQL, PHP) buried deep inside data centers, with the computer at the other end of the connection relegated to little more than a smart terminal rendering HTML. Load-balancers sprayed traffic across thousands of cheap machines. Eventually, the web turned from static sites to complex software as a service (SaaS) applications.

Then the pendulum swung back to the edge, and the clients got smart again. First with AJAX, Java, and Flash; then in the form of mobile apps, where the smartphone or tablet did most of the hard work and the back end was a communications channel for reporting the results of local action. Read more…

Comment
Four short links: 16 December 2014

Four short links: 16 December 2014

Memory Management, Stream Processing, Robot's Google, and Emotive Words

  1. Effectively Managing Memory at Gmail Scale — how they gathered data, how Javascript memory management works, and what they did to nail down leaks.
  2. tigonan open-source, real-time, low-latency, high-throughput stream processing framework.
  3. Robo Brain — machine knowledge of the real world for robots. (via MIT Technology Review)
  4. The Structure and Interpretation of the Computer Science Curriculum — convincing argument for teaching intro to programming with Scheme, but not using the classic text SICP.

Update: the original fourth link to Depeche Mood led only to a README on GitHub; we’ve replaced it with a new link.

Comments: 5
Four short links: 15 December 2014

Four short links: 15 December 2014

Transferable Learning, At-Scale Telemetry, Ugly DRM, and Fast Packet Processing

  1. How Transferable Are Features in Deep Neural Networks? — (answer: “very”). A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset. (via Pete Warden)
  2. Introducing Atlas: Netflix’s Primary Telemetry Platform — nice solution to the problems that many have, at a scale that few have.
  3. The Many Facades of DRM (PDF) — Modular software systems are designed to be broken into independent pieces. Each piece has a clear boundary and well-defined interface for ‘hooking’ into other pieces. Progress in most technologies accelerates once systems have achieved this state. But clear boundaries and well-defined interfaces also make a technology easier to attack, break, and reverse-engineer. Well-designed DRMs have very fuzzy boundaries and are designed to have very non-standard interfaces. The examples of the uglified DRM code are inspiring.
  4. DPDKa set of libraries and drivers for fast packet processing […] to: receive and send packets within the minimum number of CPU cycles (usually less than 80 cycles); develop fast packet capture algorithms (tcpdump-like); run third-party fast path stacks.
Comment
Four short links: 8 December 2014

Four short links: 8 December 2014

Systemic Improvement, Chinese Trends, Deep Learning, and Technical Debt

  1. Reith Lectures — this year’s lectures are by Atul Gawande, talking about preventable failure and systemic improvement — topics of particular relevance to devops cultural devotees. (via BoingBoing)
  2. Chinese Mobile App UI Trends — interesting differences between US and China. Phone number authentication interested me: You key in your number and receive a confirmation code via SMS. Here, all apps offer this type of phone number registration/login (if not prefer it). This also applies to websites, even those without apps. (via Matt Webb)
  3. Large Scale Deep Learning (PDF) — Jeff Dean from Google. Starts easy! Starts.
  4. Machine Learning: The High-Interest Credit Card of Technical Debt (PDF) — Google research paper on the ways in which machine learning can create problems rather than solve them.
Comment: 1
Four short links: 4 December 2014

Four short links: 4 December 2014

Click to Captcha, Managing Hackers, Easy Ordering, and Inside Ad Auctions

  1. One Click Captcha (Wired) — Google’s new Captcha tech is just a checkbox: “I am not a robot”. Instead of depending upon the traditional distorted word test, Google’s “reCaptcha” examines cues every user unwittingly provides: IP addresses and cookies provide evidence that the user is the same friendly human Google remembers from elsewhere on the Web. And Shet says even the tiny movements a user’s mouse makes as it hovers and approaches a checkbox can help reveal an automated bot.
  2. The Responsive Enterprise: Embracing the Hacker Way (ACM) — Letting developers wander around without clear goals in the vastness of the software universe of all computable functions is one of the major reasons why projects fail, not because of lack of process or planning. I like all of this, although at times it can be a little like what I imagine it would be like if Cory Doctorow wrote a management textbook. (via Greg Linden)
  3. Pizza Hut Tests Ordering via Eye-TrackingThe digital menu shows diners a canvas of 20 toppings and builds their pizza, from one of 4,896 combinations, based on which toppings they looked at longest.
  4. How Browsers Get to Know You in Milliseconds (Andy Oram) — breaks down info exchange, data exchange, timing, even business relationships for ad auctions. Augment understanding of the user from third-party data (10 milliseconds). These third parties are the companies that accumulate information about our purchasing habits. The time allowed for them to return data is so short that they often can’t spare time for network transmission, and instead co-locate at the AppNexus server site. In fact, according to Magnusson, the founders of AppNexus created a cloud server before opening their exchange.
Comment
Four short links: 6 November 2014

Four short links: 6 November 2014

Javascript Testing, Dark Data, Webapp Design, and Design Trumps Data

  1. Karma — kick-ass open source Javascript test environment.
  2. The Dark Market for Personal Data (NYTimes) — can buy lists of victims of sexual assault, of impulse buyers, of people with sexually transmitted disease, etc. The cost of a false-positive when those lists are used for marketing is less than the cost of false-positive when banks use the lists to decide whether you’re a credit risk. The lists fall between the cracks in privacy legislation; essentially, the compilation and use of lists of people are unregulated territory.
  3. 7 Principles of Rich Web Applications — “rich web applications” sounds like 2007 wants its ideas back, but the content is modern and useful. Predict behaviour for negative latency.
  4. Collaborative Filtering at LinkedIn (PDF) — This paper presents LinkedIn’s horizontal collaborative filtering infrastructure, known as browsemaps. Great lessons learned, including context and presentation of browsemaps or any recommendation is paramount for a truly relevant user experience. That is, design and presentation represents the largest ROI, with data engineering being a second, and algorithms last. (via Greg Linden)
Comment

Challenges facing predictive APIs

Solutions to a number of problems must be found to unlock PAPI value.

Key_in_Lock_nikolajnewyork_FlickrIn November, the first International Conference on Predictive APIs and Apps will take place in Barcelona, just ahead of Strata Barcelona. This event will bring together those who are building intelligent web services (sometimes called Machine Learning as a Service) with those who would like to use these services to build predictive apps, which, as defined by Forrester, deliver “the right functionality and content at the right time, for the right person, by continuously learning about them and predicting what they’ll need.”

This is a very exciting area. Machine learning of various sorts is revolutionizing many areas of business, and predictive services like the ones at the center of predictive APIs (PAPIs) have the potential to bring these capabilities to an even wider range of applications. I co-founded one of the first companies in this space (acquired by Salesforce in 2012), and I remain optimistic about the future of these efforts. But the field as a whole faces a number of challenges, for which the answers are neither easy nor obvious, that must be addressed before this value can be unlocked.

In the remainder of this post, I’ll enumerate what I see as the most pressing issues. I hope that the speakers and attendees at PAPIs will keep these in mind as they map out the road ahead. Read more…

Comment: 1
Four short links: 30 September 2014

Four short links: 30 September 2014

Continuous Testing, Programmable Bees, Deep Learning on GPUs, and Silk Road Numbers

  1. Continuously Testing Infrastructure — “infrastructure as code”. I can’t figure out whether what I feel are thrills or chills.
  2. Engineer Sees Big Possibilities in Micro-robots, Including Programmable Bees (National Geographic) — He and fellow researchers devised novel techniques to fabricate, assemble, and manufacture the miniature machines, each with a housefly-size thorax, three-centimeter (1.2-inch) wingspan, and weight of just 80 milligrams (.0028 ounces). The latest prototype rises on a thread-thin tether, flaps its wings 120 times a second, hovers, and flies along preprogrammed paths. (via BoingBoing)
  3. cuDNN — NVIDIA’s library of primitives for deep neural networks (on GPUS, natch). Not open source (registerware).
  4. Analysing Trends in Silk Road 2.0If, indeed every sale can map to a transaction, some vendors are doing huge amounts of business through mail order drugs. While the number is small, if we sum up all the product reviews x product prices, we get a huge number of USD $20,668,330.05. REMEMBER! This is on Silk Road 2.0 with a very small subset of their entire inventory. A peek into a largely invisible economy.
Comment
Four short links: 29 September 2014

Four short links: 29 September 2014

Feedback Surprises, Ownership Changes, Teaching Lessons, and 3D Retail

  1. How Community Feedback Shapes Behaviour (PDF) — Not only do authors of negatively-evaluated content contribute more, but also their future posts are of lower quality, and are perceived by the community as such. Moreover, these authors are more likely to subsequently evaluate their fellow users negatively, percolating these effects through the community. In contrast, positive feedback does not carry similar effects, and neither encourages rewarded authors to write more, nor improves the quality of their posts. Interestingly, the authors that receive no feedback are most likely to leave a community. Furthermore, a structural analysis of the voter network reveals that evaluations polarize the community the most when positive and negative votes are equally split.
  2. When Everything Works Like Your Cell Phone (The Atlantic) — our relationship to ownership is about to undergo a wild transformation.
  3. Teaching Me Softly — article of anecdotes drawing parallels between case studies in machine learning and things we know about human learning.
  4. SuperAwesome Me (3D Print) — Walmart to install 3d scanning booths and 3d printers so you can put your own head on a Hasbro action figure. Hasbro have the religion: they also paired with Shapeways for superfanart.com. (via John Battelle)
Comment
Four short links: 26 September 2014

Four short links: 26 September 2014

Good Communities, AI Games, Design Process, and Web Server Library

  1. 15 Lessons from 15 Years of Blogging (Anil Dash) — If your comments are full of assholes, it’s your fault. Good communities don’t just happen by accident.
  2. Replicating DeepMind — open source attempt to build deep learning network that can play Atari games. (via RoboHub)
  3. ToyTalk — fantastic iterative design process for the product (see the heading “A Bit of Trickery”)
  4. h2oan optimized HTTP server implementation that can be used either as a standalone server or a library.
Comment