ENTRIES TAGGED "machine learning"

Four short links: 9 January 2014

Four short links: 9 January 2014

Artificial Labour, Flexible Circuits, Vanishing Business Sexts, and Themal Imaging

  1. Artificial Labour and Ubiquitous Interactive Machine Learning (Greg Borenstein) — in which design fiction, actual machine learning, legal discovery, and comics meet. One of the major themes to emerge in the 2H2K project is something we’ve taken to calling “artificial labor”. While we’re skeptical of the claims of artificial intelligence, we do imagine ever-more sophisticated forms of automation transforming the landscape of work and economics. Or, as John puts it, robots are Marxist.
  2. Clear Flexible Circuit on a Contact Lens (Smithsonian) — ends up about 1/60th as thick as a human hair, and is as flexible.
  3. Confide (GigaOm) — Enterprise SnapChat. A Sarbanes-Oxley Litigation Printer. It’s the Internet of Undiscoverable Things. Looking forward to Enterprise Omegle.
  4. FLIR One — thermal imaging in phone form factor, another sensor for your panopticon. (via DIY Drones)
Comment
Four short links: 3 January 2014

Four short links: 3 January 2014

Mesh Networks, Collaborative LaTeX, Distributed Systems Book, and Reverse-Engineering Netflix Metadata

  1. Commotion — open source mesh networks.
  2. WriteLaTeX — online collaborative LaTeX editor. No, really. This exists. In 2014.
  3. Distributed Systems — free book for download, goal is to bring together the ideas behind many of the more recent distributed systems – systems such as Amazon’s Dynamo, Google’s BigTable and MapReduce, Apache’s Hadoop etc.
  4. How Netflix Reverse-Engineered Hollywood (The Atlantic) — Using large teams of people specially trained to watch movies, Netflix deconstructed Hollywood. They paid people to watch films and tag them with all kinds of metadata. This process is so sophisticated and precise that taggers receive a 36-page training document that teaches them how to rate movies on their sexually suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness.
Comment
Four short links: 30 December 2013

Four short links: 30 December 2013

Pattern Recognition, MicroSD Vulnerability, Security Talks, and IoT List

  1. tooldiaga collection of methods for statistical pattern recognition. Implemented in C.
  2. Hacking MicroSD Cards (Bunnie Huang) — In my explorations of the electronics markets in China, I’ve seen shop keepers burning firmware on cards that “expand” the capacity of the card — in other words, they load a firmware that reports the capacity of a card is much larger than the actual available storage. The fact that this is possible at the point of sale means that most likely, the update mechanism is not secured. MicroSD cards come with embedded microcontrollers whose firmware can be exploited.
  3. 30c3 — recordings from the 30th Chaos Communication Congress.
  4. IOT Companies, Products, Devices, and Software by Sector (Mike Nicholls) — astonishing amount of work in the space, especially given this list is inevitably incomplete.
Comment

Six reasons why I recommend scikit-learn

It's an extensive, well-documented, and accessible, curated library of machine-learning models

I use a variety of tools for advanced analytics, most recently I’ve been using Spark (and MLlib), R, scikit-learn, and GraphLab. When I need to get something done quickly, I’ve been turning to scikit-learn for my first pass analysis. For access to high-quality, easy-to-use, implementations1 of popular algorithms, scikit-learn is a great place to start. So much so that I often encourage new and seasoned data scientists to try it whenever they’re faced with analytics projects that have short deadlines.

I recently spent a few hours with one of scikit-learn’s core contributors Olivier Grisel. We had a free flowing discussion were we talked about machine-learning, data science, programming languages, big data, Paris, and … scikit-learn! Along the way, I was reminded by why I’ve come to use (and admire) the scikit-learn project.

Commitment to documentation and usability
One of the reasons I started2 using scikit-learn was because of its nice documentation (which I hold up as an example for other communities and projects to emulate). Contributions to scikit-learn are required to include narrative examples along with sample scripts that run on small data sets. Besides good documentation there are other core tenets that guide the community’s overall commitment to quality and usability: the global API is safeguarded, all public API’s are well documented, and when appropriate contributors are encouraged to expand the coverage of unit tests.

Models are chosen and implemented by a dedicated team of experts
scikit-learn’s stable of contributors includes experts in machine-learning and software development. A few of them (including Olivier) are able to devote a portion of their professional working hours to the project.

Covers most machine-learning tasks
Scan the list of things available in scikit-learn and you quickly realize that it includes tools for many of the standard machine-learning tasks (such as clustering, classification, regression, etc.). And since scikit-learn is developed by a large community of developers and machine-learning experts, promising new techniques tend to be included in fairly short order.

As a curated library, users don’t have to choose from multiple competing implementations of the same algorithm (a problem that R users often face). In order to assist users who struggle to choose between different models, Andreas Muller created a simple flowchart for users:

Read more…

Comment
Four short links: 27 December 2013

Four short links: 27 December 2013

Dinosaur Tries to Suckle, Dashboard Design, Massive Visualizations, Massive Machine Learning

  1. Intel XDKIf you can write code in HTML5, CSS3 and JavaScript*, you can use the Intel® XDK to build an HTML5 web app or a hybrid app for all of the major app stores. It’s a .exe. What more do I need to say? FFS.
  2. Behind the Scenes of a Dashboard Design — the design decisions that go into displaying complex info.
  3. Superconductora web framework for creating data visualizations that scale to real-time interactions with up to 1,000,000 data points. It compiles to WebCL, WebGL, and web workers. (via Ben Lorica)
  4. BIDMach: Large-scale Learning with Zero Memory Allocation (PDF) — GPU-accelerated machine learning. In this paper we describe a caching approach that allows code with complex matrix (graph) expressions at massive scale, i.e. multi-terabyte data, with zero memory allocation after the initial setup. (via Siah)
Comment
Four short links: 26 December 2013

Four short links: 26 December 2013

Inside the Nest Protect, Log Structures, Predictions, and In-Memory Data Cubes

  1. Nest Protect Teardown (Sparkfun) — initial teardown of another piece of domestic industrial Internet.
  2. LogsThe distributed log can be seen as the data structure which models the problem of consensus. Not kidding when he calls it “real-time data’s unifying abstraction”.
  3. Mining the Web to Predict Future Events (PDF) — Mining 22 years of news stories to predict future events. (via Ben Lorica)
  4. Nanocubesa fast datastructure for in-memory data cubes developed at the Information Visualization department at AT&T Labs – Research. Nanocubes can be used to explore datasets with billions of elements at interactive rates in a web browser, and in some cases it uses sufficiently little memory that you can run a nanocube in a modern-day laptop. (via Ben Lorica)
Comment
Four short links: 6 December 2013

Four short links: 6 December 2013

AI Book, Science Superstars, Engineering Ethics, and Crowdsourced Science

  1. Society of Mind — Marvin Minsky’s book now Creative-Commons licensed.
  2. Collaboration, Stars, and the Changing Organization of Science: Evidence from Evolutionary BiologyThe concentration of research output is declining at the department level but increasing at the individual level. [...] We speculate that this may be due to changing patterns of collaboration, perhaps caused by the rising burden of knowledge and the falling cost of communication, both of which increase the returns to collaboration. Indeed, we report evidence that the propensity to collaborate is rising over time. (via Sciblogs)
  3. As Engineers, We Must Consider the Ethical Implications of our Work (The Guardian) — applies to coders and designers as well.
  4. Eyewire — a game to crowdsource the mapping of 3D structure of neurons.
Comment: 1
Four short links: 3 December 2013

Four short links: 3 December 2013

  1. SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
  2. madliban open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
  3. Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
  4. Disguise Detection — using Raspberry Pi, Arduino, and Python.
Comment
Four short links: 2 December 2013

Four short links: 2 December 2013

Learning Machine Learning, Pokemon Coding, Drone Coverage, and Optimization Guide

  1. CalTech Machine Learning Video Library — a pile of video introductions to different machine learning concepts.
  2. Awesome Pokemon Hack — each inventory item has a number associated with it, they are kept at a particular memory location, and there’s a glitch in the game that executes code at that location so … you can program by assembling items and then triggering the glitch. SO COOL.
  3. Drone Footage of Bangkok Protests — including water cannons.
  4. The Mature Optimization Handbook — free, well thought out, and well written. My favourite line: In exchange for that saved space, you have created a hidden dependency on clairvoyance.
Comment
Four short links: 26 November 2013

Four short links: 26 November 2013

Internet Cities, Defying Google Glass, Deep Learning Book, and Open Paleoanthropology

  1. The Death and Life of Great Internet Cities“The sense that you were given some space on the Internet, and allowed to do anything you wanted to in that space, it’s completely gone from these new social sites,” said Scott. “Like prisoners, or livestock, or anybody locked in institution, I am sure the residents of these new places don’t even notice the walls anymore.”
  2. What You’re Not Supposed To Do With Google Glass (Esquire) — Maybe I can put these interruptions to good use. I once read that in ancient Rome, when a general came home victorious, they’d throw him a triumphal parade. But there was always a slave who walked behind the general, whispering in his ear to keep him humble. “You are mortal,” the slave would say. I’ve always wanted a modern nonslave version of this — a way to remind myself to keep perspective. And Glass seemed the first gadget that would allow me to do that. In the morning, I schedule a series of messages to e-mail myself throughout the day. “You are mortal.” “You are going to die someday.” “Stop being a selfish bastard and think about others.” (via BoingBoing)
  3. Neural Networks and Deep Learning — Chapter 1 up and free, and there’s an IndieGogo campaign to fund the rest.
  4. What We Know and Don’t KnowThat highly controlled approach creates the misconception that fossils come out of the ground with labels attached. Or worse, that discovery comes from cloaked geniuses instead of open discussion. We’re hoping to combat these misconceptions by pursuing an open approach. This is today’s evolutionary science, not the science of fifty years ago We’re here sharing science. [...] Science isn’t the answers, science is the process. Open science in paleoanthropology.
Comment: 1