"machine learning" entries

Four short links: 4 February 2016

Four short links: 4 February 2016

Shmoocon Video, Smart Watchstrap, Generalizing Learning, and Dataflow vs Spark

  1. Shmoocon 2016 Videos (Internet Archive) — videos of the talks from an astonishingly good security conference.
  2. TipTalk — Samsung watchstrap that is the smart device … put your finger in your ear to hear the call. You had me at put my finger in my ear. (via WaPo)
  3. Ecorithms — Leslie Valiant at Harvard broadened the concept of an algorithm into an “ecorithm,” which is a learning algorithm that “runs” on any system capable of interacting with its physical environment. Algorithms apply to computational systems, but ecorithms can apply to biological organisms or entire species. The concept draws a computational equivalence between the way that individuals learn and the way that entire ecosystems evolve. In both cases, ecorithms describe adaptive behavior in a mechanistic way.
  4. Dataflow/Beam vs Spark (Google Cloud) — To highlight the distinguishing features of the Dataflow model, we’ll be comparing code side-by-side with Spark code snippets. Spark has had a huge and positive impact on the industry thanks to doing a number of things much better than other systems had done before. But Dataflow holds distinct advantages in programming model flexibility, power, and expressiveness, particularly in the out-of-order processing and real-time session management arenas.
Four short links: 3 February 2016

Four short links: 3 February 2016

Security Forecast, Machine Learning for Defence, Retro PC Fonts, and Cognitive Psych Research

  1. Software Security Ideas Ahead of Their Time — astonishing email exchange from 1995 presaged a hell of a lot of security work.
  2. Doxxing Sherlock — Cory Doctorow’s ruminations on surveillance, Sherlock, and what he found in the Snowden papers. What he found included an outline of intelligence use of machine learning.
  3. Old-School PC Fonts — definitive collection of ripped-from-the-BIOS fonts from the various types of PCs. Your eyes will ache with nostalgia. (Or, if you’re a young gun, wondering how anybody wrote code with fonts like that) (my terminal font is VT220 because it makes me happy and productive)
  4. Cognitive Load: Brain GemsWe distill the latest behavioural economics & consumer psychology research down into helpful little brain gems.
Four short links: 2 February 2016

Four short links: 2 February 2016

Fourth Industrial Revolution, Agent System, Evidence-Based Programming, and Deep Learning Service

  1. This is Not the Fourth Industrial Revolution (Slate) — the phrase “the fourth Industrial Revolution” has been around for more than 75 years. It first came into popular use in 1940.
  2. Huginn — MIT-licensed system for building agents that perform automated tasks for you online. They can read the Web, watch for events, and take actions on your behalf. Huginn’s Agents create and consume events, propagating them along a directed graph. Think of it as a hackable Yahoo! Pipes plus IFTTT on your own server.
  3. Evidence-Oriented Programming — design programming language syntax and features based on what research shows works. They tested Perl and Java, found apparently not detectably easier to use for novices than a language that my student at the time, Susanna Kiwala (formerly Siebert), created by essentially rolling dice and picking (ridiculous) symbols at random.
  4. Deep Detect — open source deep learning service.
Four short links: 28 January 2016

Four short links: 28 January 2016

Augmented Intelligence, Social Network Limits, Microsoft Research, and Google's Go

  1. Chimera (Paper a Day) — the authors summarise six main lessons learned while building Chimera: (1) Things break down at large scale; (2) Both learning and hand-crafted rules are critical; (3) Crowdsourcing is critical, but must be closely monitored; (4) Crowdsourcing must be coupled with in-house analysts and developers; (5) Outsourcing does not work at a very large scale; (6) Hybrid human-machine systems are here to stay.
  2. Do Online Social Media Remove Constraints That Limit the Size of Offline Social Networks? (Royal Society) — paper by Robin Dunbar. Answer: The data show that the size and range of online egocentric social networks, indexed as the number of Facebook friends, is similar to that of offline face-to-face networks.
  3. Microsoft Embedding ResearchTo break down the walls between its research group and the rest of the company, Microsoft reassigned about half of its more than 1,000 research staff in September 2014 to a new group called MSR NExT. Its focus is on projects with greater impact to the company rather than pure research. Meanwhile, the other half of Microsoft Research is getting pushed to find more significant ways it can contribute to the company’s products. The challenge is how to avoid short-term thinking from your research team. For instance, Facebook assigns some staff to focus on long-term research, and Google’s DeepMind group in London conducts pure AI research without immediate commercial considerations.
  4. Google’s Go-Playing AIThe key to AlphaGo is reducing the enormous search space to something more manageable. To do this, it combines a state-of-the-art tree search with two deep neural networks, each of which contains many layers with millions of neuron-like connections. One neural network, the “policy network,” predicts the next move, and is used to narrow the search to consider only the moves most likely to lead to a win. The other neural network, the “value network,” is then used to reduce the depth of the search tree — estimating the winner in each position in place of searching all the way to the end of the game.
Four short links: 20 January 2016

Four short links: 20 January 2016

Rules-Based Distributed Code, Open Source Face Recognition, Simulation w/Emoji, and Berkeley's AI Materials

  1. Experience with Rules-Based Programming for Distributed Concurrent Fault-Tolerant Code (A Paper a Day) — To demonstrate applicability outside of the RAMCloud system, the team also re-wrote the Hadoop Map-Reduce job scheduler (which uses a traditional event-based state machine approach) using rules. The original code has three state machines containing 34 states with 163 different transitions, about 2,250 lines of code in total. The rules-based re-implementation required 19 rules in 3 tasks with a total of 117 lines of code and comments. Rules-based systems are powerful and underused.
  2. OpenFace — open source face recognition software using deep neural networks.
  3. Simulating the World in Emoji — fun simulation environment in the browser.
  4. Berkeley’s Intro-to-AI MaterialsWe designed these projects with three goals in mind. The projects allow students to visualize the results of the techniques they implement. They also contain code examples and clear directions, but do not force students to wade through undue amounts of scaffolding. Finally, Pac-Man provides a challenging problem environment that demands creative solutions; real-world AI problems are challenging, and Pac-Man is, too.
Four short links: 18 January 2016

Four short links: 18 January 2016

Machine Learning Technical Debt, Audio Matching, Self-Tracking Research, and Baidu's Open Source Deep Learning Code

  1. Hidden Technical Debt in Machine Learning Systems (PDF) — We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.
  2. Large-Scale Content-Based Matching of Midi and Audio FilesWe present a system that can efficiently match and align MIDI files to entries in a large corpus of audio content based solely on content, i.e., without using any metadata.
  3. Critical Social Research on Self-TrackingI am currently working on an article that is a comprehensive review of both literatures, in the attempt to outline what each can contribute to understanding self-tracking as an ethos and a practice, and its wider sociocultural implications. Here is a reading list of the work from critical social researchers that I am aware of. Trigger warning: phrases like “The discursive construction of student subjectivities.”
  4. Warp-CTC — Baidu’s open source deep learning code. Connectionist Temporal Classification is a loss function useful for performing supervised learning on sequence data, without needing an alignment between input data and labels.
Four short links: 5 January 2016

Four short links: 5 January 2016

Inference with Privacy, RethinkDB Reliability, T-Mobile Choking Video, and Real-Time Streams

  1. Privacy-Preserving Inference of Social Relationships from Location Data (PDF) — utilizes an untrusted server and computes the building blocks to support various social relationship studies, without disclosing location information to the server and other untrusted parties. (via CCC Blog)
  2. Jepson takes on Rethink — the glowingest review I’ve seen from Aphyr. As far as I can ascertain, RethinkDB’s safety claims are accurate.
  3. T-Mobile’s BingeOn `Optimization’ Is Just Throttling (EFF) — T-Mobile has claimed that this practice isn’t really “throttling,” but we disagree. It’s clearly not “optimization,” since T-Mobile doesn’t alter the actual content of the video streams in any way.
  4. qminer — BSD-licensed data analytics platform for processing large-scale, real-time streams containing structured and unstructured data.
Four short links: 29 December 2015

Four short links: 29 December 2015

Security Talks, Multi-Truth Discovery, Math Books, and Geek Cultures

  1. 2015 CCC Videos — collected talks from the 32nd Chaos Computer Congress conference.
  2. An Integrated Bayesian Approach for Effective Multi-Truth Discovery (PDF) — Integrating data from multiple sources has been increasingly becoming commonplace in both Web and the emerging Internet of Things (IoT) applications to support collective intelligence and collaborative decision-making. Unfortunately, it is not unusual that the information about a single item comes from different sources, which might be noisy, out-of-date, or even erroneous. It is therefore of paramount importance to resolve such conflicts among the data and to find out which piece of information is more reliable.
  3. Direct Links to Free Springer Books — Springer released a lot of math books.
  4. A Psychological Exploration of Engagement in Geek CultureSeven studies (N = 2354) develop the Geek Culture Engagement Scale (GCES) to quantify geek engagement and assess its relationships to theoretically relevant personality and individual differences variables. These studies present evidence that individuals may engage in geek culture in order to maintain narcissistic self-views (the great fantasy migration hypothesis), to fulfill belongingness needs (the belongingness hypothesis), and to satisfy needs for creative expression (the need for engagement hypothesis). Geek engagement is found to be associated with elevated grandiose narcissism, extraversion, openness to experience, depression, and subjective well-being across multiple samples.
Four short links: 23 December 2015

Four short links: 23 December 2015

Software Leaders, Hadoop Ecosystem, GPS Spoofing, and Explaining Models

  1. Things Software Leaders Should Know (Ben Gracewood) — If you have people things and tech things on your to-do list, put the people things first on the list.
  2. The Hadoop Ecosystem — table of the different projects across the Hadoop ecosystem.
  3. Narcos GPS-Spoofing Border Drones — not only are the border drones expensive and ineffective, now they’re being tricked. Basic trade-off: more reliability or longer flight times?
  4. A Model Explanation System (PDF) — you can explain any machine-learned decision, though not necessarily the way the model came to the decision. Confused? This summary might help. Explainability is not a property of the model.
Four short links: 22 December 2015

Four short links: 22 December 2015

Machine Poetry, Robo Script Kiddies, Big Data of Love, and Virtual Currency and the Nation State

  1. How Machines Write PoetryHarmon would love to have writers or other experts judge FIGURE8’s work, too. Her online subjects tended to rate the similes better if they were obvious. “The snow continued like a heavy rain” got high scores, for example, even though Harmon thought this was quite a bad effort on FIGURE8’s part. She preferred “the snow falls like a dead cat,” which got only middling ratings from humans. “They might have been cat lovers,” she says. FIGURE8 (PDF) system generates figurative language.
  2. The Decisions the Pentagon Wants to Leave to Robots“You cannot have a human operator operating at human speed fighting back at determined cyber tech,” Work said. “You are going to need have a learning machine that does that.” I for one welcome our new robot script kiddie overlords.
  3. Love in the Age of Big DataOver decades, John has observed more than 3,000 couples longitudinally, discovering patterns of argument and subtle behaviors that can predict whether a couple would be happily partnered years later or unhappy or divorced. Turns out, “don’t be a jerk” is good advice for marriages, too. (via Cory Doctorow)
  4. National Security Implications of Virtual Currency (PDF) — Rand research report examining the potential for non-state actor deployment.