Shmoocon 2016 Videos (Internet Archive) — videos of the talks from an astonishingly good security conference.
TipTalk — Samsung watchstrap that is the smart device … put your finger in your ear to hear the call. You had me at put my finger in my ear. (via WaPo)
Ecorithms — Leslie Valiant at Harvard broadened the concept of an algorithm into an “ecorithm,” which is a learning algorithm that “runs” on any system capable of interacting with its physical environment. Algorithms apply to computational systems, but ecorithms can apply to biological organisms or entire species. The concept draws a computational equivalence between the way that individuals learn and the way that entire ecosystems evolve. In both cases, ecorithms describe adaptive behavior in a mechanistic way.
Dataflow/Beam vs Spark (Google Cloud) — To highlight the distinguishing features of the Dataflow model, we’ll be comparing code side-by-side with Spark code snippets. Spark has had a huge and positive impact on the industry thanks to doing a number of things much better than other systems had done before. But Dataflow holds distinct advantages in programming model flexibility, power, and expressiveness, particularly in the out-of-order processing and real-time session management arenas.
Old-School PC Fonts — definitive collection of ripped-from-the-BIOS fonts from the various types of PCs. Your eyes will ache with nostalgia. (Or, if you’re a young gun, wondering how anybody wrote code with fonts like that) (my terminal font is VT220 because it makes me happy and productive)
Cognitive Load: Brain Gems — We distill the latest behavioural economics & consumer psychology research down into helpful little brain gems.
Huginn — MIT-licensed system for building agents that perform automated tasks for you online. They can read the Web, watch for events, and take actions on your behalf. Huginn’s Agents create and consume events, propagating them along a directed graph. Think of it as a hackable Yahoo! Pipes plus IFTTT on your own server.
Evidence-Oriented Programming — design programming language syntax and features based on what research shows works. They tested Perl and Java, found apparently not detectably easier to use for novices than a language that my student at the time, Susanna Kiwala (formerly Siebert), created by essentially rolling dice and picking (ridiculous) symbols at random.
Chimera (Paper a Day) — the authors summarise six main lessons learned while building Chimera: (1) Things break down at large scale; (2) Both learning and hand-crafted rules are critical; (3) Crowdsourcing is critical, but must be closely monitored; (4) Crowdsourcing must be coupled with in-house analysts and developers; (5) Outsourcing does not work at a very large scale; (6) Hybrid human-machine systems are here to stay.
Microsoft Embedding Research — To break down the walls between its research group and the rest of the company, Microsoft reassigned about half of its more than 1,000 research staff in September 2014 to a new group called MSR NExT. Its focus is on projects with greater impact to the company rather than pure research. Meanwhile, the other half of Microsoft Research is getting pushed to find more significant ways it can contribute to the company’s products. The challenge is how to avoid short-term thinking from your research team. For instance, Facebook assigns some staff to focus on long-term research, and Google’s DeepMind group in London conducts pure AI research without immediate commercial considerations.
Google’s Go-Playing AI — The key to AlphaGo is reducing the enormous search space to something more manageable. To do this, it combines a state-of-the-art tree search with two deep neural networks, each of which contains many layers with millions of neuron-like connections. One neural network, the “policy network,” predicts the next move, and is used to narrow the search to consider only the moves most likely to lead to a win. The other neural network, the “value network,” is then used to reduce the depth of the search tree — estimating the winner in each position in place of searching all the way to the end of the game.
Experience with Rules-Based Programming for Distributed Concurrent Fault-Tolerant Code (A Paper a Day) — To demonstrate applicability outside of the RAMCloud system, the team also re-wrote the Hadoop Map-Reduce job scheduler (which uses a traditional event-based state machine approach) using rules. The original code has three state machines containing 34 states with 163 different transitions, about 2,250 lines of code in total. The rules-based re-implementation required 19 rules in 3 tasks with a total of 117 lines of code and comments. Rules-based systems are powerful and underused.
OpenFace — open source face recognition software using deep neural networks.
Berkeley’s Intro-to-AI Materials — We designed these projects with three goals in mind. The projects allow students to visualize the results of the techniques they implement. They also contain code examples and clear directions, but do not force students to wade through undue amounts of scaffolding. Finally, Pac-Man provides a challenging problem environment that demands creative solutions; real-world AI problems are challenging, and Pac-Man is, too.
Hidden Technical Debt in Machine Learning Systems (PDF) — We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.
Critical Social Research on Self-Tracking — I am currently working on an article that is a comprehensive review of both literatures, in the attempt to outline what each can contribute to understanding self-tracking as an ethos and a practice, and its wider sociocultural implications. Here is a reading list of the work from critical social researchers that I am aware of. Trigger warning: phrases like “The discursive construction of student subjectivities.”
Warp-CTC — Baidu’s open source deep learning code. Connectionist Temporal Classification is a loss function useful for performing supervised learning on sequence data, without needing an alignment between input data and labels.
2015 CCC Videos — collected talks from the 32nd Chaos Computer Congress conference.
An Integrated Bayesian Approach for Effective Multi-Truth Discovery (PDF) — Integrating data from multiple sources has been increasingly becoming commonplace in both Web and the emerging Internet of Things (IoT) applications to support collective intelligence and collaborative decision-making. Unfortunately, it is not unusual that the information about a single item comes from different sources, which might be noisy, out-of-date, or even erroneous. It is therefore of paramount importance to resolve such conflicts among the data and to find out which piece of information is more reliable.
A Psychological Exploration of Engagement in Geek Culture — Seven studies (N = 2354) develop the Geek Culture Engagement Scale (GCES) to quantify geek engagement and assess its relationships to theoretically relevant personality and individual differences variables. These studies present evidence that individuals may engage in geek culture in order to maintain narcissistic self-views (the great fantasy migration hypothesis), to fulfill belongingness needs (the belongingness hypothesis), and to satisfy needs for creative expression (the need for engagement hypothesis). Geek engagement is found to be associated with elevated grandiose narcissism, extraversion, openness to experience, depression, and subjective well-being across multiple samples.
Narcos GPS-Spoofing Border Drones — not only are the border drones expensive and ineffective, now they’re being tricked. Basic trade-off: more reliability or longer flight times?
A Model Explanation System (PDF) — you can explain any machine-learned decision, though not necessarily the way the model came to the decision. Confused? This summary might help. Explainability is not a property of the model.