Nat Torkington

Nat has chaired the O'Reilly Open Source Convention and other O'Reilly conferences for over a decade. He ran the first web server in New Zealand, co-wrote the best-selling Perl Cookbook, and was one of the founding Radar bloggers. He lives in New Zealand and consults in the Asia-Pacific region.

Four short links: 17 April 2015

Four short links: 17 April 2015

Distributed SQLite, Communicating Scientists, Learning from Failure, and Cat Convergence

  1. Replicating SQLite using Raft Consensus — clever, he used a consensus algorithm to build a distributed (replicated) SQLite.
  2. When Open Access is the Norm, How do Scientists Communicate? (PLOS) — From interviews I’ve conducted with researchers and software developers who are modeling aspects of modern online collaboration, I’ve highlighted the most useful and reproducible practices. (via Jon Udell)
  3. Meet DJ Patil“It was this kind of moment when you realize: ‘Oh, my gosh, I am that stupid,’” he said.
  4. Interview with Bruce Sterling on the Convergence of Humans and MachinesIf you are a human being, and you are doing computation, you are trying to multiply 17 times five in your head. It feels like thinking. Machines can multiply, too. They must be thinking. They can do math and you can do math. But the math you are doing is not really what cognition is about. Cognition is about stuff like seeing, maneuvering, having wants, desires. Your cat has cognition. Cats cannot multiply 17 times five. They have got their own umwelt (environment). But they are mammalian, you are a mammalian. They are actually a class that includes you. You are much more like your house cat than you are ever going to be like Siri. You and Siri converging, you and your house cat can converge a lot more easily. You can take the imaginary technologies that many post-human enthusiasts have talked about, and you could afflict all of them on a cat. Every one of them would work on a cat. The cat is an ideal laboratory animal for all these transitions and convergences that we want to make for human beings. (via Vaughan Bell)
Comment
Four short links: 16 April 2015

Four short links: 16 April 2015

Relationships and Inference, Mother of All Demos, Kafka at Scale, and Real World Hardware

  1. DeepDiveDeepDive is targeted to help users extract relations between entities from data and make inferences about facts involving the entities. DeepDive can process structured, unstructured, clean, or noisy data and outputs the results into a database.
  2. From the Vault: Watching (and re-watching) “The Mother of All Demos”“I wish there was more about the social vision for computing—I worked with him for a long time, and Doug was always thinking ‘how can we collectively collaborate,’ like a sort of rock band.”
  3. Running Kafka at Scale (LinkedIn Engineering) — This tiered infrastructure solves many problems, but it greatly complicates monitoring Kafka and assuring its health. While a single Kafka cluster, when running normally, will not lose messages, the introduction of additional tiers, along with additional components such as mirror makers, creates myriad points of failure where messages can disappear. In addition to monitoring the Kafka clusters and their health, we needed to create a means to assure that all messages produced are present in each of the tiers, and make it to the critical consumers of that data.
  4. 3D Printing Titanium, and the Bin of Broken Dreams — you will learn HUGE amounts on the challenges of real-world manufacturing by reading this.
Comment
Four short links: 15 April 2015

Four short links: 15 April 2015

Facebook as Biometrics, Time Series Sequences, Programming Languages, and Oceanic Robots

  1. Facebook Biometrics Cache (Business Insider) — Facebook has been accused of violating the privacy of its users by collecting their facial data, according to a class-action lawsuit filed last week. This data-collection program led to its well-known automatic face-tagging service. But it also helped Facebook create “the largest privately held stash of biometric face-recognition data in the world,” the Courthouse News Service reports.
  2. The Clustering of Time Series Sequences is Meaningless (PDF) — Clustering of time series subsequences is meaningless. More concretely, clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any data set, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. From 2003, warning against sliding window techniques.
  3. Toolkits for the Mind (MIT TR) — Programming–language designer Guido van Rossum, who spent seven years at Google and now works at Dropbox, says that once a software company gets to be a certain size, the only way to stave off chaos is to use a language that requires more from the programmer up front. “It feels like it’s slowing you down because you have to say everything three times,” van Rossum says. Amen!
  4. Robots Roam Earth’s Imperiled Oceans (Wired) — It’s six feet long and shaped like an airliner, with two wings and a tail fin, and bears the message, “OCEANOGRAPHIC INSTRUMENT PLEASE DO NOT DISTURB.” All caps considered, though, it’s a more innocuous epigram than the one on a drone I saw back at the dock: “Not a weapon — Science Instrument.”
Comment
Four short links: 14 April 2015

Four short links: 14 April 2015

Technical Debt, A/A Testing, NSA's Latest, and John von Neumann

  1. Pycon 2015: Technical Debt, The Monster in Your Closet (YouTube) — excellent talk from PyCon. See also slides.
  2. A/A TestingIn an A/A test, you run a test using the exact same options for both “variants” in your test. That’s right, there’s no difference between “A” and “B” in an A/A test. It sounds stupid, until you see the “results.” (via Nelson Minar)
  3. NSA Declares War on General-Purpose Computing (BoingBoing) — NSA director Michael S Rogers says his agency wants “front doors” to all cryptography used in the USA, so that no one can have secrets it can’t spy on — but what he really means is that he wants to be in charge of which software can run on any general purpose computer.
  4. John von Neumann Documentary (YouTube) — 1966 documentary from the American Mathematical Association on the father of digital computing, who also is hailed as the father of game theory and much much more. (via Paul Walker)
Comment
Four short links: 13 April 2015

Four short links: 13 April 2015

Occupation Changes, Country Data, Cultural Analytics, and Dysfunctional Software Engineering Organisations

  1. The Great Reversal in the Demand for Skill and Cognitive Tasks (PDF) — The only difference with more conventional models of skill-biased technological change is our modelling of the fruits of cognitive employment as creating a stock instead of a pure flow. This slight change causes technological change to generate a boom and bust cycle, as is common in most investment models. We also incorporated into this model a standard selection process whereby individuals sort into occupations based on their comparative advantage. The selection process is the key mechanism that explains why a reduction in the demand for cognitive tasks, which are predominantly filled by higher educated workers, can result in a loss of employment concentrated among lower educated workers. While we do not claim that our model is the only structure that can explain the observations we present, we believe it gives a very simple and intuitive explanation to the changes pre- and post-2000.
  2. provinces — state and province lists for (some) countries.
  3. Cultural Analyticsthe use of computational and visualization methods for the analysis of massive cultural data sets and flows. Interesting visualisations as well as automated understandings.
  4. The Code is Just the SymptomThe engineering culture was a three-layer cake of dysfunction, where everyone down the chain had to execute what they knew to be an impossible task, at impossible speeds, perfectly. It was like the games of Simon Says and Telephone combined to bad effect. Most engineers will have flashbacks at these descriptions. Trigger warning: candid descriptions of real immature software organisations.
Comment
Four short links: 10 April 2015

Four short links: 10 April 2015

Graph Algorithm, Touchy Robots, Python Bolt-Ons, and Building Data Products

  1. Exact Maximum Clique for Large or Massive Real Graphs — explanation of how BBMCSP works.
  2. Giving Robots and Prostheses the Human Touchthe team, led by mechanical engineer Veronica J. Santos, is constructing a language of touch that both a computer and a human can understand. The researchers are quantifying this with mechanical touch sensors that interact with objects of various shapes, sizes, and textures. Using an array of instrumentation, Santos’ team is able to translate that interaction into data a computer can understand. The data is used to create a formula or algorithm that gives the computer the ability to identify patterns among the items it has in its library of experiences and something it has never felt before. This research will help the team develop artificial haptic intelligence, which is, essentially, giving robots, as well as prostheses, the “human touch.”
  3. boltons — things in Python that should have been builtins.
  4. Everything We Wish We’d Known About Building Data Products (DJ Patil and RusJan Belkin) — Data is super messy, and data cleanup will always be literally 80% of the work. In other words, data is the problem. […] “If you’re not thinking about how to keep your data clean from the very beginning, you’re fucked. I guarantee it.” […] “Every single company I’ve worked at and talked to has the same problem without a single exception so far — poor data quality, especially tracking data,” he says.“Either there’s incomplete data, missing tracking data, duplicative tracking data.” To solve this problem, you must invest a ton of time and energy monitoring data quality. You need to monitor and alert as carefully as you monitor site SLAs. You need to treat data quality bugs as more than a first priority. Don’t be afraid to fail a deploy if you detect data quality issues.
Comments: 2
Four short links: 9 April 2015

Four short links: 9 April 2015

Robot Personalities, Programmer Competency, Docker Dependencies, and Large Files in Git

  1. Google’s Patent on Virtual People Personalities — via IEEE Spectrum, who are not bullish, a method for downloadable personalities. Prior art? Don’t talk to me about prior art. The only thing more depressing than this patent is the tech commentary that fails to cite Hitchhiker’s Guide to the Galaxy.
  2. Programmer Competency Matrix — a rubric for developer development.
  3. Aviator — Clever’s open source service dependency management tool, described here.
  4. Announcing Git’s Large File Storagean improved way to integrate large binary files such as audio samples, data sets, graphics, and videos into your Git workflow..
Comment
Four short links: 8 April 2015

Four short links: 8 April 2015

Learning Poses, Kafkaesque Things, Hiring Research, and Robotic Movement

  1. Apple Patent on Learning-based Estimation of Hand and Finger Pose — machine learning to identify gestures (hand poses) that works even when partially occluded. See writeup in Apple Insider.
  2. The Internet of Kafkaesque Things (ACLU) — As computers are deployed in more regulatory roles, and therefore make more judgments about us, we may be afflicted with many more of the rigid, unjust rulings for which bureaucracies are so notorious.
  3. Schmidt and Hunter (1998): Validity and Utility of Selection Methods in Personnel (PDF) — On the basis of meta-analytic findings, this article examines and summarizes what 85 years of research in personnel psychology has revealed about the validity of measures of 19 different selection methods that can be used in making decisions about hiring, training, and developmental assignments. (via Wired)
  4. Complete Force Control in Constrained Under-actuated Mechanical Systems (Robohub) — Nori focuses on finding ways to advance the dynamic system of a robot – the forces that interact and make the system move. Key to developing dynamic movements in a robot is control, accompanied by the way the robot interacts with the environment. Nori talks us through the latest developments, designs, and formulas for floating-base/constrained mechanical systems, whole-body motion control of humanoid systems, whole-body dynamics computation on the iCub humanoid, and finishes with a video on recent implementations of whole-body motion control on the iCub. Video and download of presentation.
Comment
Four short links: 7 April 2015

Four short links: 7 April 2015

JavaScript Numeric Methods, Misunderstood Statistics, Web Speed, and Sentiment Analysis

  1. NumericJS — numerical methods in JavaScript.
  2. P Values are not Error Probabilities (PDF) — In particular, we illustrate how this mixing of statistical testing methodologies has resulted in widespread confusion over the interpretation of p values (evidential measures) and α levels (measures of error). We demonstrate that this confusion was a problem between the Fisherian and Neyman–Pearson camps, is not uncommon among statisticians, is prevalent in statistics textbooks, and is well nigh universal in the pages of leading (marketing) journals. This mass confusion, in turn, has rendered applications of classical statistical testing all but meaningless among applied researchers.
  3. Breaking the 1000ms Time to Glass Mobile Barrier (YouTube) —
    See also slides. Stay under 250 ms to feel “fast.” Stay under 1000 ms to keep users’ attention.
  4. Modern Methods for Sentiment AnalysisRecently, Google developed a method called Word2Vec that captures the context of words, while at the same time reducing the size of the data. Gentle introduction, with code.
Comment: 1
Four short links: 6 April 2015

Four short links: 6 April 2015

Disruption, Copyright Investment, Max Headroom, and Right to Tinker

  1. The Difference Between Direct Competition and DisruptionAs the ships grow, their engines have become vastly more efficient and sophisticated, the fuel mix has changed, and complex IT infrastructure has been put in place to coordinate the movement of the containers and ships. But fundamentally, the underlying cost structure of the business has not changed from 1950, when the first container ships carried a mere 500 to 800 containers across the world. (via Salim Virani)
  2. The Impact of Copyright Policy Changes on Venture Capital Investment in Cloud Computing Companies (PDF) — Our findings suggest that decisions around the scope of copyrights can have significant impacts on investment and innovation. We find that VC investment in cloud computing firms increased significantly in the U.S. relative to the EU after the Cablevision decision. Our results suggest that the Cablevision decision led to additional incremental investment in U.S. cloud computing firms that ranged from $728 million to approximately $1.3 billion over the two-and-a-half years after the decision. When paired with the findings of the enhanced effects of VC investment relative to corporate investment, this may be the equivalent of $2 to $5 billion in traditional R&D investment.
  3. Max Headroom Oral History“Anybody under the age of 25 just loved it. And anybody above that age was just completely confused.”
  4. Auto Makers Say You Don’t Own Your Car (EFF) — Most of the automakers operating in the U.S. filed opposition comments through trade associations, along with a couple of other vehicle manufacturers. They warn that owners with the freedom to inspect and modify code will be capable of violating a wide range of laws and harming themselves and others. They say you shouldn’t be allowed to repair your own car because you might not do it right. They say you shouldn’t be allowed to modify the code in your car because you might defraud a used car purchaser by changing the mileage. They say no one should be allowed to even look at the code without the manufacturer’s permission because letting the public learn how cars work could help malicious hackers, “third-party software developers” (the horror!), and competitors.
Comment