"analytics" entries

Four short links: 16 March 2016

Analytic Monitoring, Commenter Demographics, Math and Empathy, and How We Read

by Nat Torkington | @gnat | +Nat Torkington | March 16, 2016

MacroBase — Analytic monitoring for the Internet of Things. The code behind a research paper, written up in the morning paper where Adrian Colyer says, there is another story that also unfolds in the paper – one of careful system design based on analysis of properties of the problem space, of thinking deeply and taking the time to understand the prior art (aka “the literature”), and then building on those discoveries to advance and adapt them to the new situation. “That’s what research is all about!” you may say, but it’s also what we’d (I’d?) love to see more of in practitioner settings, too. The result of all this hard work is a system that comprises just 7,000 lines of code, and I’m sure, many, many hours of thinking!
Survey of Commenters and Comment Readers — Americans who leave news comments, who read news comments, and who do neither are demographically distinct. News commenters are more male, have lower levels of education, and have lower incomes compared to those who read news comments. (via Marginal Revolution)
The Empathizing-Systemizing Theory, Social Abilities, and Mathematical Achievement in Children (Nature) — systematic thinking doesn’t predict math ability in children, but being empathetic predicts being worse at math. The effect is stronger with girls. The authors propose the mechanism is that empathetic children pick up a teacher’s own dislike of math, and any teacher biases like “girls aren’t good at math.”
Moneyball for Book Publishers: A Detailed Look at How We Read (NYT) — On average, fewer than half of the books tested were finished by a majority of readers. Most readers typically give up on a book in the early chapters. Women tend to quit after 50 to 100 pages, men after 30 to 50. Only 5% of the books Jellybooks tested were completed by more than 75% of readers. Sixty percent of books fell into a range where 25% to 50% of test readers finished them. Business books have surprisingly low completion rates. Not surprisingly low to anyone who has ever read a business book. They’re always a 20-page idea stretched to 150 pages because that’s how wide a book’s spine has to be to visible on the airport bookshelf. Fat paper stock and 14-point text with wide margins and 1.5 line spacing help, too. Don’t forget to leave pages after each chapter for the reader’s notes. And summary checklists. And … sorry, I need to take a moment.

Four short links: 27 January 2016

Generative Text, Open Source Agriculture, Becoming Better, and GA Slackbot

by Nat Torkington | @gnat | +Nat Torkington | January 27, 2016

Improv — a javascript library for generative text.
The Food Computer (MIT) — open source controlled-environment agriculture technology platform that uses robotic systems to control and monitor climate, energy, and plant growth inside of a specialized growing chamber. Climate variables such as carbon dioxide, air temperature, humidity, dissolved oxygen, potential hydrogen, electrical conductivity, and root-zone temperature are among the many conditions that can be controlled and monitored within the growing chamber. Operational energy, water, and mineral consumption are monitored (and adjusted) through electrical meters, flow sensors, and controllable chemical dosers throughout the growth period. (via IEEE Spectrum)
10 Golden Rules for Becoming a Better Programmer — what are your 10 rules for being better in your field? If you haven’t built a list, then you aren’t thinking hard enough about what you do.
Statsbot — Google Analytics bot for Slack from NewRelic.

Four short links: 1 October 2015

Robot is Meaningless, Building Analytics, Real World Challenges, and Reclaiming Conversation

by Nat Torkington | @gnat | +Nat Torkington | October 1, 2015

The Word Robot is Meaningless and We Need to Stop Saying It — As more and more household tasks become automated, the number of robots in our lives is growing rapidly. And the rise of connected devices raises a thorny semantic question: namely, where does “automated process” stop and “robot” begin? Why is a factory machine that moves car parts considered a “robot,” but a Volkswagen with a much more sophisticated code base is just a Jetta?
Building Analytics at 500px — An ETL script that has to turn messy production data into clean data warehouse data will naturally be extremely messy. Use a framework like Luigi or a tool like Informatica. These have well-known coding styles and constructs, and are also widely used. It will still be messy. But it will be comparable to known ways of doing ETL.
Systems Computing Challenges in the Internet of Things — I love that while there are countless old institutions engaging consultants and writing strategies about “digital,” now there’s a white paper from a computer group fretting about the problems that the Real World will cause them. Maybe they should just choose the newspaper solution: the real world is just a fad, don’t worry, it’ll pass soon.
Reclaiming Conversation (Review) (NY Times) — review of Sherry Turkle’s new book. When we replace human caregivers with robots, or talking with texting, we begin by arguing that the replacements are “better than nothing” but end up considering them “better than anything” — cleaner, less risky, less demanding.

Four short links: 25 September 2015

Predicting Policing, Assaulting Advertising, Compliance Ratings, and $9 Computer

by Nat Torkington | @gnat | +Nat Torkington | September 25, 2015

Police Program Aims to Pinpoint Those Most Likely to Commit Crimes (NYT) — John S. Hollywood, a senior operations researcher at the RAND Corporation, said that in the limited number of studies undertaken to measure the efficacy of predictive policing, the improvement in forecasting crimes had been only 5% or 10% better than regular policing methods.
Apple’s Assault on Advertising and Google (Calacanis) — Google wants to be proud of their legacy, and tricking people into clicking ads and selling our profiles to advertisers is an awesome business – but a horrible legacy for Larry and Sergey. Read beside the Bloomberg piece on click fraud and the future isn’t too rosy for advertising. If the ad bubble bursts, how much of the Web will it take with it?
China Is Building The Mother Of All Reputation Systems To Monitor Citizen Behavior — The document talks about the “construction of credibility” — the ability to give and take away credits — across more than 30 areas of life, from energy saving to advertising.
$9 Computer Hardware (Makezine) — open hardware project, with open source software. The board’s spec is a 1GHz R8 ARM processor with 512MB of RAM, 4GB of NAND storage, and Wi-Fi and Bluetooth built in.

Four short links: 23 September, 2015

Sentence Generator, Deep Neural Networks Explainer, Sports Analytics, and System Hell

by Nat Torkington | @gnat | +Nat Torkington | September 23, 2015

Skip Thought Vectors — research (with code) that produces surrounding sentences, given a sentence.
A Beginner’s Guide to Deep Neural Networks (Google) — Googlers’ 20% project to explain things to people tackles machine learning.
Data Analytics in Sports — O’Reilly research report (free). When it comes to processing stats, competing companies Opta and ProZone use a combination of recording technology and human analysts who tag “events” within the game (much like Vantage Sports). Opta calculates that it tags between 1,600 and 2,000 events per football game — all delivered live.
On Go, Portability, and System Interfaces — No point mentioning Perl’s Configure.sh, I thought. The poor bastard will invent it soon enough.

Four short links: 11 August 2015

Real-time Sports Analytics, UI Regression Testing, AI vs. Charity, and Google's Data Pipeline Model

by Nat Torkington | @gnat | +Nat Torkington | August 11, 2015

Denver Broncos Testing In-Game Analytics — their newly hired director of analytics working with the coach. With Tanney nearby, Kubiak can receive a quick report on the statistical probabilities of almost any situation. Say that you have fourth-and-3 from the opponent’s 45-yard-line with four minutes to go. Do the large-sample-size percentages make the risk-reward ratio acceptable enough to go for it? Tanney’s analytics can provide insight to aid Kubiak’s decision-making. (via Flowing Data)
Visual Review (GitHub) — Apache-licensed productive and human-friendly workflow for testing and reviewing your Web application’s layout for any regressions.
Effective Altruism / Global AI (Vox) — fear of AI-run-amok (“existential risks”) contaminating a charity movement.
The Dataflow Model (PDF) — Google Research paper presenting a model aimed at ease of use in building practical, massive-scale data processing pipelines.

Four short links: 12 June 2015

OLAP Datastores, Timely Dataflow, Paul Ford is God, and Static Analysis

by Nat Torkington | @gnat | +Nat Torkington | June 12, 2015

pinot — a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally.
Naiad: A Timely Dataflow System — in Timely Dataflow, the first two features are needed to execute iterative and incremental computations with low latency. The third feature makes it possible to produce consistent results, at both outputs and intermediate stages of computations, in the presence of streaming or iteration.
What is Code (Paul Ford) — What the coders aren’t seeing, you have come to believe, is that the staid enterprise world that they fear isn’t the consequence of dead-eyed apathy but rather détente. Words and feels.
Facebook Infer Opensourced — the static analyzer I linked to yesterday, released as open source today.

Four short links: 3 June 2015

Filter Design, Real-Time Analytics, Neural Turing Machines, and Evaluating Subjective Opinions

by Nat Torkington | @gnat | +Nat Torkington | June 3, 2015

How to Design Applied Filters — The most frequently observed issue during usability testing were filtering values changing placement when the user applied them – either to another position in the list of filtering values (typically the top) or to an “Applied filters” summary overview. During testing, the subjects were often confounded as they noticed that the filtering value they just clicked was suddenly “no longer there.”
Twitter Heron — a real-time analytics platform that is fully API-compatible with Storm […] At Twitter, Heron is used as our primary streaming system, running hundreds of development and production topologies. Since Heron is efficient in terms of resource usage, after migrating all Twitter’s topologies to it we’ve seen an overall 3x reduction in hardware, causing a significant improvement in our infrastructure efficiency.
ntm — an implementation of neural Turing machines. (via @fastml_extra)
Bayesian Truth Serum — a scoring system for eliciting and evaluating subjective opinions from a group of respondents, in situations where the user of the method has no independent means of evaluating respondents’ honesty or their ability. It leverages respondents’ predictions about how other respondents will answer the same questions. Through these predictions, respondents reveal their meta-knowledge, which is knowledge of what other people know.

The two-sided coin of Web performance

Hacking performance across your organization.

by Alois Reitbauer | @AloisReitbauer | May 5, 2015

I’ve given Web performance talks where I get to show one of my favorite slides with the impact of third-party dependencies on load time. It’s the perfect use case for “those marketing people,” who overload pages with the tracking pixels and tags that make page load time go south. This, of course, would fuel the late-night pub discussrion with fellow engineers about how much faster the Web would be if those marketing people would attend a basic Web performance 101 course.

I’ve also found myself discussing exactly this topic in a meeting. This time, however, I was the guy arguing to keep the tracking code, although I was well aware of the performance impact. So what happened?
Read more…

Four short links: 13 April 2015

Occupation Changes, Country Data, Cultural Analytics, and Dysfunctional Software Engineering Organisations

by Nat Torkington | @gnat | +Nat Torkington | April 13, 2015

The Great Reversal in the Demand for Skill and Cognitive Tasks (PDF) — The only difference with more conventional models of skill-biased technological change is our modelling of the fruits of cognitive employment as creating a stock instead of a pure flow. This slight change causes technological change to generate a boom and bust cycle, as is common in most investment models. We also incorporated into this model a standard selection process whereby individuals sort into occupations based on their comparative advantage. The selection process is the key mechanism that explains why a reduction in the demand for cognitive tasks, which are predominantly filled by higher educated workers, can result in a loss of employment concentrated among lower educated workers. While we do not claim that our model is the only structure that can explain the observations we present, we believe it gives a very simple and intuitive explanation to the changes pre- and post-2000.
provinces — state and province lists for (some) countries.
Cultural Analytics — the use of computational and visualization methods for the analysis of massive cultural data sets and flows. Interesting visualisations as well as automated understandings.
The Code is Just the Symptom — The engineering culture was a three-layer cake of dysfunction, where everyone down the chain had to execute what they knew to be an impossible task, at impossible speeds, perfectly. It was like the games of Simon Says and Telephone combined to bad effect. Most engineers will have flashbacks at these descriptions. Trigger warning: candid descriptions of real immature software organisations.