a/b testing Archives - O'Reilly Radar

Four short links: 10 March 2016

Cognitivist and Behaviourist AI, Math and Social Computing, A/B Testing Stats, and Rat Cyborgs are Smarter

by Nat Torkington | @gnat | +Nat Torkington | March 10, 2016

Crossword-Solving Neural Networks — Hill describes recent progress in learning-based AI systems in terms of behaviourism and cognitivism: two movements in psychology that effect how one views learning and education. Behaviourism, as the name implies, looks at behaviour without looking at what the brain and neurons are doing, while cognitivism looks at the mental processes that underlie behaviour. Deep learning systems like the one built by Hill and his colleagues reflect a cognitivist approach, but for a system to have something approaching human intelligence, it would have to have a little of both. “Our system can’t go too far beyond the dictionary data on which it was trained, but the ways in which it can are interesting, and make it a surprisingly robust question and answer system – and quite good at solving crossword puzzles,” said Hill. While it was not built with the purpose of solving crossword puzzles, the researchers found that it actually performed better than commercially-available products that are specifically engineered for the task.
Mathematical Foundations for Social Computing (PDF) — collection of pointers to existing research in social computing and some open challenges for work to be done. Consider situations where a highly structured decision must be made. Some examples are making budgets, assigning water resources, and setting tax rates. […] One promising candidate is “Knapsack Voting.” […] This captures most budgeting processes — the set of chosen budget items must fit under a spending limit, while maximizing societal value. Goel et al. prove that asking users to compare projects in terms of “value for money” or asking them to choose an entire budget results in provably better properties than using the more traditional approaches of approval or rank-choice voting.
Power, Minimal Detectable Effect, and Bucket Size Estimation in A/B Tests (Twitter) — This post describes how Twitter’s A/B testing framework, DDG, addresses one of the most common questions we hear from experimenters, product managers, and engineers: how many users do we need to sample in order to run an informative experiment?
Intelligence-Augmented Rat Cyborgs in Maze Solving (PLoS) — We compare the performance of maze solving by computer, by individual rats, and by computer-aided rats (i.e. rat cyborgs). They were asked to find their way from a constant entrance to a constant exit in 14 diverse mazes. Performance of maze solving was measured by steps, coverage rates, and time spent. The experimental results with six rats and their intelligence-augmented rat cyborgs show that rat cyborgs have the best performance in escaping from mazes. These results provide a proof-of-principle demonstration for cyborg intelligence. In addition, our novel cyborg intelligent system (rat cyborg) has great potential in various applications, such as search and rescue in complex terrains.

Validating data models with Kafka-based pipelines

A case for back-end A/B testing.

by Gwen Shapira | @gwenshap | +Gwen Shapira | May 28, 2015

Start the O’Reilly “Introduction to Apache Kafka” training video for free. In this video, Gwen Shapira shows developers and administrators how to integrate Kafka into a data processing pipeline.

A/B testing is a popular method of using business intelligence data to assess possible changes to websites. In the past, when a business wanted to update its website in an attempt to drive more sales, decisions on the specific changes to make were driven by guesses; intuition; focus groups; and ultimately, which executive yelled louder. These days, the data-driven solution is to set up multiple copies of the website, direct users randomly to the different variations and measure which design improves sales the most. There are a lot of details to get right, but this is the gist of things.

When it comes to back-end systems, however, we are still living in the stone age. Suppose your business grew significantly and you notice that your existing MySQL database is becoming less responsive as the load increases. Suppose you consider moving to a NoSQL system, you need to decide which NoSQL solution to pick — there are a lot of options: Cassandra, MongoDB, Couchbase, or even Hadoop. There are also many possible data models: normalized, wide tables, narrow tables, nested data structures, etc.

A/B testing multiple data stores and data models in parallel

It is surprising how often a company will pick a solution based on intuition or even which architect yelled louder. Rather than making a decision based on facts and numbers regarding capacity, scale, throughput, and data-processing patterns, the back-end architecture decisions are made with fuzzy reasoning. In that scenario, what usually happens is that a data store and a data model are somehow chosen, and the entire development team will dive into a six-month project to move their entire back-end system to the new thing. This project will inevitably take 12 months, and about 9 months in, everyone will suspect that this was a bad idea, but it’s way too late to do anything about it. Read more…

Four short links: 14 April 2015

Technical Debt, A/A Testing, NSA's Latest, and John von Neumann

by Nat Torkington | @gnat | +Nat Torkington | April 14, 2015

Pycon 2015: Technical Debt, The Monster in Your Closet (YouTube) — excellent talk from PyCon. See also slides.
A/A Testing — In an A/A test, you run a test using the exact same options for both “variants” in your test. That’s right, there’s no difference between “A” and “B” in an A/A test. It sounds stupid, until you see the “results.” (via Nelson Minar)
NSA Declares War on General-Purpose Computing (BoingBoing) — NSA director Michael S Rogers says his agency wants “front doors” to all cryptography used in the USA, so that no one can have secrets it can’t spy on — but what he really means is that he wants to be in charge of which software can run on any general purpose computer.
John von Neumann Documentary (YouTube) — 1966 documentary from the American Mathematical Association on the father of digital computing, who also is hailed as the father of game theory and much much more. (via Paul Walker)

Four short links: 4 August 2014

Web Spreadsheet, Correlated Novelty, A/B Ethics, and Replicated Data Structures

by Nat Torkington | @gnat | +Nat Torkington | August 4, 2014

EtherCalc — open source web-based spreadsheet.
Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya’s urn, predicts statistical laws for the rate at which novelties happen (Heaps’ law) and for the probability distribution on the space explored (Zipf’s law), as well as signatures of the process by which one novelty sets the stage for another. (via Steven Strogatz)
On The Media Interview with OKCupid CEO — relevant to the debate on ethics of A/B tests. I preferred this to Tim Carmody’s rant.
CRDTs as Alternative to APIs — when using CRDTs to tie your system together, you don’t need to resort to using impoverished representations that simply never come anywhere near the representational power of the data structures you use in your programs at runtime. See also this paper on Convergent and Commutative Replicated Data Types.

"a/b testing" entries