"stats" entries

Four short links: 10 March 2016

Four short links: 10 March 2016

Cognitivist and Behaviourist AI, Math and Social Computing, A/B Testing Stats, and Rat Cyborgs are Smarter

  1. Crossword-Solving Neural NetworksHill describes recent progress in learning-based AI systems in terms of behaviourism and cognitivism: two movements in psychology that effect how one views learning and education. Behaviourism, as the name implies, looks at behaviour without looking at what the brain and neurons are doing, while cognitivism looks at the mental processes that underlie behaviour. Deep learning systems like the one built by Hill and his colleagues reflect a cognitivist approach, but for a system to have something approaching human intelligence, it would have to have a little of both. “Our system can’t go too far beyond the dictionary data on which it was trained, but the ways in which it can are interesting, and make it a surprisingly robust question and answer system – and quite good at solving crossword puzzles,” said Hill. While it was not built with the purpose of solving crossword puzzles, the researchers found that it actually performed better than commercially-available products that are specifically engineered for the task.
  2. Mathematical Foundations for Social Computing (PDF) — collection of pointers to existing research in social computing and some open challenges for work to be done. Consider situations where a highly structured decision must be made. Some examples are making budgets, assigning water resources, and setting tax rates. […] One promising candidate is “Knapsack Voting.” […] This captures most budgeting processes — the set of chosen budget items must fit under a spending limit, while maximizing societal value. Goel et al. prove that asking users to compare projects in terms of “value for money” or asking them to choose an entire budget results in provably better properties than using the more traditional approaches of approval or rank-choice voting.
  3. Power, Minimal Detectable Effect, and Bucket Size Estimation in A/B Tests (Twitter) — This post describes how Twitter’s A/B testing framework, DDG, addresses one of the most common questions we hear from experimenters, product managers, and engineers: how many users do we need to sample in order to run an informative experiment?
  4. Intelligence-Augmented Rat Cyborgs in Maze Solving (PLoS) — We compare the performance of maze solving by computer, by individual rats, and by computer-aided rats (i.e. rat cyborgs). They were asked to find their way from a constant entrance to a constant exit in 14 diverse mazes. Performance of maze solving was measured by steps, coverage rates, and time spent. The experimental results with six rats and their intelligence-augmented rat cyborgs show that rat cyborgs have the best performance in escaping from mazes. These results provide a proof-of-principle demonstration for cyborg intelligence. In addition, our novel cyborg intelligent system (rat cyborg) has great potential in various applications, such as search and rescue in complex terrains.
Four short links: 30 November 2011

Four short links: 30 November 2011

Crypography Illustrated, Hollywood Futures, Machine Learning Mastery, and Analytics Assumptions

  1. An Illustrated Guide to Crypographic Hashes — exactly what it says: learn how hashing works and how you’d use it for passwords, digital signatures, etc.
  2. The Age of FanfictionWe live in a time where copyright means very little to younger people, and it’s not just because they want free movies or free music. More than that, they want to be able to play with the amazing toys that they’ve been given by filmmakers and comic book writers and TV creators, and they want to do so without the constraints that copyright creates. Eloquent and thoughtful piece on what this means for Hollywood and how “the Age of Fanfiction is reflected in what Hollywood’s making. (via Sacha Judd)
  3. How Khan Academy is Using Machine Learning to Assess Student Mastery — it is bloody hard to know when a student has mastered a subject, both for real live teachers and for roboteachers like Khan Academy. This is a detailed discussion of a change in assessment within Khan Academy. if we define proficiency as your chance of getting the next problem correct being above a certain threshold, then the streak becomes a poor binary classifier. Experiments conducted on our data showed a significant difference between students who take, say, 30 problems to get a streak vs. 10 problems right off the bat — the former group was much more likely to miss the next problem after a break than the latter.
  4. In Which I Declare Four Things My Probability Class is Not About — a reminder of the assumptions we make when we use numerical analysis to understand a problem.
Four short links: 28 November 2011

Four short links: 28 November 2011

Ubicomp Project, Data Volumes, Yahoo! Cocktails, and Fighting Cybercrime

  1. Twine (Kickstarter) — modular sensors with connectivity, programmable in If This Then That style. (via TechCrunch)
  2. Small Sample Sizes Lead to High Margins of Error — a reminder that all the stats in the world won’t help you when you don’t have enough data to meaningfully analyse.
  3. Yahoo! Cocktails — somehow I missed this announcement of a Javascript front-and-back-end dev environment from Yahoo!, which they say will be open sourced 1Q2012. Until then it’s PRware, but I like that people are continuing to find new ways to improve the experience of building web applications. A Jobsian sense of elegance, ease, and perfection does not underly the current web development experience.
  4. UK Govt To Help Businesses Fight Cybercrime (Guardian) — I view this as a good thing, even though the conspiracy nut in me says that it’s a step along the path that ends with the spy agency committing cybercrime to assist businesses.
Four short links: 26 October 2011

Four short links: 26 October 2011

CPAN's Sweet 0x10, Social Reading, Questioning Polls, and 3D Manufacturing

  1. CPAN Turns 0x10 — sixteenth anniversary of the creation of the Comprehensive Perl Archive Network. Now holds 480k objects.
  2. Subtext — social bookreading by adding chat, links, etc. to a book. I haven’t tried the implementation yet but I’ve wanted this for years. (Just haven’t wanted to jump into the cesspool of rights negotiations enough to actually build it :-) (via David Eagleman)
  3. Questions to Ask about Election Polls — information to help you critically consume data analysis. (via Rachel Cunliffe)
  4. Technologies, Potential, and Implications of Additive Manufacturing (PDF) — AM is a group of emerging technologies that create objects from the bottom-up by adding material one cross-sectional layer at a time. […] Ultimately, AM has the potential to be as disruptive as the personal computer and the internet. The digitization of physical artifacts allows for global sharing and distribution of designed solutions. It enables crowd-sourced design (and individual fabrication) of physical hardware. It lowers the barriers to manufacturing, and allows everyone to become an entrepreneur. (via Bruce Sterling)
Four short links: 7 September 2011

Four short links: 7 September 2011

Waning Interest, Infrastructure Changes, eBook Stats, and Retro Chic Peripherals

  1. Comparing Link Attention (Bitly) — Twitter, Facebook, and direct (email/IM/etc) have remarkably similar patterns of decay of interest. (via Hilary Mason)
  2. Three Ages of Google — from batch, to scaling through datacenters, and finally now to techniques for real-time scaling. Of interest to everyone interested in low-latency high-throughput transactions. Datacenters have the diameter of a microsecond, yet we are still using entire stacks designed for WANs. Real-time requires low and bounded latencies and our stacks can’t provide low latency at scale. We need to fix this problem and towards this end Luiz sets out a research agenda, targeting problems that need to be solved. (via Tim O’Reilly)
  3. eReaders and eBooks (Luke Wroblewski) — many eye-opening facts. In 2010 Amazon sold 115 Kindle books for every 100 paperback books. 65% of eReader owners use them in bed, in fact 37% of device usage is in bed.
  4. VT220 on a Mac — dead sexy look. Impressive how many adapters you need to be able to hook a dingy old serial cable up to your shiny new computer.
Four short links: 11 August 2011

Four short links: 11 August 2011

Bad Web Sites, Gold Farming for Evil, Sensing Bicyclists, and Javascript Statistics

  1. Why Restaurant Web Sites Are So BadThe rest of the Web long ago did away with auto-playing music, Flash buttons and menus, and elaborate intro pages, but restaurant sites seem stuck in 1999.
  2. North Korean Government Partly Funded by Gold Farming (Gamasutra) — alleges a special group of hackers built automation software for MMOs and sent part of their profits back home.
  3. Pleasanton Protects Bicyclists with Microwave (Mercury News) — no, not by pre-emptive cooking. The device monitors the intersection and can differentiate between vehicles and bicyclists crossing the road and either extends or triggers the light if a cyclist is detected.
  4. jStat — a Javascript statistical library.