"statistics" entries

Four short links: 6 April 2016

Four short links: 6 April 2016

Hi-Techtiles, Recreating 3D, Mobile Deep Learning, and Correlation Games

  1. U.S. Textile Industry Turns to Tech as Gateway to RevivalWarwick Mills is joining the Defense Department, universities including the Massachusetts Institute of Technology, and nearly 50 other companies in an ambitious $320 million project to push the American textile industry into the digital age. Key to the plan is a technical ingredient: embedding a variety of tiny semiconductors and sensors into fabrics that can see, hear, communicate, store energy, warm or cool a person, or monitor the wearer’s health.
  2. 2D to 3D With Deep CNNs (PDF) — source code on github.
  3. Squeezing AI into Mobile Systems (IEEE Spectrum) — Sze, working with Joel Emer, also an MIT computer science professor and senior distinguished research scientist at Nvidia, developed Eyeriss­, the first custom chip designed to run a state-of-the-art convolutional neural network. They showed they could run AlexNet, a particularly demanding algorithm, using less than one-tenth the energy of a typical mobile GPU: instead of consuming 5 to 10 watts, Eyeriss used 0.3 W.
  4. The 8-Bit Game That Makes Statistics Addictive (The Atlantic) — that game is Guess The Correlation. “As a researcher, you read papers and a lot of the time, you eyeball the figures without even reading the text,” he says. “You see a plot—it could even be your own plot—and make a judgment based on it. Contrary to what people believe, they’re not very good at this. And I have the data to prove that.”
Four short links: 28 March 2016

Four short links: 28 March 2016

Holoportation, Filter Your Bot, Curriculum for the Future, and Randomized Control Trials for Policy

  1. Holoportation (YouTube) — video of teleconferencing with the Hololens. I hope my avatar wears more pants than I do.
  2. Wordfilter — package to filter out slurs and the kinds of things you don’t want your bot saying on Twitter. (via How Not to Make a Racist Bot)
  3. Curriculum For the Future (iTunes) — in game form, you get to figure out how to sell your preferred curriculum (“maker!”) to the parents and politicians who care about different things. Similar game mechanic to Win the White House from Sandra Day O’Connor’s iCivics.
  4. Test, Learn, Adapt: Developing Public Policy with Randomized Controlled Trials (PDF) — 2012 paper from the UK Cabinet Office talking about running real randomized control trials of policy. (I’d like to be part of one that looks at better health care!)
Four short links: 25 March 2016

Four short links: 25 March 2016

Intro to Statistics, Automatic Lip Reading, Outdoor Range Finding for $10, and Wrongful Takedowns

  1. Intro Statistics with Randomization and Simulation — free PDF download as well as book for purchase. (via Flowing Data)
  2. Automated Lip Reading Invented — press release, but interesting topic. The research will be presented at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in Shanghai.
  3. A Smartphone-based Laser Distance Sensor for Outdoor Environments (PDF) — We present a low-cost, smartphone-based planar laser distance sensor design for outdoor use with 6 cm accuracy at 5 meters, 30 Hz scan rate, and 0.1 degree resolution over the field of view. The cost of the hardware additions to the off-the-shelf smartphone used in our prototype is under $50.
  4. Internet Archive Seeks to Defend Against Wrongful TakedownsIn its submission, the Archive goes to some lengths to highlight differences between those engaging in commercial piracy and those who seek to preserve and share cultural heritage. As a result, the context in which a user posts content online should be considered before attempting to determine whether an infringement has taken place. This, the organization says, poses problems for the “staydown” demands gaining momentum with copyright holders.
Four short links: 4 March 2016

Four short links: 4 March 2016

Snapchat's Business, Tracking Voters, Testing for Discriminatory Associations, and Assessing Impact

  1. How Snapchat Built a Business by Confusing Olds (Bloomberg) — Advertisers don’t have a lot of good options to reach under-30s. The audiences of CBS, NBC, and ABC are, on average, in their 50s. Cable networks such as CNN and Fox News have it worse, with median viewerships near or past Social Security age. MTV’s median viewers are in their early 20s, but ratings have dropped in recent years. Marketers are understandably anxious, and Spiegel and his deputies have capitalized on those anxieties brilliantly by charging hundreds of thousands of dollars when Snapchat introduces an ad product.
  2. Tracking VotersOn the night of the Iowa caucus, Dstillery flagged all the [ad network-mediated ad] auctions that took place on phones in latitudes and longitudes near caucus locations. It wound up spotting 16,000 devices on caucus night, as those people had granted location privileges to the apps or devices that served them ads. It captured those mobile ID’s and then looked up the characteristics associated with those IDs in order to make observations about the kind of people that went to Republican caucus locations (young parents) versus Democrat caucus locations. It drilled down further (e.g., ‘people who like NASCAR voted for Trump and Clinton’) by looking at which candidate won at a particular caucus location.
  3. Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit (arXiv) — We describe FairTest, a testing toolkit that detects unwarranted associations between an algorithm’s outputs (e.g., prices or labels) and user subpopulations, including sensitive groups (e.g., defined by race or gender). FairTest reports statistically significant associations to programmers as association bugs, ranked by their strength and likelihood of being unintentional, rather than necessary effects. See also slides from PrivacyCon. Source code not yet released.
  4. Inferring Causal Impact Using Bayesian Structural Time-Series Models (Adrian Colyer) — understanding the impact of an intervention by building a predictive model of what would have happened without the intervention, then diffing reality to that model.
Four short links: 9 February 2016

Four short links: 9 February 2016

Collaborative Mario Agents, ElasticSearch at Scale, Anomaly Detection, Robotics Experiment

  1. Social Intelligence in Mario Bros (YouTube) — collaborative agents built by cognitive AI researchers … they have drives, communicate, learn from each other, and solve problems. Oh, and the agents are Mario, Luigi, Yoshi, and Toad within a Super Mario Brothers clone. No code or papers about it on the research group’s website yet, just a YouTube video and a press release on the university’s website, so appropriately adjust your priors for imminent world destruction at the hands of a rampaging super-AI. (via gizmag)
  2. How we Monitor and Run ElasticSearch at Scale (SignalFx) — sweet detail on metrics, dashboards, and alerting.
  3. Simple Anomaly Detection for Weekly PatternsRule-based heuristics do not scale and do not adapt easily, especially if we have thousands of alarms to set up. Some statistical approach is needed that is generic enough to handle many different metric behaviours.
  4. How to Design a Robotics Experiment (Robohub) — although there are many good experimental scientists in the robotic community, there has not been uniformly good experimental work and reporting within the community as a whole. This has advice such as “the five components of a well-designed experiment.”
Four short links: 1 September 2015

Four short links: 1 September 2015

People Detection, Ratings Patterns, Inspection Bias, and Cloud Filesystem

  1. End-to-End People Detection in Crowded Scenes — research paper and code. When parsing the title, bind “end-to-end” to “scenes” not “people”.
  2. Statistical Patterns in Movie Ratings (PLOSone) — We find that the distribution of votes presents scale-free behavior over several orders of magnitude, with an exponent very close to 3/2, with exponential cutoff. It is remarkable that this pattern emerges independently of movie attributes such as average rating, age and genre, with the exception of a few genres and of high-budget films.
  3. The Inspection Bias is EverywhereIn 1991, Scott Feld presented the “friendship paradox”: the observation that most people have fewer friends than their friends have. He studied real-life friends, but the same effect appears in online networks: if you choose a random Facebook user, and then choose one of their friends at random, the chance is about 80% that the friend has more friends. The friendship paradox is a form of the inspection paradox. When you choose a random user, every user is equally likely. But when you choose one of their friends, you are more likely to choose someone with a lot of friends. Specifically, someone with x friends is overrepresented by a factor of x.
  4. s3qla file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X. (GPLv3)
Four short links: 15 July 2015

Four short links: 15 July 2015

OpeNSAurce, Multimaterial Printing, Functional Javascript, and Outlier Detection

  1. System Integrity Management Platform (Github) — NSA releases security compliance tool for government departments.
  2. 3D-Printed Explosive Jumping Robot Combines Firm and Squishy Parts (IEEE Spectrum) — Different parts of the robot grade over three orders of magnitude from stiff like plastic to squishy like rubber, through the use of nine different layers of 3D printed materials.
  3. Professor Frisby’s Mostly Adequate Guide to Functional Programming — a book on functional programming, using Javascript as the programming language.
  4. Tracking Down Villains — the software and algorithms that Netflix uses to detect outliers in their infrastructure monitoring.
Four short links: 23 June 2015

Four short links: 23 June 2015

Irregular Periodicity, Facebook Beacons, Industry 4.0, and Universal Container

  1. Fast Lomb-Scargle Periodograms in Pythona classic method for finding periodicity in irregularly-sampled data.
  2. Facebook Bluetooth Beacons — free for you to use and help people see more information about your business whenever they use Facebook during their visit.
  3. Industry 4.0 — stop gagging at the term. Interesting examples of connectivity and data improving manufacturing. Human-machine interfaces: Logistics company Knapp AG developed a picking technology using augmented reality. Pickers wear a headset that presents vital information on a see-through display, helping them locate items more quickly and precisely. And with both hands free, they can build stronger and more efficient pallets, with fragile items safeguarded. An integrated camera captures serial and lot ID numbers for real-time stock tracking. Error rates are down by 40%, among many other benefits. Digital-to-physical transfer: Local Motors builds cars almost entirely through 3-D printing, with a design crowdsourced from an online community. It can build a new model from scratch in a year, far less than the industry average of six. Vauxhall and GM, among others, still bend a lot of metal, but also use 3-D printing and rapid prototyping to minimize their time to market. (via Quartz)
  4. runCa lightweight universal runtime container, by the Open Container Project. (OCP = multi-vendor initiative in hands of Linux Foundation)
Four short links: 22 June 2015

Four short links: 22 June 2015

Power Analysis, Data at Scale, Open Source Fail, and Closing the Virtuous Loop

  1. Power Analysis of a Typical Psychology Experiment (Tom Stafford) — What this means is that if you don’t have a large effect, studies with between groups analysis and an n of less than 60 aren’t worth running. Even if you are studying a real phenomenon you aren’t using a statistical lens with enough sensitivity to be able to tell. You’ll get to the end and won’t know if the phenomenon you are looking for isn’t real or if you just got unlucky with who you tested.
  2. The Future of Data at ScaleData curation, on the other hand, is “the 800-pound gorilla in the corner,” says Stonebraker. “You can solve your volume problem with money. You can solve your velocity problem with money. Curation is just plain hard.” The traditional solution of extract, transform, and load (ETL) works for 10, 20, or 30 data sources, he says, but it doesn’t work for 500. To curate data at scale, you need automation and a human domain expert.
  3. Why Are We Still Explaining? (Stephen Walli) — Within 24 hours we received our first righteous patch. A simple 15-line change that provided a 10% boost in Just-in-Time compiler performance. And we politely thanked the contributor and explained we weren’t accepting changes yet. Another 24 hours and we received the first solid bug fix. It was golden. It included additional tests for the test suite to prove it was fixed. And we politely thanked the contributor and explained we weren’t accepting changes yet. And that was the last thing that was ever contributed.
  4. Blood Donors in Sweden Get a Text Message When Their Blood Helps Someone (Independent) — great idea to close the feedback loop. If you want to get more virtuous behaviour, make it a relationship and not a transaction. And if a warm feeling is all you have to offer in return, then offer it!
Four short links: 3 June 2015

Four short links: 3 June 2015

Filter Design, Real-Time Analytics, Neural Turing Machines, and Evaluating Subjective Opinions

  1. How to Design Applied FiltersThe most frequently observed issue during usability testing were filtering values changing placement when the user applied them – either to another position in the list of filtering values (typically the top) or to an “Applied filters” summary overview. During testing, the subjects were often confounded as they noticed that the filtering value they just clicked was suddenly “no longer there.”
  2. Twitter Herona real-time analytics platform that is fully API-compatible with Storm […] At Twitter, Heron is used as our primary streaming system, running hundreds of development and production topologies. Since Heron is efficient in terms of resource usage, after migrating all Twitter’s topologies to it we’ve seen an overall 3x reduction in hardware, causing a significant improvement in our infrastructure efficiency.
  3. ntman implementation of neural Turing machines. (via @fastml_extra)
  4. Bayesian Truth Seruma scoring system for eliciting and evaluating subjective opinions from a group of respondents, in situations where the user of the method has no independent means of evaluating respondents’ honesty or their ability. It leverages respondents’ predictions about how other respondents will answer the same questions. Through these predictions, respondents reveal their meta-knowledge, which is knowledge of what other people know.