"statistics" entries

Four short links: 4 January 2013

Four short links: 4 January 2013

SSH/L Multiplexer, GitHub Bots, Test Your Assumptions, and Tech Trends

  1. sslh — ssh/ssl multiplexer.
  2. Github Says No to Bots (Wired) — what’s interesting is that bots augmenting photos is awesome in Flickr: take a photo of the sky and you’ll find your photo annotated with stars and whatnot. What can GitHub learn from Flickr?
  3. Four Assumptions of Multiple Regression That Researchers Should Always Test — “but I found the answer I wanted! What do you mean, it might be wrong?!”
  4. Tenth Grade Tech Trends (Medium) — if you want to know what will have mass success, talk to early adopters in the mass market. We alpha geeks aren’t that any more.
Comment: 1
Four short link: 27 November 2012

Four short link: 27 November 2012

Faking with Stats, Praising Coworkers, Medium Explained, and SIGGraph Trailer

  1. Statistical Misdirection Master Class — examples from Fox News. The further through the list you go, the more horrifying^Wedifying they are. Some are clearly classics from the literature, but some are (as far as I can tell) newly developed graphical “persuasion” techniques.
  2. Wall of Awesome — give your coworkers some love.
  3. Dave Winer on Medium — Dave hits some interesting points: Users can create new buckets or collections and call them anything they want. A bucket is analogous to a blog post. Then other people can post to it. That’s like a comment. But it doesn’t look like a comment. It’s got a place for a big image at the top. It looks much prettier than a comment, and much bigger. Looks are important here.
  4. SIGGraph Asia Trailer (YouTube) — resuiting Sims and rotating city blocks, at the end, were my favourite. (via Andy Baio)
Four short links: 10 October 2012

Four short links: 10 October 2012

Intuitive Linear Algebra, Bayes Intro, State of Javascript, and Web App Builders

  1. An Intuitive Guide to Linear AlgebraHere’s the linear algebra introduction I wish I had. I wish I’d had it, too. (via Hacker News)
  2. Think Bayesan introduction to Bayesian statistics using computational methods.
  3. The State of Javascript 2012 (Brendan Eich) — Javascript continues its march up and down the stack, simultaneously becoming an application language while becoming the bytecode for the world.
  4. Divshot — a startup turning mockups into web apps, built on top of the Bootstrap front-end framework. I feel momentum and a tipping point approaching, where building things on the web is about to get easier again (the way it did with Ruby on Rails). cf Jetstrap.
Comments: 3
Four short links: 9 October 2012

Four short links: 9 October 2012

ID-based Democracy, Web Documentation, American Telco Gouging, and Stats Cookbook

  1. Finland Crowdsourcing New Laws (GigaOm) — online referenda. The Finnish government enabled something called a “citizens’ initiative”, through which registered voters can come up with new laws – if they can get 50,000 of their fellow citizens to back them up within six months, then the Eduskunta (the Finnish parliament) is forced to vote on the proposal. Now this crowdsourced law-making system is about to go online through a platform called the Open Ministry. Petitions and online voting are notoriously prone to fraud, so it will be interesting to see how well the online identity system behind this holds up.
  2. WebPlatform — wiki of information about developing for the open web. Joint production of many of the $BIGCOs of the web and the W3C, so will be interesting to see, as it develops, whether it has the best aspects of each or the worst.
  3. Why Your Phone, Cable, Internet Bills Cost So Much (Yahoo) — “The companies essentially have a business model that is antithetical to economic growth,” he says. “Profits go up if they can provide slow Internet at super high prices.” Excellent piece!
  4. Probability and Statistics Cookbook (Matthias Vallentin) — The cookbook contains a succinct representation of various topics in probability theory and statistics. It provides a comprehensive reference reduced to the mathematical essence, rather than aiming for elaborate explanations. CC-BY-NC-SA licensed, LaTeX source on github.
Comment: 1

Statwing simplifies data analysis

Quickly perform and interpret the results of routine Small Data analysis

With so much focus on Big Data, the needs of many analysts who work with Small Data tend to get ignored. The default tool for many of these users remains spreadsheets1 and/or statistical packages which come with a lot of features and options. However many analysts need a very small subset of what these tools have to offer.

Enter Statwing, a software-as-a-service provider for routine statistical analysis. While the tool is still in the early stages, it can already do many basic “data analysis” tasks.

Consider the following example of a pivot table constructed in Excel: this required 8 mouse-clicks, if you do everything perfectly, and about 5 decisions (what variables to include, what metric to use, …)

The same task in Statwing required 4 mouse-clicks and 0 decisions! Plus it comes with visuals:

The lack of clutter and the addition of a simple “headline” (“Female tends to have much higher values for satisfaction than Male“), makes the result much easier to interpret. The advanced tab contains detailed statistical analysis (in this case the p-value, counts, values). Many users get confused by the output/results produced by traditional statistical software. Let’s face it, many analysts have had little training in statistics. I welcome a tool that produces readily interpretable results.

The company hopes to replicate the above example across a wide variety of routine data analysis tasks. Their initial focus is on tools for (consumer) survey analysis, a potentially huge market given that online companies have made surveys so much easier to conduct. Users of Statwing pay a small monthly subscription, making it cheaper than most2 statistical packages. For a small monthly fee, their intuitive UI lets analysts get their tasks done quickly. More importantly Statwing may nurture aspiring data scientists in your organization.

(1) As this recent Strata presentation points out: Spreadsheets are the glue that keeps many organizations together.

(2) Open source tools like OpenOffice, R and Octave are free. So is the use of Google spreadsheets.

Comment: 1

Digging into the UDID data

The UDID story has conflicting theories, so the only real thing we have to work with is the data.

Over the weekend the hacker group Antisec released one million UDID records that they claim to have obtained from an FBI laptop using a Java vulnerability. In reply the FBI stated:

The FBI is aware of published reports alleging that an FBI laptop was compromised and private data regarding Apple UDIDs was exposed. At this time there is no evidence indicating that an FBI laptop was compromised or that the FBI either sought or obtained this data.

Of course that statement leaves a lot of leeway. It could be the agent’s personal laptop, and the data may well have been “property” of an another agency. The wording doesn’t even explicitly rule out the possibility that this was an agency laptop, they just say that right now they don’t have any evidence to suggest that it was.

This limited data release doesn’t have much impact, but the possible release of the full dataset, which is claimed to include names, addresses, phone numbers and other identifying information, is far more worrying.

While there are some almost dismissing the issue out of hand, the real issues here are: Where did the data originate? Which devices did it come from and what kind of users does this data represent? Is this data from a cross-section of the population, or a specifically targeted demographic? Does it originate within the law enforcement community, or from an external developer? What was the purpose of the data, and why was it collected?

With conflicting stories from all sides, the only thing we can believe is the data itself. The 40-character strings in the release at least look like UDID numbers, and anecdotally at least we have a third-party confirmation that this really is valid UDID data. We therefore have to proceed at this point as if this is real data. While there is a possibility that some, most, or all of the data is falsified, that’s looking unlikely from where we’re standing standing at the moment.

Read more…

Comments: 10
Four short links: 8 August 2012

Four short links: 8 August 2012

Reading Minds, Satellites in the Cloud, Units for Risk, and Valuing Autism

  1. Reconstructing Visual Experiences (PDF) — early visual areas represent the information in movies. To demonstrate the power of our approach, we also constructed a Bayesian decoder by combining estimated encoding models with a sampled natural movie prior. The decoder provides remarkable reconstructions of the viewed movies. These results demonstrate that dynamic brain activity measured under naturalistic conditions can be decoded using current fMRI technology.
  2. Earth Engine — satellite imagery and API for coding against it, to do things like detecting deforestation, classifying land cover, estimating forest biomass and carbon, and mapping the world’s roadless areas.
  3. Microlives — 30m of your life expectancy. Here are some things that would, on average, cost a 30-year-old man 1 microlife: Smoking 2 cigarettes; Drinking 7 units of alcohol (eg 2 pints of strong beer); Each day of being 5 Kg overweight. A chest X-ray will set a middle-aged person back around 2 microlives, while a whole body CT-scan would weigh in at around 180 microlives.
  4. Autistics Need Opportunities More Than Treatment — Laurent gave a powerful talk at Sci Foo: if the autistic brain is better at pattern matching, find jobs where that’s useful. Like, say, science. The autistic woman who was delivering mail became a research assistant in his lab, now has papers galore to her name for original research.
Four short links: 2 August 2012

Four short links: 2 August 2012

Creative Business, News Design, Google Earth Glitches, and Data Distortion

  1. Patton Oswalt’s Letters to Both SidesYou guys need to stop thinking like gatekeepers. You need to do it for the sake of your own survival. Because all of us comedians after watching Louis CK revolutionize sitcoms and comedy recordings and live tours. And listening to “WTF With Marc Maron” and “Comedy Bang! Bang!” and watching the growth of the UCB Theatre on two coasts and seeing careers being made on Twitter and Youtube. Our careers don’t hinge on somebody in a plush office deciding to aim a little luck in our direction. (via Jim Stogdill)
  2. Headliner — interesting Guardian experiment with headlines and presentation. As always, reading the BERG designers’ notes are just as interesting as the product itself. E.g., how they used computer vision to find faces and zoom in on them to make articles more attractive to browsing readers.
  3. Google Earth Glitches — where 3d maps and aerial imagery don’t match up. (via Beta Knowledge)
  4. Campbell’s LawThe more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. (via New York Times)
Four short links: 16 July 2012

Four short links: 16 July 2012

Open Access, Emergency Social Media, A/B Testing Traps, and Post-Moore Sequencing Costs

  1. Britain To Provide Free Access to Scientific Publications (Guardian) — the Finch report is being implemented! British universities now pay around £200m a year in subscription fees to journal publishers, but under the new scheme, authors will pay “article processing charges” (APCs) to have their papers peer reviewed, edited and made freely available online. The typical APC is around £2,000 per article.
  2. Social Media in an Emergency: A Best Practice Guide — from the Wellington City Council in New Zealand, who have been learning from Christchurch earthquakes and Tauranga’s oil spill.
  3. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained (PDF) — Microsoft Research dug into A/B tests done on Bing and reveal some subtle truths. The statistical theory of controlled experiments is well understood, but the devil is in the details and the difference between theory and practice is greater in practice than in theory […] Generating numbers is easy; generating numbers you should trust is hard! (via Greg Linden)
  4. Data Sequencing Costs (National Human Genome Research Institute) — Cost-per-megabase and cost-per-genome are dropping faster than Moore’s Law now they’ve introduced “second generation techniques” for sequencing, aka “high-throughput sequencing” or a parallelization of the process. (via JP Rangaswami)
Comment: 1
Four short links: 11 May 2012

Four short links: 11 May 2012

Flipping the Medical Classroom, Inclusion Haters, Information Leveling, and Ars Longa Vita Brevis

  1. Stanford Med School Contemplates Flipped Classroom — the real challenge isn’t sending kids home with videos to watch, it’s using tools like OceanBrowser to keep on top of what they’re doing. Few profs at universities have cared whether students learned or not.
  2. Inclusive Tech Companies Win The Talent War (Gina Trapani) — she speaks the truth, and gently. The original CNN story flushed out an incredible number of vitriolic commenters apparently lacking the gene for irony.
  3. Buyers and Sellers Guide to Web Design and Development Firms (Lance Wiggs) — great idea, particularly “how to be a good client”. There are plenty of dodgy web shops, but more projects fail because of the clients than many would like to admit.
  4. What Does It Mean to Say That Something Causes 16% of Cancers? (Discover Magazine) — hey, all you infographic jockeys with your aspirations to add Data Scientist to your business card: read this and realize how hard it is to make sense of a lot of numbers and then communicate that sense. Data Science isn’t about Hadoop any more than Accounting is about columns. Both try to tell a story (the original meaning of your company’s “accounts”) and what counts is the informed, disciplined, honest effort of knowing that your story is honest.