ENTRIES TAGGED "statistics"

Four short links: 9 October 2012

Four short links: 9 October 2012

ID-based Democracy, Web Documentation, American Telco Gouging, and Stats Cookbook

  1. Finland Crowdsourcing New Laws (GigaOm) — online referenda. The Finnish government enabled something called a “citizens’ initiative”, through which registered voters can come up with new laws – if they can get 50,000 of their fellow citizens to back them up within six months, then the Eduskunta (the Finnish parliament) is forced to vote on the proposal. Now this crowdsourced law-making system is about to go online through a platform called the Open Ministry. Petitions and online voting are notoriously prone to fraud, so it will be interesting to see how well the online identity system behind this holds up.
  2. WebPlatform — wiki of information about developing for the open web. Joint production of many of the $BIGCOs of the web and the W3C, so will be interesting to see, as it develops, whether it has the best aspects of each or the worst.
  3. Why Your Phone, Cable, Internet Bills Cost So Much (Yahoo) — “The companies essentially have a business model that is antithetical to economic growth,” he says. “Profits go up if they can provide slow Internet at super high prices.” Excellent piece!
  4. Probability and Statistics Cookbook (Matthias Vallentin) — The cookbook contains a succinct representation of various topics in probability theory and statistics. It provides a comprehensive reference reduced to the mathematical essence, rather than aiming for elaborate explanations. CC-BY-NC-SA licensed, LaTeX source on github.
Comment: 1

Statwing simplifies data analysis

Quickly perform and interpret the results of routine Small Data analysis

With so much focus on Big Data, the needs of many analysts who work with Small Data tend to get ignored. The default tool for many of these users remains spreadsheets1 and/or statistical packages which come with a lot of features and options. However many analysts need a very small subset of what these tools have to offer.

Enter Statwing, a software-as-a-service provider for routine statistical analysis. While the tool is still in the early stages, it can already do many basic “data analysis” tasks.

Consider the following example of a pivot table constructed in Excel: this required 8 mouse-clicks, if you do everything perfectly, and about 5 decisions (what variables to include, what metric to use, …)

The same task in Statwing required 4 mouse-clicks and 0 decisions! Plus it comes with visuals:

The lack of clutter and the addition of a simple “headline” (“Female tends to have much higher values for satisfaction than Male“), makes the result much easier to interpret. The advanced tab contains detailed statistical analysis (in this case the p-value, counts, values). Many users get confused by the output/results produced by traditional statistical software. Let’s face it, many analysts have had little training in statistics. I welcome a tool that produces readily interpretable results.

The company hopes to replicate the above example across a wide variety of routine data analysis tasks. Their initial focus is on tools for (consumer) survey analysis, a potentially huge market given that online companies have made surveys so much easier to conduct. Users of Statwing pay a small monthly subscription, making it cheaper than most2 statistical packages. For a small monthly fee, their intuitive UI lets analysts get their tasks done quickly. More importantly Statwing may nurture aspiring data scientists in your organization.


(1) As this recent Strata presentation points out: Spreadsheets are the glue that keeps many organizations together.

(2) Open source tools like OpenOffice, R and Octave are free. So is the use of Google spreadsheets.

Comment: 1

Digging into the UDID data

The UDID story has conflicting theories, so the only real thing we have to work with is the data.

Over the weekend the hacker group Antisec released one million UDID records that they claim to have obtained from an FBI laptop using a Java vulnerability. In reply the FBI stated:

The FBI is aware of published reports alleging that an FBI laptop was compromised and private data regarding Apple UDIDs was exposed. At this time there is no evidence indicating that an FBI laptop was compromised or that the FBI either sought or obtained this data.

Of course that statement leaves a lot of leeway. It could be the agent’s personal laptop, and the data may well have been “property” of an another agency. The wording doesn’t even explicitly rule out the possibility that this was an agency laptop, they just say that right now they don’t have any evidence to suggest that it was.

This limited data release doesn’t have much impact, but the possible release of the full dataset, which is claimed to include names, addresses, phone numbers and other identifying information, is far more worrying.

While there are some almost dismissing the issue out of hand, the real issues here are: Where did the data originate? Which devices did it come from and what kind of users does this data represent? Is this data from a cross-section of the population, or a specifically targeted demographic? Does it originate within the law enforcement community, or from an external developer? What was the purpose of the data, and why was it collected?

With conflicting stories from all sides, the only thing we can believe is the data itself. The 40-character strings in the release at least look like UDID numbers, and anecdotally at least we have a third-party confirmation that this really is valid UDID data. We therefore have to proceed at this point as if this is real data. While there is a possibility that some, most, or all of the data is falsified, that’s looking unlikely from where we’re standing standing at the moment.

Read more…

Comments: 10
Four short links: 8 August 2012

Four short links: 8 August 2012

Reading Minds, Satellites in the Cloud, Units for Risk, and Valuing Autism

  1. Reconstructing Visual Experiences (PDF) — early visual areas represent the information in movies. To demonstrate the power of our approach, we also constructed a Bayesian decoder by combining estimated encoding models with a sampled natural movie prior. The decoder provides remarkable reconstructions of the viewed movies. These results demonstrate that dynamic brain activity measured under naturalistic conditions can be decoded using current fMRI technology.
  2. Earth Engine — satellite imagery and API for coding against it, to do things like detecting deforestation, classifying land cover, estimating forest biomass and carbon, and mapping the world’s roadless areas.
  3. Microlives — 30m of your life expectancy. Here are some things that would, on average, cost a 30-year-old man 1 microlife: Smoking 2 cigarettes; Drinking 7 units of alcohol (eg 2 pints of strong beer); Each day of being 5 Kg overweight. A chest X-ray will set a middle-aged person back around 2 microlives, while a whole body CT-scan would weigh in at around 180 microlives.
  4. Autistics Need Opportunities More Than Treatment — Laurent gave a powerful talk at Sci Foo: if the autistic brain is better at pattern matching, find jobs where that’s useful. Like, say, science. The autistic woman who was delivering mail became a research assistant in his lab, now has papers galore to her name for original research.
Comment
Four short links: 2 August 2012

Four short links: 2 August 2012

Creative Business, News Design, Google Earth Glitches, and Data Distortion

  1. Patton Oswalt’s Letters to Both SidesYou guys need to stop thinking like gatekeepers. You need to do it for the sake of your own survival. Because all of us comedians after watching Louis CK revolutionize sitcoms and comedy recordings and live tours. And listening to “WTF With Marc Maron” and “Comedy Bang! Bang!” and watching the growth of the UCB Theatre on two coasts and seeing careers being made on Twitter and Youtube. Our careers don’t hinge on somebody in a plush office deciding to aim a little luck in our direction. (via Jim Stogdill)
  2. Headliner — interesting Guardian experiment with headlines and presentation. As always, reading the BERG designers’ notes are just as interesting as the product itself. E.g., how they used computer vision to find faces and zoom in on them to make articles more attractive to browsing readers.
  3. Google Earth Glitches — where 3d maps and aerial imagery don’t match up. (via Beta Knowledge)
  4. Campbell’s LawThe more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. (via New York Times)
Comment
Four short links: 16 July 2012

Four short links: 16 July 2012

Open Access, Emergency Social Media, A/B Testing Traps, and Post-Moore Sequencing Costs

  1. Britain To Provide Free Access to Scientific Publications (Guardian) — the Finch report is being implemented! British universities now pay around £200m a year in subscription fees to journal publishers, but under the new scheme, authors will pay “article processing charges” (APCs) to have their papers peer reviewed, edited and made freely available online. The typical APC is around £2,000 per article.
  2. Social Media in an Emergency: A Best Practice Guide — from the Wellington City Council in New Zealand, who have been learning from Christchurch earthquakes and Tauranga’s oil spill.
  3. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained (PDF) — Microsoft Research dug into A/B tests done on Bing and reveal some subtle truths. The statistical theory of controlled experiments is well understood, but the devil is in the details and the difference between theory and practice is greater in practice than in theory [...] Generating numbers is easy; generating numbers you should trust is hard! (via Greg Linden)
  4. Data Sequencing Costs (National Human Genome Research Institute) — Cost-per-megabase and cost-per-genome are dropping faster than Moore’s Law now they’ve introduced “second generation techniques” for sequencing, aka “high-throughput sequencing” or a parallelization of the process. (via JP Rangaswami)
Comment: 1
Four short links: 11 May 2012

Four short links: 11 May 2012

Flipping the Medical Classroom, Inclusion Haters, Information Leveling, and Ars Longa Vita Brevis

  1. Stanford Med School Contemplates Flipped Classroom — the real challenge isn’t sending kids home with videos to watch, it’s using tools like OceanBrowser to keep on top of what they’re doing. Few profs at universities have cared whether students learned or not.
  2. Inclusive Tech Companies Win The Talent War (Gina Trapani) — she speaks the truth, and gently. The original CNN story flushed out an incredible number of vitriolic commenters apparently lacking the gene for irony.
  3. Buyers and Sellers Guide to Web Design and Development Firms (Lance Wiggs) — great idea, particularly “how to be a good client”. There are plenty of dodgy web shops, but more projects fail because of the clients than many would like to admit.
  4. What Does It Mean to Say That Something Causes 16% of Cancers? (Discover Magazine) — hey, all you infographic jockeys with your aspirations to add Data Scientist to your business card: read this and realize how hard it is to make sense of a lot of numbers and then communicate that sense. Data Science isn’t about Hadoop any more than Accounting is about columns. Both try to tell a story (the original meaning of your company’s “accounts”) and what counts is the informed, disciplined, honest effort of knowing that your story is honest.
Comment
Four short links: 4 May 2012

Four short links: 4 May 2012

Statistical Fallacies, Sensors via Microphone, Peak Plastic, and Go Web Framework

  1. Common Statistical Fallacies (Flowing Data) — once you know to look for them, you see them everywhere. Or is that confirmation bias?
  2. Project HijackHijacking power and bandwidth from the mobile phone’s audio interface.
    Creating a cubic-inch peripheral sensor ecosystem for the mobile phone.
  3. Peak Plastic — Deb Chachra points out that if we’re running out of oil, that also means that we’re running out of plastic. Compared to fuel and agriculture, plastic is small potatoes. Even though plastics are made on a massive industrial scale, they still account for less than 10% of the world’s oil consumption. So recycling plastic saves plastic and reduces its impact on the environment, but it certainly isn’t going to save us from the end of oil. Peak oil means peak plastic. And that means that much of the physical world around us will have to change. I hadn’t pondered plastics in medicine before. (via BoingBoing)
  4. web.go (GitHub) — web framework for the Go programming language.
Comment: 1
Understanding randomness is a double-edged sword

Understanding randomness is a double-edged sword

A review of "The Drunkard's Walk: How Randomness Rules Our Lives."

While Leonard Mlodinow's book offers a good introduction to probabilistic thinking, it carries two problems: First, it doesn't uniformly account for skill. Second, when we're talking probability and statistics, we're talking about interchangeable events.

Comments: 4
Four short links: 13 September 2011

Four short links: 13 September 2011

Lie with Research, Learning as You Teach, 3D Printing, and Future of Javascript

  1. Dan Saffer: How To Lie with Design Research (Google Video) — Experience shows that, especially with qualitative research like the type designers often do, two researchers can look at the same set of data and draw dramatically different findings from them. As William Blake said, “Both read the Bible day and night, But thou read’st black where I read white.” (via Keith Bolland)
  2. Teaching What You Don’t Know (Sci Blogs) — As that lecturer said, learning new things—while challenging—is also stimulating & fun. If that sense of excitement and enjoyment carries through to your actual classes, then you’ll speak with passion and enthusiasm—how better to in turn enthuse your students? Ties in with the Maori concept of Ako, that teacher and student learn from each other.
  3. Bored of 3D Printers (Tom Armitage) — made me wonder how long it would be before we drop the “3D” prefix and expect a “printer” to emit objects. That said, I love Tom’s neologism artefactory.
  4. Future of Javascript from Google’s Internal SummitJavascript has fundamental flaws that cannot be fixed merely by evolving the language. Their two-pronged strategy is to work with ECMA (the standards body responsible for the language) and simultaneously develop Yet Another New Language. I still don’t know which box to file this in: techowank fantasy (“I will build the ultimate language and all will fall in line before me!” — btdt, have the broken coffee mug), arrogant corporate forkwits, genuine frustration with the path of progress, evil play for ownership. Read Alex Russell’s commentary on this (Alex is the creator of Dojo, now an employee of Google) for some context. I have to say, We Will Build A Better Javascript doesn’t fill me with confidence when it comes from folks producing Chrome-specific demos (causing involuntary shudders as we all flash back to “this site best experienced in Microsoft Internet Explorer” days). Trust makes Google possible: Microsoft wanted to roll an identity solution out to the public but was beaten to pieces for it; Google was begged to provide an API for gmail account authentication. The difference was trust: Google had it and Microsoft had lost it. When Google loses our trust, whether by hostile self-interested forking, by promoting antifeature proprietary or effectively-proprietary integrated technologies over the open web, or by traditional trust-losing techniques such as security failures or over-exploitative use of data, they’re fucked. I use a lot of Google services and love them to pieces, but they must be ever-vigilant for hubris. Everyone at Google should look humbly at Yahoo!, which once served customers and worked well with others but whose death was ensured around 2000 when they rolled out popups and began eating the sheep instead of shearing them.
Comments: 3