- An Intuitive Guide to Linear Algebra — Here’s the linear algebra introduction I wish I had. I wish I’d had it, too. (via Hacker News)
- Think Bayes — an introduction to Bayesian statistics using computational methods.
- Divshot — a startup turning mockups into web apps, built on top of the Bootstrap front-end framework. I feel momentum and a tipping point approaching, where building things on the web is about to get easier again (the way it did with Ruby on Rails). cf Jetstrap.
ID-based Democracy, Web Documentation, American Telco Gouging, and Stats Cookbook
- Finland Crowdsourcing New Laws (GigaOm) — online referenda. The Finnish government enabled something called a “citizens’ initiative”, through which registered voters can come up with new laws – if they can get 50,000 of their fellow citizens to back them up within six months, then the Eduskunta (the Finnish parliament) is forced to vote on the proposal. Now this crowdsourced law-making system is about to go online through a platform called the Open Ministry. Petitions and online voting are notoriously prone to fraud, so it will be interesting to see how well the online identity system behind this holds up.
- WebPlatform — wiki of information about developing for the open web. Joint production of many of the $BIGCOs of the web and the W3C, so will be interesting to see, as it develops, whether it has the best aspects of each or the worst.
- Why Your Phone, Cable, Internet Bills Cost So Much (Yahoo) — “The companies essentially have a business model that is antithetical to economic growth,” he says. “Profits go up if they can provide slow Internet at super high prices.” Excellent piece!
- Probability and Statistics Cookbook (Matthias Vallentin) — The cookbook contains a succinct representation of various topics in probability theory and statistics. It provides a comprehensive reference reduced to the mathematical essence, rather than aiming for elaborate explanations. CC-BY-NC-SA licensed, LaTeX source on github.
Quickly perform and interpret the results of routine Small Data analysis
With so much focus on Big Data, the needs of many analysts who work with Small Data tend to get ignored. The default tool for many of these users remains spreadsheets1 and/or statistical packages which come with a lot of features and options. However many analysts need a very small subset of what these tools have to offer.
Enter Statwing, a software-as-a-service provider for routine statistical analysis. While the tool is still in the early stages, it can already do many basic “data analysis” tasks.
Consider the following example of a pivot table constructed in Excel: this required 8 mouse-clicks, if you do everything perfectly, and about 5 decisions (what variables to include, what metric to use, …)
The same task in Statwing required 4 mouse-clicks and 0 decisions! Plus it comes with visuals:
The lack of clutter and the addition of a simple “headline” (“Female tends to have much higher values for satisfaction than Male“), makes the result much easier to interpret. The advanced tab contains detailed statistical analysis (in this case the p-value, counts, values). Many users get confused by the output/results produced by traditional statistical software. Let’s face it, many analysts have had little training in statistics. I welcome a tool that produces readily interpretable results.
The company hopes to replicate the above example across a wide variety of routine data analysis tasks. Their initial focus is on tools for (consumer) survey analysis, a potentially huge market given that online companies have made surveys so much easier to conduct. Users of Statwing pay a small monthly subscription, making it cheaper than most2 statistical packages. For a small monthly fee, their intuitive UI lets analysts get their tasks done quickly. More importantly Statwing may nurture aspiring data scientists in your organization.
(1) As this recent Strata presentation points out: Spreadsheets are the glue that keeps many organizations together.
(2) Open source tools like OpenOffice, R and Octave are free. So is the use of Google spreadsheets.
The UDID story has conflicting theories, so the only real thing we have to work with is the data.
The FBI is aware of published reports alleging that an FBI laptop was compromised and private data regarding Apple UDIDs was exposed. At this time there is no evidence indicating that an FBI laptop was compromised or that the FBI either sought or obtained this data.
Of course that statement leaves a lot of leeway. It could be the agent’s personal laptop, and the data may well have been “property” of an another agency. The wording doesn’t even explicitly rule out the possibility that this was an agency laptop, they just say that right now they don’t have any evidence to suggest that it was.
This limited data release doesn’t have much impact, but the possible release of the full dataset, which is claimed to include names, addresses, phone numbers and other identifying information, is far more worrying.
While there are some almost dismissing the issue out of hand, the real issues here are: Where did the data originate? Which devices did it come from and what kind of users does this data represent? Is this data from a cross-section of the population, or a specifically targeted demographic? Does it originate within the law enforcement community, or from an external developer? What was the purpose of the data, and why was it collected?
With conflicting stories from all sides, the only thing we can believe is the data itself. The 40-character strings in the release at least look like UDID numbers, and anecdotally at least we have a third-party confirmation that this really is valid UDID data. We therefore have to proceed at this point as if this is real data. While there is a possibility that some, most, or all of the data is falsified, that’s looking unlikely from where we’re standing standing at the moment.
Reading Minds, Satellites in the Cloud, Units for Risk, and Valuing Autism
- Reconstructing Visual Experiences (PDF) — early visual areas represent the information in movies. To demonstrate the power of our approach, we also constructed a Bayesian decoder by combining estimated encoding models with a sampled natural movie prior. The decoder provides remarkable reconstructions of the viewed movies. These results demonstrate that dynamic brain activity measured under naturalistic conditions can be decoded using current fMRI technology.
- Earth Engine — satellite imagery and API for coding against it, to do things like detecting deforestation, classifying land cover, estimating forest biomass and carbon, and mapping the world’s roadless areas.
- Microlives — 30m of your life expectancy. Here are some things that would, on average, cost a 30-year-old man 1 microlife: Smoking 2 cigarettes; Drinking 7 units of alcohol (eg 2 pints of strong beer); Each day of being 5 Kg overweight. A chest X-ray will set a middle-aged person back around 2 microlives, while a whole body CT-scan would weigh in at around 180 microlives.
- Autistics Need Opportunities More Than Treatment — Laurent gave a powerful talk at Sci Foo: if the autistic brain is better at pattern matching, find jobs where that’s useful. Like, say, science. The autistic woman who was delivering mail became a research assistant in his lab, now has papers galore to her name for original research.
Creative Business, News Design, Google Earth Glitches, and Data Distortion
- Patton Oswalt’s Letters to Both Sides — You guys need to stop thinking like gatekeepers. You need to do it for the sake of your own survival. Because all of us comedians after watching Louis CK revolutionize sitcoms and comedy recordings and live tours. And listening to “WTF With Marc Maron” and “Comedy Bang! Bang!” and watching the growth of the UCB Theatre on two coasts and seeing careers being made on Twitter and Youtube. Our careers don’t hinge on somebody in a plush office deciding to aim a little luck in our direction. (via Jim Stogdill)
- Headliner — interesting Guardian experiment with headlines and presentation. As always, reading the BERG designers’ notes are just as interesting as the product itself. E.g., how they used computer vision to find faces and zoom in on them to make articles more attractive to browsing readers.
- Google Earth Glitches — where 3d maps and aerial imagery don’t match up. (via Beta Knowledge)
- Campbell’s Law — The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. (via New York Times)
Flipping the Medical Classroom, Inclusion Haters, Information Leveling, and Ars Longa Vita Brevis
- Stanford Med School Contemplates Flipped Classroom — the real challenge isn’t sending kids home with videos to watch, it’s using tools like OceanBrowser to keep on top of what they’re doing. Few profs at universities have cared whether students learned or not.
- Inclusive Tech Companies Win The Talent War (Gina Trapani) — she speaks the truth, and gently. The original CNN story flushed out an incredible number of vitriolic commenters apparently lacking the gene for irony.
- Buyers and Sellers Guide to Web Design and Development Firms (Lance Wiggs) — great idea, particularly “how to be a good client”. There are plenty of dodgy web shops, but more projects fail because of the clients than many would like to admit.
- What Does It Mean to Say That Something Causes 16% of Cancers? (Discover Magazine) — hey, all you infographic jockeys with your aspirations to add Data Scientist to your business card: read this and realize how hard it is to make sense of a lot of numbers and then communicate that sense. Data Science isn’t about Hadoop any more than Accounting is about columns. Both try to tell a story (the original meaning of your company’s “accounts”) and what counts is the informed, disciplined, honest effort of knowing that your story is honest.
Statistical Fallacies, Sensors via Microphone, Peak Plastic, and Go Web Framework
- Common Statistical Fallacies (Flowing Data) — once you know to look for them, you see them everywhere. Or is that confirmation bias?
- Project Hijack — Hijacking power and bandwidth from the mobile phone’s audio interface.
Creating a cubic-inch peripheral sensor ecosystem for the mobile phone.
- Peak Plastic — Deb Chachra points out that if we’re running out of oil, that also means that we’re running out of plastic. Compared to fuel and agriculture, plastic is small potatoes. Even though plastics are made on a massive industrial scale, they still account for less than 10% of the world’s oil consumption. So recycling plastic saves plastic and reduces its impact on the environment, but it certainly isn’t going to save us from the end of oil. Peak oil means peak plastic. And that means that much of the physical world around us will have to change. I hadn’t pondered plastics in medicine before. (via BoingBoing)
- web.go (GitHub) — web framework for the Go programming language.
A review of "The Drunkard's Walk: How Randomness Rules Our Lives."
While Leonard Mlodinow's book offers a good introduction to probabilistic thinking, it carries two problems: First, it doesn't uniformly account for skill. Second, when we're talking probability and statistics, we're talking about interchangeable events.