"statistics" entries

Four short links: 2 August 2012

Four short links: 2 August 2012

Creative Business, News Design, Google Earth Glitches, and Data Distortion

  1. Patton Oswalt’s Letters to Both SidesYou guys need to stop thinking like gatekeepers. You need to do it for the sake of your own survival. Because all of us comedians after watching Louis CK revolutionize sitcoms and comedy recordings and live tours. And listening to “WTF With Marc Maron” and “Comedy Bang! Bang!” and watching the growth of the UCB Theatre on two coasts and seeing careers being made on Twitter and Youtube. Our careers don’t hinge on somebody in a plush office deciding to aim a little luck in our direction. (via Jim Stogdill)
  2. Headliner — interesting Guardian experiment with headlines and presentation. As always, reading the BERG designers’ notes are just as interesting as the product itself. E.g., how they used computer vision to find faces and zoom in on them to make articles more attractive to browsing readers.
  3. Google Earth Glitches — where 3d maps and aerial imagery don’t match up. (via Beta Knowledge)
  4. Campbell’s LawThe more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. (via New York Times)
Four short links: 16 July 2012

Four short links: 16 July 2012

Open Access, Emergency Social Media, A/B Testing Traps, and Post-Moore Sequencing Costs

  1. Britain To Provide Free Access to Scientific Publications (Guardian) — the Finch report is being implemented! British universities now pay around £200m a year in subscription fees to journal publishers, but under the new scheme, authors will pay “article processing charges” (APCs) to have their papers peer reviewed, edited and made freely available online. The typical APC is around £2,000 per article.
  2. Social Media in an Emergency: A Best Practice Guide — from the Wellington City Council in New Zealand, who have been learning from Christchurch earthquakes and Tauranga’s oil spill.
  3. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained (PDF) — Microsoft Research dug into A/B tests done on Bing and reveal some subtle truths. The statistical theory of controlled experiments is well understood, but the devil is in the details and the difference between theory and practice is greater in practice than in theory […] Generating numbers is easy; generating numbers you should trust is hard! (via Greg Linden)
  4. Data Sequencing Costs (National Human Genome Research Institute) — Cost-per-megabase and cost-per-genome are dropping faster than Moore’s Law now they’ve introduced “second generation techniques” for sequencing, aka “high-throughput sequencing” or a parallelization of the process. (via JP Rangaswami)
Four short links: 11 May 2012

Four short links: 11 May 2012

Flipping the Medical Classroom, Inclusion Haters, Information Leveling, and Ars Longa Vita Brevis

  1. Stanford Med School Contemplates Flipped Classroom — the real challenge isn’t sending kids home with videos to watch, it’s using tools like OceanBrowser to keep on top of what they’re doing. Few profs at universities have cared whether students learned or not.
  2. Inclusive Tech Companies Win The Talent War (Gina Trapani) — she speaks the truth, and gently. The original CNN story flushed out an incredible number of vitriolic commenters apparently lacking the gene for irony.
  3. Buyers and Sellers Guide to Web Design and Development Firms (Lance Wiggs) — great idea, particularly “how to be a good client”. There are plenty of dodgy web shops, but more projects fail because of the clients than many would like to admit.
  4. What Does It Mean to Say That Something Causes 16% of Cancers? (Discover Magazine) — hey, all you infographic jockeys with your aspirations to add Data Scientist to your business card: read this and realize how hard it is to make sense of a lot of numbers and then communicate that sense. Data Science isn’t about Hadoop any more than Accounting is about columns. Both try to tell a story (the original meaning of your company’s “accounts”) and what counts is the informed, disciplined, honest effort of knowing that your story is honest.
Four short links: 4 May 2012

Four short links: 4 May 2012

Statistical Fallacies, Sensors via Microphone, Peak Plastic, and Go Web Framework

  1. Common Statistical Fallacies (Flowing Data) — once you know to look for them, you see them everywhere. Or is that confirmation bias?
  2. Project HijackHijacking power and bandwidth from the mobile phone’s audio interface.
    Creating a cubic-inch peripheral sensor ecosystem for the mobile phone.
  3. Peak Plastic — Deb Chachra points out that if we’re running out of oil, that also means that we’re running out of plastic. Compared to fuel and agriculture, plastic is small potatoes. Even though plastics are made on a massive industrial scale, they still account for less than 10% of the world’s oil consumption. So recycling plastic saves plastic and reduces its impact on the environment, but it certainly isn’t going to save us from the end of oil. Peak oil means peak plastic. And that means that much of the physical world around us will have to change. I hadn’t pondered plastics in medicine before. (via BoingBoing)
  4. web.go (GitHub) — web framework for the Go programming language.

Understanding randomness is a double-edged sword

A review of "The Drunkard's Walk: How Randomness Rules Our Lives."

While Leonard Mlodinow's book offers a good introduction to probabilistic thinking, it carries two problems: First, it doesn't uniformly account for skill. Second, when we're talking probability and statistics, we're talking about interchangeable events.

Four short links: 13 September 2011

Four short links: 13 September 2011

Lie with Research, Learning as You Teach, 3D Printing, and Future of Javascript

  1. Dan Saffer: How To Lie with Design Research (Google Video) — Experience shows that, especially with qualitative research like the type designers often do, two researchers can look at the same set of data and draw dramatically different findings from them. As William Blake said, “Both read the Bible day and night, But thou read’st black where I read white.” (via Keith Bolland)
  2. Teaching What You Don’t Know (Sci Blogs) — As that lecturer said, learning new things—while challenging—is also stimulating & fun. If that sense of excitement and enjoyment carries through to your actual classes, then you’ll speak with passion and enthusiasm—how better to in turn enthuse your students? Ties in with the Maori concept of Ako, that teacher and student learn from each other.
  3. Bored of 3D Printers (Tom Armitage) — made me wonder how long it would be before we drop the “3D” prefix and expect a “printer” to emit objects. That said, I love Tom’s neologism artefactory.
  4. Future of Javascript from Google’s Internal SummitJavascript has fundamental flaws that cannot be fixed merely by evolving the language. Their two-pronged strategy is to work with ECMA (the standards body responsible for the language) and simultaneously develop Yet Another New Language. I still don’t know which box to file this in: techowank fantasy (“I will build the ultimate language and all will fall in line before me!” — btdt, have the broken coffee mug), arrogant corporate forkwits, genuine frustration with the path of progress, evil play for ownership. Read Alex Russell’s commentary on this (Alex is the creator of Dojo, now an employee of Google) for some context. I have to say, We Will Build A Better Javascript doesn’t fill me with confidence when it comes from folks producing Chrome-specific demos (causing involuntary shudders as we all flash back to “this site best experienced in Microsoft Internet Explorer” days). Trust makes Google possible: Microsoft wanted to roll an identity solution out to the public but was beaten to pieces for it; Google was begged to provide an API for gmail account authentication. The difference was trust: Google had it and Microsoft had lost it. When Google loses our trust, whether by hostile self-interested forking, by promoting antifeature proprietary or effectively-proprietary integrated technologies over the open web, or by traditional trust-losing techniques such as security failures or over-exploitative use of data, they’re fucked. I use a lot of Google services and love them to pieces, but they must be ever-vigilant for hubris. Everyone at Google should look humbly at Yahoo!, which once served customers and worked well with others but whose death was ensured around 2000 when they rolled out popups and began eating the sheep instead of shearing them.
Four short links: 2 September 2011

Four short links: 2 September 2011

AutoUpdater, Extrapolation Apocalypse, C Compilers, and Authentication

  1. Invisible Autoupdater: An App’s Best Feature — Gina Trapani quotes Ben Goodger on Chrome: The idea was to give people a blank window with an autoupdater. If they installed that, over time the blank window would grow into a browser.
  2. Crackpot Apocalypse — analyzing various historical pronouncements of the value of pi, paper author concludes “When πt is 1, the circumference of a circle will coincide with its diameter,” Dudley writes, “and thus all circles will collapse, as will all spheres (since they have circular cross-sections), in particular the earth and the sun. It will be, in fact, the end of the world, and … it will occur in 4646 A.D., on August 9, at 4 minutes and 27 seconds before 9 p.m.” Clever commentary and a good example when you need to show people the folly of inappropriate curve-fitting and extrapolation.
  3. clang — C language family front-ends to LLVM. Development sponsored by Apple, as used in Snow Leopard. (via Nelson Minar)
  4. OmniAuth — authenticate against Twitter, GitHub, Facebook, Foursquare, and many many more. OmniAuth is built from the ground up on the philosophy that authentication is not the same as identity. (via Tony Stubblebine)
Four short links: 20 July 2011

Four short links: 20 July 2011

Meaningful Subsets, iPhone Reading, JSON Parser, The Epiphanator

  1. Random Khan Exercises — elegant hack to ensure repeatability for a user but difference across users. Note that they need these features of exercises so that they can perform meaningful statistical analyses on the results.
  2. Float, the Netflix of Reading (Wired) — an interesting Instapaper variant with a stab at an advertising business model. I would like to stab at the advertising business model, too. What I do like is that it’s trying to do something with the links that friends tweet, an unsolved problem for your humble correspondent. (via Steven Levy
  3. JSON Parser Online — nifty web app for showing JSON parses. (via Hilary Mason)
  4. Facebook and the Epiphanator (NY Magazine) — Paul Ford has a lovely frame through which to see the relationship between traditional and social media. So it would be easy to think that the Whole Earthers are winning and the Epiphinators are losing. But this isn’t a war as much as a trade dispute. Most people never chose a side; they just chose to participate. No one joined Facebook in the hope of destroying the publishing industry.
Four short links: 23 June 2011

Four short links: 23 June 2011

Communities, Statistics, News, and Doubting Data

  1. The Wisdom of Communities — Luke Wroblewski’s notes from Derek Powazek‘s talk at Event Apart. Wisdom of Crowds theory shows that, in aggregate, crowds are smarter than any single individual in the crowd. See this online in most emailed features, bit torrent, etc. Wise crowds are built on a few key characteristics: diversity (of opinion), independence (of other ideas), decentralization, and aggregation.
  2. How to Fit an Elephant (John D. Cook) — for the stats geeks out there. Someone took von Neumann’s famous line “with four parameters I can fit an elephant, and with five I can make him wiggle his trunk”, and found the four complex parameters that do, indeed, fit an elephant.
  3. How to Run a News Site and Newspaper Using WordPress and Google Docs — clever workflow that’s digital first but integrated with print. (via Sacha Judd)
  4. All Watched Over: On Foo, Cybernetics, and Big Data — I’m glad someone preserved Matt Jones’s marvelous line, “the map-reduce is not the territory”. (via Tom Armitage)
Four short links: 14 June 2011

Four short links: 14 June 2011

ASCII Diagrams, Bayesian Textbook, Telehacks Interview, and Table Resizing in CSS

  1. ASCII Flow — create ASCII diagrams. Awesome. (via Hacker News)
  2. Principles of Uncertainty — probability and statistics textbook, for maths students to build up to understanding Bayesian reasoning.
  3. Playable Archaeology: An Interview with the Telehacks Anonymous Creator (Andy Baio) — The inspiration was my son. I had shown him the old movies Hackers, Wargames, and Colossus: The Forbin Project and he really liked them. After seeing Hackers and Wargames, he really wanted to start hacking stuff on his own. I’d taught him some programming, but I didn’t want him doing any actual hacking, so I decided to make a simulation so he could telnet to hosts, hack them, and get the feel of it, but safely. (Andy was the interviewer, not the creator)
  4. Responsive Data Tables — CSS ways to reformat data tables if the screen width is inadequate for the default table layout. (via Keith Bolland)