"statistics" entries

Four short links: 16 March 2010

Four short links: 16 March 2010

Platform Games, NoSQL Conf, Ratings, and How to Teach

  1. Government is an Elephant (Public Strategist) — if Government is to be a platform, it will end up competing with the members of its ecosystems (the same way Apple’s Dashboard competed with Konfabulator, and Google’s MyMaps competed with Platial). If you think people squawk when a company competes, just wait until the competition is taxpayer-funded ….
  2. Recordings from NoSQL Live Boston — also available in podcasts.
  3. Modeling Scale Usage Heterogeneity the Bayesian Way — people use 1-5 scales in different ways (some cluster around the middle, some choose extremes, etc.). This shows how to identify the types of users, compensate for their interpretation of the scale, and how it leads to more accurate results.
  4. Building a Better Teacher — fascinating discussion about classroom management that applies to parenting, training, leading a meeting, and many other activities that take place outside of the school classroom. (via Mind Hacks)
Four short links: 5 March 2010

Four short links: 5 March 2010

GMail CRM, Django Best Practices, Stats-Think, and WoW Number Crunching

  1. Rapportivea simple social CRM built into Gmail. They replace the ads in Gmail with photos, bio, and info from social media sites. (via ReadWrite Web)
  2. Best Practices in Web Development with Django and Python — great set of recommendations. (via Jon Udell‘s article on checklists)
  3. Think Like a Statistician Without The Math (Flowing Data) — Finally, and this is the most important thing I’ve learned, always ask why. When you see a blip in a graph, you should wonder why it’s there. If you find some correlation, you should think about whether or not it makes any sense. If it does make sense, then cool, but if not, dig deeper. Numbers are great, but you have to remember that when humans are involved, errors are always a possibility. This is basically how to be a scientist: know the big picture, study the details to find deviations, and always ask “why”.
  4. WoW Armory Data Mining — a blog devoted to data mining on the info from the Wow Amory, which has a lot of data taken from the servers. It’s baseball statistics for World of Warcraft. Fascinating! (via Chris Lewis)
Four short links: 19 February 2010

Four short links: 19 February 2010

Data Adjustments, Grasping Telcos, Open Data Panacea Denied, Newspaper Software

  1. How to Seasonally Adjust DataMost statisticians, economists and government agencies that report data use a method called the X12 procedure to adjust data for seasonal patterns. The X12 procedure and its predecessor X11, which is still widely used, were developed by the U.S. Census Bureau. When applied to a data series, the X12 process first estimates effects that occur in the same month every year with similar magnitude and direction. These estimates are the “seasonal” components of the data series. (via bengebre on Delicious)
  2. Vodafone Chief: Mobile Groups Should Be Able to Bypass Google (Guardian) — Vodafone and other telcos want to charge both ends, to charge not just the person with a monthly mobile data subscription but also the companies with whom that person communicates. It’s double-dipping and offensively short-sighted. Vodafone apparently wants to stripmine all the value their product creates. This is not shearing the sheep, this is a recipe for lamb in mint sauce.
  3. Open Data is Not A Panacea, But It Is A StartThe reality is that releasing the data is a small step in a long walk that will take many years to see any significant value. Sure there will be quick wins along the way – picking on MP’s expenses is easy. But to build something sustainable, some series of things that serve millions of people directly, will not happen overnight. And the reality, as Tom Loosemore pointed out at the London Data Store launch, it won’t be a sole developer who ultimately brings it to fruition. (via sebchan on Twitter)
  4. Our GeoDjango EC2 Image for News Apps — Chicago Tribune releasing an Amazon EC2 image of the base toolchain they use. Very good to see participation and contribution from organisations historically seen as pure consumers of technology. All business are becoming technology-driven businesses, realising the old mindset of “leave the tech to those who do it best” isn’t compatible with being a leader in your industry.
Four short links: 17 December 2009

Four short links: 17 December 2009

Desirable Devices, iPhone Piracy Numbers, Internet Trend Numbers, Value of Privacy

  1. New Device Desirable, Old Device Undesirable“I’m going to take my new device wherever I go,” said Larson, holding the expensive item directly in the eyeline of several reporters. “That way no one on the street, inside the elevator, or at my place of business will ever mistake me for the sort of individual who does not own the new device.” Added Larson, “The new device brings me satisfaction.” (via liza on Twitter)
  2. iPhone Piracy — over 70% of submitted game scores for this game were from pirated copies. Having seen our data and the fact that not a single pirate bought Tap-Fu after playing it, these arguments all sound a bit delusional to me. It seems like an attempt at trying to be legitimate while hiding the real reason. They should just change their page to say “We pirate because we can”. That seems to be a much more honest statement based on the data we’ve seen. (via timoreilly on Twitter)
  3. World Internet Project — global research into Internet adoption and trends. Found via the New Zealand partner who published their dataset in the New Zealand Social Science Datasets repository.
  4. The Eternal Value of Privacy (Bruce Schneier) — powerful notes about the right to privacy. Privacy protects us from abuses by those in power, even if we’re doing nothing wrong at the time of surveillance. […] Privacy is a basic human need. […] For if we are observed in all matters, we are constantly under threat of correction, judgment, criticism, even plagiarism of our own uniqueness. We become children, fettered under watchful eyes, constantly fearful that — either now or in the uncertain future — patterns we leave behind will be brought back to implicate us, by whatever authority has now become focused upon our once-private and innocent acts. We lose our individuality, because everything we do is observable and recordable.
Four short links: 16 December 2009

Four short links: 16 December 2009

Global Broadband, A/B Testing Stats, Streaming with SSDs, Online Videos Sell

  1. OECD Broadband Portal — global data on broadband penetration and pricing available from June 2009.
  2. Easy Statistics for A/B Testing — it really is easy. And it mentions hamsters. This is worth reading. (via Hacker News)
  3. last.fm’s SSD Streaming InfrastructureEach single SSD can support around 7000 concurrent listeners, and the serving capacity of the machine topped out at around 30,000 concurrent connections in it’s tested configuration. Lots of hardware and OS configuration geeking here, it’s great. (via Hacker News)
  4. Videos Sell More Product — Zappos sells 6-30% more merchandise when accompanied by video demos. By the end of next year, Zappos will have ten full working video studios, with the goal of producing around 50,000 product videos by 2010, up from the 8,000 videos they have on the site today (via johnclegg on Twitter)
Four short links: 5 November 2009

Four short links: 5 November 2009

Heat Maps in R, EC2 Blackhat Tricks, Snickersome Unicode, and Decoding Statistics

  1. Heat Maps in RWe used financial data here because it’s easier to access than the airline data, but it’s actually a pretty interesting way of looking at a financial time series. Weekend and holiday effects are a bit more obvious, and it’s a bit like being able to see the daily, weekly, monthly and yearly closes all at once (by scanning your eye over the calendar in different directions). Includes source code. (via migurski on Delicious)
  2. BlackHat and EC2Theft of resources is the red-headed step-child of attack classes and doesn’t get much attention, but on cloud platforms where resources are shared amongst many users these attacks can have a very real impact. With this in mind, we wanted to show how EC2 was vulnerable to a number of resource theft attacks and the videos below demonstrate three separate attacks against EC2 that permit an attacker to boot up massive numbers of machines, steal computing time/bandwidth from other users and steal paid-for AMIs. (via straup on Delicious)
  3. Funny Characters in Unicode — I never get tired of the wacky stuff in Unicode. I love the thought of a Unicode committee somewhere arguing passionately about the number of buttons on the snowman …. (via Hacker News)
  4. Statistics to English TranslationThe terms sensitivity and specificity generally refer to diagnostic or screening procedures, such as an HIV or allergy tests. The sensitivity of a test is its true positive rate; the specificity is its true negative rate, although it can be more intuitive to think of specificity as the complement of the false positive rate. This matters. Bandying around numbers with misleading labels, or misinterpreting numbers that have a precise and defined meaning, does not further understanding. (Said 78.4% of statisticians, with a 20% confidence factor probability of false positives)

 

Four short links: 14 August 2009

Four short links: 14 August 2009

EPub FTW, SQL Horror, Computer Vision Explained, and A Massive Dump of Twitter Stats

  1. Page2Pub — harvest wiki content and turn it into EPub and PDF. See also Sony dropping its proprietary format and moving to EPub. Open standards rock. (via oreillylabs on Twitter)
  2. SQL Pie Chart — an ASCII pie chart, drawn by SQL code. Horrifying and yet inspiring. Compare to PostgreSQL code to produce ASCII Mandelbrot set. (via jdub on Twitter and Simon Willison)
  3. How SudokuGrab Works — the computer vision techniques behind an iPhone app that solves Sudoku puzzles that you take a photo of. Well explained! These CV techniques are an essential part of the sensor web. (via blackbeltjones on Delicious)
  4. Twitter by the Numbers — massive dump of charts and stats on Twitter. I love that there’s a section devoted to social media marketers, the Internet’s head lice. (via Kevin Marks on Twitter)
Four short links: 13 August 2009

Four short links: 13 August 2009

  1. Under the Hood of App Inventor for Android — regular readers know I’m a big fan of visual programming language Scratch, and apparently Google are too. They’ve got twelve university classes testing App Inventor for Android, a visual connect-the-bits programming environment for Android. University classes probably because one of the co-creators is Hal Abelson, coauthor of the definitive programming textbook. Also found online: the PR-type announcement, a Professor using it, and @AppInv (nothing juicy on Twitter–it looks like might be a channel for tech support for the students). (via Hacker News)
  2. Google Web Optimizer Case Study (Four Hour Work Week) — GWO manages A/B tests for you, with a lot of statistical analysis. It’s a fascinating read to see how these should be done. Every equation may halve the readership of a book, but every table of numbers and relevancy analysis doubles the value of a post like this. (via Hacker News)
  3. Opening Up The BBC’s Natural History Archive — the BBC are releasing programme segments and a whole lot of metadata around their programming. Audio and video segmented, tagged with DBpedia terms, and aggregated into a URI structure based on natural history concepts: species, habitats, adaptations, etc. Gorgeous!
  4. Yahoo! Term Extraction API to CloseInternally, both services
    share a backend data source that is closing down, so the publicly-facing YDN
    services will be closing as well.
    I think it’s the most significant casualty of Y! outsourcing search to MSFT, as this API was used by a lot of projects. (via Simon Willison)

Making Government Transparent Using R

Danese Cooper thinks it will be an important tool in Open Gov

With Open Source now considered an accepted part of the software industry, some people are starting to wonder if we can’t bring the same degree of openness and innovation into government. Danese Cooper, who is actively involved in the open source community through her work with the Open Source Initiative and Apache, as well as working as an R wonk for Revolution Computing, would love to see the government become more open. Part of that openness is being able to access and interpret the mass of data that the government collects, something Cooper thinks R would be a great tool for. She’ll be talking about R and Open Government at O’Reilly’s Open Source Conference, OSCON.

Four short links: 7 July 2009

Four short links: 7 July 2009

Motivation, R, Games, and Open Source Medicine

  1. Announcing your plans makes you less motivated to accomplish themTests done since 1933 show that people who talk about their intentions are less likely to make them happen. Announcing your plans to others satisfies your self-identity just enough that you’re less motivated to do the hard work needed. I have noticed this myself. It must be balanced against the other finding that public commitment increases probability of followthrough, which might work in sales but seems to fail miserably in getting me to do anything productive. (via migurski on Delicious)
  2. Rseek — search engine for info on R. Necessary because of the non-unique project name. (via Benjamin Mako Hill)
  3. Treasure World (Offworld) — Nintendo DS game that turns wifi spots into collectible treasure. You have to explore the real world as you play the game, another of these games that mix the online and offline worlds. (via waxy)
  4. 50 Successful Open Source Projects That Are Changing Medicine — notice the large number of electronic health record (EHR) suites. What are the chances of any of them getting a slice of Obama’s EHR money that the ex-RedHatters behind The Axial Project are going for? (via timoreilly on Twitter)