Inside the Aaron Swartz Investigation, Multivariate Dataset Exploration, Augmediated Life, and Public Experience

  1. Life Inside the Aaron Swartz Investigationdo hard things and risk failure. What else are we on this earth for?
  2. crossfilter — open source (Apache 2) JavaScript library for exploring large multivariate datasets in the browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records.
  3. Steve Mann: My Augmediated Life (IEEE) — Until recently, most people tended to regard me and my work with mild curiosity and bemusement. Nobody really thought much about what this technology might mean for society at large. But increasingly, smartphone owners are using various sorts of augmented-reality apps. And just about all mobile-phone users have helped to make video and audio recording capabilities pervasive. Our laws and culture haven’t even caught up with that. Imagine if hundreds of thousands, maybe millions, of people had video cameras constantly poised on their heads. If that happens, my experiences should take on new relevance.
  4. The Google Glass Feature No-One Is Talking AboutThe most important Google Glass experience is not the user experience – it’s the experience of everyone else. The experience of being a citizen, in public, is about to change.

Design matters more than math

Design compels. Math is proof.

At Strata Santa Clara later this month, we’re reprising what has become a tradition: Great Debates. These Oxford-style debates pit two teams against one another to argue a hot topic in the fields of big data, ubiquitous computing, and emerging interfaces.

What matters more? Our teams for the Great Debate.Part of the fun is the scoring: attendees vote on whether they agree with the proposal before the debaters; and after both sides have said their piece, the audience votes again. Whoever moves the needle wins.

This year’s proposition — that design matters more than math — is sure to inspire some vigorous discussion. The argument for math is pretty strong. Math is proof. Given enough data — and today, we have plenty — we can know. “The right information in the right place just changes your life,” said Stewart Brand. Properly harnessed, the power of data analysis and modeling can fix cities, predict epidemics, and revitalize education. Abused, it can invade our lives, undermine economies, and steal elections. Surely the algorithms of big data matter!

But your life won’t change by itself. Bruce Mau defines design as “the human capacity to plan and produce desired outcomes.” Math informs; design compels. Without design, math can’t do its thing. Poorly designed experiments collect the wrong data. And if the data can’t be understood and acted upon, it may as well not have been crunched in the first place.

This is the question we’ll be putting to our debaters: Which matters more? A well-designed collection of flawed information — or an opaque, hard-to-parse, but unerringly accurate model? From mobile handsets to social policy, we need both good math and good design. Which is more critical? Read more…

SCADA 0-Day, Complexity Course, ToS Tracking, and Custom Manufacturing Prostheses

  1. Tridium Niagara (Wired) — A critical vulnerability discovered in an industrial control system used widely by the military, hospitals and others would allow attackers to remotely control electronic door locks, lighting systems, elevators, electricity and boiler systems, video surveillance cameras, alarms and other critical building facilities, say two security researchers. cf the SANS SCADA conference.
  2. Santa Fe Institute Course: Introduction to Complexity — 11 week course on understanding complex systems: dynamics, chaos, fractals, information theory, self-organization, agent-based modeling, and networks. (via BoingBoing)
  3. Terms of Service Changes — a site that tracks changes to terms of service. (via Andy Baio)
  4. 3D Printing a Replacement Hand for a 5 Year Old Boy (Ars Technica) — the designs are on Thingiverse. For more, see their blog.

Improve your math skills

Practical advice for those considering a career in data science

When I was a youngster in college I found myself dissatisfied after I took a stats class from the math department.  So I decided to take another stats class. Classmates thought I was crazy. Let’s be real, what precocious over-achieving teenager majoring in English lit seeks to retake a math class? And not because of a grade but because they were dissatisfied with what they didn’t get out of it? After a bit of research, I decided to take the stats class offered by the psych department.

It made a significant difference.

Thinking about math from the perspectives of research design methodology and how data can be used to manipulate people made quite an impact on my teenage worldview. This experience also reinforced my belief that education is what you decide it will be. There is always more than one way to learn and education doesn’t necessarily have to happen in a physical classroom. Growing up in the San Francisco Bay Area where friends and loved ones decided to forgo traditional higher ed completely to start their own companies or immediately work in jobs in technology also contributed to this belief.

While full time students who are looking at a career in data science may have the time to do seemingly nutty things like take overlapping math classes, this is not something that most people with full time jobs are able to do. When people with full time jobs ask me about what they need to do to move into data science, I probe them about the kind of job in data science they want and about their analytical and empathy skills. Then, I immediately follow up with “So, how are your math skills?.” Interestingly enough, I get a lot people saying how they don’t have time to physically go into a classroom or that it has been, like, forever since they’ve used statistics and/or linear algebra for data analysis. Even more interesting is how often people don’t realize just how many resources are available to learn math outside of the physical-attendance-in-a-classroom-model.

Huh. Read more…

SSH/L Multiplexer, GitHub Bots, Test Your Assumptions, and Tech Trends

  1. sslh — ssh/ssl multiplexer.
  2. Github Says No to Bots (Wired) — what’s interesting is that bots augmenting photos is awesome in Flickr: take a photo of the sky and you’ll find your photo annotated with stars and whatnot. What can GitHub learn from Flickr?
  3. Four Assumptions of Multiple Regression That Researchers Should Always Test — “but I found the answer I wanted! What do you mean, it might be wrong?!”
  4. Tenth Grade Tech Trends (Medium) — if you want to know what will have mass success, talk to early adopters in the mass market. We alpha geeks aren’t that any more.
Win95 Tips, Obama's Big Data, Aggregate Statistics, and Foxconn Robots

  1. Windows 95 Tips — hilarious tumblr showing the dark side of life through Windows 95 UI tips. (via Juha Saarinen)
  2. Everything We Know About Obama’s Big Data Operation (Pro Publica) — “White suburban women? They’re not all the same. The Latino community is very diverse with very different interests,” Dan Wagner, the campaign’s chief analytics officer, told The Los Angeles Times. “What the data permits you to do is figure out that diversity.”
  3. cube (GitHub) — time-series data collection and analysis. Cube lets you compute aggregate statistics post hoc. It also enables richer analysis, such as quantiles and histograms of arbitrary event sets. Cube is built on MongoDB and available under the Apache License on GitHub.
  4. 1M Robots to Replace 1M Human Jobs at Foxconn (Singularity Hub) — Foxconn plant opening, making manufacturing robots, and they appear to be dogfooding by using them in other plants. $25k each, 10k+ made, and fits into the pattern: the number of operational robots in China increased by 42 percent from 2010 to 2011.
Intuitive Linear Algebra, Bayes Intro, State of Javascript, and Web App Builders

  1. An Intuitive Guide to Linear AlgebraHere’s the linear algebra introduction I wish I had. I wish I’d had it, too. (via Hacker News)
  2. Think Bayesan introduction to Bayesian statistics using computational methods.
  3. The State of Javascript 2012 (Brendan Eich) — Javascript continues its march up and down the stack, simultaneously becoming an application language while becoming the bytecode for the world.
  4. Divshot — a startup turning mockups into web apps, built on top of the Bootstrap front-end framework. I feel momentum and a tipping point approaching, where building things on the web is about to get easier again (the way it did with Ruby on Rails). cf Jetstrap.
ID-based Democracy, Web Documentation, American Telco Gouging, and Stats Cookbook

  1. Finland Crowdsourcing New Laws (GigaOm) — online referenda. The Finnish government enabled something called a “citizens’ initiative”, through which registered voters can come up with new laws – if they can get 50,000 of their fellow citizens to back them up within six months, then the Eduskunta (the Finnish parliament) is forced to vote on the proposal. Now this crowdsourced law-making system is about to go online through a platform called the Open Ministry. Petitions and online voting are notoriously prone to fraud, so it will be interesting to see how well the online identity system behind this holds up.
  2. WebPlatform — wiki of information about developing for the open web. Joint production of many of the $BIGCOs of the web and the W3C, so will be interesting to see, as it develops, whether it has the best aspects of each or the worst.
  3. Why Your Phone, Cable, Internet Bills Cost So Much (Yahoo) — “The companies essentially have a business model that is antithetical to economic growth,” he says. “Profits go up if they can provide slow Internet at super high prices.” Excellent piece!
  4. Probability and Statistics Cookbook (Matthias Vallentin) — The cookbook contains a succinct representation of various topics in probability theory and statistics. It provides a comprehensive reference reduced to the mathematical essence, rather than aiming for elaborate explanations. CC-BY-NC-SA licensed, LaTeX source on github.
Mobile Content, Google Math, Mobile Linux, and Mozilla's Strategy

  1. Mobile Content StrategyMobile is a catalyst that can help you make your content tighter without loss of clarity or information. If you make your content work well on mobile, it will work everywhere. Excellent presentation, one I want to thump on every decision-maker’s desk and say “THIS!”.
  2. Math at Google (PDF) — presentation showing the different types of math used to build Google. Good as overview, and as way to motivate highschool and college kids to do their math homework. “See, it really is useful! Really!” (via Ben Lorica)
  3. Tizen 2.0 Alpha Released — Tizen is the Linux Foundation’s mobile Linux kernel, device drivers, middleware subsystems, and Web APIs. (via The Linux Foundation)
  4. Explaining WebMaker Crisply (Mark Surman) — if you’ve wondered wtf Mozilla is up to, this is excellent. Mozilla has big priorities right now: the web on the desktop; the web on mobile; and web literacy.
Open Publishing, Theatre Sensing, Reddit First, and Math Podcasts

  1. Open Monograph Pressan open source software platform for managing the editorial workflow required to see monographs, edited volumes and, scholarly editions through internal and external review, editing, cataloguing, production, and publication. OMP will operate, as well, as a press website with catalog, distribution, and sales capacities. (via OKFN)
  2. Sensing Activity in Royal Shakespeare Theatre (NLTK) — sensing activity in the theatre, for graphing. Raw data available. (via Infovore)
  3. Why Journalists Love Reddit (GigaOM) — “Stories appear on Reddit, then half a day later they’re on Buzzfeed and Gawker, then they’re on the Washington Post, The Guardian and the New York Times. It’s a pretty established pattern.”
  4. Relatively Prime: The Toolbox — Kickstarted podcasts on mathematics. (via BoingBoing)