- Ubiquity — Sears Holdings has formed a new unit to market space from former Sears and Kmart retail stores as a home for data centers, disaster recovery space and wireless towers.
- Google Abandons Open Standards for Instant Messaging (EFF) — it has to be a sign of the value to users of open standards that small companies embrace them and large companies reject them.
- How Does Copyright Work in Space? (The Economist) — amazingly complex rights trail for the International Space Station-recorded cover of “Space Oddity”. Sample: Commander Hadfield and his son Evan spent several months hammering out details with Mr Bowie’s representatives, and with NASA, Russia’s space agency ROSCOSMOS and the CSA. That’s the SIMPLE HAPPY ENDING.
- Great Lessons: Evan Weinberg’s “Do You Know Blue?” (Dan Meyer) — It’s a bridge from math to computer science. Students get a chance to write algorithms in a language understood by both mathematicians and the computer scientists. It’s analogous to the Netflix Prize for grown-up computer scientists.
Areas concerned with shapes, invariants, and dynamics, in high-dimensions, are proving useful in data analysis
I’ve been noticing unlikely areas of mathematics pop-up in data analysis. While signal processing is a natural fit, topology, differential and algebraic geometry aren’t exactly areas you associate with data science. But upon further reflection perhaps it shouldn’t be so surprising that areas that deal in shapes, invariants, and dynamics, in high-dimensions, would have something to contribute to the analysis of large data sets. Without further ado, here are a few examples that stood out for me. (If you know of other examples of recent applications of math in data analysis, please share them in the comments.)
Compressed sensing is a signal processing technique which makes efficient data collection possible. As an example using compressed sensing images can be reconstructed from small amounts of data. Idealized Sampling is used to collect information to measure the most important components. By vastly decreasing the number of measurements to be collected, less data needs to stored, and one reduces the amount of time and energy1 needed to collect signals. Already there have been applications in medical imaging and mobile phones.
The problem is you don’t know ahead of time which signals/components are important. A series of numerical experiments led Emanuel Candes to believe that random samples may be the answer. The theoretical foundation as to why a random set of signals would work, where laid down in a series of papers by Candes and Fields Medalist Terence Tao2.
Repurposing Dead Retail Space, Open Standards, Space Copyright, and Bridging Lessons
Exploiting Glass, Teaching Probability, Product Design, and Subgraph Matching
- Exploiting a Bug in Google Glass — unbelievably detailed and yet easy-to-follow explanation of how the bug works, how the author found it, and how you can exploit it too. The second guide was slightly more technical, so when he returned a little later I asked him about the Debug Mode option. The reaction was interesting: he kind of looked at me, somewhat confused, and asked “wait, what version of the software does it report in Settings”? When I told him “XE4” he clarified “XE4, not XE3”, which I verified. He had thought this feature had been removed from the production units.
- Probability Through Problems — motivating problems to hook students on probability questions, structured to cover high-school probability material.
- Connbox — love the section “The importance of legible products” where the physical UI interacts seamless with the digital device … it’s glorious. Three amazing videos.
- The Index-Based Subgraph Matching Algorithm (ISMA): Fast Subgraph Enumeration in Large Networks Using Optimized Search Trees (PLoSONE) — The central question in all these fields is to understand behavior at the level of the whole system from the topology of interactions between its individual constituents. In this respect, the existence of network motifs, small subgraph patterns which occur more often in a network than expected by chance, has turned out to be one of the defining properties of real-world complex networks, in particular biological networks. […] An implementation of ISMA in Java is freely available.
Analytics vs Learning, Reproducible Science, Ramping up Military Internet Attacks, and Compressed Sensing
- Analytics for Learning — Since doing good learning analytics is hard, we often do easy learning analytics and pretend that they are good instead. But pretending doesn’t make it so. (via Dan Meyer)
- Reproducible Research — a list of links to related work about reproducible research, reproducible research papers, etc. (via Stijn Debrouwere)
- Pentagon Deploying 100+ Cyber Teams — The organization defending military networks — cyber protection forces — will comprise more than 60 teams, a Pentagon official said. The other two organizations — combat mission forces and national mission forces — will conduct offensive operations. I’ll repeat that: offensive operations.
- Towards Deterministic Compressed Sensing (PDF) — instead of taking lots of data, compressing by throwing some away, can we only take a few samples and reconstruct the original from that? (more mathematically sound than my handwaving explanation). See also Compressed sensing and big data from the Practical Quant. (via Ben Lorica)
Inside the Aaron Swartz Investigation, Multivariate Dataset Exploration, Augmediated Life, and Public Experience
- Life Inside the Aaron Swartz Investigation — do hard things and risk failure. What else are we on this earth for?
- Steve Mann: My Augmediated Life (IEEE) — Until recently, most people tended to regard me and my work with mild curiosity and bemusement. Nobody really thought much about what this technology might mean for society at large. But increasingly, smartphone owners are using various sorts of augmented-reality apps. And just about all mobile-phone users have helped to make video and audio recording capabilities pervasive. Our laws and culture haven’t even caught up with that. Imagine if hundreds of thousands, maybe millions, of people had video cameras constantly poised on their heads. If that happens, my experiences should take on new relevance.
- The Google Glass Feature No-One Is Talking About — The most important Google Glass experience is not the user experience – it’s the experience of everyone else. The experience of being a citizen, in public, is about to change.
Design compels. Math is proof. Both sides will defend their domains at Strata's next Great Debate.
At Strata Santa Clara later this month, we’re reprising what has become a tradition: Great Debates. These Oxford-style debates pit two teams against one another to argue a hot topic in the fields of big data, ubiquitous computing, and emerging interfaces.
Part of the fun is the scoring: attendees vote on whether they agree with the proposal before the debaters; and after both sides have said their piece, the audience votes again. Whoever moves the needle wins.
This year’s proposition — that design matters more than math — is sure to inspire some vigorous discussion. The argument for math is pretty strong. Math is proof. Given enough data — and today, we have plenty — we can know. “The right information in the right place just changes your life,” said Stewart Brand. Properly harnessed, the power of data analysis and modeling can fix cities, predict epidemics, and revitalize education. Abused, it can invade our lives, undermine economies, and steal elections. Surely the algorithms of big data matter!
But your life won’t change by itself. Bruce Mau defines design as “the human capacity to plan and produce desired outcomes.” Math informs; design compels. Without design, math can’t do its thing. Poorly designed experiments collect the wrong data. And if the data can’t be understood and acted upon, it may as well not have been crunched in the first place.
This is the question we’ll be putting to our debaters: Which matters more? A well-designed collection of flawed information — or an opaque, hard-to-parse, but unerringly accurate model? From mobile handsets to social policy, we need both good math and good design. Which is more critical? Read more…
SCADA 0-Day, Complexity Course, ToS Tracking, and Custom Manufacturing Prostheses
- Tridium Niagara (Wired) — A critical vulnerability discovered in an industrial control system used widely by the military, hospitals and others would allow attackers to remotely control electronic door locks, lighting systems, elevators, electricity and boiler systems, video surveillance cameras, alarms and other critical building facilities, say two security researchers. cf the SANS SCADA conference.
- Santa Fe Institute Course: Introduction to Complexity — 11 week course on understanding complex systems: dynamics, chaos, fractals, information theory, self-organization, agent-based modeling, and networks. (via BoingBoing)
- Terms of Service Changes — a site that tracks changes to terms of service. (via Andy Baio)
- 3D Printing a Replacement Hand for a 5 Year Old Boy (Ars Technica) — the designs are on Thingiverse. For more, see their blog.
Practical advice for those considering a career in data science
When I was a youngster in college I found myself dissatisfied after I took a stats class from the math department. So I decided to take another stats class. Classmates thought I was crazy. Let’s be real, what precocious over-achieving teenager majoring in English lit seeks to retake a math class? And not because of a grade but because they were dissatisfied with what they didn’t get out of it? After a bit of research, I decided to take the stats class offered by the psych department.
It made a significant difference.
Thinking about math from the perspectives of research design methodology and how data can be used to manipulate people made quite an impact on my teenage worldview. This experience also reinforced my belief that education is what you decide it will be. There is always more than one way to learn and education doesn’t necessarily have to happen in a physical classroom. Growing up in the San Francisco Bay Area where friends and loved ones decided to forgo traditional higher ed completely to start their own companies or immediately work in jobs in technology also contributed to this belief.
While full time students who are looking at a career in data science may have the time to do seemingly nutty things like take overlapping math classes, this is not something that most people with full time jobs are able to do. When people with full time jobs ask me about what they need to do to move into data science, I probe them about the kind of job in data science they want and about their analytical and empathy skills. Then, I immediately follow up with “So, how are your math skills?.” Interestingly enough, I get a lot people saying how they don’t have time to physically go into a classroom or that it has been, like, forever since they’ve used statistics and/or linear algebra for data analysis. Even more interesting is how often people don’t realize just how many resources are available to learn math outside of the physical-attendance-in-a-classroom-model.
Huh. Read more…
SSH/L Multiplexer, GitHub Bots, Test Your Assumptions, and Tech Trends
- sslh — ssh/ssl multiplexer.
- Github Says No to Bots (Wired) — what’s interesting is that bots augmenting photos is awesome in Flickr: take a photo of the sky and you’ll find your photo annotated with stars and whatnot. What can GitHub learn from Flickr?
- Four Assumptions of Multiple Regression That Researchers Should Always Test — “but I found the answer I wanted! What do you mean, it might be wrong?!”
- Tenth Grade Tech Trends (Medium) — if you want to know what will have mass success, talk to early adopters in the mass market. We alpha geeks aren’t that any more.
Win95 Tips, Obama's Big Data, Aggregate Statistics, and Foxconn Robots
- Windows 95 Tips — hilarious tumblr showing the dark side of life through Windows 95 UI tips. (via Juha Saarinen)
- Everything We Know About Obama’s Big Data Operation (Pro Publica) — “White suburban women? They’re not all the same. The Latino community is very diverse with very different interests,” Dan Wagner, the campaign’s chief analytics officer, told The Los Angeles Times. “What the data permits you to do is figure out that diversity.”
- cube (GitHub) — time-series data collection and analysis. Cube lets you compute aggregate statistics post hoc. It also enables richer analysis, such as quantiles and histograms of arbitrary event sets. Cube is built on MongoDB and available under the Apache License on GitHub.
- 1M Robots to Replace 1M Human Jobs at Foxconn (Singularity Hub) — Foxconn plant opening, making manufacturing robots, and they appear to be dogfooding by using them in other plants. $25k each, 10k+ made, and fits into the pattern: the number of operational robots in China increased by 42 percent from 2010 to 2011.