Distributed Browser-Based Computation, Streaming Regex, Preventing SQL Injections, and SVM for Faster Deep Learning
- WeevilScout — browser app that turns your browser into a worker for distributed computation tasks. See the poster (PDF). (via Ben Lorica)
- sregex (Github) — A non-backtracking regex engine library for large data streams. See also slide notes from a YAPC::NA talk. (via Ivan Ristic)
- Bobby Tables — a guide to preventing SQL injections. (via Andy Lester)
- Deep Learning Using Support Vector Machines (Arxiv) — we are proposing to train all layers of the deep networks by backpropagating gradients through the top level SVM, learning features of all layers. Our experiments show that simply replacing softmax with linear SVMs gives significant gains on datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop’s face expression recognition challenge. (via Oliver Grisel)
Notable Release, SVG Library, Modular Robot, and Factchecking Politicians Will Not Work
- Quick Reads of Notable New Zealanders — notable for two reasons: (a) CC-NC-BY licensed, and (b) gorgeous gorgeous web design. Not what one normally associates with Government web sites!
- Linkbot: Create with Robots (Kickstarter) — accessible and expandable modular robot. Loaded w/ absolute encoding, accelerometer, rechargeable lithium ion battery and ZigBee. (via IEEE Spectrum)
- The Promise and Peril of Real-Time Corrections to Political Misperceptions (PDF) — paper presenting results of an experiment comparing the effects of real-time corrections to corrections that are presented after a short distractor task. Although real-time corrections are modestly more effective than delayed corrections overall, closer inspection reveals that this is only true among individuals predisposed to reject the false claim. In contrast, individuals whose attitudes are supported by the inaccurate information distrust the source more when corrections are presented in real time, yielding beliefs comparable to those never exposed to a correction. We find no evidence of realtime corrections encouraging counterargument. Strategies for reducing these biases are discussed. So much for the Google Glass bullshit detector transforming politics. (via Vaughan Bell)
Solar Numbers, Process Managers, BitTorrent Sync, and Motherfrickin' Snakes in Your Motherfrickin' Browser
- Solar Energy: This is What a Disruptive Technology Looks Like (Brian McConnell) — In 1977, solar cells cost upwards of $70 per Watt of capacity. In 2013, that cost has dropped to $0.74 per Watt, a 100:1 improvement (source: The Economist). On average, solar power improves 14% per year in terms of energy production per dollar invested.
- Process Managers — overview of the tools that keep your software running.
- Bittorrent Sync — Dropbox-like features, BitTorrent under the hood.
Binary Data Is Back, Scala Data, Visualization Grammar, and Pastebin Monitor
- Capn Proto — open source faster protocol buffers (binary data interchange format and RPC system).
- Saddle — a high performance data manipulation library for Scala.
- Vega — a visualization grammar, a declarative format for creating, saving and sharing visualization designs. (via Flowing Data)
- dumpmon — Twitter bot that monitors paste sites for password dumps and other sensitive information. Source on github, see the announcement for more.
Master Coding, Rethinking Textbooks, Blocking Open Access, VPN from your Pi
- Analyzing mbostock’s queue.js — beautiful walkthrough of a small library, showing the how and why of good coding.
- What Job Would You Hire a Textbook To Do? (Karl Fisch) — notes from a Discovery Education “Beyond the Textbook” event. The issues Karl highlights for textbooks (why digital, etc.) are there for all books as we create this new genre.
- Neutralizing Open Access (Glyn Moody) — the publishers appear to have captured the UK group implementing the UK’s open access policy. At every single step of the way, the RCUK policy has been weakened. From being the best and most progressive in the world, it’s now considerably weaker than policies already in action elsewhere in the world, and hardly represents an increment on their 2006 policy. What’s at stake? Opportunity to do science faster, to provide source access to research for the public, and to redirect back to research the millions of pounds spent on journal subscriptions.
- Turn the Raspberry Pi into a VPN Server (LinuxUser) — One possible scenario for wanting a cheap server that you can leave somewhere is if you have recently moved away from home and would like to be able to easily access all of the devices on the network at home, in a secure manner. This will enable you to send files directly to computers, diagnose problems and other useful things. You’ll also be leaving a powered USB hub connected to the Pi, so that you can tell someone to plug in their flash drive, hard drive etc and put files on it for them. This way, they can simply come and collect it later whenever the transfer has finished.
- Digital Music Consumption on the Internet: Evidence from Clickstream Data (Scribd) — The goal of this paper is to analyze the behavior of digital music consumers on the Internet. Using clickstream data on a panel of more than 16,000 European consumers, we estimate the effects of illegal downloading and legal streaming on the legal purchases of digital music. Our results suggest that Internet users do not view illegal downloading as a substitute to legal digital music. Although positive and signiﬁcant, our estimated elasticities are essentially zero: a 10% increase in clicks on illegal downloading websites leads to a 0.2% increase in clicks on legal purchases websites. Online music streaming services are found to have a somewhat larger (but still small) effect on the purchases of digital sound recordings, suggesting complementarities between these two modes of music consumption. According to our results, a 10% increase in clicks on legal streaming websites lead to up to a 0.7% increase in clicks on legal digital purchases websites. We ﬁnd important cross country difference in these eﬀects. A paper from the EU commission’s in-house science service. (via Don Christie)
- Six Degrees of Francis Bacon — data-driven research into “the early-modern social network”. (via Jonathan Gray)
- Internet Census 2012 — scanning the net via botnet. Appalling how many unsecured devices are directly connected to the net. Also appalling how underused the address space is.
- A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method (PDF) — This project was simultaneously an experiment in developing quantitative and computational methods for tracing changes in literary language. We wanted to see how far quantifiable features such as word usage could be pushed toward the investigation of literary history. Could we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? To this end, we present a second set of results, the techniques and methodological lessons gained in the course of designing and running this project. Even litcrit becoming a data game.
- Easy6502 — get started writing 6502 assembly language. Fun way to get started with low-level coding.
- How Analytics Really Work at a Small Startup (Pete Warden) — The key for us is that we’re using the information we get primarily for decision-making (should we build out feature X?) rather than optimization (how can we improve feature X?). Nice rundown of tools and systems he uses, with plug for KissMetrics.
Inside the Aaron Swartz Investigation, Multivariate Dataset Exploration, Augmediated Life, and Public Experience
- Life Inside the Aaron Swartz Investigation — do hard things and risk failure. What else are we on this earth for?
- Steve Mann: My Augmediated Life (IEEE) — Until recently, most people tended to regard me and my work with mild curiosity and bemusement. Nobody really thought much about what this technology might mean for society at large. But increasingly, smartphone owners are using various sorts of augmented-reality apps. And just about all mobile-phone users have helped to make video and audio recording capabilities pervasive. Our laws and culture haven’t even caught up with that. Imagine if hundreds of thousands, maybe millions, of people had video cameras constantly poised on their heads. If that happens, my experiences should take on new relevance.
- The Google Glass Feature No-One Is Talking About — The most important Google Glass experience is not the user experience – it’s the experience of everyone else. The experience of being a citizen, in public, is about to change.
- The Network of Global Control (PLoS One) — We find that transnational corporations form a giant bow-tie structure and that a large portion of control flows to a small tightly-knit core of financial institutions. [...] From an empirical point of view, a bow-tie structure with a very small and influential core is a new observation in the study of complex networks. We conjecture that it may be present in other types of networks where “rich-get-richer” mechanisms are at work. (via The New Aesthetic)
- Using SimCity to Diagnose My Home Town’s Traffic Problems — no actual diagnosis performed, but the modeling and observations gave insight. I always feel that static visualizations (infographics) are far less useful than an interactive simulation that can give you an intuitive sense of relationships and behaviour. once I’d built East Didsbury, the strip of shops in Northenden stopped making as much money as they once were, and some were even beginning to close down as my time ran out. Walk along Northenden high street, and you’ll know that feeling.
- How the Harlem Shake Went from Viral Sideshow to Global Meme (The Verge) — interesting because again the musician is savvy enough (and has tools and connections) to monetize popularity without trying to own every transaction involving his idea. Baauer and Mad Decent have generally been happy to let a hundred flowers bloom, permitting over 4,000 videos to use an excerpt of the song but quietly adding each of them to YouTube’s Content ID database, asserting copyright over the fan videos and claiming a healthy chunk of the ad revenue for each of them.
Google's Autonomous Cars, DIY BioPrinter, Forms Validation, and Machine Learning Workflow
- Google’s Driverless Car is Worth Trillions (Forbes) — Much of the reporting about Google’s driverless car has mistakenly focused on its science-fiction feel. [...] In fact, the driverless car has broad implications for society, for the economy and for individual businesses. Just in the U.S., the car puts up for grab some $2 trillion a year in revenue and even more market cap. It creates business opportunities that dwarf Google’s current search-based business and unleashes existential challenges to market leaders across numerous industries, including car makers, auto insurers, energy companies and others that share in car-related revenue.
- DIY BioPrinter (Instructables) — Think of it as 3D printing, but with squishier ingredients! How to piggyback on inkjet printer technology to print with your own biomaterials. It’s an exciting time for biohackery: FOO Ewan Birney is kicking ass and taking names, he was just involved in a project storing and retrieving data from DNA.
- ADAMS — open sourced workflow tool for machine learning, from the excellent people at Waikato who brought you WEKA. ADAMS = Advanced Data mining And Machine learning System.