- Repo Surveillance Network — An automated reader attached to the spotter car takes a picture of every license plate it passes and sends it to a company in Texas that already has more than 1.8 billion plate scans from vehicles across the country.
- Mobile Companies Work Big Data — Meanwhile companies are taking different approaches to user consent. Orange collects data for its Flux Vision data product from French mobile users without offering a way for them to opt-out, as does Telefonica’s equivalent service. Verizon told customers in 2011 it could use their data and now includes 100 million retail mobile customers by default, though they can opt out online.
- Serfdom — a decentralised solution for service discovery and orchestration that is lightweight, highly available, and fault tolerant.
- Longomatch — a free video analysis software for sport analysts with unlimited possibilities: Record, Tag, Review, Draw, Edit Videos and much more! (via Mark Osborne)
Learning Machine Learning, Pokemon Coding, Drone Coverage, and Optimization Guide
- CalTech Machine Learning Video Library — a pile of video introductions to different machine learning concepts.
- Awesome Pokemon Hack — each inventory item has a number associated with it, they are kept at a particular memory location, and there’s a glitch in the game that executes code at that location so … you can program by assembling items and then triggering the glitch. SO COOL.
- Drone Footage of Bangkok Protests — including water cannons.
- The Mature Optimization Handbook — free, well thought out, and well written. My favourite line: In exchange for that saved space, you have created a hidden dependency on clairvoyance.
Will WebRTC disrupt or be disrupted?
WebRTC promises to deliver computer to computer communications with minimal reliance on central servers to manage the conversation. Peer-to-peer systems promise smoother exchanges without the tremendous scale challenges of running video, for example, through central points.
The WebRTC Conference and Expo was unlike any other web conference I’ve attended. Though technologies in development are common at tech conferences, I can’t remember attending a show that was focused on a technology whose future had these levels of promise and uncertainty. Also, despite the name, WebRTC doesn’t resemble much of the Web despite being built into some browsers (more hopefully coming soon) and supporting HTTP(S) proxying.
Flying Robot, State of Cyberspace, H.264, and Principal Component Analysis
- Insect-Inspired Collision-Resistant Robot — clever hack to make it stable despite bouncing off things.
- The Battle for Power on the Internet (Bruce Schneier) — the state of cyberspace. [M]ost of the time, a new technology benefits the nimble first. […] In other words, there will be an increasing time period during which nimble distributed powers can make use of new technologies before slow institutional powers can make better use of those technologies.
- Cisco’s H.264 Good News (Brendan Eich) — Cisco is paying the license fees for a particular implementation of H.264 to be used in open source software, enabling it to be the basis of web streaming video across all browsers (even the open source ones). It’s not as ideal a solution as it might sound.
- Principal Component Analysis for Dummies — This post will give a very broad overview of PCA, describing eigenvectors and eigenvalues (which you need to know about to understand it) and showing how you can reduce the dimensions of data using PCA. As I said it’s a neat tool to use in information theory, and even though the maths is a bit complicated, you only need to get a broad idea of what’s going on to be able to use it effectively.
Disk Over Ethernet, Inside Elite, Polar Charts, and R Videos
- Seagate Kinetic Storage — In the words of Geoff Arnold: The physical interconnect to the disk drive is now Ethernet. The interface is a simple key-value object oriented access scheme, implemented using Google Protocol Buffers. It supports key-based CRUD (create, read, update and delete); it also implements third-party transfers (“transfer the objects with keys X, Y and Z to the drive with IP address 184.108.40.206”). Configuration is based on DHCP, and everything can be authenticated and encrypted. The system supports a variety of key schemas to make it easy for various storage services to shard the data across multiple drives.
- Masters of Their Universe (Guardian) — well-written and fascinating story of the creation of the Elite game (one founder of which went on to make the Raspberry Pi). The classic action game of the early 1980s – Defender, Pac Man – was set in a perpetual present tense, a sort of arcade Eden in which there were always enemies to zap or gobble, but nothing ever changed apart from the score. By letting the player tool up with better guns, Bell and Braben were introducing a whole new dimension, the dimension of time.
- Micropolar (github) — A tiny polar charts library made with D3.js.
- Introduction to R (YouTube) — 21 short videos from Google.
Video Editing, Game Engine, Python Debugger, and P2P VPN
Filmic Photogrammetry, Car APIs, Takedowns, and OpenCV for Processing
- Sifted — 7 minute animation set in a point cloud world, using photogrammetry in film-making. My brilliant cousin Ben wrote the software behind it. See this newspaper article and tv report for more.
- Vehicle Tech Out of Sync with Drivers’ Devices — Ford Motor Co. has its own system. Apple Inc. is working with one set of automakers to design an interface that works better with its iPhone line. Some of the same car companies and others have joined the Car Connectivity Consortium, which is working with the major Android phone brands to develop a different interface. FFS. “… you are changing your phone every other year, and the top-of-mind apps are continuously changing.” That’s why Chevrolet, Mini and some other automakers are starting to offer screens that mirror apps from a smartphone.
- Incentives in Notice and Takedown (PDF) — findings summarised in Blocking and Removing Illegal Child Sexual Content: Analysis from a Technical and Legal Perspective: financial institutions seemed to be relatively successful at removing phishing websites while it took on average 150 times longer to remove child pornography.
- OpenCV for Processing (Github) — OpenCV for Processing is based on the official OpenCV Java bindings. Therefore, in addition to a suite of friendly functions for all the basics, you can also do anything that OpenCV can do. And a book from O’Reilly, and it’ll be CC-licensed. All is win. (via Greg Borenstein)
Microvideos for MIcrohelp, Organic Search, Probabilistic Programming, and Cluster Management
- How to Make Help Microvideos For Your Site (Alex Holovaty) — Instead of one monolithic video, we decided to make dozens of tiny, five-second videos separately demonstrating features.
- How Google is Killing Organic Search — 13% of the real estate is organic results in a search for “auto mechanic”, 7% for “italian restaurant”, 0% if searching on an iPhone where organic results are four page scrolls away. SEO Book did an extensive analysis of just how important the top left of the page, previously occupied by organic results actually is to visitors. That portion of the page is now all Google. (via Alex Dong)
- Church — probabilistic programming language from MIT, with tutorials. (via Edd Dumbill)
- mesos — a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications. Mesos is open source in the Apache Incubator. (via Ben Lorica)
Velocity 2013 Speaker Series
Be honest, have you ever wanted to play Steve Souders for a day and pull some revealing stats or trends about some web sites of your choice? Or maybe dig around the HTTP archive? You can do that and more by setting up your own HTTP Archive.
httparchive.org is a fantastic tool to track, monitor, and review how the web is built. You can dig into trends around page size, page load time, content delivery network (CDN) usage, distribution of different mimetypes, and many other stats. With the integration of WebPagetest, it’s a great tool for synthetic testing as well.
You can download an HTTP Archive MySQL dump (warning: it’s quite large) and the source code from the download page and dissect a snapshot of the data yourself. Once you’ve set up the database, you can easily query anything you want.
You need MySQL, PHP, and your own webserver running. As I mentioned above, HTTP Archive relies on WebPagetest—if you choose to run your own private instance of WebPagetest, you won’t have to request an API key. I decided to ask Patrick Meenan for an API key with limited query access. That was sufficient for me at the time. If I ever wanted to use more than 200 page loads per day, I would probably want to set up a private instance of WebPagetest.
To find more details on how to set up an HTTP Archive instance yourself and any further advice, please check out my blog post.
Going back to the scenario I described above: the real motivation is that often you don’t want to throw your website(s) in a pile of other websites (e.g. not related to your business) to compare or define trends. Our digital property at the Canadian Broadcasting Corporation’s (CBC) spans over dozens of URLs that have different purposes and audiences. For example, CBC Radio covers most of the Canadian radio landscape, CBC News offers the latest breaking news, CBC Hockey Night in Canada offers great insights on anything related to hockey, and CBC Video is the home for any video available on CBC. It’s valuable for us to not only compare cbc.ca to the top 100K Alexa sites but also to verify stats and data against our own pool of web sites.
In this case, we want to use a set of predefined URLs that we can collect HTTP Archive stats for. Hence a private instance can come in handy—we can run tests every day, or every week, or just every month to gather information about the performance of the sites we’ve selected. From there, it’s easy to not only compare trends from httparchive.org to our own instance as a performance baseline, but also have a great amount of data in our local database to run queries against and to do proper performance monitoring and investigation.
The beautiful thing about having your own instance is that you can be your own master of data visualization: you can now create more charts in addition to the ones that came out of the box with the default HTTP Archive setup. And if you don’t like Google chart tools, you may even want to check out D3.js or Highcharts instead.
The image below shows all mime types used by CBC web properties that are captured in our HTTP archive database, using D3.js bubble charts for visualization.