- Mining the World’s Data by Selling Street Lights and Farm Drones (Quartz) — Depending on what kinds of sensors the light’s owners choose to install, Sensity’s fixtures can track everything from how much power the lights themselves are consuming to movement under the post, ambient light, and temperature. More sophisticated sensors can measure pollution levels, radiation, and particulate matter (for air quality levels). The fixtures can also support sound or video recording. Bring these lights onto city streets and you could isolate the precise location of a gunshot within seconds.
- An Investor’s Guide to Hardware Startups — good to know if you’re thinking of joining one, too.
- WebScaleSQL — a MySQL downstream patchset built for “large scale” (aka Google, Facebook type loads).
ENTRIES TAGGED "scaling"
High-performing memory throws many traditional decisions overboard
Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance ratio radically different from both RAM and disk drives. But databases have just started to catch up during the past few years. Most still depend on internal data structures and storage management fine-tuned for spinning disks.
Citing price and performance, one author advised a wide range of database vendors to move to Flash. Certainly, a database administrator can speed up old databases just by swapping out disk drives and inserting Flash, but doing so captures just a sliver of the potential performance improvement promised by Flash. For this article, I asked several database experts — including representatives of Aerospike, Cassandra, FoundationDB, RethinkDB, and Tokutek — how Flash changes the design of storage engines for databases. The various ways these companies have responded to its promise in their database designs are instructive to readers designing applications and looking for the best storage solutions.
- Salesforce Architecture — Our search tier runs on commodity Linux hosts, each of which is augmented with a 640 GiB PCI-E flash drive which serves as a caching layer for search requests. These hosts get their data from a shared SAN array via an NFS file system. Search indexes are stored on the flash drive to enable greater performance for search throughput. Architecture porn.
- Gerrit Code Review (Github) — tool for doing code reviews on Github codebases. (via Chris Aniszczyk)
- Users vs Apps (Tim Bray) — the wrong thing being shared with the wrong people, even once, can ruin a trust relationship forever. Personally, I’m pretty hard-line about this one. I’m currently refusing to update the Android app from my bank, CIBC, because it wants access to my contacts. You know what the right amount of “social” content is in my relationship with my bank? Zero, that’s what.
Flexible Layouts, Web Components, Distributed SQL Database, and Reverse-Engineering Dropbox Client
- intention.js — manipulates the DOM via HTML attributes. The methods for manipulation are placed with the elements themselves, so flexible layouts don’t seem so abstract and messy.
- F1: A Distributed SQL Database That Scales — a distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which provides synchronous cross-datacenter replication and strong consistency. Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design. F1 also includes a fully functional distributed SQL query engine and automatic change tracking and publishing.
- Looking Inside The (Drop)Box (PDF) — This paper presents new and generic techniques, to reverse engineer frozen Python applications, which are not limited to just the Dropbox world. We describe a method to bypass Dropbox’s two factor authentication and hijack Dropbox accounts. Additionally, generic techniques to intercept SSL data using code injection techniques and monkey patching are presented. (via Tech Republic)
Carsharing boosts city governments, why complex systems fail, and what web ops teams could do with big data.
This week on O'Reilly: How Zipcar's technology is saving big money for U.S. city governments, why scalable clouds need simple parts, and pondering the possibilities of web ops and machine learning.
Maximum MySQL, Digital News, Unbiased Mining, and Congressional Clue
- How Twitter Stores 250M Tweets a Day Using MySQL (High Scalability) — notes from a talk at the MySQL conference on how Twitter built a high-volume MySQL store.
- How The Atlantic Got Profitable With Digital First (Mashable) — Lauf says his team has focused on putting together premium advertising experiences that span print, digital, events and (increasingly) mobile.
- Data Mining Without Prejudice — an attempt to measure fit without pre-favouring one type of curve over another.
- It Is No Longer OK Not To Know How Congress Works (Clay Johnson) — looking for a specific innovation to try and change the way Washington works by the time Congress votes on SOPA is about as foolish as Steve Jobs trying to diet his way out of having pancreatic cancer.
The ability to augment people (meat) with data and processes (math) is a key to success.
Successful companies find ways to augment their employees, allowing them to operate at scale with customers. Big data, machine learning, and an iterative, experimental mindset are essential — and increasingly, company valuations are tied to the efficiency with which firms put information to work.
Rare Visualization, Google+ Tech, Scala+Erlang, and In-Database Analytics
- Slopegraphs — a nifty Tufte visualization which conveys rank, value, and delta over time. Includes pointers to how to make them, and guidelines for when and how they work. (via Avi Bryant)
- scalang (github) — a Scala wrapper that makes it easy to interface with Erlang, so you can use two hipster-compliant built-to-scale technologies in the same project. (via Justin Sheehy)
- Madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. (via Mike Loukides)