- Salesforce Architecture — Our search tier runs on commodity Linux hosts, each of which is augmented with a 640 GiB PCI-E flash drive which serves as a caching layer for search requests. These hosts get their data from a shared SAN array via an NFS file system. Search indexes are stored on the flash drive to enable greater performance for search throughput. Architecture porn.
- Gerrit Code Review (Github) — tool for doing code reviews on Github codebases. (via Chris Aniszczyk)
- Users vs Apps (Tim Bray) — the wrong thing being shared with the wrong people, even once, can ruin a trust relationship forever. Personally, I’m pretty hard-line about this one. I’m currently refusing to update the Android app from my bank, CIBC, because it wants access to my contacts. You know what the right amount of “social” content is in my relationship with my bank? Zero, that’s what.
ENTRIES TAGGED "scaling"
Flexible Layouts, Web Components, Distributed SQL Database, and Reverse-Engineering Dropbox Client
- intention.js — manipulates the DOM via HTML attributes. The methods for manipulation are placed with the elements themselves, so flexible layouts don’t seem so abstract and messy.
- F1: A Distributed SQL Database That Scales — a distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which provides synchronous cross-datacenter replication and strong consistency. Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design. F1 also includes a fully functional distributed SQL query engine and automatic change tracking and publishing.
- Looking Inside The (Drop)Box (PDF) — This paper presents new and generic techniques, to reverse engineer frozen Python applications, which are not limited to just the Dropbox world. We describe a method to bypass Dropbox’s two factor authentication and hijack Dropbox accounts. Additionally, generic techniques to intercept SSL data using code injection techniques and monkey patching are presented. (via Tech Republic)
Carsharing boosts city governments, why complex systems fail, and what web ops teams could do with big data.
This week on O'Reilly: How Zipcar's technology is saving big money for U.S. city governments, why scalable clouds need simple parts, and pondering the possibilities of web ops and machine learning.
Maximum MySQL, Digital News, Unbiased Mining, and Congressional Clue
- How Twitter Stores 250M Tweets a Day Using MySQL (High Scalability) — notes from a talk at the MySQL conference on how Twitter built a high-volume MySQL store.
- How The Atlantic Got Profitable With Digital First (Mashable) — Lauf says his team has focused on putting together premium advertising experiences that span print, digital, events and (increasingly) mobile.
- Data Mining Without Prejudice — an attempt to measure fit without pre-favouring one type of curve over another.
- It Is No Longer OK Not To Know How Congress Works (Clay Johnson) — looking for a specific innovation to try and change the way Washington works by the time Congress votes on SOPA is about as foolish as Steve Jobs trying to diet his way out of having pancreatic cancer.
The ability to augment people (meat) with data and processes (math) is a key to success.
Successful companies find ways to augment their employees, allowing them to operate at scale with customers. Big data, machine learning, and an iterative, experimental mindset are essential — and increasingly, company valuations are tied to the efficiency with which firms put information to work.
Rare Visualization, Google+ Tech, Scala+Erlang, and In-Database Analytics
- Slopegraphs — a nifty Tufte visualization which conveys rank, value, and delta over time. Includes pointers to how to make them, and guidelines for when and how they work. (via Avi Bryant)
- scalang (github) — a Scala wrapper that makes it easy to interface with Erlang, so you can use two hipster-compliant built-to-scale technologies in the same project. (via Justin Sheehy)
- Madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. (via Mike Loukides)