"scaling" entries

Scaling NoSQL databases: 5 tips for increasing performance

How NoSQL databases scale vertically and horizontally, and what you should consider when building a DB cluster.

Hypergrid_by_John_Lester_Flickr

Editor’s note: this post is a follow-up to a recent webcast, “Getting the Most Out of Your NoSQL DB,” by the post author, Alex Bordei.

As product manager for Bigstep’s Full Metal Cloud, I work with a lot of amazing technologies. Most of my work actually involves pushing applications to their limits. My mission is simple: make sure we get the highest performance possible out of each setup we test, then use that knowledge to constantly improve our services.

Here are some of the things I’ve learned along the way about how NoSQL databases scale vertically and horizontally, and what things you should consider when building a DB cluster. Some of these findings can be applied to RDBMS as well, so read on even if you’re still a devoted SQL fan. You might just get up to 60% more performance out of that database soon enough. Read more…

How Flash changes the design of database storage engines

High-performing memory throws many traditional decisions overboard

supermicro_storage

Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance ratio radically different from both RAM and disk drives. But databases have just started to catch up during the past few years. Most still depend on internal data structures and storage management fine-tuned for spinning disks.

Citing price and performance, one author advised a wide range of database vendors to move to Flash. Certainly, a database administrator can speed up old databases just by swapping out disk drives and inserting Flash, but doing so captures just a sliver of the potential performance improvement promised by Flash. For this article, I asked several database experts — including representatives of Aerospike, Cassandra, FoundationDB, RethinkDB, and Tokutek — how Flash changes the design of storage engines for databases. The various ways these companies have responded to its promise in their database designs are instructive to readers designing applications and looking for the best storage solutions.

Read more…

Four short links: 28 March 2014

Four short links: 28 March 2014

Javascript on Glass, Smart Lights, Hardware Startups, MySQL at Scale

  1. WearScript — open source project putting Javascript on Glass. See story on it. (via Slashdot)
  2. Mining the World’s Data by Selling Street Lights and Farm Drones (Quartz) — Depending on what kinds of sensors the light’s owners choose to install, Sensity’s fixtures can track everything from how much power the lights themselves are consuming to movement under the post, ambient light, and temperature. More sophisticated sensors can measure pollution levels, radiation, and particulate matter (for air quality levels). The fixtures can also support sound or video recording. Bring these lights onto city streets and you could isolate the precise location of a gunshot within seconds.
  3. An Investor’s Guide to Hardware Startups — good to know if you’re thinking of joining one, too.
  4. WebScaleSQL — a MySQL downstream patchset built for “large scale” (aka Google, Facebook type loads).
Four short links: 25 September 2013

Four short links: 25 September 2013

Scaling Systems, Code Reviews in Github, Humane Javascript, and Privacy in Identity

  1. Salesforce ArchitectureOur search tier runs on commodity Linux hosts, each of which is augmented with a 640 GiB PCI-E flash drive which serves as a caching layer for search requests. These hosts get their data from a shared SAN array via an NFS file system. Search indexes are stored on the flash drive to enable greater performance for search throughput. Architecture porn.
  2. Gerrit Code Review (Github) — tool for doing code reviews on Github codebases. (via Chris Aniszczyk)
  3. Humanize (Github) — Javascript to turn “first” into a list position, format numbers, generate plurals in English, etc. (via Pete Warden)
  4. Users vs Apps (Tim Bray) — the wrong thing being shared with the wrong people, even once, can ruin a trust relationship forever. Personally, I’m pretty hard-line about this one. I’m currently refusing to update the Android app from my bank, CIBC, because it wants access to my contacts. You know what the right amount of “social” content is in my relationship with my bank? Zero, that’s what.
Four short links: 30 August 2013

Four short links: 30 August 2013

Flexible Layouts, Web Components, Distributed SQL Database, and Reverse-Engineering Dropbox Client

  1. intention.jsmanipulates the DOM via HTML attributes. The methods for manipulation are placed with the elements themselves, so flexible layouts don’t seem so abstract and messy.
  2. Introducing Brick: Minimal-markup Web Components for Faster App Development (Mozilla) — a cross-browser library that provides new custom HTML tags to abstract away common user interface patterns into easy-to-use, flexible, and semantic Web Components. Built on Mozilla’s x-tags library, Brick allows you to plug simple HTML tags into your markup to implement widgets like sliders or datepickers, speeding up development by saving you from having to initially think about the under-the-hood HTML/CSS/JavaScript.
  3. F1: A Distributed SQL Database That Scalesa distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which provides synchronous cross-datacenter replication and strong consistency. Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design. F1 also includes a fully functional distributed SQL query engine and automatic change tracking and publishing.
  4. Looking Inside The (Drop)Box (PDF) — This paper presents new and generic techniques, to reverse engineer frozen Python applications, which are not limited to just the Dropbox world. We describe a method to bypass Dropbox’s two factor authentication and hijack Dropbox accounts. Additionally, generic techniques to intercept SSL data using code injection techniques and monkey patching are presented. (via Tech Republic)

Top Stories: April 9-13, 2012

Carsharing boosts city governments, why complex systems fail, and what web ops teams could do with big data.

This week on O'Reilly: How Zipcar's technology is saving big money for U.S. city governments, why scalable clouds need simple parts, and pondering the possibilities of web ops and machine learning.

Four short links: 20 December 2011

Four short links: 20 December 2011

Maximum MySQL, Digital News, Unbiased Mining, and Congressional Clue

  1. How Twitter Stores 250M Tweets a Day Using MySQL (High Scalability) — notes from a talk at the MySQL conference on how Twitter built a high-volume MySQL store.
  2. How The Atlantic Got Profitable With Digital First (Mashable) — Lauf says his team has focused on putting together premium advertising experiences that span print, digital, events and (increasingly) mobile.
  3. Data Mining Without Prejudice — an attempt to measure fit without pre-favouring one type of curve over another.
  4. It Is No Longer OK Not To Know How Congress Works (Clay Johnson) — looking for a specific innovation to try and change the way Washington works by the time Congress votes on SOPA is about as foolish as Steve Jobs trying to diet his way out of having pancreatic cancer.

The Meat to Math ratio

The ability to augment people (meat) with data and processes (math) is a key to success.

Successful companies find ways to augment their employees, allowing them to operate at scale with customers. Big data, machine learning, and an iterative, experimental mindset are essential — and increasingly, company valuations are tied to the efficiency with which firms put information to work.

Four short links: 12 July 2011

Four short links: 12 July 2011

Rare Visualization, Google+ Tech, Scala+Erlang, and In-Database Analytics

  1. Slopegraphs — a nifty Tufte visualization which conveys rank, value, and delta over time. Includes pointers to how to make them, and guidelines for when and how they work. (via Avi Bryant)
  2. Ask Me Anything: A Technical Lead on the Google+ Team — lots of juicy details about technology and dev process. A couple nifty tricks we do: we use the HTML5 History API to maintain pretty-looking URLs even though it’s an AJAX app (falling back on hash-fragments for older browsers); and we often render our Closure templates server-side so the page renders before any JavaScript is loaded, then the JavaScript finds the right DOM nodes and hooks up event handlers, etc. to make it responsive (as a result, if you’re on a slow connection and you click on stuff really fast, you may notice a lag before it does anything, but luckily most people don’t run into this in practice). (via Nahum Wild)
  3. scalang (github) — a Scala wrapper that makes it easy to interface with Erlang, so you can use two hipster-compliant built-to-scale technologies in the same project. (via Justin Sheehy)
  4. Madliban open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. (via Mike Loukides)