"open source" entries

Big data, small cluster

Finding new ways to shrink disk space for storing partitionable data.

1024px-30_Doradus_or_NGC_2070

Register for the free webcast, “Extending Cassandra with Doradus OLAP for High Performance Analytics,” which will be held July 29 at 9 a.m. PT.

Engineers at Dell were developing customer apps when they found that the query response times their customers were demanding — something on the order of seconds (in other words, the need to scan millions of objects/second) — required a new type of query engine. This led them on a four-year journey to create Doradus, one of Dell Software Group’s first open-source projects.

Doradus is a server framework that runs on top of Cassandra. To build Doradus, the team borrowed from several well-accepted paradigms. They used traditional OLAP techniques to allow data to be arranged into static, multidimensional cubes. They leveraged the vertical orientation and efficient compression of columnar databases. And, from the NoSQL world, they employed sharding. The result: a storage and query engine called Doradus OLAP that stores data up to 1M objects/second/node, providing nearly real-time data warehousing. This architecture also allows for extreme compression of the data, sometimes producing up to a 99% reduction in space usage.

This extremely dense storage means that data that once took multiple nodes can now be stored on a single node, allowing for fast queries without the expense of a large cluster. Because Doradus is built on top of Cassandra, the option to scale out is still there. This allows for sharding and replication, and also takes advantage of Cassandra’s failover features. Read more…

Comment
Four short links: 20 July 2015

Four short links: 20 July 2015

Less Spam, Down on Dropdowns, Questioning Provable Security, and Crafting Packets

  1. Spam Under Half of Email (PDF) — Symantec report: There is good news this month on the email-based front of the threat landscape. According to our metrics, the overall spam rate has dropped to 49.7%. This is the first time this rate has fallen below 50% of email for over a decade. The last time Symantec recorded a similar spam rate was clear back in September of 2003.
  2. Dropdowns Should be the UI of Last Resort (Luke Wroblewski) — Well-designed forms make use of the most appropriate input control for each question they ask. Sometimes that’s a stepper, a radio group, or even a dropdown menu. But because they are hard to navigate, hide options by default, don’t support hierarchies, and only enable selection not editing, dropdowns shouldn’t be the first UI control you reach for. In today’s software designs, they often are. So instead, consider other input controls first and save the dropdown as a last resort.
  3. Another Look at Provable SecurityIn our time, one of the dominant paradigms in cryptographic research goes by the name “provable security.” This is the notion that the best (or, some would say, the only) way to have confidence in the security of a cryptographic protocol is to have a mathematically rigorous theorem that establishes some sort of guarantee of security (defined in a suitable way) under certain conditions and given certain assumptions. The purpose of this website is to encourage the emergence of a more skeptical and less credulous attitude toward this notion and to contribute to a process of critical analysis of the positive and negative features of the “provable security” paradigm.
  4. Pig (github) — a Linux packet crafting tool. You can use Pig to test your IDS/IPS among other stuffs.
Comment
Four short links: 17 July 2015

Four short links: 17 July 2015

Smalltalky Web, Arduino Speech, Testing Distributed Systems, and Dataflow for FP

  1. Project Journal: Objects (Ian Bicking) — a view askew at the Web, inspired by Alan Kay’s History of Smalltalk.
  2. Speech Recognition for Arduino (Kickstarter) — for all your creepy toy hacking needs!
  3. Conductor (github) — a framework for testing distributed systems.
  4. Dataflow Syntax for Functional Programming? — two great tastes that will make your head hurt together!
Comment
Four short links: 16 July 2015

Four short links: 16 July 2015

Consumer Exoskeleton, Bitcoin Trends, p2p Sockets, and Plain Government Comms

  1. ReWalk Robotics Exoskeleton — first exoskeleton for the paralyzed to receive regulatory approval; 66 bought so far, 11 with reimbursement from insurance. The software upgrades for the ReWalk 6.0 provide a smoother walking gait (with less of a soldier-like marching step), an easier stopping mechanism, and a much-improved mode for ascending and descending stairs. The user wears a wristwatch-like controller to switch the suit between sit, stand, walk, and stair modes. How long until a cheaper version hits the market, but you don’t always get to control where it takes you if there’s a sale on featuring brands you love? (via IEEE)
  2. Bitcoin Trends in First Half of 201594% increase in monthly transactions over the past year. 47% of Coinbase wallet holders are now from countries outside the U.S.
  3. Socket.io p2pan easy and reliable way to set up a WebRTC connection between peers and communicate using the socket.io-protocol.
  4. 18F Content Guide — communications guide for government content writers that bears in mind the frustrations citizens have with gov-speak websites.
Comment
Four short links: 15 July 2015

Four short links: 15 July 2015

OpeNSAurce, Multimaterial Printing, Functional Javascript, and Outlier Detection

  1. System Integrity Management Platform (Github) — NSA releases security compliance tool for government departments.
  2. 3D-Printed Explosive Jumping Robot Combines Firm and Squishy Parts (IEEE Spectrum) — Different parts of the robot grade over three orders of magnitude from stiff like plastic to squishy like rubber, through the use of nine different layers of 3D printed materials.
  3. Professor Frisby’s Mostly Adequate Guide to Functional Programming — a book on functional programming, using Javascript as the programming language.
  4. Tracking Down Villains — the software and algorithms that Netflix uses to detect outliers in their infrastructure monitoring.
Comment
Four short links: 9 July 2015

Four short links: 9 July 2015

Conservation of Attractive Profits, Prototyping Tool, Open Source PM, and Dev Snark

  1. Netflix and the Conservation of Attractive Profits (Stratechery) — Note the common element to all three of these companies: all have managed to modularize the production/delivery of their service which has allowed them to move closer to the customer. To put it another way, all of this new value is being created by specialized CRM companies: Airbnb for travelers, Uber for commuters, and Netflix for the bored.
  2. Origami — Facebook’s prototyping tool.
  3. Go as Open Source — keynote from this year’s Gophercon. I’ve been pondering lately how successful open source projects go beyond “anyone can scratch their itch,” and instead actively manage the tendency for scope creep. We’d rather have a small number of features that enable, say, 90% of the target use cases, and avoid the orders of magnitude more features necessary to reach 99% or 100%.
  4. The Universal Data StructureGiven the abysmal state of of today’s software engineering, I believe that a full embrace of the universal hash will result in better, simpler programs. Your weekly dose of snark.
Comment
Four short links: 7 July 2015

Four short links: 7 July 2015

SCIP Berkeley Style, Regular Failures, Web Material Design, and Javascript Breakouts

  1. CS 61AS — Berkeley self-directed Structure and Interpretation of Computer Programs course.
  2. Harbingers of Failure (PDF) — We show that some customers, whom we call ‘Harbingers’ of failure, systematically purchase new products that flop. Their early adoption of a new product is a strong signal that a product will fail – the more they buy, the less likely the product will succeed. Firms can identify these customers either through past purchases of new products that failed, or through past purchases of existing products that few other customers purchase.
  3. Google Material Design LiteA library of Material Design components in CSS, JS, and HTML.
  4. Breakoutsvarious implementations of the classic game Breakout in numerous different [Javascript] engines.
Comment
Four short links: 3 July 2015

Four short links: 3 July 2015

Storage Interference, Open Source SSL, Pub-Sub Reverse-Proxy, and Web Components Checklist

  1. The Storage Tipping Pointthe performance optimization technologies of the last decade – log structured file systems, coalesced writes, out-of-place updates and, soon, byte-addressable NVRAM – are conflicting with similar-but-different techniques used in SSDs and arrays. The software we use is written for dumb storage; we’re getting smart storage; but smart+smart = fragmentation, write amplification, and over-consumption.
  2. s2n — Amazon’s open source ssl implementation.
  3. pushpina reverse proxy server that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services. It communicates with backend web applications using regular, short-lived HTTP requests (GRIP protocol). This allows backend applications to be written in any language and use any webserver.
  4. The Gold Standard Checklist for Web ComponentsThis is a working draft of a checklist to define a “gold standard” for web components that aspire to be as predictable, flexible, reliable, and useful as the standard HTML elements.
Comment
Four short links: 30 June 2015

Four short links: 30 June 2015

Ductile Systems, Accessibility Testing, Load Testing, and CRAP Data

  1. Brittle SystemsMore than two decades ago at Sun, I was convinced that making systems ductile (the opposite of brittle) was the hardest and most important problem in system engineering.
  2. tota11y — accessibility testing toolkit from Khan.
  3. Locustan open source load testing tool.
  4. Impala: a Modern, Open-source SQL Engine for Hadoop (PDF) — CRAP, aka Create, Read, and APpend, as coined by an ex-colleague at VMware, Charles Fan (note the absence of update and delete capabilities). (via A Paper a Day)
Comment
Four short links: 22 June 2015

Four short links: 22 June 2015

Power Analysis, Data at Scale, Open Source Fail, and Closing the Virtuous Loop

  1. Power Analysis of a Typical Psychology Experiment (Tom Stafford) — What this means is that if you don’t have a large effect, studies with between groups analysis and an n of less than 60 aren’t worth running. Even if you are studying a real phenomenon you aren’t using a statistical lens with enough sensitivity to be able to tell. You’ll get to the end and won’t know if the phenomenon you are looking for isn’t real or if you just got unlucky with who you tested.
  2. The Future of Data at ScaleData curation, on the other hand, is “the 800-pound gorilla in the corner,” says Stonebraker. “You can solve your volume problem with money. You can solve your velocity problem with money. Curation is just plain hard.” The traditional solution of extract, transform, and load (ETL) works for 10, 20, or 30 data sources, he says, but it doesn’t work for 500. To curate data at scale, you need automation and a human domain expert.
  3. Why Are We Still Explaining? (Stephen Walli) — Within 24 hours we received our first righteous patch. A simple 15-line change that provided a 10% boost in Just-in-Time compiler performance. And we politely thanked the contributor and explained we weren’t accepting changes yet. Another 24 hours and we received the first solid bug fix. It was golden. It included additional tests for the test suite to prove it was fixed. And we politely thanked the contributor and explained we weren’t accepting changes yet. And that was the last thing that was ever contributed.
  4. Blood Donors in Sweden Get a Text Message When Their Blood Helps Someone (Independent) — great idea to close the feedback loop. If you want to get more virtuous behaviour, make it a relationship and not a transaction. And if a warm feeling is all you have to offer in return, then offer it!
Comment