"databases" entries

Graph tools forge path to new solutions

Find emergent properties and solutions to new computing problems with graphs

by Andy Oram | @praxagora | +Andy Oram | July 25, 2014

alchemyjs Graph databases haven’t made the news much because, I think, they don’t fit in convenient categories. They certainly aren’t the relational databases we’re all familiar with, nor are they the arbitrary keys and values provided by many NoSQL stores. But in a highly connected world–where it’s not what you know but whom you know–it makes intuitive sense to arrange our knowledge as nodes and edges.

Ted Nelson, inventor of the hyperlink, recognized the power of viewing life in graphs. After the implosion of his historic Xanadu project, he embarked on a graph database tool called ZigZag. The most modern instantiations of graphs–the Neo4j store and the Alchemy.js tool for interactively visualizing graphs–were well represented this year at O’Reilly’s Open Source convention.

Read more…

Four short links: 22 July 2014

English lint, Scalable Replicated Datastore, There's People in my Software, and Sci-Fi for Ethics

by Nat Torkington | @gnat | +Nat Torkington | July 22, 2014

write-good — a naive `lint’ for English prose.
cockroachdb — a scalable, geo-replicated, transactional datastore from a team that includes the person who built Spanner for Google. Spanner requires atomic clocks, cockroach does not (which has corresponding performance consequences). (via Wired)
The Deep Convergence of Networks, Software, and People — as we wire up our digital products increasingly with interconnected networks, their nature is increasingly a product of the responses that come back from those networks. The experience cannot be wholly represented in mock prototypes that are coded to respond in predictable ways, or even using a set of preset random responses. The power of the application is seeing the emergent behaviour of the system, and recognizing that you are a participant in that emergent behaviour. (via Tim O’Reilly)
An Ethics Class for Inventors, via Sci-Fi — “Reading science fiction is kind of like ethics class for inventors,” says Brueckner. Traditionally, technology schools ask ‘how do we build it?’ This class asks a different question: ‘should we?’

Four short links: 9 June 2014

SQL against Text, Fake Social Networks, Hidden Biases, and Versioned Data

by Nat Torkington | @gnat | +Nat Torkington | June 9, 2014

textql — execute SQL against structured text like CSV or TSV.
Social Network Structure of Fake Friends — author bought 4,000 Twitter followers and studied their relationships.
Hidden Biases in Big Data — with every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big data sets? (via Quinn Norton)
CoreObject — a version-controlled object database for Objective-C that supports powerful undo, semantic merging, and real-time collaborative editing.

Four short links: 3 June 2014

Machine Learning Mistakes, Recommendation Bandits, Droplet Robots, and Plain English

by Nat Torkington | @gnat | +Nat Torkington | June 3, 2014

Machine Learning Done Wrong — [M]ost practitioners pick the modeling algorithm they are most familiar with rather than pick the one which best suits the data. In this post, I would like to share some common mistakes (the don’t-s).
Bandits for Recommendations — A common problem for internet-based companies is: which piece of content should we display? Google has this problem (which ad to show), Facebook has this problem (which friend’s post to show), and RichRelevance has this problem (which product recommendation to show). Many of the promising solutions come from the study of the multi-armed bandit problem.
Droplets — the Droplet is almost spherical, can self-right after being poured out of a bucket, and has the hardware capabilities to organize into complex shapes with its neighbors due to accurate range and bearing. Droplets are available open-source and use cheap vibration motors and a 3D printed shell. (via Robohub)
Apple’s App Store Approval Guidelines — some of the plainest English I’ve seen, especially the Introduction. I can only aspire to that clarity. If your App looks like it was cobbled together in a few days, or you’re trying to get your first practice App into the store to impress your friends, please brace yourself for rejection. We have lots of serious developers who don’t want their quality Apps to be surrounded by amateur hour.

Four short links: 30 May 2014

Video Transparency, Software Traffic, Distributed Database, and Open Source Sustainability

by Nat Torkington | @gnat | +Nat Torkington | May 30, 2014

Video Quality Report — transparency is a great way to indirectly exert leverage.
Control Your Traffic Flows with Software — using BGP to balance traffic. Will be interesting to see how the more extreme traffic managers deploy SDN in the data center.
Cockroach — a distributed key/value datastore which supports ACID transactional semantics and versioned values as first-class features. The primary design goal is global consistency and survivability, hence the name. Cockroach aims to tolerate disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention. Cockroach nodes are symmetric; a design goal is one binary with minimal configuration and no required auxiliary services.
Linux Foundation Providing for Core Infrastructure Projects — press release, but interested in how they’re tackling sustainability—they’re taking on identifying worthies (glad I’m not the one who says “you’re not worthy” to a project) and being the non-profit conduit for the dosh. Interesting: implies they think the reason companies weren’t supporting necessary open source projects was some combination of being unsure who to support (projects you use, surely?) and how to get them money (ask?). (Sustainability of open source projects is a pet interest of mine)

Four short links: 24 January 2014

Floating Point, Secure Distributed FS, Cloud Robotics, and Domestic Sensors

by Nat Torkington | @gnat | +Nat Torkington | January 24, 2014

What Every Computer Scientist Should Know About Floating Point Arithmetic — in short, “it will hurt you.”
Ori — a distributed file system built for offline operation and empowers the user with control over synchronization operations and conflict resolution. We provide history through light weight snapshots and allow users to verify the history has not been tampered with. Through the use of replication instances can be resilient and recover damaged data from other nodes.
RoboEarth — a Cloud Robotics infrastructure, which includes everything needed to close the loop from robot to the cloud and back to the robot. RoboEarth’s World-Wide-Web style database stores knowledge generated by humans – and robots – in a machine-readable format. Data stored in the RoboEarth knowledge base include software components, maps for navigation (e.g., object locations, world models), task knowledge (e.g., action recipes, manipulation strategies), and object recognition models (e.g., images, object models).
Mother — domestic sensors and an app with an appallingly presumptuous name. (Also, wasn’t “Mother” the name of the ship computer in Alien?) (via BoingBoing)

Four short links: 10 December 2013

Flexible Data, Google's Bottery, GPU Assist Deep Learning, and Open Sourcing

by Nat Torkington | @gnat | +Nat Torkington | December 10, 2013

ArangoDB — open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions.
Google’s Seven Robotics Companies (IEEE) — The seven companies are capable of creating technologies needed to build a mobile, dexterous robot. Mr. Rubin said he was pursuing additional acquisitions. Rundown of those seven companies.
Hebel (Github) — GPU-Accelerated Deep Learning Library in Python.
What We Learned Open Sourcing — my eye was caught by the way they offered APIs to closed source code, found and solved performance problems, then open sourced the fixed code.

Four short links: 3 December 2013

by Nat Torkington | @gnat | +Nat Torkington | December 3, 2013

SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
Disguise Detection — using Raspberry Pi, Arduino, and Python.

Four short links: 11 November 2013

Squid in the Dark, Beautiful Automation, Fan Criticism, and Petabyte Queries

by Nat Torkington | @gnat | +Nat Torkington | November 11, 2013

Living Light — 3D printed cephalopods filled with bioluminescent bacteria. PAGING CORY DOCTOROW, YOUR ORGASMATRON HAS ARRIVED. (via Sci Blogs)
Repacking Lego Batteries with a CNC Mill — check out the video. Patrick programmed a CNC machine to drill out the rivets holding the Mindstorms battery pack together. Coding away a repetitive task like this is gorgeous to see at every scale. We don’t have to teach our kids a particular programming language, but they should know how to automate cruft.
My Thoughts on Google+ (YouTube) — when your fans make hatey videos like this one protesting Google putting the pig of Google Plus onto the lipstick that was YouTube, you are Doin’ It Wrong.
Presto: Interacting with Petabytes of Data at Facebook — a distributed SQL query engine optimized for ad-hoc analysis at interactive speed. It supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. For details, see the Facebook post about its launch.

Four short links: 5 November 2013

Time Series Database, Cluster Schedulers, Structural Search-and-Replace, and TV Data

by Nat Torkington | @gnat | +Nat Torkington | November 5, 2013

Influx DB — open-source, distributed, time series, events, and metrics database with no external dependencies.
Omega (PDF) — ﬂexible, scalable schedulers for large compute clusters. From Google Research.
GraspJS — Search and replace your JavaScript code based on its structure rather than its text.
Amazon Mines Its Data Trove To Bet on TV’s Next Hit (WSJ) — Amazon produced about 20 pages of data detailing, among other things, how much a pilot was viewed, how many users gave it a 5-star rating and how many shared it with friends.