- ASB Bank’s Facebook Virtual Branch — the world’s first Facebook branch of a bank, where you can live chat with tellers. (via Vaughn Davis)
- SciDB — GPLv3 NoSQL database. In addition to being multi-dimensional and offering array based scaling from megabytes to petabytes and running on tens of thousands clustered nodes, SciDB’s will be write once read many, allow bulk load rather than single road insert, provide parallel computation, be designed for automatic rather than manual administration, and work with R, Matlab, IDL, C++ and Python. (that from The Register) (via jsteeleeditor on Twitter)
- Twitter By The Numbers (Raffi Krikorian) — given to answer the question “what’s so hard about delivering 140 characters?”. They hit a peak of 3283 inbound tweets/second. Every time Lady Gaga tweets, 6.1M people have to get it. (via Alex Russell)
- EmoKit — an open source driver to the $300 Emotiv EPOC EEG headset. (via BoingBoing)
ENTRIES TAGGED "nosql"
Facebook Bank, New in NoSQL, Twitter Numbers, and Open Source EEG Driver
Community Deconstructed, Sparklines Explained, NoSQL Navigated, and Foxconn Surveyed
- Open Source Community Types (Simon Phipps) — draws a distinction between extenders and deployers to take away the “who do you mean?” confusion that comes with the term “community”.
- Sparklines — Tufte’s coverage of sparkline graphs in Beautiful Evidence. (via Hacker News)
- Why NoSQL Matters (Heroku blog) — a very nice precis of the use cases for various NoSQL systems. Frequently-written, rarely read statistical data (for example, a web hit counter) should use an in-memory key/value store like Redis, or an update-in-place document store like MongoDB. I’m sure there are as many opinions as there are people, but I’d welcome a “if you want to do X, look at Y” guide to the NoSQL space. If you know of such a beast, please leave pointers in the comments. Thanks!
- The Man Who Makes Your iPhone (BusinessWeek) — a fascinating survey of Foxconn’s CEO, history, operations, culture, and plans. This line resonated for me: “I never think I am successful,” he says. “If I am successful, then I should be retired. If I am not retired, then that means I should still be working hard, keeping the company running.”
Thumb Drives and the Cloud, FCC APIs, Mining on GFS, Check Your Prose with Scribe
- CloudUSB — a USB key containing your operating environment and your data + a protected folder so nobody can access you data, even if you lost the key + a backup program which keeps a copy of your data on an online disk, with double password protection. (via ferrouswheel on Twitter)
- FCC APIs — for spectrum licenses, consumer broadband tests, census block search, and more. (via rjweeks70 on Twitter)
- Sibyl: A system for large scale machine learning (PDF) — paper from Google researchers on how to build machine learning on top of a system designed for batch processing. (via Greg Linden)
- The Surprisingness of What We Say About Ourselves (BERG London) — I made a chart of word-by-word surprisingness: given the statement so far, could Scribe predict what would come next?
Data Pointed, CouchDB in the Cloud, Launching Strata
Data Week is a new series that brings together notable stories and developments from the data world. Links in this edition include: the connection between visualizations and art, advice on becoming a data scientist, BigCouch goes open source, and more.
Faces in R, Open Source Web Analytics, Small File Store, Building Mapper
- R Library for Chernoff Faces — faces represent the rows of a data matrix by faces. plot.faces plots faces into a scatterplot. Interesting emotional way to visualize data, which was used to good effect (though not with this library) by BERG in Schooloscope. (via the tutorial at Flowing Data)
- Piwik — GPLed web analytics package.
- Pomegranate — a data store for billions of tiny files. (via the High Scalability blog interview with the creator of Pomegranate)
- New Backpack Makes 3D Maps of Buildings — the backpack indoor equivalent of the Google Maps cars, from Berkeley researchers.
Scientific Literacy, Load Balancing, Indoors Geolocation, and iPhone Security
- The Myth of Scientific Literacy — I’d love it if there was a simple course we could send our elected officials on which would guarantee future science policy would be reliably high quality. Being educated in science (or even “about science”) isn’t going to do it. It’s social connections that will. We need to keep our elected officials honest, constantly check they are applying the evidence we want them to, in the ways we want them to. And if the scientific community want to be listened to, they need to work to build connections. Get political and scientific communities overlapping, embed scientists in policy institutions (and vice versa), get MP’s constituents onside to help foster the sorts of public pressure you want to see: build trust so scientists become people MPs want to be briefed by. (via foe on Twitter)
- Three Papers on Load Balancing (Alex Popescu) — three papers on distributed hash tables.
- Meridian — iPhone app that does in-building location, sample app is the AMNH Explorer which shows you maps of where you are. Uses wifi-based positioning. (via raffi on Twitter)
- Fixing What Apple Won’t — the jailbreakers are releasing security patches for systems that Apple have abandoned. (via ardgedee on Twitter)
Delicious Graphs, Charities and Data, Climate Psychology, Data Structure Portability
- Delicious Links Clustered and Stacked (Matt Biddulph) — six years of his delicious links, k-means clustered by tag and graphed. The clusters are interesting, but I wonder whether Matt can identify significant life/work events by the spikes in the graph.
- Open Data and the Voluntary Sector (OKFN) — Open data will give charities new ways to find and share information on the need of their beneficiaries – who needs their services most and where they are located. The sharing of information will be key to this – it’s not just about using data that the government has opened up, but also opening your own data.
- Cognitive and Behavioral Challenges in Responding to Climate Change — At the deepest level, large scale environmental problems such as global warming threaten people’s sense of the continuity of life – what sociologist Anthony Giddens calls ontological security. Ignoring the obvious can, however, be a lot of work. Both the reasons for and process of denial are socially organized; that is to say, both cognition and denial are socially structured. Denial is socially organized because societies develop and reinforce a whole repertoire of techniques or “tools” for ignoring disturbing problems. Fascinating paper. (via Jez)
- Blueprints — provides a collection of interfaces and implementations to common, complex data structures. Blueprints contains a property graph model its implementations for TinkerGraph, Neo4j, and SAIL. Also, it contains an object document model and implementations for TinkerDoc, CouchDB, and MongoDB. In short, Blueprints provides a one stop shop for implemented interfaces to help developers create software without being tied to particular underlying data management systems.
More NoSQL, Data Medicine, Startups to Government, and Cake-and-eat-it Open Source
- Membase — an open-source (Apache 2.0 license) distributed, key-value database management system optimized for storing data behind interactive web applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data in real-time. Supporting these requirements, membase processes data operations with quasi-deterministic low latency and high sustained throughput. (via Hacker News)
- Sergey’s Search (Wired) — Sergey Brin, one of the Google founders, learned he had a gene allele that gave him much higher odds of getting Parkinson’s. His response has been to help medical research, both with money and through 23andme. Langston decided to see whether the 23andMe Research Initiative might be able to shed some insight on the correlation, so he rang up 23andMe’s Eriksson, and asked him to run a search. In a few minutes, Eriksson was able to identify 350 people who had the mutation responsible for Gaucher’s. A few clicks more and he was able to calculate that they were five times more likely to have Parkinson’s disease, a result practically identical to the NEJM study. All told, it took about 20 minutes. “It would’ve taken years to learn that in traditional epidemiology,” Langston says. “Even though we’re in the Wright brothers early days with this stuff, to get a result so strongly and so quickly is remarkable.”
- Startup.gov (YouTube) — Anil Dash talk at Personal Democracy Forum on applying insights from startups to government. I hope the more people say this, the greater the odds it’ll be acted on.
- Open Core Software — Marten Mickos (ex-MySQL) talks up “open core” (open source base, proprietary extensions) as a way to resolve the conflict of “change the world with open source” and “make money”. Brian Aker disagrees: There has been no successful launch of an open core company that has reached any significant size, especially of the size that Marten hints at in the article. My take: there are three reasons for open source (freedoms, price, and development scale) and if you close the source to part of your product then the whole product loses those benefits. If you open source enough that the open source bit has massive momentum, then you probably don’t have enough left proprietary to gain huge financial benefit.
Fair Use Economy, Deconstituted Appliances, 3D Vision, Redis for Fun and Profit
- Fair Use in the US Economy (PDF) — prepared by IT lobby in the US, it’s the counterpart to Big ©’s fictitious billions of dollars of losses due to file sharing. Take each with a grain of salt, but this is interesting because it talks about the industries and businesses that the fair use laws make possible.
- Disassembled Household Appliances — neat photos of the pieces in common equipment like waffle irons, sandwich makers, can openers, etc. (via evilmadscientist)
- GelSight — gel block on a sheet of glass, lit from below with lights and then scanned with cameras, lets you easily capture 3D qualities of the objects pressed into it. Very cool demo–you can see finger prints, pulse, and even make out designs on a $100 bill.
- Redis Tutorial (Simon Willison) — Redis is a very fast collection of useful behaviours wrapped around a distributed key-value store. You get locks, IDs, counters, sets, lists, queues, replication, and more.
The growing popularity of Big Data management tools (Hadoop; MPP, real-time SQL, NoSQL databases; and others) means many more companies can handle large amounts of data. But how do companies analyze and mine their vast amounts of data? For companies that already have large amounts of data in Hadoop, there's room for even simpler tools that would allow business users to directly interact with Big Data.