- Digital Subscription Prices — the NY Times in context. Aie.
- Trinity — Microsoft Research graph database. (via Hacker News)
- Data Science Toolkit — prepackaged EC2 image of most useful data tools. (via Pete Warden)
- Snappy — Google’s open sourced compression library, as used in BigTable and MapReduce. Emphasis is on speed, with resulting lack of quality in filesize (20-100% bigger than zlib).
ENTRIES TAGGED "nosql"
Digital Subscriptions, Graph Database, Data Science, and High Speed Compression
Web Memory, Phones Read Cards, Military and Public Data, and NoSQL Merger
- Erase and Rewind — the BBC are planning to close (delete) 172 websites on some kind of cost-cutting measure. i’m very saddened to see the BBC join the ranks of online services that don’t give a damn for posterity. As Simon Willison points out, the British Library will have archived some of the sites (and Internet Archive others, possibly).
- Announcing Farebot for Android — dumps the information stored on transit cards using Android’s NFC (near field communication, aka RFID) support. When demonstrating FareBot, many people are surprised to learn that much of the data on their ORCA card is not encrypted or protected. This fact is published by ORCA, but is not commonly known and may be of concern to some people who would rather not broadcast where they’ve been to anyone who can brush against the outside of their wallet. Transit agencies across the board should do a better job explaining to riders how the cards work and what the privacy implications are.
- Using Public Data to Fight a War (ReadWriteWeb) — uncomfortable use of the data you put in public?
- CouchOne and Membase Merge — consolidation in the commercial NoSQL arena. the merger not only results in the joining of two companies, but also combines CouchDB, memcached and Membase technologies. Together, the new company, Couchbase, will offer an end-to-end database solution that can be stored on a single server or spread across hundreds of servers.
Microsoft and the Web, URL Library, Optimism, and NoSQL Instruction
- Dive Into 2010 (Mark Pilgrim) — Mark wrote a hugely popular guide to HTML5 which was available online and published by O’Reilly. 6% of visitors used some version of Internet Explorer. That is not a typo. The site works fine in Internet Explorer — the site practices what it preaches, and the live examples use a variety of fallbacks for legacy browsers — so this is entirely due to the subject matter. Microsoft has completely lost the web development community.
- google-url — the Google URL-parsing library, designed to be embeddable.
- Reasons to be Cheerful (Charlie Stross) — if all we ever do is gripe about ways in which the world is not perfect, we will make ourselves miserable and fail to appreciate ways in which things are getting better. Important.
- NoSQL Tapes — videos of lectures on NoSQL topics. (via Hacker News)
MySQL as NoSQL, Handmade SLR, Mac App Store, and Datamining Privacy Workshop
- Using MysQL as NoSQL — 750,000+ qps on a commodity MySQL/InnoDB 5.1 server from remote web clients.
- Making an SLR Camera from Scratch — amazing piece of hardware devotion. (via hackaday.com)
- Mac App Store Guidelines — Apple announce an app store for the Macintosh, similar to its app store for iPhones and iPads. “Mac App” no longer means generic “program”, it has a new and specific meaning, a program that must be installed through the App store and which has limited functionality (only one can run at a time, it’s full-screen, etc.). The list of guidelines for what kinds of programs you can’t sell through the App Store is interesting. Many have good reasons to be, but It creates a store inside itself for selling or distributing other software (i.e., an audio plug-in store in an audio app) is pure greed. Some are afeared that the next step is to make the App store the only way to install apps on a Mac, a move that would drive me away. It would be a sad day for Mac-lovers if Microsoft were to be the more open solution than Apple. cf the Owner’s Manifesto.
- Privacy Aspects of Data Mining — CFP for an IEEE workshop in December. (via jschneider on Twitter)
Bad Game Mechanics, Under NoSQL Covers, the LAN of Things, and the Smithsonian Commons
- Pwned: Gamification and its Discontents (Slideshare) — hear, hear! Video games are not fun because they’re video games, but if and only they are well-designed. Just adding something from games isn’t a guarantee for fun. (via jameshome on Twitter)
- Redis Under the Hood — explanation of the insides and mechanisms of this popular distributed key-value store. (via tlockney on delicious)
- The LAN of Things (Mike Kuniavsky) — Before we can have an Internet of Things, we will need to have a LAN of things.[...] Most of the utility of a LAN came from its local functionality. Thus, before we can build a useful (from a user perspective) Internet of Things, we need to learn to build useful LANs of Things. [...] I think it’s important to start thinking about what the highly localized uses of sparsely distributed technology can be. What can we do when there are only a couple of things with RFIDs in our house? What totally great service can be built on having two light switches that report their telemetry in the house? What totally valuable information can you tell me if I only wear my motion sensor every once in a while? Love it. (via Matt Jones on Delicious)
- Mike Edson’s Talk at Powerhouse Museum — the Director of Web and New Media Technology at the Smithsonian is smart, articulate, and trying to do something cool with the Smithsonian Commons prototype. (via sebchan on Twitter)
Storage, MapReduce and Query are ushering in data-driven products and services.
We're at the beginning of a revolution in data-driven products and services, driven by a software stack that enables big data processing on commodity hardware. Learn about the SMAQ stack, and where today's big data tools fit in.
Facebook Bank, New in NoSQL, Twitter Numbers, and Open Source EEG Driver
- ASB Bank’s Facebook Virtual Branch — the world’s first Facebook branch of a bank, where you can live chat with tellers. (via Vaughn Davis)
- SciDB — GPLv3 NoSQL database. In addition to being multi-dimensional and offering array based scaling from megabytes to petabytes and running on tens of thousands clustered nodes, SciDB’s will be write once read many, allow bulk load rather than single road insert, provide parallel computation, be designed for automatic rather than manual administration, and work with R, Matlab, IDL, C++ and Python. (that from The Register) (via jsteeleeditor on Twitter)
- Twitter By The Numbers (Raffi Krikorian) — given to answer the question “what’s so hard about delivering 140 characters?”. They hit a peak of 3283 inbound tweets/second. Every time Lady Gaga tweets, 6.1M people have to get it. (via Alex Russell)
- EmoKit — an open source driver to the $300 Emotiv EPOC EEG headset. (via BoingBoing)
Community Deconstructed, Sparklines Explained, NoSQL Navigated, and Foxconn Surveyed
- Open Source Community Types (Simon Phipps) — draws a distinction between extenders and deployers to take away the “who do you mean?” confusion that comes with the term “community”.
- Sparklines — Tufte’s coverage of sparkline graphs in Beautiful Evidence. (via Hacker News)
- Why NoSQL Matters (Heroku blog) — a very nice precis of the use cases for various NoSQL systems. Frequently-written, rarely read statistical data (for example, a web hit counter) should use an in-memory key/value store like Redis, or an update-in-place document store like MongoDB. I’m sure there are as many opinions as there are people, but I’d welcome a “if you want to do X, look at Y” guide to the NoSQL space. If you know of such a beast, please leave pointers in the comments. Thanks!
- The Man Who Makes Your iPhone (BusinessWeek) — a fascinating survey of Foxconn’s CEO, history, operations, culture, and plans. This line resonated for me: “I never think I am successful,” he says. “If I am successful, then I should be retired. If I am not retired, then that means I should still be working hard, keeping the company running.”
Thumb Drives and the Cloud, FCC APIs, Mining on GFS, Check Your Prose with Scribe
- CloudUSB — a USB key containing your operating environment and your data + a protected folder so nobody can access you data, even if you lost the key + a backup program which keeps a copy of your data on an online disk, with double password protection. (via ferrouswheel on Twitter)
- FCC APIs — for spectrum licenses, consumer broadband tests, census block search, and more. (via rjweeks70 on Twitter)
- Sibyl: A system for large scale machine learning (PDF) — paper from Google researchers on how to build machine learning on top of a system designed for batch processing. (via Greg Linden)
- The Surprisingness of What We Say About Ourselves (BERG London) — I made a chart of word-by-word surprisingness: given the statement so far, could Scribe predict what would come next?