- Digital Subscription Prices — the NY Times in context. Aie.
- Trinity — Microsoft Research graph database. (via Hacker News)
- Data Science Toolkit — prepackaged EC2 image of most useful data tools. (via Pete Warden)
- Snappy — Google’s open sourced compression library, as used in BigTable and MapReduce. Emphasis is on speed, with resulting lack of quality in filesize (20-100% bigger than zlib).
ENTRIES TAGGED "databases"
Memcached lets your servers spend time on the important stuff.
Memcached is one of the linchpin technologies that holds the modern Internet together, but do you know what it actually does? Brian Aker offers a peek under the hood.
CouchDB proves a good fit for a project with technical limits.
A new project in Zambia is trying to integrate supervisors, clinics, and community healthcare workers into a system that can improve patient service and provide more data. In this interview, Cory Zue explains how CouchDB is playing a role.
Digital Subscriptions, Graph Database, Data Science, and High Speed Compression
Amazon buys itself a lawsuit, a setting Sun.com, and the new name in databases
What's in a name? For Amazon's new Appstore, it was a lawsuit. For Oracle's sun.com domain, big money. And would MySQL by any other name smell as sweet?
Job Titles, Android Copyright, Error Hosting, and Drizzle Ships
- Titles and Promotions (Ben Horowitz) — Andreessen argues that people ask for many things from a company: salary, bonus, stock options, span of control, and titles. Of those, title is by far the cheapest, so it makes sense to give the highest titles possible. The hierarchy should have Presidents, Chiefs, and Senior Executive Vice Presidents. If it makes people feel better, let them feel better. Titles cost nothing. Better yet, when competing for new employees with other companies, using Andreessen’s method you can always outbid the competition in at least one dimension.
- Android’s Linux Copyrights Issue — Google copied 2.5 megabytes of code from more than 700 Linux kernel header files with a homemade program that drops source code comments and some other elements, and daringly claims (in a notice at the start of each generated file) that the extracted material constitutes “no copyrightable information”
- errbit — open source self-hosted error catcher, an open source alternative to HopToad. (via Glen Barnes)
- Drizzle: From What If to What Has (Brian Aker) — fantastic retrospective of lessons learned in the shipping of Drizzle. We have fixed all the warnings in Drizzle. This is something that isn’t sexy work, and the only way it is justified is because cleaning up warnings fixes bugs. If you are starting a new code base let me implore upon on you the necessity of doing this from the beginning. They sweat the dull stuff that matters, not just the shiny sexy featureitis.
Data acquisition for a site like CrunchBase may not carry the costs some assume.
The data acquisition process should be increasingly automatic, and so increasingly cheap. I'm hoping for a world where information producers are paid for extracting value from that data.
Faces in R, Open Source Web Analytics, Small File Store, Building Mapper
- R Library for Chernoff Faces — faces represent the rows of a data matrix by faces. plot.faces plots faces into a scatterplot. Interesting emotional way to visualize data, which was used to good effect (though not with this library) by BERG in Schooloscope. (via the tutorial at Flowing Data)
- Piwik — GPLed web analytics package.
- Pomegranate — a data store for billions of tiny files. (via the High Scalability blog interview with the creator of Pomegranate)
- New Backpack Makes 3D Maps of Buildings — the backpack indoor equivalent of the Google Maps cars, from Berkeley researchers.
Revolutionaries, Sentiment, UX, and Data Warehouses
- Rules for Revolutionaries — Carl Malamud’s talk to the WWW2010 Conference. Video, slides, and text available.
- Self-Improving Bayesian Sentiment Analysis for Twitter — a how-I-did-it for a homegrown project to do sentiment analysis on Twitter.
- LUXR — the Lean User Experience Residency program. LUXr brings user experience and design services to early stage teams in a lower cost, more efficient way than traditional project-based consulting. The latest from Adaptive Path’s Janice Fraser.
- My Top Ten Assertions About Data Warehouses (CACM) — Michael Stonebraker’s take on the data warehouse world, and his predictions cut across a lot of our O’Reilly trends. Assertion 5: “No knobs” is the only thing that makes any sense. It is pretty clear that human operational costs dominate the cost of running a data warehouse. [...] Almost all DBMSs have 100 or more complicated tuning “knobs.” This requires DBAs to be “4-star wizards” and drives up operating costs. Obviously, the only thing that makes sense is to have a program that adjusts these knobs automatically. In other words, look for “no knobs” as the only way to cut down DBA costs. (via mikeolson on Twitter)
Delicious Graphs, Charities and Data, Climate Psychology, Data Structure Portability
- Delicious Links Clustered and Stacked (Matt Biddulph) — six years of his delicious links, k-means clustered by tag and graphed. The clusters are interesting, but I wonder whether Matt can identify significant life/work events by the spikes in the graph.
- Open Data and the Voluntary Sector (OKFN) — Open data will give charities new ways to find and share information on the need of their beneficiaries – who needs their services most and where they are located. The sharing of information will be key to this – it’s not just about using data that the government has opened up, but also opening your own data.
- Cognitive and Behavioral Challenges in Responding to Climate Change — At the deepest level, large scale environmental problems such as global warming threaten people’s sense of the continuity of life – what sociologist Anthony Giddens calls ontological security. Ignoring the obvious can, however, be a lot of work. Both the reasons for and process of denial are socially organized; that is to say, both cognition and denial are socially structured. Denial is socially organized because societies develop and reinforce a whole repertoire of techniques or “tools” for ignoring disturbing problems. Fascinating paper. (via Jez)
- Blueprints — provides a collection of interfaces and implementations to common, complex data structures. Blueprints contains a property graph model its implementations for TinkerGraph, Neo4j, and SAIL. Also, it contains an object document model and implementations for TinkerDoc, CouchDB, and MongoDB. In short, Blueprints provides a one stop shop for implemented interfaces to help developers create software without being tied to particular underlying data management systems.