- Digital Subscription Prices — the NY Times in context. Aie.
- Trinity — Microsoft Research graph database. (via Hacker News)
- Data Science Toolkit — prepackaged EC2 image of most useful data tools. (via Pete Warden)
- Snappy — Google’s open sourced compression library, as used in BigTable and MapReduce. Emphasis is on speed, with resulting lack of quality in filesize (20-100% bigger than zlib).
ENTRIES TAGGED "databases"
Key themes from MySQL 2011. Plus, what you sacrifice when you use a NoSQL solution.
Two dominant themes emerged at MySQL 2011: Mix your relational database with less formal solutions and move to the cloud. This may actually be the best environment MySQL has ever enjoyed.
Memcached lets your servers spend time on the important stuff.
Memcached is one of the linchpin technologies that holds the modern Internet together, but do you know what it actually does? Brian Aker offers a peek under the hood.
CouchDB proves a good fit for a project with technical limits.
A new project in Zambia is trying to integrate supervisors, clinics, and community healthcare workers into a system that can improve patient service and provide more data. In this interview, Cory Zue explains how CouchDB is playing a role.
Digital Subscriptions, Graph Database, Data Science, and High Speed Compression
Amazon buys itself a lawsuit, a setting Sun.com, and the new name in databases
What's in a name? For Amazon's new Appstore, it was a lawsuit. For Oracle's sun.com domain, big money. And would MySQL by any other name smell as sweet?
Job Titles, Android Copyright, Error Hosting, and Drizzle Ships
- Titles and Promotions (Ben Horowitz) — Andreessen argues that people ask for many things from a company: salary, bonus, stock options, span of control, and titles. Of those, title is by far the cheapest, so it makes sense to give the highest titles possible. The hierarchy should have Presidents, Chiefs, and Senior Executive Vice Presidents. If it makes people feel better, let them feel better. Titles cost nothing. Better yet, when competing for new employees with other companies, using Andreessen’s method you can always outbid the competition in at least one dimension.
- Android’s Linux Copyrights Issue — Google copied 2.5 megabytes of code from more than 700 Linux kernel header files with a homemade program that drops source code comments and some other elements, and daringly claims (in a notice at the start of each generated file) that the extracted material constitutes “no copyrightable information”
- errbit — open source self-hosted error catcher, an open source alternative to HopToad. (via Glen Barnes)
- Drizzle: From What If to What Has (Brian Aker) — fantastic retrospective of lessons learned in the shipping of Drizzle. We have fixed all the warnings in Drizzle. This is something that isn’t sexy work, and the only way it is justified is because cleaning up warnings fixes bugs. If you are starting a new code base let me implore upon on you the necessity of doing this from the beginning. They sweat the dull stuff that matters, not just the shiny sexy featureitis.
Data acquisition for a site like CrunchBase may not carry the costs some assume.
The data acquisition process should be increasingly automatic, and so increasingly cheap. I'm hoping for a world where information producers are paid for extracting value from that data.
Faces in R, Open Source Web Analytics, Small File Store, Building Mapper
- R Library for Chernoff Faces — faces represent the rows of a data matrix by faces. plot.faces plots faces into a scatterplot. Interesting emotional way to visualize data, which was used to good effect (though not with this library) by BERG in Schooloscope. (via the tutorial at Flowing Data)
- Piwik — GPLed web analytics package.
- Pomegranate — a data store for billions of tiny files. (via the High Scalability blog interview with the creator of Pomegranate)
- New Backpack Makes 3D Maps of Buildings — the backpack indoor equivalent of the Google Maps cars, from Berkeley researchers.
Revolutionaries, Sentiment, UX, and Data Warehouses
- Rules for Revolutionaries — Carl Malamud’s talk to the WWW2010 Conference. Video, slides, and text available.
- Self-Improving Bayesian Sentiment Analysis for Twitter — a how-I-did-it for a homegrown project to do sentiment analysis on Twitter.
- LUXR — the Lean User Experience Residency program. LUXr brings user experience and design services to early stage teams in a lower cost, more efficient way than traditional project-based consulting. The latest from Adaptive Path’s Janice Fraser.
- My Top Ten Assertions About Data Warehouses (CACM) — Michael Stonebraker’s take on the data warehouse world, and his predictions cut across a lot of our O’Reilly trends. Assertion 5: “No knobs” is the only thing that makes any sense. It is pretty clear that human operational costs dominate the cost of running a data warehouse. [...] Almost all DBMSs have 100 or more complicated tuning “knobs.” This requires DBAs to be “4-star wizards” and drives up operating costs. Obviously, the only thing that makes sense is to have a program that adjusts these knobs automatically. In other words, look for “no knobs” as the only way to cut down DBA costs. (via mikeolson on Twitter)