- Track DC — informative drill-down report from Washington DC government about the different departments. (via Sunlight Labs blog)
- Errors in Scientific Software — a 1994 study of scientific software that found inconsistent interfaces (1 in 7 for Fortran, 1 in 37 for C) and poor use of arithmetic such that significant figures declined from 6sf in the data to 1sf in the result. (via “If you’re going to do good science, release the computer code too” in the Guardian)
- How Farmville Scales — 75M players/month (28M/day), 1/4 of disk activity is writes, 50% higher load spikes, 3G/s traffic go between Farmville and Facebook at peak, LAMP stack, nagios+munin+puppet. (via Hacker News)
- Mathematical Philology — when two manuscripts of the same text differ, which is correct? This PLoSONE paper looked at all such discrepancies in Lucretius’s De Rerum Natura and found that the traditional principle of choosing the more difficult reading (on the grounds that errors are from humans unconsciously simplifying) has a strong information theory justification for it. Interesting to see this less than a week after an MIT Technology Review article on quantum teleportation remarked, There is a growing sense that the properties of the universe are best described not by the laws that govern matter but by the laws that govern information.
ENTRIES TAGGED "scale"
Government Dashboard, Science Code Errors, Scaling Online Games, Information Theory
Code for Speed, Wooden Locks, Font Design, and a Java Distributed Data Store
- Why Git Is So Fast — interesting mailing list post about the problems that the JGit folks had when they tried to make their Java version of Git go faster. Higher level languages hide enough of the machine that we can’t make all of these optimizations. A reminder that you must know and control the systems you’re running on if you want to get great performance. (via Hacker News)
- Wooden Combination Lock — you’ll easily understand how combination locks work with this find piece of crafty construction work.
- From Moleskine to Market — how a leading font designer designs fonts. Fascinating, and beautiful, and it makes me covet his skills.
- Terrastore — open source distributed document store, HTTP accessible, data and queries are distributed, built on Terracotta
which is built on ehcache(updated: Terracotta has an ehcache plugin, but isn’t built on ehcache). A NoSQL database built on Java tools that serious Java developers respect, the firstsuch one that I’ve noticed (update: I brain-farted: neo4j was definitely on my radar). Notice that all the interesting work going on in the NoSQL arena is happening in open source projects.
Trading Systems, Streaming iTunes, Scheduling App, Crowdsourcing Lessons
- Trading Shares in Milliseconds (Technology Review) — With the rise of automation, the bulk of U.S. stock trading has moved from the once-crowded floor of Manhattan’s New York Stock Exchange (NYSE) to silent server farms run by exchanges and broker-dealers across the country: the proportion of all trades that the NYSE handles has shrunk from 80 percent in 2005 to 40 percent today. Trading is now essentially a virtual art, and its practitioners put such a premium on speed that NASDAQ has considered issuing equal 100-foot lengths of cable to the brokers who send orders to its exchange servers. (via Hacker News)
- Stream iTunes Over SSH — short script that lets you tunnel itunes from one machine to another over ssh (by default iTunes only shares on the local network).
- Doodle — simple way to schedule a common meeting time. (via joshua on Delicious)
- Crowdsourcing — Simon Willison’s thoughtful “lessons learned” from his crowdsourcing projects at the Guardian. Crowdsourcing is not as simple as “give them a wiki and they will fill it” (this is related to the failed “everyone in the world wants to work on my broken payroll system” theory of open source), and Simon explains some of the subtleties. The reviewing experience the first time round was actually quite lonely. We deliberately avoided showing people how others had marked each page because we didn’t want to bias the results. Unfortunately this meant the site felt like a bit of a ghost town, even when hundreds of other people were actively reviewing things at the same time. For the new version, we tried to provide a much better feeling of activity around the site. We added “top reviewer” tables to every assignment, MP and political party as well as a “most active reviewers in the past 48 hours” table on the homepage (this feature was added to the first project several days too late). User profile pages got a lot more attention, with more of a feel that users were collecting their favourite pages in to tag buckets within their profile.
Cognitive Surplus, Scaling, Chinese Blogs, CS Education for Growth
- Eight Billion Minutes Spent on Facebook Daily — you weren’t using that cognitive surplus, were you?
- How We Made Github Fast — high-level summary is that the new “fast, good, cheap–pick any two” is “fast, new, easy–pick any two”. (via Simon Willison)
- Isaac Mao, China, 40M Blogs and Counting — Today, there are 40 million bloggers in China and around 200 million blogs, according to Mao. Some blogs survive only a few days before being shut down by authorities. More than 80% of people in China don’t know that the internet is censored in their country. When riots broke out in Xinjiang province this year, the authorities shut down internet access for the whole region. No one could get online.
- Congress Endorses CS Education as Driver of Economic Growth — compare to Economist’s Optimism that tech firms will help kick-start economic recovery is overdone.
Network File System, Internet Use, Lovelace Comic, Search User Interfaces
- Ceph — open source distributed filesystem from UCSC. Ceph is built from the ground up to seamlessly and gracefully scale from gigabytes to petabytes and beyond. Scalability is considered in terms of workload as well as total storage. Ceph is designed to handle workloads in which tens thousands of clients or more simultaneously access the same file, or write to the same directory-usage scenarios that bring typical enterprise storage systems to their knees. (via joshua on delicious)
- Daily Internet Activities, 2000-2009 — Pew Charitable Trust’s Internet usage survey. We’ve finally broken 50% of Americans using the Internet daily. Twitter is almost a rounding error. (via dhowell on Twitter)
- The Thrilling Adventures of Lovelace and Babbage — fantastic comic, with end-notes that explain how Babbage and Lovelace’s lives and works are reflected in the action of the comic. (via suw on Twitter)
- Search User Interfaces — full text of this book about the different (successful and un-) interfaces to search. (via sebchan on Twitter)
Guest blogger Scott Ruthfield is a Program Committee member of the O’Reilly Velocity: Web Performance & Operations Conference. Web Operations is not for the casual observer: it’s for a particular kind of adrenaline junkie that’s motivated by graphs and servers spinning out of control. Jumping in, on-your-feet analysis, and experience-based-experimentation are all part of solving new problems caused by unexpected user and machine behavior,…
(tag cloud created from Velocity session & speaker information using wordle.net) My favorite interview question to ask candidates is: “What happens when you type www.(amazon|google|yahoo).com in your browser and press return?” While the actual process of serving and rendering a page takes seconds to complete, describing it in real detail can take an hour. A good answer spans every part…
In a post looking at the future interplay of content, gatekeepers and consumers, David Nygren touches on a key issue for large book publishers: scale. Mega Publishing Conglomerates Go Bye-Bye: Or at least they will look very different than they do today. Their scale is not sustainable. The partial implosion we saw in the publishing industry last week was just…