- CloudUSB — a USB key containing your operating environment and your data + a protected folder so nobody can access you data, even if you lost the key + a backup program which keeps a copy of your data on an online disk, with double password protection. (via ferrouswheel on Twitter)
- FCC APIs — for spectrum licenses, consumer broadband tests, census block search, and more. (via rjweeks70 on Twitter)
- Sibyl: A system for large scale machine learning (PDF) — paper from Google researchers on how to build machine learning on top of a system designed for batch processing. (via Greg Linden)
- The Surprisingness of What We Say About Ourselves (BERG London) — I made a chart of word-by-word surprisingness: given the statement so far, could Scribe predict what would come next?
ENTRIES TAGGED "scale"
Thumb Drives and the Cloud, FCC APIs, Mining on GFS, Check Your Prose with Scribe
Non-Profits, UK Legislation, Mobile Web Variation, and Scaling
- How to Raise Funds for Non-Profits (Joi Ichi) — One organization sent a message to all of their donors during the Haiti crisis asking them to give to an NGO that they had vetted. They didn’t ask for any money for themselves. This had a hugely positive effect and the donors trust in the group increased. Wallets aren’t zero sum.
- legislation.gov.uk — very elegant legislation system for the UK. Check out the annual analysis, for example. (via rchards on Twitter)
- The Great WebKit Comparison Table — So far I’ve tested 14 different mobile WebKits, and they are all slightly different. You can find the details below. (via Andrew Savikas)
- Node and Scaling in the Small vs Scaling in the Large (al3x) — In a system of no significant scale, basically anything works. The power of today’s hardware is such that, for example, you can build a web application that supports thousands of users using one of the slowest available programming languages, brutally inefficient datastore access and storage patterns, zero caching, no sensible distribution of work, no attention to locality, etc. etc. Basically, you can apply every available anti-pattern and still come out the other end with a workable system, simply because the hardware can move faster than your bad decision-making.
- Tondo Interactive Table to Analyze Medical Errors (MedGadget) — use of a multitouch table to help clinical staff identify and track medical errors. (via IVLINE on Twitter)
- Steve Huffman Lessons Learned While at Reddit (SlideShare) — uptime and scale. It’s interesting that most everyone reinvents tuples as a way to scale databases, hence the popularity of NoSQL systems.
- Hernando de Soto: Shadow Economies — de Soto is an economist, and this ends up talking about the need for transparency and open data. As long as you don’t know who owns the greatest amount of your assets, there’s no info as to who owns what, who is related to what, you have a shadow economy. We live in one, and it has as a characteristic a permanent credit crunch. We know more about it than you do. Credit crunch is where you don’t know who you’d be lending to, so you don’t lend. It’s permanent, we live with it, and now you’re going to have to learn to live with it too, because until you know who is solvent how can you give anybody credit? You’re flying blind. (via Jon Udell)
Fair Use Economy, Deconstituted Appliances, 3D Vision, Redis for Fun and Profit
- Fair Use in the US Economy (PDF) — prepared by IT lobby in the US, it’s the counterpart to Big ©’s fictitious billions of dollars of losses due to file sharing. Take each with a grain of salt, but this is interesting because it talks about the industries and businesses that the fair use laws make possible.
- Disassembled Household Appliances — neat photos of the pieces in common equipment like waffle irons, sandwich makers, can openers, etc. (via evilmadscientist)
- GelSight — gel block on a sheet of glass, lit from below with lights and then scanned with cameras, lets you easily capture 3D qualities of the objects pressed into it. Very cool demo–you can see finger prints, pulse, and even make out designs on a $100 bill.
- Redis Tutorial (Simon Willison) — Redis is a very fast collection of useful behaviours wrapped around a distributed key-value store. You get locks, IDs, counters, sets, lists, queues, replication, and more.
Government Dashboard, Science Code Errors, Scaling Online Games, Information Theory
- Track DC — informative drill-down report from Washington DC government about the different departments. (via Sunlight Labs blog)
- Errors in Scientific Software — a 1994 study of scientific software that found inconsistent interfaces (1 in 7 for Fortran, 1 in 37 for C) and poor use of arithmetic such that significant figures declined from 6sf in the data to 1sf in the result. (via “If you’re going to do good science, release the computer code too” in the Guardian)
- How Farmville Scales — 75M players/month (28M/day), 1/4 of disk activity is writes, 50% higher load spikes, 3G/s traffic go between Farmville and Facebook at peak, LAMP stack, nagios+munin+puppet. (via Hacker News)
- Mathematical Philology — when two manuscripts of the same text differ, which is correct? This PLoSONE paper looked at all such discrepancies in Lucretius’s De Rerum Natura and found that the traditional principle of choosing the more difficult reading (on the grounds that errors are from humans unconsciously simplifying) has a strong information theory justification for it. Interesting to see this less than a week after an MIT Technology Review article on quantum teleportation remarked, There is a growing sense that the properties of the universe are best described not by the laws that govern matter but by the laws that govern information.
Code for Speed, Wooden Locks, Font Design, and a Java Distributed Data Store
- Why Git Is So Fast — interesting mailing list post about the problems that the JGit folks had when they tried to make their Java version of Git go faster. Higher level languages hide enough of the machine that we can’t make all of these optimizations. A reminder that you must know and control the systems you’re running on if you want to get great performance. (via Hacker News)
- Wooden Combination Lock — you’ll easily understand how combination locks work with this find piece of crafty construction work.
- From Moleskine to Market — how a leading font designer designs fonts. Fascinating, and beautiful, and it makes me covet his skills.
- Terrastore — open source distributed document store, HTTP accessible, data and queries are distributed, built on Terracotta
which is built on ehcache(updated: Terracotta has an ehcache plugin, but isn’t built on ehcache). A NoSQL database built on Java tools that serious Java developers respect, the firstsuch one that I’ve noticed (update: I brain-farted: neo4j was definitely on my radar). Notice that all the interesting work going on in the NoSQL arena is happening in open source projects.
Trading Systems, Streaming iTunes, Scheduling App, Crowdsourcing Lessons
- Trading Shares in Milliseconds (Technology Review) — With the rise of automation, the bulk of U.S. stock trading has moved from the once-crowded floor of Manhattan’s New York Stock Exchange (NYSE) to silent server farms run by exchanges and broker-dealers across the country: the proportion of all trades that the NYSE handles has shrunk from 80 percent in 2005 to 40 percent today. Trading is now essentially a virtual art, and its practitioners put such a premium on speed that NASDAQ has considered issuing equal 100-foot lengths of cable to the brokers who send orders to its exchange servers. (via Hacker News)
- Stream iTunes Over SSH — short script that lets you tunnel itunes from one machine to another over ssh (by default iTunes only shares on the local network).
- Doodle — simple way to schedule a common meeting time. (via joshua on Delicious)
- Crowdsourcing — Simon Willison’s thoughtful “lessons learned” from his crowdsourcing projects at the Guardian. Crowdsourcing is not as simple as “give them a wiki and they will fill it” (this is related to the failed “everyone in the world wants to work on my broken payroll system” theory of open source), and Simon explains some of the subtleties. The reviewing experience the first time round was actually quite lonely. We deliberately avoided showing people how others had marked each page because we didn’t want to bias the results. Unfortunately this meant the site felt like a bit of a ghost town, even when hundreds of other people were actively reviewing things at the same time. For the new version, we tried to provide a much better feeling of activity around the site. We added “top reviewer” tables to every assignment, MP and political party as well as a “most active reviewers in the past 48 hours” table on the homepage (this feature was added to the first project several days too late). User profile pages got a lot more attention, with more of a feel that users were collecting their favourite pages in to tag buckets within their profile.
Cognitive Surplus, Scaling, Chinese Blogs, CS Education for Growth
- Eight Billion Minutes Spent on Facebook Daily — you weren’t using that cognitive surplus, were you?
- How We Made Github Fast — high-level summary is that the new “fast, good, cheap–pick any two” is “fast, new, easy–pick any two”. (via Simon Willison)
- Isaac Mao, China, 40M Blogs and Counting — Today, there are 40 million bloggers in China and around 200 million blogs, according to Mao. Some blogs survive only a few days before being shut down by authorities. More than 80% of people in China don’t know that the internet is censored in their country. When riots broke out in Xinjiang province this year, the authorities shut down internet access for the whole region. No one could get online.
- Congress Endorses CS Education as Driver of Economic Growth — compare to Economist’s Optimism that tech firms will help kick-start economic recovery is overdone.
Network File System, Internet Use, Lovelace Comic, Search User Interfaces
- Ceph — open source distributed filesystem from UCSC. Ceph is built from the ground up to seamlessly and gracefully scale from gigabytes to petabytes and beyond. Scalability is considered in terms of workload as well as total storage. Ceph is designed to handle workloads in which tens thousands of clients or more simultaneously access the same file, or write to the same directory-usage scenarios that bring typical enterprise storage systems to their knees. (via joshua on delicious)
- Daily Internet Activities, 2000-2009 — Pew Charitable Trust’s Internet usage survey. We’ve finally broken 50% of Americans using the Internet daily. Twitter is almost a rounding error. (via dhowell on Twitter)
- The Thrilling Adventures of Lovelace and Babbage — fantastic comic, with end-notes that explain how Babbage and Lovelace’s lives and works are reflected in the action of the comic. (via suw on Twitter)
- Search User Interfaces — full text of this book about the different (successful and un-) interfaces to search. (via sebchan on Twitter)
Guest blogger Scott Ruthfield is a Program Committee member of the O’Reilly Velocity: Web Performance & Operations Conference. Web Operations is not for the casual observer: it’s for a particular kind of adrenaline junkie that’s motivated by graphs and servers spinning out of control. Jumping in, on-your-feet analysis, and experience-based-experimentation are all part of solving new problems caused by unexpected user and machine behavior,…