ENTRIES TAGGED "scale"
Data Sets, Data-driven Policy, Task Queues, and 8-Bit Browser
- DSPL: DataSet Publishing Language (Google Code) — a representation language for the data and metadata of datasets. Datasets described in this format can be processed by Google and visualized in the Google Public Data Explorer. XML metadata on CSV, geo-enabled, with linkable data. (via Michal Migurski on Delicious)
- Why is Evidence So Hard for Politicians — Ben Goldacre nails how politicians go about “evidence-based policy making”: So the Minister has cherry picked only the good findings, from only one report, while ignoring the peer-reviewed literature. Most crucially, he cherry-picks findings he likes whilst explicitly claiming that he is fairly citing the totality of the evidence from a thorough analysis. I can produce good evidence that I have a magical two-headed coin, if I simply disregard all the throws where it comes out tails.
- Celery: Distributed Task Queue — asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. MIT-style licensed, written in Python, RabbitMQ is the recommended message broker. (via Joshua Schachter on Delicious)
- pixelfari — Safari hacked to look like it’s running on an 8-bit computer. This sense of playfulness with the medium is something I love about the best coders. They think “ha, wouldn’t it be funny if …” and then can make it happen.
Amazon as Vendor, Distributed Tasks, Evolutionary Photofitting, and Basic Physics
- The Rise of Amazon Web Services — Stephen O’Grady points out that Amazon has become an enterprise sales company but we don’t treat it as such because we think of it as a retail company that’s dabbling in technology. I think of Amazon as an automation company: they automate and optimize everything, and a data center is just a warehouse for MIPS. (via Matt Asay)
- Celery Project — a distributed task queue. (via joshua on Delicious)
- Memory Upgrade (The Economist) — a photofit system that uses evolutionary algorithms to generate the suspects’ faces, and does clever things like animated distortions to call out features the witness might recall. Technology going beyond automated sketch artists.
- The Particle Adventure: The Fundamental of Matter and Force — basic physics in easy-to-understand language with illustrations, all in bite-size pieces (and 1998-era web design). I’m pondering what one of these would be like for computers, and whether “how do these actually work?” has the same romance as “how does the world really work?”.
Thumb Drives and the Cloud, FCC APIs, Mining on GFS, Check Your Prose with Scribe
- CloudUSB — a USB key containing your operating environment and your data + a protected folder so nobody can access you data, even if you lost the key + a backup program which keeps a copy of your data on an online disk, with double password protection. (via ferrouswheel on Twitter)
- FCC APIs — for spectrum licenses, consumer broadband tests, census block search, and more. (via rjweeks70 on Twitter)
- Sibyl: A system for large scale machine learning (PDF) — paper from Google researchers on how to build machine learning on top of a system designed for batch processing. (via Greg Linden)
- The Surprisingness of What We Say About Ourselves (BERG London) — I made a chart of word-by-word surprisingness: given the statement so far, could Scribe predict what would come next?
Non-Profits, UK Legislation, Mobile Web Variation, and Scaling
- How to Raise Funds for Non-Profits (Joi Ichi) — One organization sent a message to all of their donors during the Haiti crisis asking them to give to an NGO that they had vetted. They didn’t ask for any money for themselves. This had a hugely positive effect and the donors trust in the group increased. Wallets aren’t zero sum.
- legislation.gov.uk — very elegant legislation system for the UK. Check out the annual analysis, for example. (via rchards on Twitter)
- The Great WebKit Comparison Table — So far I’ve tested 14 different mobile WebKits, and they are all slightly different. You can find the details below. (via Andrew Savikas)
- Node and Scaling in the Small vs Scaling in the Large (al3x) — In a system of no significant scale, basically anything works. The power of today’s hardware is such that, for example, you can build a web application that supports thousands of users using one of the slowest available programming languages, brutally inefficient datastore access and storage patterns, zero caching, no sensible distribution of work, no attention to locality, etc. etc. Basically, you can apply every available anti-pattern and still come out the other end with a workable system, simply because the hardware can move faster than your bad decision-making.
- Tondo Interactive Table to Analyze Medical Errors (MedGadget) — use of a multitouch table to help clinical staff identify and track medical errors. (via IVLINE on Twitter)
- Steve Huffman Lessons Learned While at Reddit (SlideShare) — uptime and scale. It’s interesting that most everyone reinvents tuples as a way to scale databases, hence the popularity of NoSQL systems.
- Hernando de Soto: Shadow Economies — de Soto is an economist, and this ends up talking about the need for transparency and open data. As long as you don’t know who owns the greatest amount of your assets, there’s no info as to who owns what, who is related to what, you have a shadow economy. We live in one, and it has as a characteristic a permanent credit crunch. We know more about it than you do. Credit crunch is where you don’t know who you’d be lending to, so you don’t lend. It’s permanent, we live with it, and now you’re going to have to learn to live with it too, because until you know who is solvent how can you give anybody credit? You’re flying blind. (via Jon Udell)
Fair Use Economy, Deconstituted Appliances, 3D Vision, Redis for Fun and Profit
- Fair Use in the US Economy (PDF) — prepared by IT lobby in the US, it’s the counterpart to Big ©’s fictitious billions of dollars of losses due to file sharing. Take each with a grain of salt, but this is interesting because it talks about the industries and businesses that the fair use laws make possible.
- Disassembled Household Appliances — neat photos of the pieces in common equipment like waffle irons, sandwich makers, can openers, etc. (via evilmadscientist)
- GelSight — gel block on a sheet of glass, lit from below with lights and then scanned with cameras, lets you easily capture 3D qualities of the objects pressed into it. Very cool demo–you can see finger prints, pulse, and even make out designs on a $100 bill.
- Redis Tutorial (Simon Willison) — Redis is a very fast collection of useful behaviours wrapped around a distributed key-value store. You get locks, IDs, counters, sets, lists, queues, replication, and more.
Government Dashboard, Science Code Errors, Scaling Online Games, Information Theory
- Track DC — informative drill-down report from Washington DC government about the different departments. (via Sunlight Labs blog)
- Errors in Scientific Software — a 1994 study of scientific software that found inconsistent interfaces (1 in 7 for Fortran, 1 in 37 for C) and poor use of arithmetic such that significant figures declined from 6sf in the data to 1sf in the result. (via “If you’re going to do good science, release the computer code too” in the Guardian)
- How Farmville Scales — 75M players/month (28M/day), 1/4 of disk activity is writes, 50% higher load spikes, 3G/s traffic go between Farmville and Facebook at peak, LAMP stack, nagios+munin+puppet. (via Hacker News)
- Mathematical Philology — when two manuscripts of the same text differ, which is correct? This PLoSONE paper looked at all such discrepancies in Lucretius’s De Rerum Natura and found that the traditional principle of choosing the more difficult reading (on the grounds that errors are from humans unconsciously simplifying) has a strong information theory justification for it. Interesting to see this less than a week after an MIT Technology Review article on quantum teleportation remarked, There is a growing sense that the properties of the universe are best described not by the laws that govern matter but by the laws that govern information.
Code for Speed, Wooden Locks, Font Design, and a Java Distributed Data Store
- Why Git Is So Fast — interesting mailing list post about the problems that the JGit folks had when they tried to make their Java version of Git go faster. Higher level languages hide enough of the machine that we can’t make all of these optimizations. A reminder that you must know and control the systems you’re running on if you want to get great performance. (via Hacker News)
- Wooden Combination Lock — you’ll easily understand how combination locks work with this find piece of crafty construction work.
- From Moleskine to Market — how a leading font designer designs fonts. Fascinating, and beautiful, and it makes me covet his skills.
- Terrastore — open source distributed document store, HTTP accessible, data and queries are distributed, built on Terracotta
which is built on ehcache (updated: Terracotta has an ehcache plugin, but isn’t built on ehcache). A NoSQL database built on Java tools that serious Java developers respect, the first such one that I’ve noticed (update: I brain-farted: neo4j was definitely on my radar). Notice that all the interesting work going on in the NoSQL arena is happening in open source projects.
Trading Systems, Streaming iTunes, Scheduling App, Crowdsourcing Lessons
- Trading Shares in Milliseconds (Technology Review) — With the rise of automation, the bulk of U.S. stock trading has moved from the once-crowded floor of Manhattan’s New York Stock Exchange (NYSE) to silent server farms run by exchanges and broker-dealers across the country: the proportion of all trades that the NYSE handles has shrunk from 80 percent in 2005 to 40 percent today. Trading is now essentially a virtual art, and its practitioners put such a premium on speed that NASDAQ has considered issuing equal 100-foot lengths of cable to the brokers who send orders to its exchange servers. (via Hacker News)
- Stream iTunes Over SSH — short script that lets you tunnel itunes from one machine to another over ssh (by default iTunes only shares on the local network).
- Doodle — simple way to schedule a common meeting time. (via joshua on Delicious)
- Crowdsourcing — Simon Willison’s thoughtful “lessons learned” from his crowdsourcing projects at the Guardian. Crowdsourcing is not as simple as “give them a wiki and they will fill it” (this is related to the failed “everyone in the world wants to work on my broken payroll system” theory of open source), and Simon explains some of the subtleties. The reviewing experience the first time round was actually quite lonely. We deliberately avoided showing people how others had marked each page because we didn’t want to bias the results. Unfortunately this meant the site felt like a bit of a ghost town, even when hundreds of other people were actively reviewing things at the same time. For the new version, we tried to provide a much better feeling of activity around the site. We added “top reviewer” tables to every assignment, MP and political party as well as a “most active reviewers in the past 48 hours” table on the homepage (this feature was added to the first project several days too late). User profile pages got a lot more attention, with more of a feel that users were collecting their favourite pages in to tag buckets within their profile.
Cognitive Surplus, Scaling, Chinese Blogs, CS Education for Growth
- Eight Billion Minutes Spent on Facebook Daily — you weren’t using that cognitive surplus, were you?
- How We Made Github Fast — high-level summary is that the new “fast, good, cheap–pick any two” is “fast, new, easy–pick any two”. (via Simon Willison)
- Isaac Mao, China, 40M Blogs and Counting — Today, there are 40 million bloggers in China and around 200 million blogs, according to Mao. Some blogs survive only a few days before being shut down by authorities. More than 80% of people in China don’t know that the internet is censored in their country. When riots broke out in Xinjiang province this year, the authorities shut down internet access for the whole region. No one could get online.
- Congress Endorses CS Education as Driver of Economic Growth — compare to Economist’s Optimism that tech firms will help kick-start economic recovery is overdone.