- The Internet Arcade — classic arcade games, emulated in the browser.
- Writing Reviewable Code — good advice.
New Math, Business Math, Summarising Text, Clipping Images
- Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It (Jennifer Ouellette) — Yale University mathematician Ronald Coifman says that what is really needed is the big data equivalent of a Newtonian revolution, on par with the 17th century invention of calculus, which he believes is already underway.
- Is Google Jumping the Shark? (Seth Godin) — Public companies almost inevitably seek to grow profits faster than expected, which means beyond the organic growth that comes from doing what made them great in the first place. In order to gain that profit, it’s typical to hire people and reward them for measuring and increasing profits, even at the expense of what the company originally set out to do. Eloquent redux.
- textteaser — open source text summarisation algorithm.
- Clipping Magic — Instantly create masks, cutouts, and clipping paths online.
How illustrations and a clear path can enhance a story.
A clear reading path isn't always a bad thing. Here's an example where imagery advances the narrative and guides the reader along a defined trajectory.
Two examples of how digital images and associated text can stick together.
The fluidity of digital content occasionally sends images in one direction and text in another. Here's a look at two design experiments that keep digital assets together.
Regular Expressions, Mac Git, Open Source Patents, and Pepys Lessons
- Rubular — a way to write and test regular expressions interactively. Very cool. (via Adam Fields)
- gitx — OSX ui for git. (via Marc Hedlund)
- Open Source Critical to Competition (Simon Phipps) — DOJ and German Federal Cartel Office see danger for open source in Novell’s patents being acquired by a consortium of Oracle, Microsoft, Apple, and EMC (fancy!) and are taking steps to ensure open source is protected.
- My Talk about Samuel Pepys’s Diary as an Online Story (Phil Gyford) — I love the ways Phil has stretched and repurposed the web’s affects for storytelling. Listen to this talk. (via BoingBoing)
Stream Processing, Semantic Web, Location Services, and PDF Extraction
- S4 — S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Open-sourced (Apache license) by Yahoo!.
- RDF and Semantic Web: Can We Reach Escape Velocity? (PDF) — spot-on presentation from the data.gov.uk linked data advisor. It nails, clearly and in only 12 slides, why there’s still resistance to linked data uptake and what should happen to change this. Amen! (via Simon St Laurent)
- Pew Internet Report on Location-based Services — 10% of online Hispanics use these services – significantly more than online whites (3%) or online blacks (5%).
- Slate — Python library for extracting text from PDFs easily.
Amazon Margins, Crowdsourced Science, Data Tool Opensourced, Document Splitting
- AWS: Forget the Revenue, Did You See the Margins? (RedMonk) — According to UBS, Amazon Web Services gross margins for the years 2006 through 2014 are 47%, 48%, 48%, 49%, 49%, 50%, 50.5%, 51%, 53%. (these are analyst projections, so take with grain of salt, but those are some sweet margins if they’re even close to accurate)
- Science Pipes — an environment in which students, educators, citizens, resource managers, and scientists can create and share analyses and visualizations of biodiversity data. It is built to support inquiry-based learning, allowing analysis results and visualizations to be dynamically incorporated into web sites (e.g. blogs) for dissemination and consumption beyond SciencePipes.org itself. (via mikeloukides on Twitter)
- ScraperWiki Source Code — AGPL-licensed source to the ScraperWiki, a tool for data storage, cleaning, search, visualization, and export.
- Doc split — a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages…)