- The Open Spending Data that Isn’t (OKFN) — the UK government mandated councils release details of expenditure over 500 pounds in size. Councils have been sending data to a proprietary service and claiming this is releasing it. Everyone needs to realise that government must always wholesale its data (offer bulk downloads), even when it doesn’t retail that data (offer useful visualisation or analysis tools for it).
- SenseAware — sensors for shipping that wirelessly report back where they are, whether there’s light (i.e., has the container been opened), what the temperature is. (via data4all on Twitter)
- Open Science, Open Data, Open Methods (Ben Goldacre) — open data is sometimes no use unless we also have open methods. (via OKFN)
- Sones — cross-platform open source graph database built on Mono.
ENTRIES TAGGED "databases"
Data Wholesaling, Sensor Networks, Transparent Science, and a Graph Database
A deep look at Oracle's motivations and MySQL's future
MySQL, MySociety, NoSQL DB, and NoSQL Conference Notes
- Common MySQL Queries — a useful reference.
- MySociety’s Next 12 Months — two new projects, FixMyTransport and “Project Fosbury”. The latter is a more general tool to help people organise their own campaigns for change.
- riak — scalable key-value store with JSON interface. (via joshua on Delicious)
- Notes from NoSQL Live Boston — full of juicy nuggets of info from the NoSQL conference.
On March 11 Boston will join several other cities who have host conferences on the movement broadly known as NoSQL. Cassandra, CouchDB, HBase, HypergraphDB, Hypertable, Memcached, MongoDB, Neo4j, Riak, SimpleDB, Voldemort, and probably other projects as well will be represented at the one-day affair. The interviews I had with various projects leaders for this article turned up a recurring usage pattern for NoSQL. What connects the users is that they carry out web-related data crunching, searching, and other Web 2.0 related work. I think these companies use NoSQL tools because they’re the companies who understand leading-edge technologies and are willing to take risks in those areas. As the field gets better known, usage will spread.
Access to local information is great, but context is even better
There’s plenty of enthusiasm for local / hyperlocal projects, but the sweepstakes has yet to be won. So many of these local efforts rely on traditional information delivery through news articles or databases. That material has use, no doubt. Yet few projects take the extra step and put that data into context.
Electronics Hacking FAQs, Speech-To-Text Democracy, Open Source Column Database, Massive Online Analysis
- ChipHacker — collaborative FAQ site for electronics hacking. Based on the same StackExchange software as RedMonk’s FOSS FAQ for open source software.
- Democracy Live — BBC launch searchable coverage of parliamentary discussion, using speech-to-text. One aspect we’re particularly proud of is that we’ve managed to deliver good results for speech-to-text in Welsh, which, we’re told, is unique. I think of this as the start of a They Work For You for video coverage. I’d love to be able to scale this to local government coverage, which is disappearing as local newspapers turn into delivery mechanisms for real estate advertisements.
- InfiniDB: Open Source Column Database — hooks into MySQL, uses MySQL for SQL parsing, security, etc. The commercial enterprise version has multi-server support (parallel scale-out). (via Brian Aker)
- Massive Online Analysis — MOA is a framework for data stream mining. Includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, also written in Java, while scaling to more demanding problems. . (via joshua on Delicious)
iPhone App Backstory, Cookie Resurrection, The Entrepreneuralism Lickmus test, and An Interesting Database
- The Making of the NPR News iPhone App — interesting behind-the-scenes look, with sketches and all. Station streams, however, presented a larger challenge. To begin with, NPR didn’t have direct stream links for any of its stations, so we built a Web spider that identified and captured more than 300 iPhone-compatible station streams. After that first pass, we worked with our station representatives to manually test each stream. In the process they found enough new streams to double our database. All of these streams are delivered to the app from NPR’s Station Finder API. (via mattb on Twitter)
- You Deleted Your Cookies? Think Again (Wired) — Flash keeps its own cookies, which are harder to delete. Several services even use the surreptitious data storage to reinstate traditional cookies that a user deleted, which is called ‘re-spawning’ in homage to video games where zombies come back to life even after being “killed,” the report found. So even if a user gets rid of a website’s tracking cookie, that cookie’s unique ID will be assigned back to a new cookie again using the Flash data as the “backup.” (via Simon Willison)
- Would You Lick It? (Rowan Simpson) — clever example of what it takes to be an entrepreneur.
- FluidDB — a shared “in the cloud” database built around tags: an object is a container for a set of tags which are name:value pairs, tag names have simple namespaces (e.g., “gnat/review” is the “review” tag in my namespace), all objects are world readable and writable but there are ACLs for tags, values can be any type (string, number, URL, Excel spreadsheet), and there’s a simple query language. I’m curious to see what applications spring up around shared data. They’re in limited alpha, controlling the # of users, so register now to play before everyone else.
Ancient Language, NoSQL, Molecular Gastronomy, SQL Weirdness
- Computers Unlock More Secrets of the Indus Valley Script — Four-thousand years ago, an urban civilization lived and traded on what is now the border between Pakistan and India. During the past century, thousands of artifacts bearing hieroglyphics left by this prehistoric people have been discovered. Today, a team of Indian and American researchers are using mathematics and computer science to try to piece together information about the still-unknown script. The team led by a University of Washington researcher has used computers to extract patterns in ancient Indus symbols. The study, published this week in the Proceedings of the National Academy of Sciences, shows distinct patterns in the symbols’ placement in sequences and creates a statistical model for the unknown language. (via ACM TechNews)
- NoSQL: If Only It Was That Easy — war stories of the problems with nosql systems to handle big throughput. We liked Tokyo Tyrant so much, we put it in production. In fact, every request to AboutUs.org hits Tokyo. One of the uses is as a persistent memcached replacement for caching 10 million+ wiki pages (as a json document of all the pieces of our page, which comes out to around 51gb(edited) of data), and it works great. It runs on a single server, it serves up a single type of data, very quickly, and has been a pleasure to use. We keep other ancillary data sets on some other servers too, and it’s great for this. Tokyo Tyrant is a great example of very performant software, but it doesn’t scale. (via straup on Delicious)
- WillPowder — Specialty Powders and Spices from Chef Will Goldfarb — molecular gastronomy products from “the golden boy of pastry”. (via joshua on Delicious)
- What is the Deal with NULLs? — In the past, I’ve criticized NULL semantics, but in this post I’d just like to explain some corner cases that I think you’ll find interesting, and try to straighten out some myths and misconceptions. [...] I believe the above shows, beyond a reasonable doubt, that NULL semantics are unintuitive, and if viewed according to most of the “standard explanations,” highly inconsistent. (via bos on Delicious)
Meatware Hacks, iPhone Web Stats, Distributed Hash Tables, Richard Feynman Fun
- Freedom for OS X — Mac app that disables networking for up to eight hours so you can get work done without Internet distractions. Technology workarounds for meatware bugs. (via Joshua-Michèle Ross).
- iPhone Casts a Giant Shadow on the Web — 43% of mobile web traffic is from iPhone users, as measured by “the world’s largest purveyor of ads on mobile apps and websites”. As I was told today, “more people are spending more time looking at the web through one of these. For how much longer can you afford to ignore it?” (via timoreilly on Twitter)
- Why you won’t be building your killer app on a distributed hash table (Jonathan Ellis) — locking and sophisticated queries. I’m still trying to figure out where we’ll end up with these “let’s do something simple in a way that lets us scale horizontally, and then build on top of that” approaches to solving the big data/graph theory problems behind many modern apps.
- Richard Feynman Interviews at Microsoft — a bit of fun to start the weekend on. (new URL 20090601)