"databases" entries

Four short links: 1 March 2013

Four short links: 1 March 2013

Drone Journalism, DNS Sniffing, E-Book Lending, and Structured Data Server

  1. Drone Journalismtwo universities in the US have already incorporated drone use in their journalism programs. The Drone Journalism Lab at the University of Nebraska and the Missouri Drone Journalism Program at the University of Missouri both teach journalism students how to make the most of what drones have to offer when reporting a story. They also teach students how to fly drones, the Federal Aviation Administration (FAA) regulations and ethics.
  2. passivednsA network sniffer that logs all DNS server replies for use in a passive DNS setup.
  3. IFLA E-Lending Background Paper (PDF) — The global dominance of English language eBook title availability reinforced by eReader availability is starkly evident in the statistics on titles available by country: in the USA: 1,000,000; UK: 400,000; Germany/France: 80,000 each; Japan: 50,000; Australia: 35,000; Italy: 20,000; Spain: 15,000; Brazil: 6,000. Many more stats in this paper prepared as context for the International Federation of Library Associations.
  4. The god Architecturea scalable, performant, persistent, in-memory data structure server. It allows massively distributed applications to update and fetch common data in a structured and sorted format. Its main inspirations are Redis and Chord/DHash. Like Redis it focuses on performance, ease of use and a small, simple yet powerful feature set, while from the Chord/DHash projects it inherits scalability, redundancy, and transparent failover behaviour.
Four short links: 20 December 2012

Four short links: 20 December 2012

SQL Indexes, Instagram Effects in JS, Evil Fake Keyboard, and Preschool UX

  1. Use The Index, Luke — free ebook on tuning SQL database access.
  2. CamanJS — Instagram-like filters in Javascript, permissively-licensed open source. (via VentureBeat)
  3. Don’t Stick That There — USB device pretending to be a keyboard. The benefit of this is that even with USB auto-run disabled, our exploit will still work as it emulates a keyboard. No one ever blocks USB keyboards! (via David Sklar)
  4. Best Practices: Designing Touch Tablet Experiences for Preschoolers (Sesame Workshop) — the good people at Sesame Street Workshop tell what works and what doesn’t when you make tablet touch UIs for kids. Double Tap: Children expect immediate feedback from their touch and tend to think the app is unresponsive when a double tap is required. We suggest only using double tap to prevent a child from accidental navigation (e.g., leaving an activity, accessing parent content).
Four short links: 25 October 2012

Four short links: 25 October 2012

Big Data's Big Picture, Real-Time Queries, Real-Time Queries, Single-Process Real-Time Queries

  1. Big Data: the Big Picture (Vimeo) — Jim Stogdill’s excellent talk: although Big Data is presented as part of the Gartner Hype Cycle, it’s an epoch of the Information Age which will have significant effects on the structure of corporations and the economy.
  2. Impala (github) — Cloudera’s open source (Apache) implementation of Google’s F1 (PDF), for realtime queries across clusters. Impala is different from Hive and Pig because it uses its own daemons that are spread across the cluster for queries. Furthermore, Impala does not leverage MapReduce, allowing Impala to return result in real-time. (via Wired)
  3. druid (github) — open source (GPLv2) a distributed, column-oriented analytical datastore. It was originally created to resolve query latency issues seen with trying to use Hadoop to power an interactive service. See also the announcement of its open-sourcing.
  4. Supersonic (Google Code) — an ultra-fast, column oriented query engine library written in C++. It provides a set of data transformation primitives which make heavy use of cache-aware algorithms, SIMD instructions and vectorised execution, allowing it to exploit the capabilities and resources of modern, hyper pipelined CPUs. It is designed to work in a single process. Apache-licensed.

A grisly job for data scientists

Matching the missing to the dead involves reconciling two national databases.

Missing Person: Ai Weiwei by Daquella manera, on FlickrJavier Reveron went missing from Ohio in 2004. His wallet turned up in New York City, but he was nowhere to be found. By the time his parents arrived to search for him and hand out fliers, his remains had already been buried in an unmarked indigent grave. In New York, where coroner’s resources are precious, remains wait a few months to be claimed before they’re buried by convicts in a potter’s field on uninhabited Hart Island, just off the Bronx in Long Island Sound.

The story, reported by the New York Times last week, has as happy an ending as it could given that beginning. In 2010 Reveron’s parents added him to a national database of missing persons. A month later police in New York matched him to an unidentified body and his remains were disinterred, cremated and given burial ceremonies in Ohio.

Reveron’s ordeal suggests an intriguing, and impactful, machine-learning problem. The Department of Justice maintains separate national, public databases for missing people, unidentified people and unclaimed people. Many records are full of rich data that is almost never a perfect match to data in other databases — hair color entered by a police department might differ from how it’s remembered by a missing person’s family; weights fluctuate; scars appear. Photos are provided for many missing people and some unidentified people, and matching them is difficult. Free-text fields in many entries describe the circumstances under which missing people lived and died; a predilection for hitchhiking could be linked to a death by the side of a road.

I’ve called the Department of Justice (DOJ) to ask about the extent to which they’ve worked with computer scientists to match missing and unidentified people, and will update when I hear back. One thing that’s not immediately apparent is the public availability of the necessary training set — cases that have been successfully matched and removed from the lists. The DOJ apparently doesn’t comment on resolved cases, which could make getting this data difficult. But perhaps there’s room for a coalition to request the anonymized data and manage it to the DOJ’s satisfaction while distributing it to capable data scientists.

Photo: Missing Person: Ai Weiwei by Daquella manera, on Flickr

Read more…

Four short links: 26 July 2012

Four short links: 26 July 2012

Drone Overload, Mac MySQL Tool, Better Cancer Diagnosis Through AI, and Inconstant Identifiers

  1. Drones Over Somalia are Hazard to Air Traffic (Washington Post) — In a recently completed report, U.N. officials describe several narrowly averted disasters in which drones crashed into a refuĀ­gee camp, flew dangerously close to a fuel dump and almost collided with a large passenger plane over Mogadishu, the capital. (via Jason Leopold)
  2. Sequel Pro — free and open source Mac app for managing MySQL databases. It’s an update of CocoaMySQL.
  3. Neural Network Improves Accuracy of Least Invasive Breast Cancer Test — nice use of technology to make lives better, for which the creator won the Google Science Fair. Oh yeah, she’s 17. (via Miss Representation)
  4. Free Harder to Find on Amazon — so much for ASINs being permanent and unchangeable. Amazon “updated” the ASINs for a bunch of Project Gutenberg books, which means they’ve lost all the reviews, purchase history, incoming links, and other juice that would have put them at the top of searches for those titles. Incompetence, malice, greed, or a purely innocent mistake? (via Glyn Moody)

The key web technologies that work together for dynamic web sites

An interview with the author of Learning PHP, MySQL, JavaScript, and CSS

The technologies that led to an explosion of interactive web sites — PHP, MySQL, JavaScript, and CSS — are still as popular today, and a non-programmer can master them quickly.

Four short links: 15 June 2012

Four short links: 15 June 2012

On Anonymous, Graph Database, Leap Second, and Debugging Creativity

  1. In Flawed, Epic Anonymous Book, the Abyss Gazes Back (Wired) — Quinn Norton’s review of a book about Anonymous is an excellent introduction to Anonymous. Anonymous made us, its mediafags, masters of hedging language. The bombastic claims and hyperbolic declarations must be reported from their mouths, not from our publications. And yet still we make mistakes and publish lies and assumptions that slip through. There is some of this in all of journalism, but in a world where nothing is true and everything is permitted, it’s a constant existential slog. It’s why there’s not many of us on this beat.
  2. Titan (GitHub) — Apache2-licensed distributed graph database optimized for storing and processing large-scale graphs within a multi-machine cluster. Cassandra and HBase backends, implements the Blueprints graph API. (via Hacker News)
  3. Extra Second This June — we’re getting a leap second this year: there’ll be 2012 June 30, 23h 59m 60s. Calendars are fun.
  4. On Creativity (Beta Knowledge) — I wanted to create a game where even the developers couldn’t see what was coming. Of course I wasn’t thinking about debugging at this point. The people who did the debugging asked me what was a bug. I could not answer that. — Keita Takahashi, game designer (Katamari Damacy, Noby Noby Boy). Awesome quote.

MySQL in 2012: Report from Percona Live

Checking in on the state of MySQL.

Contrasting deployments at craigslit and Pinterest, trends, commercial offerings, and more

The NoSQL movement

How to think about choosing a database.

A relational database is no longer the default choice. Mike Loukides charts the rise of the NoSQL movement and explains how to choose the right database for your application.

Four short links: 8 August 2011

Four short links: 8 August 2011

Graph ORM, Graphic Computation, Web Intents, and Async RPC

  1. Bulbflow — a Python framework for graph databases: it’s like an ORM for graphs. (via Joshua Schachter)
  2. Nomograms — the lost art of graphical computing. (via John D Cook)
  3. Web Intents — adding Android-style Intents to the web. Services register their intention to be able to handle an action on the user’s behalf. Applications request to start an Action of a certain verb (share, edit, view, pick etc) and the system will find the appropriate Services for the user to use based on the user’s preference.
  4. Finagle (GitHub) — Twitter’s asynchronous network stack for the JVM that you can use to build asynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, or any JVM-hosted language. Finagle provides a rich set of tools that are protocol independent.