"government data" entries

Making open data more valuable, one micropayment at a time

Yo Yoshida's startup, Appallicious, is using San Francisco's government data as a backbone.

When it comes to making sense of the open data economy, tracking cents is valuable. In San Francisco, where Mayor Ed Lee’s administration has reinvigorated city efforts to release open data for economic benefits, entrepreneur Yo Yoshida has made the City by the Bay’s government data central to his mobile ecommerce startup, Appallicious.

Appallicious is positioning its Skipitt mobile platform as a way for cities to easily process mobile transactions for their residents. The startup is generating revenue from each transaction the city takes with its platform using micropayments, a strategy that’s novel in the world of open data but has enabled Appallicious to make enough money to hire more employees and look to expand to other municipalities. I spoke to Yoshida last fall about his startup, what it’s like to go through city procurement, and whether he sees a market opportunity in more open government data.

Where did the idea for Appallicious come from?

Yo Yoshida: About three years ago, I was working on another platform with a friend that I met years ago, working on a company called Beaker. We discovered a number of problems. One of them was being able to find our way around San Francisco and not only get information, but be able to transact with different services and facilities, including going to a football game at the 49ers stadium. Why couldn’t we order a beer to our seats or order merchandise? Or find the food trucks that were sitting in some of the parks and then place an order from that?

So we were looking at what solutions were out there via mobile. We started exploring how to go about doing this. We looked first at the vendors and approaching them. That’s been done with a lot of other specific verticals. We started talking to the city a little bit. We looked at the open data legislation that was coming out at that time and said, “This is the information we need, but now we also need to be able to figure out how to monetize and populate that.” Read more…

Want to analyze performance data for accountability? Focus on quality first.

A data-driven investigation of emergency response times by the Los Angeles Data Desk found larger issues.

Here’s an ageless insight that will endure well beyond the “era of big data“: poor collection practices and aging IT will derail any institutional efforts to use data analysis to improve performance.

According to an investigation by the Los Angeles Times, poor record-keeping is holding back state government efforts to upgrade California’s 911 system. As with any database project, beware “garbage in, garbage out,” or “GIGO.”

As Ben Welsh and Robert J. Lopez reported for the L.A. Times in December, California’s Emergency Medical Services Authority has been working to centralize performance data since 2009.

Unfortunately, it’s difficult to achieve data-driven improvements or manage against perceived issues by applying big data to the public sector if the data collection itself is flawed. The L.A. Times reported quality issues stemmed from how response times were measured to record keeping on paper to a failure to keep records at all. Read more…

Visualization of the Week: Australia’s weather and wave forecast maps

The Australian Bureau of Meteorology adds new colors to forecast maps to accommodate rising high temps topping 129 degrees Fahrenheit.

Australia’s Bureau of Meteorology recently had to update its interactive weather forecasting chart to add new colors. Peter Hannam explains at The Sydney Morning Herald that the previous temperature range topped out at 50 degrees Celsius (122 degrees Fahrenheit) to accommodate forecasted high temperatures. Now, they’ve had to add two new colors, dark purple and bright pink, to represent temperatures up to 54 degrees Celsius (129.2 degrees Fahrenheit). Hannam captured a forecast map for 5 p.m. January 14 that required the new deep purple color.

Read more…

Panjiva uses government data to build a global search engine for commerce

Successful startups look to solve a problem first, then look for the datasets they need.

“If you go back to how we got started,” mused Josh Green, “government data really is at the heart of that story.” Green, who co-founded Panjiva with Jim Psota in 2006, was demonstrating the newest version of Panjiva.com to me over the web, thinking back to the startup’s origins in Cambridge, Mass.

At first blush, the search engine for products, suppliers and shipping services didn’t have a clear connection to the open data movement I’d been chronicling over the past several years. His account of the back story of the startup is a case study that aspiring civic entrepreneurs, Congress and the White House should take to heart.

“I think there are a lot of entrepreneurs who start with datasets,” said Green, “but it’s hard to start with datasets and build business. You’re better off starting with a problem that needs to be solved and then going hunting for the data that will solve it. That’s the experience I had.”

The problem that the founders of Panjiva wanted to help address was one that many other entrepreneurs face: how do you connect with companies in far away places? Green came to the realization that a better solution was needed in the same way that many people who come up with an innovative idea do: he had a frustrating experience and wanted to scratch his own itch. When he was working at an electronics company earlier in his career, his boss asked him to find a supplier they could do business with in China.

“I thought I could do that, but I was stunned by the lack of reliable information,” said Green. “At that moment, I realized we were talking about a problem that should be solvable. At a time when people are interested in doing business globally, there should be reliable sources of information. So, let’s build that.”

Today, Panjiva has created a higher tech way to find overseas suppliers. The way they built it, however, deserves more attention.

Read more…

Visualization of the Week: Visualizing D.C. homicides

The Washington Post developed an interactive map using data from area homicides from 2000 through 2011.

Residents in Washington D.C., or citizens considering a move to D.C., have a new tool to assess the city’s homicide rate. As part of a 15-month investigative study, The Washington Post has created an interactive map of the homicides in D.C. from 2000 through 2011. The interactive tool lets users drill down into the information by demographic, motive and manner of murder, for instance — all of which can also be isolated by neighborhood or by individual homicide.

Click here for the full interactive map.

Read more…

A grisly job for data scientists

Matching the missing to the dead involves reconciling two national databases.

Missing Person: Ai Weiwei by Daquella manera, on FlickrJavier Reveron went missing from Ohio in 2004. His wallet turned up in New York City, but he was nowhere to be found. By the time his parents arrived to search for him and hand out fliers, his remains had already been buried in an unmarked indigent grave. In New York, where coroner’s resources are precious, remains wait a few months to be claimed before they’re buried by convicts in a potter’s field on uninhabited Hart Island, just off the Bronx in Long Island Sound.

The story, reported by the New York Times last week, has as happy an ending as it could given that beginning. In 2010 Reveron’s parents added him to a national database of missing persons. A month later police in New York matched him to an unidentified body and his remains were disinterred, cremated and given burial ceremonies in Ohio.

Reveron’s ordeal suggests an intriguing, and impactful, machine-learning problem. The Department of Justice maintains separate national, public databases for missing people, unidentified people and unclaimed people. Many records are full of rich data that is almost never a perfect match to data in other databases — hair color entered by a police department might differ from how it’s remembered by a missing person’s family; weights fluctuate; scars appear. Photos are provided for many missing people and some unidentified people, and matching them is difficult. Free-text fields in many entries describe the circumstances under which missing people lived and died; a predilection for hitchhiking could be linked to a death by the side of a road.

I’ve called the Department of Justice (DOJ) to ask about the extent to which they’ve worked with computer scientists to match missing and unidentified people, and will update when I hear back. One thing that’s not immediately apparent is the public availability of the necessary training set — cases that have been successfully matched and removed from the lists. The DOJ apparently doesn’t comment on resolved cases, which could make getting this data difficult. But perhaps there’s room for a coalition to request the anonymized data and manage it to the DOJ’s satisfaction while distributing it to capable data scientists.

Photo: Missing Person: Ai Weiwei by Daquella manera, on Flickr

Read more…

Visualization of the Week: 56 years of tornadoes

Data from NOAA is used to map the strength and paths of tornadoes.

John Nelson's visualization taps NOAA historical data to map tornado paths and strengths.

Strata Week: Big data boom and big data gaps

One report says the Hadoop market is booming while another says federal data usage isn't.

In this week's big data news, an IDC report points to the booming market for Hadoop and MapReduce (and if proposals for Strata are any indication, this is indeed a good time for big data).

Strata Week: New life for an old census

The 1940 census makes its data debut, and the White House shows off its data initiative.

In this week's data news, the National Archives releases the data from the 1940 Census, the federal government outlines its big data plans, and an app uproar leads to good thinking on privacy and sharing.

Help drive the data revolution in health care

The goal of the Health Data Initiative is to be the NOAA of health data.

The Health Data Initiative’s annual “Health Datapalooza” is behing held June 5-6 in Washington, D.C. The deadline for applications is just a few weeks away (March 30).