Panjiva uses government data to build a global search engine for commerce

Successful startups look to solve a problem first, then look for the datasets they need.

“If you go back to how we got started,” mused Josh Green, “government data really is at the heart of that story.” Green, who co-founded Panjiva with Jim Psota in 2006, was demonstrating the newest version of Panjiva.com to me over the web, thinking back to the startup’s origins in Cambridge, Mass.

At first blush, the search engine for products, suppliers and shipping services didn’t have a clear connection to the open data movement I’d been chronicling over the past several years. His account of the back story of the startup is a case study that aspiring civic entrepreneurs, Congress and the White House should take to heart.

“I think there are a lot of entrepreneurs who start with datasets,” said Green, “but it’s hard to start with datasets and build business. You’re better off starting with a problem that needs to be solved and then going hunting for the data that will solve it. That’s the experience I had.”

The problem that the founders of Panjiva wanted to help address was one that many other entrepreneurs face: how do you connect with companies in far away places? Green came to the realization that a better solution was needed in the same way that many people who come up with an innovative idea do: he had a frustrating experience and wanted to scratch his own itch. When he was working at an electronics company earlier in his career, his boss asked him to find a supplier they could do business with in China.

“I thought I could do that, but I was stunned by the lack of reliable information,” said Green. “At that moment, I realized we were talking about a problem that should be solvable. At a time when people are interested in doing business globally, there should be reliable sources of information. So, let’s build that.”

Today, Panjiva has created a higher tech way to find overseas suppliers. The way they built it, however, deserves more attention.

Government data as a platform

By 2009, the startup had an initial product they could bring to market and launched a search engine that used government data as a platform for international trade. An importer could type in “patio furniture” and
determine who shipped it and who their customers were. The company chose a freemium model, where search is available for free but relationships between suppliers are only available to subscribers. The mapping of relationships between buyers and suppliers is where Panjiva delivered added value on top of public data.

That added value is crucial, given that competitors can also request and use the dataset. “Companies have been packaging and reselling this data in one way or another for years, ” said Green. “If you looked at this data, people are going to find value. It’s typically folks in the shipping industry, who want to know what’s going into ports or moving on different shipping lines. For us, the central purpose of the data was something different and required more work.”

That work paid off. In 2010, Panjiva built a search engine for global commerce that worked. Today, they have more than 100,000 users in 190 countries using its free service and some 3,700 companies subscribing to the paid version, including 42 Fortune 500 companies.

Notably, the Department of Homeland Security (DHS) itself is also a paying subscriber. Green declined to disclose the terms of relationships with all of Panjiva’s partners or data suppliers, some of which include nonprofits. Some users “do a revenue share, some are paying for data, others are providing data because they think there’s public good for that data being on the platform,” he said.

Panjiva competes with ImportGenius, Zepol, AliBaba and PIERS. Green credits PIERS for extracting similar value from customs datasets.

The turning point

When they started, the first approach that Green and his co-founder decided to take was to build a “Yelp for global trade” that would be based on feedback from people who work with companies. Unfortunately for the young startup, they couldn’t get off the starting blocks in generating reviews, much less reach critical mass.

They also encountered a new problem: even if they were able to get ratings of exporters and suppliers, how would they ensure the reviews came from people who had actually done business with the entities being rated? In retrospect, that focus was a bit silly, said Green, because they couldn’t get engagement, but talking about how to solve it led them to an unexpected answer: government data.

That direction came from a meeting where a staffer for a trade promotion organization told them it was straightforward to get shipping data on what’s coming into the country from the United States Customs Agency, which is now part of the Department of Homeland Security.

“It was a turning point for the company,” said Green. “We realized there was a dataset available to the public for a fee. They make available data about shipments that enter the U.S. While not all data is made available to the public and there are a bunch of limitations, the data that is made available is amazing. There’s about 10 million shipping records every year, typically including who is sending goods, who is receiving goods, what’s inside, and how much is inside a container.”

While useful, these government datasets do come with inherent limitations, cautioned Green. For one, they only contain data about shipments coming into the United States, not what’s going into Europe or Asia. For another, the data made available to the public only covers shipments made by boat, which is about about half of the shipments that come into the United States.

“It’s unfortunate that government cannot make available data on other modes of transport,” observed Green, with a hint of frustration in his voice. “That leaves out truck, rail, and air. Congress actually attempted to clarify that the regulations that govern this data weren’t just about boats but applied to air. Thus far, DHS hasn’t acted.”

Given the lens that has been focused on trade deficits between other countries and the United States in recent decades, there’s also a political angle to the market intelligence Panjiva provides that Congress and taxpayers may find of interest. For instance, Panjiva data showed global trade growth slowing in the first part of 2012.

“What we’ve organized, by its nature, gives us insight on companies around the world that serve the U.S. market,” said Green, “We’re helping people find overseas suppliers. Why not help find suppliers here at home? It turns out there’s a similar story on export data that’s supposed to be made to the public as well. DHS has a hard time with that as well. We can’t get the data.”

Data availability is also affected by the actions of the companies themselves, which have the ability to petition the government to hide shipments that are coming to them. “In about a third of the cases, you cannot see who is sending and receiving the goods,” said Green. “Government can see, but what’s released to the public has information pulled from it.”

This government data comes at a cost

Accessing this public data comes at a cost of some $100 per day, which is the service fee DHS charges for providing a daily CD-ROM. Each disc includes one day’s worth of shipments, which is generally around 30,000 shipping records. Panjiva started requesting data on July 1, 2007, and now has a little over five years of records.

“This data, on a record-by-record basis, is interesting,” said Green. “If you can organize, it’s phenomenal. If you can associate with companies, can say this company has experience with these supplies and this company has experience with these customers, it’s very useful in deciding if a company is a good fit. You can see by customers if they’re reasonably high quality.”

Making those CD-ROMs into a useful, searchable resource, however, was far from a simple matter of just inserting them into an optical drive and moving their contents into a structured database.

“Jim and a team of engineers went to work organizing the datasets initially,” said Green. “They were very hard to work with — absurdly messy. Think about the number of ways you can misname a Chinese factory. It was really problematic. You need to build company profiles, correct for misspellings and variations on names. We spent years getting that right.” Eventually, Panjiva was able to automate the process of ingesting the data from the CD-ROMs, building an algorithm to take the data and clean it up.

Making data a strategic asset

Panjiva’s initial foray, which created a search engine for customs data, didn’t meet with strong demand out of the gate. As they refined the product, it generated what Green described as a “nice business.” The startup was profitable, in other words, but its leadership aspired to build something bigger.

The direction they took was driven by user feedback. When Panjiva also asked its users about how they were making buying decisions, they saw a pattern emerge that looked like a bigger opportunity.

“Users started with Panjiva then went to search for additional information on B2B sites or on Google,” said Green. “We heard this process and it sounded a lot like the experience consumers had searching for flights before search engines or Kayak.com — except that instead of airline sites, people are going to B2B sites. The difference is it’s not just every airline. It’s like every flight has its own website.”

The founders now have raised just under $10 million from Battery Ventures and Harrison Metal, and invested it in technology and data acquisition. They’ve now grown their engineering team to 10 people, out of a total of 50 or so current employees. The engineering team is focused on improving search and enriching Panjiva’s data with other sources, beyond government data.

This October, the startup relaunched Panjiva.com with another layer: data supplied by the companies themselves.

“We have a database of six million companies spread around the world and contact information on four million companies,” said Green. “We have product photos for 34 million products. There was a lot of investment required to do that, but none of this would be possible if we hadn’t had a backbone of data that came from the U.S. government.”

Since Panjiva added global search, Green said that traffic to the search engine has gone up 50%.

The data sources that Panjiva integrated were also driven by customer interest. As the founders shared their product with potential subscribers, they kept hearing the same thing: 1) “that’s awesome” and 2) “I’d like more data.”

“We loved the first one and hated the second,” said Green. “In retrospect, we should have loved both. The second one was a roadmap for us to build them a really great differentiated product.”

When they asked users exactly which kinds of data would make the service more useful, a map to the future of the company emerged.

The first was operational data. “Customs data is a perfect example,” said Green. “It gives you a sense of what companies have done and their track record.”

The second was financial data. “Sure, a company has experience, but are they financially healthy?” asked Green. “Some of that you can infer, but there’s other things you can use. We’ve partnered with Dun & Bradstreet and Experian to pull that data into our platform.”

The third was positive and negative data about a company. “That includes getting certified as financially responsible,” said Green. “We’ve partnered with nonprofits and added that data, showing you information about companies doing wrong, including a blacklist of illicit global trade.”

The key insight that anyone interested in building a business on top of government data should take away here is to go beyond.

What happens if government data becomes open?

Green thinks that Panjiva is well-positioned to be both competitive and profitable, even if DHS decided to start publishing customs data online. “We don’t worry that much about data becoming more accessible,” he said, “even if government data becomes free. It’s not the $36,500 per year to buy the data — it’s the engineering talent to clear it up. That’s a massive problem, and it wouldn’t be as simple as getting the data.”

Panjiva is betting that the investments they’ve made in technology, talent and — crucially — combining so many different data sources have created a differentiated product that solves a problem for its customers.

“We’re not trying to build out a data business where we’re reselling government data,” said Green. “We’re trying to build a platform where serious buyers and sellers can connect. We’re now going to the world’s most important buyers. We have two revenue streams: selling premium access to data and selling access to suppliers who want it. The starting point for customers is $99 per month, going up to $10,000 per month for unlimited access for an unlimited number of users, then services that we sell on the top.”

The experience that Panjiva has had with government data and building a business using it has left Green with a strong perspective on what works — and what doesn’t.

“We don’t think there are infinite numbers of possibilities in terms of ways to build sustainable value with public data,” he said. “One is to take datasets that are commoditizeable and add value. Another is to feed the creation of more data. Another is to build a service. Another is to create network effects, where the data is the honey that attracts the bees.”

Most important, Green suggested, is to use public data to solve a problem that’s both hard and important. For Panjiva, that means making global trade more efficient and more transparent.

“There is a future where information is consolidated and accessible to people making key decisions, from a buying or regulatory standpoint,” he said. “Once that happens — and we’re close — there’s potentially a place where there’s a race to the top instead of the bottom, in terms of supply chain records. That will make a difference when you’re under scrutiny. Right now, the fragmentation of data is the ally of bad behavior. Our hope is to change that reality.”

Strata Conference Santa Clara — Strata Conference Santa Clara, being held Feb. 26-28, 2013 in California, gives you the skills, tools, and technologies you need to make data work today.

Save 20% on registration with the code STRATA20

tags: , , , , ,