Open data groups plan volunteer search for all data sets maintained by Calif. localities

A trio of open data advocacy groups want to create a centralized catalog of the databases that localities in California use.

A trio of open data advocacy groups are banding together to lead a volunteer effort aimed at creating a comprehensive list of all of the government databases maintained by California’s localities.

The Data Foundation, the Electronic Frontier Foundation and the Sunlight Foundation are teaming up to hold the “Great California Database Hunt” on Saturday, coordinating a daylong campaign of volunteers around the country to sift through the inventories of enterprise systems that each local government agency in the state maintains.

Hudson Hollister, interim president of the Data Foundation, told StateScoop that this joint effort stems from a change to California’s public records law that took effect on July 1. Last year’s S.B. 272 directed all local agencies (except school districts) to post lists of the public databases they maintain, and while Hollister sees tremendous potential in that particular requirement, he also feels there’s a clear need for a centralized “data set of all the data sets.”

“This new California law tries to make sense of all these valuable open data sets across all the local agencies, but it doesn’t set up any way of figuring out where all of them are,” Hollister said. “So as a public service, we’re rallying people who, surprisingly, are willing to come in on a Saturday, eat some pizza and drink some beer and try to identify these.”


Hollister noted that the new law may require agencies to publish these lists of databases, but it doesn’t mandate how exactly departments need to post them. That’s led to “very little consistency” among the formats the various agencies are using, which he thinks can make the process of examining the state’s data offerings quite cumbersome for people hoping to use localities’ lists to file public records requests and get their hands on the information.

“Some agencies have just chosen to publish a PDF document on their website,” Hollister said. “On the other end of the spectrum, there are some agencies that have published their inventory as itself a fully open data set, with rich metadata, all sorts of great stuff.”

[Read more: Open gov groups team up on new database of local open data policies]

Accordingly, Hollister hopes that some volunteers will be able to “map out where the useful ones are and where the ones that need improvement are,” and take a “step in the direction” of providing a single resource of all available databases at the local level.

“I’m not saying we’re going to create a single dataset of all the hundreds and hundreds of different municipalities and local governments in California all publishing, we hope, hundreds of data sets just yet,” Hollister said. “But this is going to be a baseline for maybe finding those connections.”


Indeed, Hollister sees this sort of effort moving California in the same direction the federal government is heading thanks to the Digital Accountability and Transparency Act of 2014, which directs all federal agencies to make their spending information available as open data.

“Every department has instructions to combine seven different data sources into one data set and then report that out and aggregate it across the entire federal government,” Hollister said. “And that’s one of the future benefits of open data that we’re working toward. When will it become possible to take a particular company and see the filings that it submits to the government as a regulated entity?”

But to realize that sort of goal, Hudson believes “you’ve got to start at least identifying” what data sets are out there, “then you can get an idea of what the possibilities are of combining them.”

He even hopes that this sort of advocacy can help move California toward adopting a “standard business reporting” standard, where “all the different agencies and entities in government that collect information from businesses, they all adopt a consistent open data structure.”

“This means that a company can automatically report simultaneously to all these different entities,” Hollister said. “By doing this kind of project, we are both identifying where the resources are for people who want to use them to build businesses or do good things, or do the research if we’re ever able to persuade California or some other state to see and pursue the possibility of this.”


Hollister also plans to continue pushing California’s state agencies to embrace similar data set reporting standards as the localities themselves. But in the meantime, he’s hopeful that this first effort to track down the state’s databases proves to be a good jumping off point for similar events going forward.

“I’d be happy if one person came and had a beer and a pizza with me on Saturday, and I think we’re going to have an office full and that’s fantastic,” Hollister said. “And this is not the only time we’ll do this. I’m hoping that we get the chance to do it again.”

Latest Podcasts