California to clean up messy open data with metadata

A recent report from the state portal's governing agency reveals incoming upgrades and a national trend in open data toward increased accessibility and standardization.

California’s open data portal has grown from a concept to a pilot, and then into an official site in September. Now the state is taking on what may be one of its most ambitious updates so far: the use of metadata to interpret its highly diverse data sets.

The move is ambitious considering that, like most states, California uses different formats and schemas to label its data from agency to agency and sometimes even from project to project. This is one of the ongoing challenges that officials — and developers of apps designed to track government data — are burdened with when searching for data sets across agencies and services. The state’s drive to create metadata standards for the portal — one part in a series of ongoing developments — could lay a framework for civic tech advocates and officials to more readily build tools that can process data sets and ultimately change lives.

The Los Angeles Mayor’s Dashboard illustrates how this kind of data clarity can track progress and spur results. Using open data, the dashboard can monitor dozens of city performance indicators in a single view. This extends to the number of housing permits that are issued, to new payroll jobs, to the number of graffiti reports that are serviced within 48 hours. If California were to have a single statewide lens to view its performance metrics, leaders could gain a greater understanding of how its resources are used and where it can improve its services, but that’s a dream unlikely to be realized with the messy data it deals with today.

Since launch, the California Department of Technology reported this month the portal has grown to 325 data sets taken from across the state. The department says its goal is to publish a total of 425 data sets by next December as it reviews new features and technologies to enhance user experience.


The California Government Operations Agency’s (CalGov Ops) Stuart Drown, the portal’s founder and deputy secretary for innovation and accountability, said the site has undergone a radical metamorphosis since its start in 2015 when it was just a micro-portal for the GreenGov Data Challenge, the state’s sustainability coding competition. In 2017 Drown said the metadata will play an important role to improve the portals architecture.

“We’re excited about it. More people in the state are learning about it and those who already use it are learning California’s data can be used for lots of different applications,” Drown said. “The big hits we hope to achieve this year with the open data portal is to get metadata standards out — we’re holding a meeting tomorrow about it,” Drown said. “And we’re looking to automate how data sets are refreshed and updated this year.”

This is just the beginning of the updates that are coming this year. CalGov Ops Chief Data Officer Zachary Townsend, who is taking over management of the portal, will be working to improve the speed of data updates and to increase the data’s searchability with the metadata as well.

‘The big question’

After finishing the portal’s redesign and pilot phase in 2016, CalGov Ops partnered with the civic tech company Granicus to open the portal’s source code using DKAN, an open source data management platform. Drown said the state wanted to ensure the portal was completely transparent and that demanded code that was as searchable and accessible as the data sets it was hosting. Transparency and civic tech advocates have long rallied behind open source because it’s sharable and non-proprietary.


This is just the beginning of the updates that are coming this year. CalGov Ops Chief Data Officer Zachary Townsend, who is taking over management of the portal, will be working to improve the speed of data updates and to increase the data’s searchability.

Using metadata and creating openness via open source are core tenets of the Sunlight Foundation, the national transparency organization based in Washington D.C. Stephen Larrick, Sunlight’s open cities director, said the potential for the portal to have a strong impact is high.

“The big question now is how can they institutionalize this?” Larrick said. “How can they make this a continuous improvement project, a continuous part of how the site goes through iterations and how, ultimately, the data gets used. That’s the big issue, and not just for California, but for open data programs and governments throughout the country.”

Larrick said the state’s transparency programs combined with a desire for a more collaborative government will likely define how the state’s portal and open data initiatives progress.

Transparency initiatives in California have included the launch of Open Justice in 2015, a public dashboard for criminal justice data; the development of URSUS in 2016, a police collection system for incident and use-of-force data; and the California Health & Human Services portal, that continues to be a source of health state information.


“One of the things that … I think is critical going forward is that you don’t want the state’s data to be in silos,” Larrick said. “Because a lot of times the true value of this data comes from the way you can layer different perspectives on different problems and different data sets can be interwoven together, and so the state’s goal of integrating this seems right to me.”

Other major upgrades to California’s portal may depend on new transparency legislation. For instance, the publication of enterprise data inventories — the complete catalogue of an agency’s private and public data — would require stricter transparency laws, but ultimately support a wider array of data requests from the public.

Human-centered data

While open source and metadata add to a portal’s value, Larrick noted there are areas where California can improve.

Cities like New York and Boston have widened the reach of their data by making their portals more user-friendly and accessible by providing suggestions and examples on how to begin exploring and using the data. To celebrate its five year anniversary, New York City recently gave its data portal a simplified design, making it more navigable for average resident. In Boston, the city launched a redesigned beta version of its open data portal that embraces similar user-friendly features.


“This changes the platform and the way the user interacts with the site so it’s intuitive as possible, but it also is about a human-centered mentality that where government talks to the people who are actually in need for this data,” Larrick said.

The “Contact Us” page was one page he singled out. California’s site only has an email address for residents to send their data requests and comments. New York City, on the other hand, has a contact page that is geared toward civic engagement and includes a civic tech event calendar alongside numerous prompts for data requests and comments.

“It’s early, but user-centered design is not ever far from our minds and we have some other things in the works coming soon,” Drown said.

Latest Podcasts