Strong public engagement and rigorous planning will allow agencies to open new data with confidence, the Berkman Klein Center says.
To provide answers in a field where they are perpetually in short supply, the Berkman Klein Center for Internet & Society released a new guide on Monday called Open Data Privacy.
The 110-page tactical guide represents a year of work by researchers who sought to equip city leaders with tools and information that would allow them to ride the line between privacy and function. Proponents of open data have claimed for years that much of technology's true potential remains locked away in databases. Case studies showing best practices — and what not to do — are paired with guidelines and printable charts that allow leaders to map the risks and benefits associated with unleashing new data sets upon a world filled with bad actors who ply personally identifying information against unsuspecting citizens for personal gain.
Key recommendations offered by the report include:
- Conduct risk-benefit analyses to inform the design and implementation of open data programs.
- Consider privacy at each stage of the lifecycle.
- Develop operational structures and processes that codify privacy management widely throughout the city.
- Emphasize public engagement and public priorities as essential aspects of data management programs.
Anyone can provide high-level advice, but this report aims to provide leaders with a tool for tactical decision-making, said Ben Green, fellow at the Berkman Klein and data analytics fellow at the Boston Department of Innovation and Technology.
A risk-benefit matrix allows leaders to think through their plans to open new data sets. (Berkman Klein Center for Internet & Society)
"In today's data-rich world, the definition of sensitive and not sensitive is no longer black and white and so cities are really struggling to figure out what data to release, how they should release it, how they should talk about it, what those risks actually are and what they can do about it, so we wanted to provide a guide to help them navigate this really complex thicket of challenges," Green said.
John Correllus, North Carolina's chief data officer and deputy chief information officer, told StateScoop in an email that this issue is central to government's challenges with open data.
"One of our biggest concerns is to ensure that the data has been properly classified so that we understand how the data needs to be protected, and of course, ensuring we de-identify where appropriate to protect our data owners — the citizens," Correllus said.
Avoiding the privacy challenges of open data is easy if a leader decides simply not to open data, but in recognition of the immense value that data can hold, the researchers attempted to place the balance between utility and risk at the center of their work, Green explained. But privacy is among the most complex issues in technology. This report attempts to cover the breadth of the issue — from lifecycle management to public engagement.
"One thing we find doesn't get enough attention is public relationships," Green said. "We talk a lot about how to manage the data, how to release the data and all of that, but at the end of the day, so much of this boils down to public trust, because we're working in this sort of gray area where there's not a law saying this is acceptable to release, this is not acceptable to release. Most of the data is not really-defined."
Some cities, like Seattle, have addressed the public engagement piece in part through the designation of a chief privacy officer, an official charged with first surveying and then protecting the public's trust with regard to privacy. That task grows increasingly difficult the more sophisticated technology becomes. Around 2010, an idea began circulating in popular media that it takes only 33 bits of data — just a few data points that seem benign taken alone — to identify an anonymous internet user online.
"The strongest consensus that you'll get today among computer scientists is that there's no clear definition of anonymity," Green said. "De-identification is not a well-defined thing. That's a huge issue if your job as a data official is to release data that has been properly de-identified."
Most data published online today steers clear of the ill-defined bounds of the privacy landscape, said Abhi Nemani, former chief data officer at the City of Los Angeles and board member of the OpenGov Foundation and Data4America, in an email to StateScoop.
"Probably the biggest challenge for privacy issues when it comes to government data is the absence of rules or regulations (ie oversight regimes) to dictate what is and what isn't allowable," Nemani said — most policies err on the side of holding data, rather than risk publishing something that might come back to haunt everyone.
The full report — Open Data Privacy — can be found here.