A year after updating its open data policy, New York City is pushing ahead with plans to automatically refresh more of its data sets and tie public records requests to the open data process — moves hailed by open government advocates, even as some worry about the data’s quality.
In a letter to Mayor Bill de Blasio and the city council earlier this month, a year after his “Open Data for All” initiative launched, Department of Information Technology and Telecommunications reported it entered 156 new data sets on the city’s online portal over the last year, and noted that more than 200 of the roughly 1,500 data sets included on the site are now configured to update automatically.
Additionally, the letter highlighted how the IT department and the Mayor’s Office of Data Analytics have been working to comply with the seven new laws de Blasio signed to improve the portal, including a new focus on reviewing data made available through Freedom of Information Law, or FOIL, requests for inclusion on the site.
“At its core, it’s about making sure that the people who the data belongs to can get access to it,” Amen Ra Mashariki, the city’s chief analytics officer and director of the mayor’s data office, told StateScoop.
Mashariki said some of the city’s successes meeting the goals of the “Open Data for All” initiative, which de Blasio launched last July, can be chalked up to the fact that they’re now managing the open data program more holistically. Rather than DoITT and the mayor’s data office each working to monitor the portal, they’ve formed a single “open data team” to oversee their efforts in the area.
In particular, he thinks this increased emphasis on teamwork has helped the departments adapt to the new open data laws, and they “spent a lot of time diving into the laws and understanding their impact on the agencies.” With the directive from council and the mayor to start examining FOIL requests to see if data released can be included on the city’s portal, the open data team found themselves with plenty of complicated questions to answer.
“The idea is that if a data set can be released, then we should probably investigate releasing the full data set so it’s no longer just piecemealing things out,” Mashariki said. “if we can release this portion of it, then should we actually release all of it?”
But Albert Webber, DoITT’s open data program manager, noted that examining open records requests can also help smooth the data release process. As they’ve started to look at the data sets people are frequently requesting, he thinks they can chart a course toward saving agencies time and money by releasing them proactively.
“Looking at the FOIL requests is sort of like crowdsourcing or letting the public help with the identifying of the data that should be public,” Webber said.
John Kaehny, executive director of the New York government transparency group Reinvent Albany, is “really impressed” with those results as well. He points to the city’s Taxi and Limousine Commission dataset detailing yellow cab trips as one that’s “hugely, hugely useful” but only accessible via a cumbersome records request.
“That was a data set getting roughly 75 FOIL requests a year … and you’d literally have go to the [commission] with a brand new hard drive in the box, give it to them, they would dump the data onto it and then give it back to you,” Kaehny said. “So this is a perfect example of how open data is supposed to work.”
National observers also praised the city for the practice, noting that the notion of tying open data to open records requests is still a relatively rare one.
Stephen Larrick, open data program lead for the nonprofit Sunlight Foundation, said his research indicates that only 38 states and localities even mention public records in their open data policies and fewer still tie the release of data to information that people frequently request — in fact, just nine other localities have adopted that sort of policy.
“Open data is a continuation of that feedback loop that open records rely on,” Larrick said.
But Kaehny is equally enthusiastic about the city’s automation efforts, calling staffers’ work to automate 100 data sets in the last year an “A-plus performance,” making information more up to date and cutting down on human data entry errors.
Mashariki hopes those improvements get more people using the automated data sets, since they can be more confident in its accuracy. Meanwhile, Webber is thrilled by how much time each automation saves his team, particularly after they found a way to automatically update “geospatial data sets” like maps a few months back.
“It used to be we’d get 50 files at a time from agencies that we needed to update, and that could take two or three days of time,” Webber said. “Now we don’t have to load that data.”
These advances aside, Kaehny said the “big criticism” he has left for the city’s progress report revolves around what it doesn’t include: any mention of how the department plans to address issues with the quality of some data sets.
From “inconsistent geospatial data” to “bad metadata” to erroneous data duplication, Kaehny sees issues that the city could address if it had a better process for accepting feedback from the public and reacting to those comments.
“The average person, average user, they have a lot of trouble getting answers when they report data problems, and this is a big deal,” Kaehny said. “But there’s zero discussion of that in the progress report, no forward looking statement on ‘Here’s how we’re going to address this.’”
But Mashariki charges that the city is indeed considering these sorts of questions, and is currently “working to have a more robust customer service capability in short order.”
“We want to begin to have a more city-centric landing page for all things open data where we can have communications and engagement,” Mashariki said. “We are moving quite aggressively, and so we’ll be rolling out our mechanism and ability to be responsive to any level of users requests and observations and we’re looking at all of the technologies and tools available to us.”
Kaehny suggests that something as simple as setting up “expert email lists,” letting experienced users communicate directly with staffers about issues, has worked in places like Philadelphia and Chicago. Similarly, he thinks devoting more staff to this problem could help yield solutions as well.
Yet Larrick also cautions that the city will have to think about how to encourage users beyond experienced developers to actually contribute comments. While many cities (New York included) have comment sections and “request a dataset” buttons, he feels few have managed to find a way to build engaging tools for “real-time feedback.”
“There’s a real UX question here,” Larrick said. “How much human-centered or user-centered design has really been done on this stuff?”
Indeed, Kaehny recognizes that these are uncharted waters for the city to navigate, but he’s anxious to see some results all the same.
“There are no great examples of that happening anywhere in the world right now, so this is hard work and this is new territory,” Kaehny said. “Now we’re getting to the next level of open data, making sure that it’s right and that it works.”