Larger cities tend to lead the way when it comes to releasing popular data sets in bulk, according to new research from a government transparency advocacy group.
The Sunlight Foundation published a blog post Wednesday detailing how cities post open data online, based on results from the U.S. City Open Data Census — a project the group supports where people can submit details about their city’s open data offerings online. The foundation found that, of the six cities with the highest percentage of key data sets available for bulk download, four had populations of 850,000 or more.
Indeed, Los Angeles and New York led the way nationwide, with each posting 94.7 percent of their data sets on popular topics like crime, property taxes and city contracts in formats that let users download the data in bulk. San Francisco and Chicago placed fourth and fifth respectively, with totals just under 90 percent.
“Although many cities on the census digitize official documents and put them online, looking at the availability of bulk data downloads could help show which cities are more thoroughly engaged with opening their datasets,” Richard Yarrow, a Sunlight policy intern, wrote in the post.
Hartford, Connecticut and Santa Monica, California were the outliers among the top group — though neither is a particularly large city, with populations of roughly 125,000 and 92,000 respectively, they managed to rank third and sixth in bulk data percentages respectively.
Yarrow suggests in the post that those cities stick out because of their “long and well-recognized track records with open data,” but in general, “cities with larger populations tend to do better at implementing open data.” He notes that “the correlation remains even when excluding the census’ largest and smallest cities.”
[Read more: States flunk push for open data — report]
He cautions that “small population sizes don’t prevent cities from having strong open data programs,” but larger cities undoubtedly seem to have a leg up when posting quality data that’s useful to people hoping to work with the information. But as for why this trend has popped up, Yarrow admits that a variety of possible explanations exist.
“It is possible that cities learn from and compete with a select group of ‘peers’ with similar sizes or locations,” Yarrow wrote. “Many of the first cities to develop open data programs were large metropolises like New York and Los Angeles; other large or nearby ‘peer’ cities may have been faster to see what the early adopters accomplished and then catch on. Another possibility is that, while many cities face calls for greater openness and transparency, cities with particularly active tech communities are more oriented toward the finer details of open data — thus presenting machine-readable bulk data rather than many separate pages of scanned PDF.”
Sunlight also examined which cities struggle to make data available in bulk formats.
Atlanta has the dubious distinction of appearing at the bottom of the list with just 33.3 percent of its data available in bulk, while Detroit isn’t much ahead at 35.7 percent. The group found that St. Louis; Tulsa, Oklahoma; Anchorage, Alaska and Washington, D.C. all tied at 36.8 percent to round out the bottom six cities included in the census.
There are fewer population-related trends to note among the struggling cities — they range in size from Detroit and Washington, D.C.’s populations of more than 670,000 people each to Anchorage’s population of 300,000.
However, Yarrow does point out that, though each city may have issues when it comes to releasing data sets in bulk, “otherwise they may be doing a decent job releasing datasets.”
Specifically, he notes that Atlanta even has a higher overall score on the census than Hartford does, though the cities are at opposite ends of the spectrum when it comes to bulk data. Yarrow isn’t certain why this trend has emerged, but he has one key suggestion.
“Large populations, however, may give cities an extra push — not only to create an open data initiative, but to create one well,” Yarrow wrote. “That could explain why bulk access to data is even more closely linked to population size than generic open data indicators, and why more complicated facets of open data — like bulk access, machine-readability and open licensing — tend to appear in the same cities. “