Advertisement

PDFs are the tip of the spear for Code for America’s new AI studio

A new project at Code for America aims to help state and local governments make their documents more accessible in time for a 2026 deadline.
Listen to this article
0:00
Learn more. This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.
businessman with files in wheelbarrow
(Getty Images)

State and local governments overseeing populations of at least 50,000 people have 10 months remaining to comply with new accessibility rules, to ensure that people with disabilities of all sorts can find information and use services. And so governments are chipping away at their sprawling online presences: adding alternative text to images on their websites, swapping out low-contrast or hard-to-navigate designs, and perhaps most tedious of all, ensuring that thousands upon thousands of PDF files are properly formatted so that they can be parsed by assistive devices like screen readers.

Georgia.gov, which is just one state government’s main website, is home to about 50,000 PDFs and other static files, like PowerPoint slide decks. (That figure doesn’t even account for all the files on miscellaneous Georgia websites that aren’t part of the main site.) And about one-third of Georgia’s static files will need some form of remediation, according to Jennifer Thom, senior director of data science with the civic tech nonprofit Code for America.

State and local governments have gotten much more adept at web accessibility in recent years. What was an arcane science not taken especially seriously except by a minority who took it very seriously, has become a routine part of doing stuff on the web. Thanks to the popularity of human-centered design and the growing cultural relevance of digital technology, government web designers have largely escaped the institutional inertia that had justly made their digital creations objects of ridicule and scorn. Sites whose main lineages can be traced back to Geocities have been traded in for crisp typography, functional designs and 404 pages with cutesy warnings like the one on Texas.gov that says “hold on a second partner.”

In Georgia, officials have been following modern web standards for a decade. It ranks among the top five states in the nation for following accessibility rules, and plenty of other states are gaining ground. What remains of the accessibility problem in government is, for the most part, no longer cultural, but technical and specific.

Advertisement

“The last-mile problem people are experiencing is dealing with PDFs,” Thom said. “PDFs remain a huge problem in government and that was the project we felt like AI could do a lot to help cities and states be able to review the thousands of PDFs they’ll have to go through in order to meet this accessibility deadline.”

The artificial intelligence Thom referred to is a months-old tool developed by Code for America that Georgia and Salt Lake City, Utah, are using to wrangle their static files in time for the deadline next April. (Governments overseeing populations of fewer than 50,000 are allowed an additional year.) Its interface looks like a file manager, displaying rows of data that can be filtered and sorted, by type or date. A machine learning algorithm classifies the files into types — agreement, agenda, brochure, etc. — while a generative AI model of the user’s choosing summarizes each file’s contents and determines if it’s eligible for exemption from the new standards. (Files eligible for exemption are those that are kept online purely for archival purposes and have no relation to accessing services.)

The tool doesn’t automate much of the work states have to do in making their static files accessible, unless they want to use the tool’s generated descriptions for alt text, but it is helping them size up the project. Will Alford, director of content at the Georgia Technology Authority, the state’s IT bureau, said he thinks of it as a project management tool.

“One idea we had is that you could use this tool to identify fillable forms,” Alford said. “Thinking about those forms might be the most time-consuming, because they need to be recreated from scratch, versus many of the PDFs we see that need to remediated, just need to be marked up with a title, heading, body text within the document so the screenreaders can know what they’re looking at.”

While Georgia wants to improve its forms, Salt Lake City is concerned additionally with avoiding lawsuits and reducing its legal liability, said Code for America’s Thom. The web standards governments are meant to be following are set by the World Wide Web Consortium, the influential group founded in the 90s by the web’s creator, Tim Berners-Lee. But the requirement is driven by the Americans with Disabilities Act, and failures to comply might be legally interpreted as discrimination.

Advertisement

The ADA’s website cheerfully notes that the new rules are designed to “better serve all members of your community, including people with visual, auditory, physical, speech, cognitive, and neurological disabilities.” And it points out that following the rules comes with incentives: “For example, if a state government’s online tax form is inaccessible to people with disabilities, it can make it hard for the government to efficiently collect taxes.”

One of Salt Lake City’s files in need of remediation dates back to 1998 — “It’s a millennial PDF,” Thom joked (it actually missed the cut-off, making it Gen Z), “and so the person who created the PDF is probably not the person who’s reviewing it. So the summary [generated by the tool] helps the person who’s reviewing it get a little more context.”

The tool, developed using Ruby on Rails, is open source and available for free on Github, alongside many other Code for America projects. Georgia and Salt Lake City will continue to get tailored support from Thom and her “nimble, tech-focused” three-to-five person AI squad until next year’s deadline, but she said other governments may find it useful to use on their own. It was designed so that agencies can swap in different AI models as they like. Her team is also considering new features that would allow users to remediate files directly from the interface, rather than only label them for various fates — “remove PDF from website”, “convert to HTML” and “PDF in review” are three current examples.

The PDF tool is the first project developed by Code for America’s AI Studio, which launched early this year, a new team intended to be nimble enough to keep up with the rapid pace of development in the AI field, where fierce global competition drives weekly releases of smarter and cheaper models. She said the PDF problem rose to the top of the pile quickly once her team started searching for problems, because it’s so common in government, and because it presents a salient challenge to those who care about public service.

“Websites are the front door to government services and these websites often have a lot of PDFs,” Thom said. “The federal government in 2023 had 4 billion PDF downloads across agencies and some of those top forms that were downloaded were tax forms. These are things that provide direct government services to people, and if the PDFs are inaccessible then it really makes getting through that front door a lot harder.”

Advertisement

At a time when the second Donald Trump administration is cutting programs, particularly diversity and accessibility initiatives, eliminating federal workers who provide services and disappearing people who might already have been skeptical of authority, the government’s front door appears to many not just difficult to access, but bricked shut. The effects of federal cuts are already being felt in state and local governments, where many services are distributed, though they are still working. Fixing PDFs is a relatively modest, if onerous, project, but to Thom’s eye, an undeniably constructive one.

Latest Podcasts