In an interview with StateScoop, the group's project lead says it's almost finished with an unprecedented data set to illuminate the behavior and motivations of the American voter.
The open government movement is many-faceted, but with democratic principles serving as such a prominent motivator for many of its participants, gathering election data is considered by some to be their most crucial pursuit.
To learn more about ongoing efforts to collect and share data pertaining to how and why people vote, StateScoop spoke with Derek Willis. He's a journalist, data scientist and project lead for OpenElections, a project funded by the Knight Foundation that is close to completing a data set of nationwide precinct-level elections results.
StateScoop: What is OpenElections?
OpenElections is a nonprofit effort to collect and publish certified election results at the precinct level. We also do county-level results, as well. But it's focused on collecting official certified precinct-level election results from all 50 states and the District of Columbia. Our goal is to do the most recent elections and then work our way backwards. Ideally to about 2000 or so if we can. In some states that's pretty easy and in other states that's pretty difficult.
OK, so what does this data look like?
It looks like a lot of different things because since states are in charge of how elections are run, the formats and appearance of election results really varies. Not only state to state, but even within states. Most states do publish, for example, a single county-level results file for an election. But when you're talking about precinct-level data, there's some states that produce a statewide precinct-level results file, but there's others who do not. For those, we have to go and contact individual counties, and there you see a lot of variation in terms of whether it's a spreadsheet, or whether it's a PDF with a printout of some software that they've been using to tally votes and produce reports. In other cases, it's essentially pictures of handwritten documents.
This sounds like it could be a lot of work.
Yeah, it varies. We've got some great volunteers and obviously there are some similarities. Even with all the differences, there are some things that are common to a lot of places. There's some tools we have that can help make that process go faster.
So, why is this needed — why are you doing this?
When this project started, the idea for it even, I was working at the New York Times and every couple of years when it came around for an election we have to sort of put together a bunch of election results, previous election results for example, and we'd find ourselves redoing that every couple of years. We thought it would be a great idea to have a set of election results, historic election results that we could rely on. I talked to a friend of mine, who at the time was working at the Washington Post and he agreed. And so, we decided to try to do that. We got some funding from the Knight Foundation and were able to get our website up and running and hire a developer to help us design a system that would help us collect and also parse through and convert this information into actual data, which we would then publish back out.
Originally the idea is that this data would be more useful for our work. That's still true, but what we've found is that there are other folks out there who also have a need for this kind of data, particularly at the precinct level. Whether that's political scientists that have this need, people who write about and study elections even if they're not political scientists, and essentially other reporters, other journalists. Even campaigns sometimes find it difficult to get this information, depending on how far back they want to go and where, which state they're talking about.
Can you provide an example of the kind of question you might want to answer with this data?
Yeah, for example, if you are trying to see if maybe a state legislative candidate, a member of Congress, if the support for them might be weakening over time. One way to look at that is to take a look at their primary elections, which a lot of folks don't collect data for. If somebody is pretty strong, they usually don't even get a primary challenger. But, maybe is somebody gets a primary challenger or even if they're unchallenged but they still have an election to go through, maybe the number of votes falls off a little bit. It's a way to kind of see what might be an issue there, in terms of whether there is some softness there, some weakness there that other political opponents might try to exploit. Or if you're a candidate or a campaign, whether you might consider running against that person. So that's one example.
Anther example would be essentially from the November '16 election, there was a lot of talk about who or where the voters that support the president are. Whether they're traditional Republicans or crossovers from other parties and things like that, so having results from multiple offices within the same jurisdiction can sometimes help to indicate whether, in this case, there's sort of like a generic Republican vote versus a Donald Trump vote.
You've got plenty of counties that report not only candidate results, but they have the option to vote an entire straight ticket. So, it's like, I'm gonna vote for the Republican candidate in every election where there's a Republican running. You can do that. You just check one box and you vote for all of the Republicans. So you can look at the Trump figures or you look at the straight party figures from this election, then compare them to the previous election. Or look at the straight party ticket numbers and then look at the results from the presidential race and then also results from other races on the ballot.
There are many, many places in November where Trump, for example, was not the leading vote getter among Republican candidates, which is a little unusual. It's not unheard of, but it's a little unusual for a presidential candidate not to be the recipient of the most votes among candidates of the same party. You have instances in which state legislative candidates would be out-polling the presidential candidate. So, it kind of helps you look at who the electorate is and the particular appeal of a candidate.
So, what is the ultimate goal of your work in getting this data out there?
There's a couple possible uses I think would be really what we had set out to do. One was to make it easier for reporters and people studying elections to have better context — to understand the context of elections in a particular state or county. To the extent it leads to better, more informed reporting — that was definitely a goal of ours.
The other one is for web applications, for making maps, interactive things that allow people to sort of contrast and compare a geography that they recognize. That's something that people have done with our data. That's also something that we were really interested in making possible. In many cases this information doesn't come as data. It comes as a PDF or something like that. We really wanted to make it possible for people to build things with election results data and then try to provide that data where we could.
What you're doing is a little different from the usual efforts you see in open data. Where do you see yourselves fitting into the whole open government space?
Yeah, there's definitely some overlap. There's definitely things that we have in common in terms of we're both about making more information available in a format that's easier to use.
I think one of the key differences here is that there's no single government agency that can really solve this information in the formats that we want it. In many places, it's not so much a matter of the state government not doing it, or not being willing to do it, or able to do it. It's just sort of the fact that they don't have some of this information. We're actually creating a data set by going out and finding the components of it, then putting it all together and converting it into a format that people can use.
I think government can and does a lot of that work, but a lot of that is driven by statutes, where like, "Hey, you have to compile and release this information." Or it's driven by these statutory-specific duties of an office. Put it this way, if every state government did this, there wouldn't really be a need for our project. But, not every state currently does, so that's the difference. We're doing what they are not doing.
What's the best way for people to follow your work?
There's a couple different ways. One is that when we publish elections results data, we do so on our website, which is OpenElections.net, if folks are really interested in a specific state, for example. We post information and data as we go along before we even finish a state on GitHub. We've got a Twitter account @openelex, and also a Google group that people can kind of sign up for and we send out periodic emails telling them what we're doing.
Any or all of those would be a good way to keep track of those. We get so many people who are interested in elections who maybe feel like they have something to offer us in terms of whether they have old election results they'd like to give us or point us to, or whether they want to volunteer. We definitely welcome that.
Great — thank you. Are there any announcements coming up that we should watch for?
We are closing in on a nationwide precinct-level elections results data set. This is not just for the presidential race, but it goes all the back to state legislature. We're about eight states away from finishing up and several of those eight are pretty far along. There's only a handful that really we have a lot of work to do. It's one of those where we have to go to every county, and there's 254 counties in Texas, so that's gonna take a while. But, we're really pretty close to being able to say, "Hey, if you would like precinct-level result file for the presidential race, or for Senate candidates" or things like that, we'd be able to provide that.