Cybersecurity veteran, Washington state CISO Agnes Kirk retires
June 19, 2018
Since her start in 1995, Kirk watched as the relevance and reach of her role in government grew.
Commentary: Rémi Mercier of OpenDataSoft shares how the company's comprehensive list and map of open data portals came to be and what they've learned along the way.
Rémi Mercier is part of the team at French startup OpenDataSoft, where he is a marketing and communication project manager....
Did you ever browse the internet for hours on end trying to find data portals? Ever stalked data geeks on Twitter hoping they'll tweet a worthy data repository? Ever maniacally clicked on a five-year-old thread asking for data resources only to be disappointed by the answer (or lack thereof).
If you answered “yes” to one of these questions, this article is for you! Let me tell you the story of opendatainception.io — a list and a map of 2,700+ open data portals around the world. This quirky side project was born out of the difficulty of finding data portals on the internet.
The initial pain point
In November 2015, my colleague Nicolas Terpolilli and I grew tired of seeing that one question over and over:
“Where can I find clean and usable data?”
For a few days, we joked about making a list that would gather every open data portal around the world. No less! We knew of several similar projects but neither were super comprehensive or user-friendly as we’d liked. OpenGeocode (now disused) only listed US portals. Dataportals.org made it impossible for the user to sort its list by country.
We decided that our list of every open data portal around the world should be a list that every data aficionado could bookmark and go to when in need. It should always be up to date. And it should be darn easy to use.
We all had gathered several lists of data portal URLs over the years. After pitching our resources together and scraping websites that needed to be scrapped, we had a vast amount of open data portals — far more that what was listed on other websites!
We had the raw material, but we couldn’t use it as such. When collecting data from multiple sources, you seldom have a clean resource at first: there are typos, missing coordinates, duplicates, and various typologies.
We poured ourselves a cup of coffee and got to work. All we needed was a lot of elbow grease to turn these hundreds of links into something usable for the average internet user.
How we built opendatainception.io
The first thing we wanted to build was a list, in alphabetical order, of all portals.
We started by cleaning the data we had gathered. We limited ourselves to the name of the portal, the organization, a link, and a location. We hit our first roadblock thanks to our lack of geopolitical knowledge. Should we put England, Wales and Northern Ireland in different rows or include them in the United Kingdom? What about the Isle of Man, which is a self-governing British Crown Dependency? We used the United Nations list of sovereign states as a go-to resource for the job.
Our second task was to fill in missing geographic coordinates. It took us a couple of days of copying and pasting coordinates for every record. Why go to such lengths with geographic data? The main idea was to have a list of all open data portals classified by country that people could browse through and bookmark. And we were already planning on pushing this data as a map, too!
Once the V0.1 of our dataset was good enough, Nicolas ran it through a Ruby script. The output was a pre-formatted HTML file that I could paste on the OpenDataSoft website. But every time we updated the dataset, Nicolas needed to run the script and I needed to paste it on our website. So we quickly decided to use the OpenDataSoft open source widgets library. This is a library of web components that can be used to build rich and interactive pages of data visualization, live from data available on a remote API. We uploaded our dataset on our open data solution and used the API automatically generated as the main source for our widgets.
From now on, every user loading the page would see an always up-to-date version of the list thanks to the widgets fetching the latest version of the dataset through the API.
As I said earlier, we also wanted to build a map displaying all these portals around the world. This is how opendatainception.io came into existence. And we used the same widgets library to build it. Opendatainception.io would focus on different users — people more visual that would look for portals by zooming in and out, pacing up and down the map.
Of course, we’ve kept adding open data portals to the dataset. It now gathers more than 2,700 portals.
Who used it?
During the first year, people shared the list and the map every single day on social networks. As of today, these two resources were shared more than 4,000 times with a strong emphasis on Twitter and LinkedIn. We can deduce from these two social networks that the resources especially resonated with non-technical users. The list of 2,700+ open data portals around the world is still driving a lot of our traffic. More than 80,000 unique visitors have visited the map and the list since their creation.
Over the course of 2017, we've been monitoring the API usage of the data set upon which the map and the list are built. We were wondering if the usage would follow the distribution of data portals around the world.
Regarding the number of portals listed, the Top 5 are:
There’s no surprise here. The United States has had a significant head start in the open data game. They have portals ranging from the citywide to those of intergovernmental organizations, such as the UN.
When we analyzed the data of the API usage, we found four out of the five countries listed in the Top 5 mentioned above — France, the United States, Spain and Canada.
But one of the best surprise we had was that opendatainception.io is used all over the world!
Each numbered cluster represents a precise location where several API calls were made (by searching for a country, or zooming in the map for instance). And the API calls distribution ranges from Cameroon to China, and even French Polynesia.
Data is not enough — focus on usage
If I had one key takeaway for this project, it’d be: data is not enough, focus on usage.
We could have gathered these thousands of open data portals and shared them as a repository on GitHub. Techies would have been happy. We could have moved on to something else. But when you want to reach non-technical users, this is not enough.
The average Joe doesn't know what to do with a CSV. They want resources they can wrap their head around in an instant. They want basic features such as searching for a country or select an area on a map to refine their search.
At the same time, sharing a list or map is not enough. What if someone with technical skills wants to access the dataset to build a better product on top of it? They need raw CSV or an API. That’s why we provided all of these options. We adapted to the demand.
What matters, in the end, is to give every potential user the possibility to embrace what you share with them regardless of the time it’ll cost you to do it properly.
Now, you can check the list of 2,700+ open data portals around the world on opendatainception.io. Your portal is missing? Send word to Nicolas Terpolilli or the author of this article on Twitter. We’re pretty responsive there!