Mill: the unitedstates project: A Modern Approach to Open Data

Eric Mill of the Sunlight Foundation has posted A Modern Approach to Open Data, at Sunlight Foundation Blog.

Here are excerpts from the post:

Last year, a group of us who work daily with open government data — Josh Tauberer of GovTrack.us, Derek Willis at The New York Times, and myself — decided to stop each building the same basic tools over and over, and start building a foundation we could share.

We set up a small home at github.com/unitedstates [a.k.a, the unitedstates project], and kicked it off with a couple of projects to gather data on thepeople and work of Congress. Using a mix of automation and curation, they gather basic information from all over the government — THOMAS.gov, the House and Senate, the Congressional Bioguide, GPO’s FDSys, and others — that everyone needs to report, analyze, or build nearly anything to do with Congress.

Once we centralized this work and started maintaining it publicly, we began getting contributions nearly immediately. […]

This is an unusual, and occasionally chaotic, model for an open data project. github.com/unitedstates is a neutral space; GitHub’s permissions system allows many of us to share the keys, so no one person or institution controls it. What this means is that while we all benefit from each other’s work, no one is dependent or “downstream” from anyone else. It’s a shared commons in the public domain.

There are a few principles that have helped make the unitedstates project something that’s worth our time:

  • We collaborate in public. When we have questions or ideas, we bring them up and talk them out using GitHub’s issue tracker. Questions get answers very quickly, unexpected participants hop in, and (as with other Q&A systems like Stack Overflow and Quora) discussions themselves become valuable long-term artifacts. GitHub is extremely well designed for this.
  • Our congressional tools can be used in a standalone, language-agnostic way, with no required configuration. You just need a command line, and data gets placed on disk in bulk. Nothing depends on a database.
  • We started using our new data in a live product right away. Instead of waiting for something that felt “1.0”, Sunlight and GovTrack replaced their pre-existing collection infrastructure with our new tools as soon as they were functional. Because of this, we were forced to promptly fix bugs and fill gaps, and create a stable platform to iterate on. This guarantees momentum.
  • No brand names. Our organization’s name, “unitedstates”, is harder to describe to someone in an elevator, but it makes it clearer to volunteers that they’re contributing to the public domain and the common good. Repository names project authority by being clear and descriptive, rather than catchy. […]

For more details, please see the complete post.

HT @derekwillis

This entry was posted in Applications, Data sets, Others' scholarly or sophisticated blogposts, Software, Technology developments, Technology tools and tagged , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s