What's This We Know?

Our mission is to present the information the U.S. government collects about every community. By publishing this data in an easy to understand and consistent manner, we seek to empower citizens to act on what's known.

In this first phase of development, we focused on a handful of nationwide data sets from six different agencies in the data.gov catalog. We picked data sets that each had a spatial component. All the data sets were converted to RDF and loaded into a RDF database that serves as the foundation for this website.

Our Vision

Our long-term vision for ThisWeKnow is to model the entire data.gov catalog and make it available to the public using Semantic Web standards as a large-scale online database. ThisWeKnow will provide citizens with a single destination where they can search and browse all the information the government collects. It will also provide other application developers with a powerful standards-based API for accessing the data.

Loading governmental databases into a single, flexible data store breaks down silos of information and facilitates inferences across multiple data stores.  For example, inferences can be made by combining census demographic data from the Agency of Commerce, factory information from the Environmental Protection Agency, information about employment from the Department of Labor, and so on. We can't even begin to imagine the discoveries that will become possible after all these data are loaded into an integrated repository.

Tim Berners-Lee's describes a Semantic Web with data distributed across the Internet that is readable by people while simultaneously being available and manipulable by software agents. To that end, in addition to building our Web pages for browsing and searching these data, we will expose all the data in the catalog to computers as RDF that can be retrieved via the SPARQL Query Language.

Implementation

To manage the complexity of presenting so much information, we developed the 'factoid' approach, where queries are presented as simple sentences about a city or town. While we also envision presenting the data spatially and graphically in the future, we felt that factoids are a helpful summary of these data, and a good entry point into the data browser.

By clicking on the factoid, citizens bring up tables of data, which in turn are linked to detailed information about the entities. We envision citizens that use the application being able to generate their own factoids which could then be shared and voted on by the community.

RDF triples (like the subject, verb and object of a sentence) express the data loaded from data.gov. In RDF, particular things (people, companies or any other entity in the data.gov collection) have properties (such as "is the parent company of," "is the author of") with certain values (another person, another Web page). An advantage of storing the data.gov information using RDF is that the database and applications can readily expand as new data sources are added to the catalog, without requiring new coding or revisions to existing coding. In a relational database, the connections between information will need to be made in advance, revisions will be necessary as new databases are loaded, and the data model will become extremely large and unwieldy if thousands of databases were to be modeled in a single database.

We have developed an ontology defining the relationship between fields in the database for the databases from the data.gov catalog we've loaded. The ontology needs to grow to include all databases in the data.gov catalog. This will depict the relationship of information collected across the federal government, and it is our recommendation that the collaborative development of a data.gov ontology become a major component of future work on this initiative.

Sources of Data

  • Datasets appearing on Data.gov (sourced throughout the application)
  • GeoNames.org: geographical information
  • GovTrack.us: legislative data
  • Joshua Tauberer for an RDF version of the 2000 U.S. Census

Technologies

ThisWeKnow is written in Ruby on Rails. It communicates via SPARQL to an RDF database. The source code is available under an MIT license at github.com/btucker/thisweknow.

About the Developers

The developers of this application are a consortium of a Web application development and data analysis firm (GreenRiver.org), a Web design studio (Sway Design), and a Semantic Web database company (Intellidimension). This collaboration allowed us take on the project of transforming the current data.gov site into a semantic web application. The potential for this project is very exciting, and we hope you enjoy our phase one application.

Feedback

Have some Feedback?
Help us improve this site.

Our vision: Provide citizens with a single destination where they can explore all the information from the Data.gov catalog. Learn more about how our semantic RDF-based application works »