This algorithm converts the given 9-digits U.S. ZIP codes (ZIP+4 codes) into its congressional districts and geographical coordinates (latitude and longitude). The Benchmark is 50,000 ZIP codes per second. Download the plugin here.
Pros & Cons
- The algorithm is using a local database mapping with 25MB file size. It will increase the application size dramatically. So it is build as an external plugin
- For first execution in the same application window, the plugin required 5 seconds to load the database. The consequent execution will not required the pre-loading phase.
- Since some 5-digits ZIP codes contain multiple districts, the 9-digits ZIP codes is required for the conversion. Warning message will be printed to notice user if the given 5-digits ZIP codes contain multiple districts
- Congressional district might be varied by each election. The database would need to be maintained and updated relatively.
This plugin only support U.S. ZIP codes. It convert 9-digits ZIP codes to their belonging congressional district. It is an external plugin since the data size is so large. The dataset is based on the year 2008 election.
Words for developers: Please do take a look at the ZIP code wiki at here to have a better understand on how U.S. ZIP+4 code system works. The first 5-digits number in ZIP code is called Uzip. The last 4-digits number in the ZIP+4 code is Post Office box number which can refer to here.
The challenge of the implementation is the design of the mapping model that used to look up congressional districts from ZIP+4 codes. To understand the metadata file (provided by GovTrack), create a mapping model with constant (O(1)) look up time and easy to managed. The implementation detail is documented in the source code.
The following will provide a high level view of the design.
- The algorithm is facilitated by the Model-View-Controller idea
- The core of this implementation. Formed by ZipCodeToDistrictMap, PostBoxToDistrictMap and DistrictRegistry.
- ZipCodeToDistrictMap hold a map of uzip to USDistrict and a map of uzip to PostBoxToDistrictMap.
- PostBoxToDistrictMap hold a map of postBox to USDistrict and a map of wildcard to USDistrict map.
- DistrictRegistry contains non-duplicated of USDistrict objects. It holds entire U.S congressional districts information.
- USDistrict contains district label and geolocation. The class is imported from edu.iu.scipolicy.model.geocode package
- View - ZipToDistrictAlgorithmFactory contains all the view setup implementation, including title, windows and options
- ZipToDistrictAlgorithm prepares the model; parses the input ZIP codes to USZIPCode objects; performs the district look up, handles exceptions and saves the result to a CSV file.
- The Look up is performed through ZipCodeToDistrictMap. If there isn't found a direct match of uzip to USDistrict, it will performed a look up through PostBoxToDistrictMap that holds by the uzip. Return USDitrict in success while throws ZipToDistrictException if no matched found
- Dependency: dist2geolocation.txt and zip4dist-prefix.txt
The output table contains all columns of the input table with three new columns (Congressional district, latitude and longitude).
Here is a four steps guide to use the plugin:
- Load your input data file that contains 9-digits U.S. ZIP codes to be geocoded.
- Select Analysis > Geospatial > Congressional District Geocoder from menu bar. A window will be pop up
- Choose place name column that represents the ZIP code field in your data file.
- Press Ok button to start the geocoding
5-digits ZIP codes with multiple congressional districts, empty entries and invalid ZIP codes that failed to be geocoded will list in warning messages on the console.
The output of this algorithm is the original input table with additional 3 columns (Congressional district column, latitude column and longitude column). ZIP codes that failed to be geocoded will have blank entries.
Our benchmark is 50,000 ZIP codes per second.
Geomap the congressional districts
- Firstly, you might want to aggregate your data based on congressional district. To do this, you can follow user hints at here.
- You are ready to plot your aggregated result to geomap. It is recommended to plot the congressional district results on a country map due to some U.S. districts are located outside of the America Continents. To geomap the congressional districts, please follow the user hints at here.
- Source Code: Link
- External Package: Link
- Home Page: Link
- GovTrack: Link
- WATCHDOG.NET: Link
- ZIP code's wiki: Link
Comments on Data Source (updated December 2013)
The data used to power this plugin was originally sourced from the GovTrack.us website. As of the 113th Congress, they no longer support or update the district to geolocation or zip code to district data. We are currently in the process of updating the data that this plugin uses, and have found some updated data here:
- US Census Congressional District data: Link
The data links zip codes to Congressional districts, but does not provide updated geocoordinates for Congressional districts. We are in the process of locating updated information related to this and will update this page accordingly.
The geocoding algorithm was authored, implemented, integrated and documented by Chin Hua Kong. Many thanks to the Sprint team for providing advices and suggestions. Many thanks to GovTrack that provides ZIP to district mapping data and district's geolocation information. Thanks to Carl Malamud and Aaron Swartz, that make the data available on WATCHDOG.NET for GovTrack.
It is interesting to work on this algorithm from zero knowledge of ZIP codes and congressional district. A lot exploring works and analysis are done during development which have caused the design and preparation period in Sprint longer than expected. There is a lot of mapping databases available for sale. However, we are lucky to found the GovTrack that provide all free data and web service for the mapping. A lot of revise and improvement were done during the development which make the plugin in better and accurate. It is fun and worth for the knowledge I gained. Now I have better ZIP code system knowledge and congressional district concept. V!