How I built FOIAshare December 12, 2011

December 2012 Update

FOIAshare has been discontinued since I do not have the time to update the datasets and maintain the service.

The source code can be found on GitHub.

Please use the City of Chicago Data Portal to find FOIA requests.


This all started back in June was when I read that Gabe Klein would be the new commissioner for the Chicago Department of Transportation. I've been a year round bicyclist in the city for a few years now, and it was great to read about all the improvements that were planned for the city. That lead me to finally setup my Twitter account to follow city leadership and fellow developers. Shortly after, I heard about the Apps for Metro Chicago, Illinois competition. The commitment to open data and the potential for innovation for the city got me excited about the competition and I wanted to participate. I went to hacksalons, Open Government Meetups, and got to interact with a lot of smart people. At the time, I wasn't able to put together an entry for the transportation or community round, but at the start of November, I made the time and a commitment to build an app for the Grand Challenge Round.

I started off by investigating the City of Chicago Data Portal, mapping, visualization, and data mining tools. I found the FOIA request logs interesting since they were regularly updated, had a substantial amount of records, and were uniquely made available in bulk by Chicago. I initially wanted to build a scraper to fetch the data, but Christopher Groskopf already had a great FOIA Firehose ScraperWiki scraper built to pull in all the request logs from the data portal, but it was broken at the time. I fixed the errors, made a few adjustments, and started refining the data. Eventually, I wanted bigger changes to the existing scraper, but I didn't want to completely change it from its original intent. So, I created a local scraper using the ScraperWiki development tools and a SQLite database. I got the scraper to fetch data from as many FOIA request logs as I could find. A lot of work was spent scraping and cleaning the data, but that is something that can still be improved. Currently, the data is scraped with Python, refined with Google Refine, and imported using Ruby. This can be updated to automatically fetch, refine, and import the data using Ruby. I left that task till after the deadline since I could have spent a week getting the application to import automatically and near perfectly, or spend that time to actually, you know, build the app. Luckily, I picked the latter.

Now that I had the data, it was time to start building the app. My first git commit of a default Rails 3.1 app with PostgreSQL support was on November 19th, 2011, two weeks before the competition deadline. Shortly after, I added haml, will_paginate, Twitter Bootstrap with bootstrap-sass, and friendly_id. Then, I started laying out the urls, scaffolds, and visualizations. During those two weeks, I worked the majority of my waking hours and I actually had a substantial part of the site built in a week. Every day I would lay out what I could do in the time before the deadline. I focused on what was possible, clarifying the goals, doing the most with the least amount of code, getting it to work now, and ruthlessly cutting features that could compromise delivery.

Originally, I planned on deploying to a free Heroku instance, but the database was bigger than their 5mb limit. I already had a Linode with another Rails app on it sitting around, so I decided I would be better off deploying to that machine. That process took much more time than I expected and in retrospect, I may have been better off paying for the larger shared database at Heroku, but I didn't want to pay for a service that I already had. After analyzing a month's worth of traffic for FOIAshare, I will be able to make a more informed decision about hosting.

Building FOIAshare has been an exciting journey. There are still a lot of possibilities for its future and I look forward to seeing the response the application gets through the competition. Regardless of the outcome, I know that I have learned a lot more about government and software development. Overall, I am proud that I was able to take on and deliver such an ambitious personal project.

 

comments powered by Disqus