Data Preprocessing for Predictive Models

Greetings, I am Connor Moen! I’m a rising sophomore at Northwestern University studying computer science and environmental engineering. This summer I am working under Dr. Stefan Wild at Argonne National Laboratory, where I am assisting him with developing accurate flood prediction models for the City of Chicago. The goal of these models is to analyze weather conditions and soil moisture on a block-by-block basis (or, for the time being, where sensors are installed) and then determine if flooding will occur. This knowledge can be used to notify homeowners in flood-prone regions to prepare for flooding, thereby minimizing property damage and disruption after heavy storms.
I have spent much of the summer collecting vast amounts of data from the Chicago Data Portal and UChicago’s Thoreau Sensor Network, preprocessing it using the AWK programming language, and working to visualize it in MATLAB. Below is a MATLAB plot showing the Volumetric Water Content for all sensors in the Thoreau Network over the past few months.

The future of the project will involve qualitatively describing the trends we see in our data (for example, might the uncharacteristic behavior seen in a number of sensors after mid-June be caused by an outside factor such as sprinklers?), and then writing, testing, and refining the predictive models. Personally, I am most excited to dive into these predictive models; I am fascinated by the idea of combining environmental sensing with machine learning in order to directly help those living in my neighboring city.