Predicting Household

Lead Inspection Result

in Minneapolis













Predicting Household Lead Inspection Result

Using Multiple Machine Learning Models

Minneapolis is facing a public health crisis as children are exposed to lead products through the city’s aging housing stock. Although lead-based household products were banned in 1978, two-thirds of homes in Minneapolis built prior to the ban may still contain lead. The City of Minneapolis is beginning proactive outreach to residents through its Lead Hazard Reduction Grant program. This analysis uses a data-driven approach to help proactively identify which residences are likely to have hazardous lead paint.

The initial home inspections performed as part of this program serve as the base data for our analysis. The inspection includes just over 1,500 observations, of which roughly two-thirds had lead present. To structure our modeling approach, this analysis limited the data to just those home built before 1978, and hypothesized that investment in a home since then would decrease the likelihood of the lead presence.

A range of independent variables were pulled from open data to compare how they relate to the presence or absence of lead in a home. To structure our approach, we bucketed the independent variable data into the parcel level of an individual home, the block group level, and the census tract level, then looked at all available parcel data - including variables such as square footage, building type, and home price, demographic data, and historical demographic data, and conducted a series of statistical tests to see the correlation between them.

Using different feature selection techniques, including Stepwise, Boruta, and XGBoost, we built five predictive models to estimate the probability of lead in any given home, and finally selected Stepwise Selection model which had an accuracy of 82%, indicating the result from the final model we developed is correct in 82% of the cases.

There are three highlights of this project. This analysis served as a proof of concept that a classification model could be used to predict the presence or absence of lead for every household location across a city. Second, we added a cost benefit analysis to give practical advice to the lead inspection activities. Last but not least, published a web application to display the result for lead inspectors. The final result of this study has been accepted by the Minneapolis city government and applied to their lead inspection activities.

Teammate
Evan Cernea, Maureen McQuilkin
Categories
Predictive Modeling
Date
04/25/2018