National Aquarium is the state coordinator for Ocean Conservancy’s all International Coastal Cleanup (ICC) events in Maryland. They collect the information that volunteers collect on their debris data cards and publish the annual data in the Ocean Trash Index, an item-by-item, location-by-location accounting of marine debris picked up by volunteers on just one day. The major task of this project is to use this dataset to predict the possible high-risk marine debris locations in the mid-Atlantic region to better inform and target future reduction efforts.
The cleanup records are biased towards urban development and population, therefore finding the best predictive modeling approach is the major difficulty of this project. I performed data cleaning, feature engineering, exploratory analysis and multiple modeling approaches in predicting the high-risk marine debris locations. In the modeling approaches, I tried with multiple machine learning approaches including decision tree, random forest, gradient boosting, extreme gradient boosting and geostatistical regression, including Kriging interpolation.
As the final result of this project, I aggregated the geostatistical regression result to the municipalities level and created a map of the high-risk areas with Marine Debris to better inform future cleaning efforts.