Pump It Up: Data Mining the Water Table

Table of Contents

Introduction

This is a comprehensive data science project, encompassing E.D.A, machine learning models and data visualization. This project is based on the DrivenData competition.

Method

Myself and a colleage worked together on this project. We used a variety of preprocessing and ML techniques to determine the best method for solving this problem. The objective was to predict the condition of water pumps in Tanzania using a range of different features. Our best performing model was a VotingClassifier, which ensembled a BaggingClassifier, a XGBoost, a HistGradientBoost and a CatBoost model. This model produced an 81% accuracy on the test set.

To read more into our process, please view the Submission Notebook in the GitHub repository.

Key Technologies

  • Python
  • Pandas
  • Weights and Biases