Project

General

Profile

Data Collection and Preprocessing Pipeline

Added by Egor Popenov about 3 hours ago

Hello Everyone,
Before we start training any ML models, we need to define our data pipeline.
I suggest we focus on the following:
Data Cleaning: Handling missing values in our MIS database.
Feature Engineering: Which variables will have the most impact on our model performance?
Scaling: Using StandardScalar or MinMaxScaler for our inputs.
I’ve found a great GitHub resource for ML pipelines that we can adapt: https://github.com
Let’s discuss how we want to handle the ETL (Extract, Transform, Load) process.