
Loan Status Prediction
Machine learning project that predicts loan repayment status to assess credit risk, using an XGBoost-based model deployed with a real-time Streamlit application.

This project focuses on developing a predictive model to determine whether a loan will be fully paid or charged off.
Using Machine Learning techniques and a dataset from Kaggle, a model was trained to help financial institutions assess credit risk for loan applicants.
Problem to Solve
Financial institutions face the challenge of identifying which clients will repay their loans and which ones represent a high risk of default.
Objective: Build a model that predicts whether a loan will be repaid or defaulted.
Impact: Helps reduce financial risks, minimize losses, and improve credit approval decision-making.
Data Used
Source: Kaggle dataset ("Credit_train.csv")
Key Features:
- Credit Score: Applicant's credit rating.
- Annual Income: Yearly earnings.
- Monthly Debt: Monthly liabilities.
- Years of Credit History: Number of years of credit history.
- Number of open accounts, current credit balance, and maximum open credit.
- Loan purpose and home ownership status.
Methodology Applied
- Data Exploration: Statistical analysis and correlation visualization.
- Preprocessing:
- Handling missing values and removing duplicates.
- Encoding categorical variables (Label Encoding).
- Scaling numerical features with StandardScaler.
- Splitting data into training and validation sets.
- Model Training: Multiple classification models were tested.
- Best Model Selection: Comparison based on performance metrics such as Accuracy, F1 Score, and AUC-ROC.
- Model Deployment: Implementation using Streamlit Cloud for real-time interaction.
Models Tested
The following classification models were evaluated:
- XGBRFClassifier (Best model)
- Random Forest
- Logistic Regression
- Gradient Boosting
- AdaBoost
- SGDClassifier

Best Model Performance (XGBRFClassifier)
Accuracy: 82.7%
F1 Score: 0.89
AUC-ROC: 0.64
Inference Time: 32.3 ms
Most influential features in prediction:
- Credit Score
- Annual Income
- Years of Credit History
Model Deployment
Technologies used:
- Google Colab for model development and data exploration
- Streamlit for interactive UI.
- Joblib for saving and loading the trained model.
- GitHub + Streamlit Cloud for online deployment.
Key Findings
Credit Score and Income significantly impact loan approval.
XGBRFClassifier performed best due to its ability to combine Gradient Boosting and Random Forest.
Inference time is crucial, as it affects decision-making speed in financial institutions.
Conclusions & Learnings
A reliable model was built to predict loan repayment status.
Key influencing factors were identified in credit classification.
The model was optimized to balance accuracy and inference speed.
A user-friendly interactive app was deployed for real-time predictions.