Diamond Price Prediction (Regression)

Overview

An applied regression project on the classic diamonds dataset. We compare multiple models (linear/polynomial regression, random forest, XGBoost), evaluate out-of-sample error, and package a minimal Flask GUI for experimentation.

Goals

  • Build and compare several regression baselines and tree-based methods.
  • Explain key drivers of price via features and model diagnostics.
  • Provide a demo UI and export trained artifacts for reproducibility.

Data

  • Dataset: /projects/diamond-price/data/diamonds.csv

Methods

  • Benchmarks: Linear/Polynomial regression, Random Forest, XGBoost.
  • Evaluation: train/validation split, MAE/RMSE, error analysis by feature bins.
  • Packaging: simple Flask app with pre-fitted artifacts (scaler + models).

Deliverables

  • Report: /projects/diamond-price/reports/report.pdf
  • Notebook: /projects/diamond-price/notebooks/diamond-price.ipynb
  • Demo: /projects/diamond-price/demo/demo.mp4
  • Flask code: /projects/diamond-price/code/app.py
  • Artifacts: /projects/diamond-price/artifacts/

Outcomes

  • Tree ensembles (especially XGBoost) outperform linear baselines on MAE/RMSE while maintaining reasonable inference latency.
  • Feature effects align with domain intuition (cut, color, clarity, carat). The UI demonstrates interactive predictions.