Handwritten Digit Classification: LR, SVM, and SVD

Overview

Course project (MAT167, Applied Linear Algebra). We compare classic classifiers (logistic regression, linear SVM) for handwritten digits and a linear-algebraic approach (SVD). Below are a brief dataset intro and the confusion matrices rendered directly for a quick skim.

Dataset

  • USPS digits (10 classes), 16x16 grayscale images.
  • Pre-split train/test; flattened to 256-dim vectors; z-score standardized.
Sample USPS digits (0–9)
Example 16x16 grayscale digits.

SVD (Main Method)

  • For each digit class, compute a rank-k SVD basis U_k on that class’s training images.
  • For a test image x, compute reconstruction error ||x − U_k U_k^T x|| for each class and predict the class with minimal error.
Accuracy vs. rank k for SVD-based classifier
SVD classifier accuracy across ranks k (teaser of the trade-off).
SVD (k=17) confusion matrix
SVD (k=17) — Confusion Matrix (Acc ≈ 0.966).

Confusion Matrices

Logistic Regression confusion matrix
Logistic Regression — Acc ≈ 0.943
Linear SVM confusion matrix
Linear SVM — Acc ≈ 0.931

Results Summary

  • Logistic Regression (one-vs-rest) accuracy: ~0.943
  • Support Vector Machine (linear kernel) accuracy: ~0.931
  • SVD-based classifier (k=17, reconstruction error): ~0.966

Metrics shown are from the current run; small variation is expected across seeds/hyperparameters.

Takeaways

  • SVM yields the best accuracy on this USPS split.
  • SVD offers compact, interpretable low-rank structure, trading accuracy for speed and dimensionality reduction.
  • Regularization, scaling, and rank choice materially affect outcomes.