Handwritten Digit Classification: LR, SVM, and SVD

Overview

Course project (MAT167, Applied Linear Algebra). We compare classic classifiers (logistic regression, linear SVM) for handwritten digits and a linear-algebraic approach (SVD). Below are a brief dataset intro and the confusion matrices rendered directly for a quick skim.

Dataset

USPS digits (10 classes), 16x16 grayscale images.
Pre-split train/test; flattened to 256-dim vectors; z-score standardized.

Sample USPS digits (0–9) — Example 16x16 grayscale digits.

SVD (Main Method)

For each digit class, compute a rank-k SVD basis U_k on that class’s training images.
For a test image x, compute reconstruction error ||x − U_k U_k^T x|| for each class and predict the class with minimal error.

Accuracy vs. rank k for SVD-based classifier — SVD classifier accuracy across ranks k (teaser of the trade-off).

SVD (k=17) confusion matrix — SVD (k=17) — Confusion Matrix (Acc ≈ 0.966).

Confusion Matrices

Logistic Regression confusion matrix — Logistic Regression — Acc ≈ 0.943

Linear SVM confusion matrix — Linear SVM — Acc ≈ 0.931

Results Summary

Logistic Regression (one-vs-rest) accuracy: ~0.943
Support Vector Machine (linear kernel) accuracy: ~0.931
SVD-based classifier (k=17, reconstruction error): ~0.966

Metrics shown are from the current run; small variation is expected across seeds/hyperparameters.

Takeaways

SVM yields the best accuracy on this USPS split.
SVD offers compact, interpretable low-rank structure, trading accuracy for speed and dimensionality reduction.
Regularization, scaling, and rank choice materially affect outcomes.