If you want to keep things simple, let K = 5 and call it a day. If you want brownie points, devise and implement a strategy for finding the best value for K.

Optimizing hyperparameters is important, but it's not the focus of this challenge.

Obviously, there are many ways to code a solution. Here's mine (discussed below) 👇

I decided to wrap my model into a class called KNN. It's instantiated by giving it

X training features in array form

y training labels

k number of neighbors to use

If you look at the __init__(...) you won't see anything fancy. It merely "memorizes" the training data so it can be used in the predict() method.

The predict() method operates on an array, X, which can be 2-D or 3-D. If X is 2-D, KNN assumes it's a single image. If X is 3-D, KNN assumes it's a collection of images organized such that the first axis of the array is represents the sample dimension. (In other words, X[i] returns a 2-D array that is the ith image.)

The predict() method uses L2 for the distance metric, implemented with the help of np.linalg.norm(). From there, it's just some fancy array manipulation to grab the top k closest training samples and get their vote for the predicted label.

Struggling to understand the NumPy array logic?

Simplify the arrays into tiny, toy examples that are easy to follow. Then step through the code line by line.

Check out my course on NumPy.

The final task is to make predictions on the test data. Since the training data has 60,000 images and the test data has 10,000 images, prediction is fairly slow and memory intensive. For that reason, I ended up making predictions on the test data in batches and then combining the results.

Results

This model scores 96.9% accuracy on the test data. Not bad!

Grouping by label, we see that the model did a great job predicting 1s, but not so much predicting 8s

Breaking the results into a confusion matrix, we see that a lot of 2s were incorrectly predicted to be 7s. That makes sense.