Knowledge Base SageMaker: Analysis and model training on the Iris dataset
A look at training K-Nearest Neighbor models using a Jupyter notebook running on a SageMaker notebook instance.
The Iris dataset is one of the Hello World datasets of machine learning (along with the Titanic and MNIST datasets, if you were curious).
The dataset contains 150 observations of the Iris flower; each observation provides four attributes and the species. Having the species present makes this labelled data which means we can train a model using a supervised ML algorithm. The model can then be used to predict the species for new observations.
In this walk through, I train two K-Nearest Neighbor models using a Jupyter notebook running on a SageMaker notebook instance. Before the training, let’s dive into some simple analysis to find out a bit more about the dataset itself.