Knowledge Base SageMaker: Analysis and model training on the Iris dataset

Dan Cooper AWS

A look at training K-Nearest Neighbor models using a Jupyter notebook running on a SageMaker notebook instance.

The Iris dataset is one of the Hello World datasets of machine learning (along with the Titanic and MNIST datasets, if you were curious).

The dataset contains 150 observations of the Iris flower; each observation provides four attributes and the species. Having the species present makes this labelled data which means we can train a model using a supervised ML algorithm. The model can then be used to predict the species for new observations.

Iris Setosa, or is it? | Source:

In this walk through, I train two K-Nearest Neighbor models using a Jupyter notebook running on a SageMaker notebook instance. Before the training, let’s dive into some simple analysis to find out a bit more about the dataset itself.


Download notebook

Dan Cooper: Allies AWS Principal Consultant

Dan Cooper

CEO and Principal Consultant, Allies Computing Professional Services

Dan has a proven track record in helping customers leverage their data and technology to gain a commercial edge. During his 20-year career, Dan has worked in roles covering solution design, hands-on development, product management and IT strategy within both SME and enterprise orgs. He is an AWS Certified Solution Architect, a Certified Scrum Master and qualified in business analysis, PRINCE2 and ITIL 4.

Receive more great content just like this

Our email newsletter goes out once a month and we will only use your details to send you links to products and articles we think you might be interested in.

You may also like

Live Start Loading NowChatting Offline

Cookie consent

We use some essential cookies to make this website work. We'd like to set additional cookies to help us measure your experience when you view and interact with the website.

Cookie policy