Fisher’s Irises

Sep 13, 2021

Measurements of the sepals and petals for three varieties of iris were analyzed to determine whether any of these characteristics could be used to distibuish varieties. The most useful variables were The Sepal Length, Sepal Width and the Petal Width.


The Iris flower data set or Fisher’s Iris data set is a multivariate data set introduced by Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis’’. It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the variation of Iris flowers.

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres.


It was found that Setosa had the highest average Sepal width as well as the smallest average Sepal length and Petal width. Virginica was found to have the largest average Sepal Length. Versicolor fell in the middle of every average distrobution.

Graphical Summaries Versicolor has an average Sepal length is in between Setosa and Virginica. Virginica has the greatest variation with more upper quartile outlyers. Setosa has the samllest Sepal lengths.