Tuesday, 15 April 2014

Iris dataset classification

First let us visualize the iris dataset

1. drag and drop file widget from data palette

2. add scatter plot from data visualisation palette

3. double click file and load the iris data set.

Iris data set can be downloaded from http://orange.biolab.si/datasets.psp

Iris Data Visualistion

4. Double click scatter plot and see the data visualization by changing the x and y parameters.

Scatter plot

 

How to Change point shape?

under additional point properties, change the point shape attribute to iris

Scatter plot change shape

can I save the plot?

You can save the plot by using save graph. plots can be saved as a picture and matplotlib script.

Sp_Len_Vs_Sp_wid

 

Building the classification model

From the below plot, it seems easy to separate iris-setosa from other two iris species

Spllen_ptllen

We can write a simple logic of if petal length is less than 2.0 cm then it is iris-setosa else other two species.

How to classify based on the petal length?

1. add interactive tree builder from classify palette and rename as Iris setosa classifier

Classifier

2. Double click interactive tree builder. set split selection to petal length, cut off point to 2.0 and click split. see the report.

2014-04-16_12-54-34

We have successfully classified iris-setosa.

How to classify other two species?

We will separate iris-setosa from the data first. and will plot only the Iris-verginica and Iris-versicolor

identifying threshold to separate these two species is not easy as iris-setosa. we have to find a best method.

I found one simple inbuilt method to identify best threshold for the separation of these two species. I am not sure this will work for all the data.

separate Iris-setosa

1. Add select data widget from data palette

2014-04-16_16-39-31

 

2. Double click the select data widget, select iris under attribute, equals under operator, Iris-setosa under value and check negate

3. click add button

2014-04-16_16-39-46

inbuilt method to find best threshold

1. double click scatter plot(1)

2. click Vizrank under optimisation dialogs

2014-04-16_16-40-05

3. Click start evaluating projections button in Vizrank Dialog

4. Click Locally optimise best projections button. (see the plots when you do this action)

5. see the results, in petal width and petal length combination best projection is achieved.

2014-04-16_16-40-25

6. best projection is achieved with the default settings.

2014-04-16_16-40-38

7. Now by applying the threshold of 1.65, we can separate Iris-virginica from Iris-Versicolor

8. add interactive tree builder and set petal width and cutoff as 1.65

2014-04-16_16-56-49

Our Final Code

2014-04-16_16-56-35

If you any problem in understanding feel free to call me.

No comments:

Post a Comment