Thursday, 17 April 2014

Classification of wheat seeds dataset

In the previous post we built classifier for Iris dataset, without knowing or acquiring any theoretical  knowledge on machine learning.

Sameway, in this post I am going to try complex classify seeds dataset, without attempting to learn theory.

wheat seeds data set contains 3 varieties and 7 features. seeds dataset can be obtained from http://archive.ics.uci.edu/ml/ 

Visualising data

1. drag and drop file widget from data palette

2. add scatter plot from data visualisation palette

2014-04-17_12-19-09

 

2014-04-17_12-19-23

 

Find best threshold using inbuilt method

1. double click scatter plot

2. click Vizrank under optimisation dialogs

3. Click Start evaluating procedure and see the result

2014-04-17_12-25-01

4. For better results, click Locally optimise best projection and see the result

2014-04-17_12-27-08

5. From the result it is easy to separate Rosa from Koma and canadian

6. let us find the threshold, We found threshold = 5.573

Apply threshold for Rosa seeds

1. add interactive tree builder from classify palette

2.2. Double click interactive tree builder. set split selection to Length kernel groove, cut off point to 5.573 and click split. see the report.

2014-04-17_12-34-04

Find best threshold for separating Canadian seeds and Koma

1. double click scatter plot (1)

2. click Vizrank under optimisation dialogs

3. Click Start evaluating procedure and click Locally optimise best projection and see the result

2014-04-17_12-54-22

4. From the result I have selected rank 7 combination Length of kernel groove vs area. In this, its easy to put a threshold. but the accuracy of this classification is only 89.8 percentage. threshold = 13.55 area.

Now I feel I should understand what is really happening.

No comments:

Post a Comment