In the previous post we built classifier for Iris dataset, without knowing or acquiring any theoretical knowledge on machine learning.
Sameway, in this post I am going to try complex classify seeds dataset, without attempting to learn theory.
wheat seeds data set contains 3 varieties and 7 features. seeds dataset can be obtained from http://archive.ics.uci.edu/ml/
Visualising data
1. drag and drop file widget from data palette
2. add scatter plot from data visualisation palette
Find best threshold using inbuilt method
1. double click scatter plot
2. click Vizrank under optimisation dialogs
3. Click Start evaluating procedure and see the result
4. For better results, click Locally optimise best projection and see the result
5. From the result it is easy to separate Rosa from Koma and canadian
6. let us find the threshold, We found threshold = 5.573
Apply threshold for Rosa seeds
1. add interactive tree builder from classify palette
2.2. Double click interactive tree builder. set split selection to Length kernel groove, cut off point to 5.573 and click split. see the report.
Find best threshold for separating Canadian seeds and Koma
1. double click scatter plot (1)
2. click Vizrank under optimisation dialogs
3. Click Start evaluating procedure and click Locally optimise best projection and see the result
4. From the result I have selected rank 7 combination Length of kernel groove vs area. In this, its easy to put a threshold. but the accuracy of this classification is only 89.8 percentage. threshold = 13.55 area.
Now I feel I should understand what is really happening.
No comments:
Post a Comment