Machine Learning With Orange: Counting words - Clustering–Find Related post

assume in our website we already have few posts. we are going to find a most relevant post for the current viewing post “Imaging database”

the Below table shows existing posts in the websites

Si.No	Post Content
1	This is a toy post about machine learning. Actually, it contains not much interesting stuff.
2	Imaging databases can get huge.
3	Most imaging databases safe images permanently.
4	Imaging databases store images.
5	Imaging databases store images. Imaging databases store images. Imaging databases store images.

In the book, posts are opened using following script

Script1 – reading file and printing output

   1: import os

   2: import sys

   3: import scipy as sp

   4: from sklearn.feature_extraction.text import CountVectorizer

   5: data_Dir = "E:\\Machine Learning\\Orange\\Finding Related Post\\toy\\"

   6: posts = [open(os.path.join(data_Dir,f)).read() for f in os.listdir(data_Dir)]

   7: print posts

Result1:

"Running script:

['This is a toy post about machine learning. Actually, it contains not much interesting stuff.', 'Imaging databases provide storage capabilities.', 'Most imaging databases safe images permanently.', 'Imaging databases store data.', 'Imaging databases store data. Imaging databases store data. Imaging databases store data.']