Friday, 18 April 2014

Counting words - Clustering–Find Related post–part1

assume in our website we already have few posts. we are going to find a most relevant post for the current viewing post “Imaging database”

the Below table shows existing posts in the websites

Si.No

Post Content
1

This is a toy post about machine learning. Actually, it contains
not much interesting stuff.

2 Imaging databases can get huge.
3 Most imaging databases safe images permanently.
4 Imaging databases store images.
5

Imaging databases store images. Imaging databases store
images. Imaging databases store images.

In the book, posts are opened using following script

Python_Script_2014-04-30_15-17-52

Script1 – reading file and printing output

   1: import os
   2: import sys
   3: import scipy as sp
   4: from sklearn.feature_extraction.text import CountVectorizer
   5: data_Dir = "E:\\Machine Learning\\Orange\\Finding Related Post\\toy\\"
   6: posts = [open(os.path.join(data_Dir,f)).read() for f in os.listdir(data_Dir)]
   7: print posts

Result1:


"Running script:


['This is a toy post about machine learning. Actually, it contains not much interesting stuff.', 'Imaging databases provide storage capabilities.', 'Most imaging databases safe images permanently.', 'Imaging databases store data.', 'Imaging databases store data. Imaging databases store data. Imaging databases store data.']

 

Implementation of script1 in orange


we will modify the Bag of words file.

2014-04-30_15-34-00

Text_Files_2014-04-30_15-35-36

1. Replace the Textable\TextFeild widget to Textable\TextFile widget

2. Click advanced settings,

3. Click browse button to open the file

4. select all the files and click open.

5. you can see the result by connecting, Textable\Disply to Textable\Lowercase widget.

 

 

Learning from this post


- opening the directory path and file in python


Reference


http://langtech.ch/forum/textable/viewtopic.php?f=4&t=4


http://moodle2.unil.ch/course/view.php?id=574


https://orange-textable.readthedocs.org/en/latest/


http://orange.biolab.si/forum/viewtopic.php?f=4&t=1949#p5655


how to work with directories http://www.diveintopython.net/file_handling/os_module.html

No comments:

Post a Comment