assume in our website we already have few posts. we are going to find a most relevant post for the current viewing post “Imaging database”
the Below table shows existing posts in the websites
Si.No | Post Content |
1 | This is a toy post about machine learning. Actually, it contains |
2 | Imaging databases can get huge. |
3 | Most imaging databases safe images permanently. |
4 | Imaging databases store images. |
5 | Imaging databases store images. Imaging databases store |
In the book, posts are opened using following script
Script1 – reading file and printing output
1: import os
2: import sys
3: import scipy as sp
4: from sklearn.feature_extraction.text import CountVectorizer
5: data_Dir = "E:\\Machine Learning\\Orange\\Finding Related Post\\toy\\"
6: posts = [open(os.path.join(data_Dir,f)).read() for f in os.listdir(data_Dir)]
7: print posts
Result1:
"Running script:
['This is a toy post about machine learning. Actually, it contains not much interesting stuff.', 'Imaging databases provide storage capabilities.', 'Most imaging databases safe images permanently.', 'Imaging databases store data.', 'Imaging databases store data. Imaging databases store data. Imaging databases store data.']
Implementation of script1 in orange
we will modify the Bag of words file.
1. Replace the Textable\TextFeild widget to Textable\TextFile widget
2. Click advanced settings,
3. Click browse button to open the file
4. select all the files and click open.
5. you can see the result by connecting, Textable\Disply to Textable\Lowercase widget.
Learning from this post
- opening the directory path and file in python
Reference
http://langtech.ch/forum/textable/viewtopic.php?f=4&t=4
http://moodle2.unil.ch/course/view.php?id=574
https://orange-textable.readthedocs.org/en/latest/
http://orange.biolab.si/forum/viewtopic.php?f=4&t=1949#p5655
how to work with directories http://www.diveintopython.net/file_handling/os_module.html
No comments:
Post a Comment