Sunday, 8 June 2014

Finding norm

Finding norm in mathematics is nothing but a finding distance

Thursday, 5 June 2014

Stemming defined inside class in python


in the book chapter 3, following code is given for implementing stemming in class and functions.
   1: >>> import nltk.stem

   2: >>> english_stemmer = nltk.stem.SnowballStemmer('english')

   3: >>> class StemmedCountVectorizer(CountVectorizer):

   4: ... def build_analyzer(self):

   5: ... analyzer = super(StemmedCountVectorizer, self).build_analyzer()


   7: ... return lambda doc: (english_stemmer.stem(w) for w in analyzer(doc))


   9: >>> vectorizer = StemmedCountVectorizer(min_df=1, stop_words='english')


First of all I want to understand, how class can be defined in the program.

How to define class?


http://freepythontips.wordpress.com/2013/08/07/the-self-variable-in-python-explained/

What is super function?


http://learnpythonthehardway.org/book/ex44.html

How to use super function?


http://orangeml.blogspot.in/2014/06/use-of-super-in-built-python-function.html

What is Lambda?


Lambda is used to create anonymous function which is used just once in the application. In the above example an anonymous function is executed only once when that class is executed.

probably to run an expression in the return statement, lambda function is used.

http://pythonconquerstheuniverse.wordpress.com/2011/08/29/lambda_tutorial/ 

How return statement executes?


Return statement in the above example creates "lambda doc" an anonymous function. This anonymous function calls two functions
1. 
english_stemmer()

2.
analyzer()

analyzer function vectorizing the contents in the doc variable, for iterates each values of the analyzer() and then passed to the english_stemmer to stem.

Example for for loop.http://orangeml.blogspot.in/2014/07/using-for-loop-in-python-for-lists-as.html

https://wiki.python.org/moin/ForLoop


Further explanation


This particular example confused me initially and I could not understand. So I wrote to the author's official website. He has given me another example, which is very clear and easy to understand

Code

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
analyzer = cv.build_analyzer()

import nltk.stem
english_stemmer = nltk.stem.SnowballStemmer('english')

doc = "A cook cooks delicious meals"
print([english_stemmer.stem(w) for w in analyzer(doc)])

print (analyzer(doc))

Output

[u'cook', u'cook', u'delici', u'meal']
[u'cook', u'cooks', u'delicious', u'meals']

here is the link to my discussion with the author http://www.twotoreal.com/q/156/return-lambda-function-in-ntlks-stemmer-chapter-3-page-59