Shallow Module

Module for shallow preprocessing techniques: lowercase, contractions expansion, urls replacement, abbreviations substiturion, POS tagging, etc.

This module inherit the normalization techniques and implement POS, lemmatization and stemming.

preprocess.shallow.lemmatization(text, lang='en', input_type='raw_value')[source]

Lemmatize words based on WordNet corpus.

preprocess.shallow.remove_stopwords(text, lang='en', stops_path='', ignore_case=True)[source]

Remove stopwords based on language.

Software

Based on Normalizr package remove_stop_words.

preprocess.shallow.stemming(text, lang='en')[source]

Stem words based in Snowball algorithm.