Source code for preprocess.basic.hyphen
[docs]def hyphenation(text :str, collocations :list) -> str:
"""Made originally to underscored the collocations in the original text
The recursive looking for collocations allow to find important
expressions that define topic (of course there are better techniques
to do this, using Deep Learning and more complex techniques.)
Once the collocations are hyphenated these turns into single words
and are not mixed with the rest. For example, if you hypenate de
collocation: [natural,language] as "natural_language" will be more
informative in a Luhn term evaluation than just using "natural" and
"language" separately.
Parameters
----------
text: str
normalized text
collocations: tuple list
List of collocations
Return
------
text: str
same text with all collocations hyphenated with underscore char
"""
for tuple in collocations:
expression = ''
replacement = ''
for word in tuple:
expression += ' ' + word
replacement += '_' + word
expression = expression.strip()
replacement = replacement[1:]
text = text.replace(expression,replacement)
return text