-
Notifications
You must be signed in to change notification settings - Fork 1.6k
pattern fr
The pattern.fr module contains a fast part-of-speech tagger for French (identifies nouns, adjectives, verbs, etc. in a sentence), sentiment analysis, and tools for French verb conjugation and noun singularization & pluralization.
It can be used by itself or with other pattern modules: web | db | en | search | vector | graph.
The functions in this module take the same parameters and return the same values as their counterparts in pattern.en. Refer to the documentation there for more details.
For French nouns there is singularize()
and pluralize()
. The implementation
uses a statistical approach with 93% accuracy for singularization and
92% for pluralization.
>>> from pattern.fr import singularize, pluralize
>>>
>>> print singularize('chats')
>>> print pluralize('chat')
chat
chats
For French verbs there is conjugate()
,
lemma()
, lexeme()
and tenses()
. The lexicon for verb conjugation
contains about 1,750 common French verbs (constructed with Bob Salita's
verb conjugation rules). For unknown verbs it will fall back to regular
expressions with an accuracy of about 83%.
French verbs have more tenses than English verbs. In particular, the
plural differs for each person, and there are additional forms for
the FUTURE
tense, the IMPERATIVE
, CONDITIONAL
and SUBJUNCTIVE
mood and the PERFECTIVE
aspect:
>>> from pattern.fr import conjugate
>>> from pattern.fr import INFINITIVE, PRESENT, PAST, SG, SUBJUNCTIVE, PERFECTIVE
>>>
>>> print conjugate('suis', INFINITIVE)
>>> print conjugate('suis', PRESENT, 1, SG, mood=SUBJUNCTIVE)
>>> print conjugate('suis', PAST, 3, SG)
>>> print conjugate('suis', PAST, 3, SG, aspect=PERFECTIVE)
être
sois
était
fut
For PAST
tense + PERFECTIVE
aspect we can also use PRETERITE
(passé simple). For PAST
tense + IMPERFECTIVE
aspect we can also use IMPERFECT
(imparfait):
>>> from pattern.fr import conjugate
>>> from pattern.fr import IMPERFECT, PRETERITE
>>>
>>> print conjugate('suis', IMPERFECT, 3, SG)
>>> print conjugate('suis', PRETERITE, 3, SG)
était
fut
The conjugate()
function takes the
following optional parameters:
Tense | Person | Number | Mood | Aspect | Alias | Example |
INFINITVE | None | None | None | None | "inf" | être |
PRESENT | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sg" | je __suis__ |
PRESENT | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sg" | tu __es__ |
PRESENT | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sg" | il __est__ |
PRESENT | 1 | PL | INDICATIVE | IMPERFECTIVE | "1pl" | nous __sommes__ |
PRESENT | 2 | PL | INDICATIVE | IMPERFECTIVE | "2pl" | vous __êtes__ |
PRESENT | 3 | PL | INDICATIVE | IMPERFECTIVE | "3pl" | ils __sont__ |
PRESENT | None | None | INDICATIVE | PROGRESSIVE | "part" | étant |
PRESENT | 2 | SG | IMPERATIVE | IMPERFECTIVE | "2sg!" | sois |
PRESENT | 1 | PL | IMPERATIVE | IMPERFECTIVE | "1pl!" | soyons |
PRESENT | 2 | PL | IMPERATIVE | IMPERFECTIVE | "2pl!" | soyez |
PRESENT | 1 | SG | CONDITIONAL | IMPERFECTIVE | "1sg->" | je __serais__ |
PRESENT | 2 | SG | CONDITIONAL | IMPERFECTIVE | "2sg->" | tu __serais__ |
PRESENT | 3 | SG | CONDITIONAL | IMPERFECTIVE | "3sg->" | il __serait__ |
PRESENT | 1 | PL | CONDITIONAL | IMPERFECTIVE | "1pl->" | nous __serions__ |
PRESENT | 2 | PL | CONDITIONAL | IMPERFECTIVE | "2pl->" | vous __seriez__ |
PRESENT | 3 | PL | CONDITIONAL | IMPERFECTIVE | "3pl->" | ils __seraient__ |
PRESENT | 1 | SG | SUBJUNCTIVE | IMPERFECTIVE | "1sg?" | je __sois__ |
PRESENT | 2 | SG | SUBJUNCTIVE | IMPERFECTIVE | "2sg?" | tu __sois__ |
PRESENT | 3 | SG | SUBJUNCTIVE | IMPERFECTIVE | "3sg?" | il __soit__ |
PRESENT | 1 | PL | SUBJUNCTIVE | IMPERFECTIVE | "1pl?" | nous __soyons__ |
PRESENT | 2 | PL | SUBJUNCTIVE | IMPERFECTIVE | "2pl?" | vous __soyez__ |
PRESENT | 3 | PL | SUBJUNCTIVE | IMPERFECTIVE | "3pl?" | ils __soient__ |
PAST | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sgp" | j' __étais__ |
PAST | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sgp" | tu __étais__ |
PAST | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sgp" | il __était__ |
PAST | 1 | PL | INDICATIVE | IMPERFECTIVE | "1ppl" | nous __étions__ |
PAST | 2 | PL | INDICATIVE | IMPERFECTIVE | "2ppl" | vous __étiez__ |
PAST | 3 | PL | INDICATIVE | IMPERFECTIVE | "3ppl" | ils __étaient__ |
PAST | None | None | INDICATIVE | PROGRESSIVE | "ppart" | été |
PAST | 1 | SG | INDICATIVE | PERFECTIVE | "1sgp+" | je __fus__ |
PAST | 2 | SG | INDICATIVE | PERFECTIVE | "2sgp+" | tu __fus__ |
PAST | 3 | SG | INDICATIVE | PERFECTIVE | "3sgp+" | il __fut__ |
PAST | 1 | PL | INDICATIVE | PERFECTIVE | "1ppl+" | nous __fûmes__ |
PAST | 2 | PL | INDICATIVE | PERFECTIVE | "2ppl+" | vous __fûtes__ |
PAST | 3 | PL | INDICATIVE | PERFECTIVE | "3ppl+" | ils __furent__ |
PAST | 1 | SG | SUBJUNCTIVE | IMPERFECTIVE | "1sgp?" | je __fusse__ |
PAST | 2 | SG | SUBJUNCTIVE | IMPERFECTIVE | "2sgp?" | tu __fusses__ |
PAST | 3 | SG | SUBJUNCTIVE | IMPERFECTIVE | "3sgp?" | il __fût__ |
PAST | 1 | PL | SUBJUNCTIVE | IMPERFECTIVE | "1ppl?" | nous __fussions__ |
PAST | 2 | PL | SUBJUNCTIVE | IMPERFECTIVE | "2ppl?" | vous __fussiez__ |
PAST | 3 | PL | SUBJUNCTIVE | IMPERFECTIVE | "3ppl?" | ils __fussent__ |
FUTURE | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sgf" | je __serai__ |
FUTURE | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sgf" | tu __seras__ |
FUTURE | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sgf" | il __sera__ |
FUTURE | 1 | PL | INDICATIVE | IMPERFECTIVE | "1plf" | nous __serons__ |
FUTURE | 2 | PL | INDICATIVE | IMPERFECTIVE | "2plf" | vous __serez__ |
FUTURE | 3 | PL | INDICATIVE | IMPERFECTIVE | "3plf" | ils __seron__ |
Instead of optional parameters, a single short alias, or PARTICIPLE
or PAST+PARTICIPLE
can also be given. With no
parameters, the infinitive form of the verb is returned.
Reference: Salita, B. (2011). French Verb Conjugation Rules. Retrieved from: http://fvcr.sourceforge.net.
French adjectives inflect with an -e
, -s
or -es
suffix depending on gender.
There are many irregular cases (e.g., curieux → une fille curieuse).
You can get the base form with the predicative()
function. A statistical
approach is used with an accuracy of 95%.
>>> from pattern.fr import predicative
>>> print predicative('curieuse')
curieux
For opinion mining there is sentiment()
, which returns a (polarity
, subjectivity
)-tuple, based on a lexicon of
adjectives. Polarity is a value between -1.0
and +1.0
, subjectivity between 0.0
and 1.0
.
The accuracy is around 74% (P 0.77, R 0.73) for book reviews:
>>> from pattern.fr import sentiment
>>> print sentiment('Un livre magnifique!')
(1.0, 1.0)
For parsing there is parse()
, parsetree()
and split()
. The parse()
function annotates words in the given
string with their part-of-speech
tags (e.g.,
NN
for nouns and VB
for verbs). The parsetree()
function takes a string and
returns a tree of nested objects (Text
→ Sentence
→ Chunk
→ Word
). The split()
function takes the output of parse()
and returns a Text
. See the pattern.en
documentation (here) how to
manipulate Text
objects.
>>> from pattern.fr import parse, split
>>>
>>> s = parse(u"Le chat noir s'était assis sur le tapis.")
>>> for sentence in split(s):
>>> print sentence
Sentence('Le/DT/B-NP/O chat/NN/I-NP/O noir/JJ/I-NP/O'
"s'/PRP/B-NP/O était/VB/B-VP/O assis/VBN/I-VP/O"
'sur/IN/B-PP/B-PNP le/DT/B-NP/I-PNP tapis/NN/I-NP/I-PNP ././O/O')
The parser is based on Lefff. For words in Lefff that can have multiple part-of-speech tags, we used Lexique to find the most frequent POS-tag.
References:
Sagot, B. (2010). The Lefff, a freely available and large-coverage morphological and syntantic lexicon for French. Proceedings of LREC'10.
New, B., Pallier, C., Ferrand, L. & Matos, R. (2001). A lexical database for contemporary french: LEXIQUE. L'année Psychologique.