Open
Description
The original Hunspell had two important utilities:
# print all forms for all words whose roots are given in `roots.dic`
# and make use of affix rules defined in `affixes.aff`:
unmunch roots.dic affixes.aff
# print the forms of ONE given word (a single root with no affix rule)
# which are allowed by the reference dictionary defined by the pair of
# `roots.dic` and `affixes.aff`:
wordforms affixes.aff roots.dic word
How to achieve this in spylls.hunspell
?
I use Hunspell to generate Scrabble dictionaries, and I am looking into replacing it with spylls.hunspell
.
Metadata
Metadata
Assignees
Labels
No labels
Activity
zverok commentedon Apr 28, 2022
There is an examples/unmunch.py
The comments there explain its limitations.
I haven't had resource to work on something more robust, unfortunately 🤷
exander77 commentedon Apr 28, 2022
Works superbly compared to running wordforms over all roots (took me three days), unmuch is not supported for a while. This took like 30 seconds. But I have some differences. I am missing 2851 words and I have 319493 new words.
exander77 commentedon Apr 28, 2022
Running Czech hunspell: http://www.translatoblog.cz/wp-content/uploads/2021/03/hunspell_cs.zip
exander77 commentedon Apr 28, 2022
All words missed by Spylls:
cs_CZ.txt.missing.txt
This most likely means that they will not be assumed as correct during spellchecking.
exander77 commentedon Apr 29, 2022
The new words created by spylls seems to be deficiency in original wordforms.
exander77 commentedon Apr 29, 2022
Basically I see missing words of two kinds.
The ones with prefix
nej
(basically same as suffixest
in english: rychlejší => nejrychlejší, fast => fastest). But a lot of words withnej
are present, so some combination of properties?The other ones are some foreign surnames forms.
Spylls:
Hunspell:
The surnames are maybe correct with Spylls, but wrong in Hunspell? But the
nej
prefix is definitely some bug in Spylls.exander77 commentedon Apr 29, 2022
Running:
Produces:
So this looks more like an unmnuch bug and not a general Spylls bug.
exander77 commentedon Apr 29, 2022
vs
Seniorní
is missingnej
variants compared torychlý
.exander77 commentedon Apr 29, 2022
Found obviously missing code: #23
Suffix crossproduct is not analysed for prefixes. Btw, maybe secondary suffix crossproduct needs to be analysed as well?
exander77 commentedon Apr 29, 2022
Btw, I am not even sure if the code in unmuch is right approach, sound't it be recursive check?
After each prefix or suffix is added, check if new prefixes or suffixes cannot be added on top of that?
I can image
word
where when you add prefixprefword
, then you can now add suffixprefwordsuf
even thoughwordsuf
would not be valid. And then you can add another prefixpref2prefwordsuf
even thoughpref2prefword
would not be valid? And so on?zverok commentedon Apr 30, 2022
@exander77
unmunch.py
is a quick hack I did while discussing a similar question in #10, I don't consider it feature-complete (that's why it isexamples/
, just shows the direction in which one should go to usespylls
to produce word list).ATM I, unfortunately, don't have much resource to discuss/debug it (I am in Kharkiv, Ukraine, splitting my days between volunteering, my dayjob, and doomscrolling).
I'm thankful for your PR and I'll merge it if it works for you :)
In case you are willing to work on improving
unmunch.py
, it might make sense to "promote" it to a real feature with code inspylls/hunspell/
, script inbin/
and maybe some tests to make sure it works (and improve it when it doesn't), WDYT?exander77 commentedon Apr 30, 2022
With that PR unmuch pretty much works for
cs_CZ
and I support turning it into the real feature and offer my help with improving it. It definitely works better than Hunspell'swordforms
andunmunch
even now. Putting it into bin and adding tests etc. sounds reasonable.I don't want to get political on Github, but I am sending my:
from the Czech Republic. We had Soviet occupation here in 1968... I hope Czech Republic and whole European Union and NATO by extension will send enough support including weapons, so Ukraine can put Russia in its place. I think the hearts and minds of most Czech people are with Ukraine.