Skip to content

ask for the Stemming feature  #20

Open
@redstoneleo

Description

Activity

zverok

zverok commented on Jan 30, 2022

@zverok
Owner

The possible implementation of stemming with Spylls is demonstrated in this discussion: #19 (comment)

I'll gladly accept a PR that will make it more convenient, but unfortunately don't have time to work on this myself

redstoneleo

redstoneleo commented on Oct 19, 2024

@redstoneleo
Author

As I have tested with word wrote , it is not accurate

zverok

zverok commented on Oct 19, 2024

@zverok
Owner

That just depends on the dictionary.

In the standard en-US dictionary, wrote is specified as a separate word form (like most of irregular verbs, I guess). Note that hunspell and its dictionaries are designed, first and foremost, as a spell-checking tool, not a full-fledged linguistic analysis package.

redstoneleo

redstoneleo commented on Oct 20, 2024

@redstoneleo
Author

I tested with https://github.com/cdhigh/chunspell
it worked as expected

zverok

zverok commented on Oct 20, 2024

@zverok
Owner

Please show which Hunspell dictionaries you have used with both, and what was the code you have tried.

redstoneleo

redstoneleo commented on Oct 21, 2024

@redstoneleo
Author

For chunspell, by default you have the only en_US dictionaries available. --https://github.com/cdhigh/chunspell?tab=readme-ov-file#dictionaries

from hunspell import Hunspell
hunSpell = Hunspell()
print(hunSpell.stem('wrote'))#gives ('write', 'wrote')

With spylls, I used the code you gave at the comment section #19 (comment)

from spylls.hunspell import Dictionary
# en_US dictionary is distributed with spylls
# See docs to load other dictionaries
dictionary = Dictionary.from_files('en_US')
from spylls.hunspell.algo.capitalization import Type as CapType

for form in dictionary.lookuper.affix_forms('wrote', captype=CapType.NO): 
  print(form.stem))#only gives 'wrote'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      ask for the Stemming feature · Issue #20 · zverok/spylls