Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore, or deprecate the confidence for very common words like "beta" #1

Open
dannguyen opened this issue Jul 26, 2020 · 3 comments
Open
Labels
Non-Ideal Fix Implemented More ideal fixes discussed within issue, but non-optimal solution has been implemented for now

Comments

@dannguyen
Copy link

The bot seems to always have a 100% confidence rate for "Beta", even though it's a common non-card word for thread titles, e.g. Playing since beta, 480 hours logged, I have now beat the heart on A20 with all four characters, when is the 5th coming?. Not sure how confidence is calculated right now, and I assume it'll be too much work right now to train it (e.g. measure confidence against how likely a card-name is used in a non-card-sense by bulk downloading/analyzing r/slaythespire data), but putting "Beta" on the ignore list for now seems reasonable. Especially since it seems very rare in practice that anyone will start a thread talking about the "Beta" card, given its actual mechanics.

(Collect might be another common false positive, or at least in the case when it's uncapitalized)

@TrippW
Copy link
Owner

TrippW commented Jul 30, 2020

At current time, the bot's confidence is solely based on string sameness. Suggestions include: looking at surrounding words to try and determine that beta is a noun rather than an adjective (Best), full ignore beta (Worst), ignore beta if certain words are near beta (eg: art, since, etc) (Moderate)

@TrippW
Copy link
Owner

TrippW commented Aug 10, 2020

Added Beta to the ignore list for now. Will come up with an improved, NLP solution later in time.

@TrippW TrippW added the Non-Ideal Fix Implemented More ideal fixes discussed within issue, but non-optimal solution has been implemented for now label Aug 19, 2020
@TrippW
Copy link
Owner

TrippW commented Aug 19, 2020

It's not perfect, but I've expanded the ability of the ignore system, which allows me to choose a few phrases to ignore.

I've added things like "since beta, the beta, beta art, beta card art" to hopefully improve precision around the specific "beta" card. I'll continue to update the card.ignore file if precision drifts too far. While it's not ideal (a full NLP system would be) it's really all I can do until I become more knowledgeable on the topic or someone else swoops in. I'm going to leave the issue open with the above tag until I can get a better system in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Non-Ideal Fix Implemented More ideal fixes discussed within issue, but non-optimal solution has been implemented for now
Projects
None yet
Development

No branches or pull requests

2 participants