Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outdoor legacy classifications and generic classification API #103

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

TomHall2020
Copy link
Contributor

@TomHall2020 TomHall2020 commented Dec 9, 2024

First steps towards implementing the pre-2023 AGB outdoor classifications into archery utils.
Generally I have started by just trying to do the same style of implementation as the other 5 modules in the classifications package. I went for an explict style of declaring the categories (bowstyle/gender/age) and requirements rather than a looped implict method as in the new style, as that fits better with how its done for the old indoor and field requirements, and also it was easier to construct from scratch and visualise what they are.

I've not dug up a definitieve reference to the old rules of shooting yet but from memory and a bit of pouring over some old classification tables I've worked on the following basis:

  • Minimum 12 dozen rounds at max distance for GMB/MB (and JMB? much less familiar with the old requirements for juniors)
  • Max distance required for Bowman
  • Step down 10m/y for each of 1st, 2nd and 3rd class.
  • No minimum arrows for bowman or below?

Checklist

  • Made required modules and wired in imports
  • Convert newer age categories to their equivalents.
  • Add checks for distance and 12 dozen rounds
  • Fill in placeholder handicap values
  • Add parametrized tests based on actual known scores

Plan is to get this part at least mostly working, and then get to work on the generic API for classifications, but we can begin the discussions on that while this is being done. There are also a few minor inconsistencies and tweaks to improve the existing classifications code that I'd like to work on while I've got the whole lot in the operating theatre so to speak...

  • multiple nested for loops over age/bowstyle/gender in new system data generation -> replace with itertools.product, will reduce indentation and make easier to read without any changes to the underlying code
  • agb_old_field returns "unclassified" for scores below lowest threshold, where all other modules return "UC"
  • mismatched specific long/short forms for age categories that are accepted 'adult', 'under 18' but '50+'? would make more sense to take over 50, but this also points to a deeper issue I'll get to later
  • exposed functions/objects old_agb_field_classification_scores and old_agb_field_classifications are inconsistent in anming convention with the rest of the package, suggest chaning instead to agb_old_field_classification_scores etc. (this really got me for a while where I was trying the same parameters in different function by just editing the function name)
  • importing classifications module as class_funcs in test suite when in source code the very similar name cls_funcs is used to import classification_utils - suggest changing this to cf similar to importing archeryutils.handicaps as hc
  • couple of documentation tweaks where the wrong references are made to indoors/outdoors

More generally beyond this it comes to looking at the generic api, which I think hope will be quite simple to implement along similar lines to how we did in #78, with generic functions along the lines of:

from archeryutils import classifications as cf
cf.calculate_classification(score, roundname, bowstyle, gender, age, hc_sys="AGB", discipline="indoor")

That then just dispatches to the appropriate module. Currently each module publicly exposes its calculate_xx_classification method and a xx_classification_scores method which is required to power the calculation, though the return type with its fill value of -9999 isn't super friendly to work with. Both of these are also moved up into the overall classifications package namespace along with the module itself and I would plan to leave them there for backwards compatibility.

Theres also a dictionary of the same name as the module which is essentially the compiled raw data needed by the functions, which we take great pains to make sure can never be recreated by deleting the private function that generatea it, something I can't quite work out a reason for. It's definitely bit me before earlier on when I first tried to work on the module and was trying to understand the code. I remember using some of the other private functions in the new outdoor classification module which mutate the underlying data in place and then being unable to regenerate the original data.

A side effect of that is that the namespace is really verbose when trying to get around in an interactive shell or do imports, especailly as tab completion will happily jump you into the modules quite easily where you have to repeat yourself over again to find the function you wanted. Even though they are promoted up to the package namespace I find myself accidentally doing this surprisingly often, and also getting confused by the module and then the data dictionary within that module having the same name.

from archeryutils.classifications.agb_out[tab-completed: door_classificaitons] import calculate_agb_outdoor_classification

I propose while adding the generic access function, we could simplify the names of the functions within each module, and just re-alias them to their current form when importing them up to the package. Such that each module would contain a make_classification_dict/data/requirements function, a calculate_classification and a classification_scores (or score_thresholds function. This would also make life a bit easier when doing the implementation of the generic functions, as we could just match the correct module with the user provided system and discipline, and then call that modules identically named functions to do the calculations.

At any rate thats a decent brain dump of where my thinking is.

Hooked up to imports, maintianing same api, built required dictionaries and functions. Placeholder data used for handicap thresholds. Checks for minimum distance/number of arrows not yet applied.
Copy link

codecov bot commented Dec 9, 2024

Codecov Report

Attention: Patch coverage is 63.01370% with 27 lines in your changes missing coverage. Please review.

Project coverage is 96.41%. Comparing base (99fe0a9) to head (a45254d).

Files with missing lines Patch % Lines
...classifications/agb_old_outdoor_classifications.py 55.73% 27 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
- Coverage   97.81%   96.41%   -1.40%     
==========================================
  Files          30       32       +2     
  Lines        1742     1815      +73     
==========================================
+ Hits         1704     1750      +46     
- Misses         38       65      +27     
Files with missing lines Coverage Δ
archeryutils/classifications/__init__.py 100.00% <100.00%> (ø)
...tils/classifications/tests/test_agb_old_outdoor.py 100.00% <100.00%> (ø)
archeryutils/rounds.py 100.00% <100.00%> (ø)
archeryutils/tests/test_rounds.py 100.00% <100.00%> (ø)
...classifications/agb_old_outdoor_classifications.py 55.73% <55.73%> (ø)

@TomHall2020
Copy link
Contributor Author

but this also points to a deeper issue I'll get to later

As promised, I think fundamentally the fact that everything going into these classification functions aside from a score is a string, is a serious issue.

Both in terms of what has to happen internally in the code to process these strings:

#Theres lots of string processing and mangling going on to clean up the arguments
/archeryutils/classifications/agb_old_field_classifications.py:
  178:     if age_group.lower().replace(" ", "") in ("adult", "50+", "under21"):
  179          age_group = "Adult"
  180:     elif re.compile("under(18|16|15|14|12)").match(age_group.lower().replace(" ", "")):
  181          age_group = "Under 18"

  190      if (
  191:         bowstyle.lower().replace(" ", "") in ("compound", "recurve")
  192          and "wa_field_24_red_" not in roundname
  193      ):
  194          return "unclassified"
  195      if (
  196:         bowstyle.lower().replace(" ", "")
  197          in (
  198              "barebow",

...

And also for the end user as they are trying to figure out how to use the functions

# you are utterly on your own when you are trying to find the right round name or category names...
In [6]: import archeryutils.classifications as cf

In [7]: cf.agb_outdoor_classification_scores('short warwick', 'recurve', 'womens', 'u18') #wrong round, wrong gender, wrong age class...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[7], line 1
----> 1 cf.agb_outdoor_classification_scores('short warwick', 'recurve', 'womens', 'u18')

File ./archeryutils/classifications/agb_outdoor_classifications.py:543, in agb_outdoor_classification_scores(roundname, bowstyle, gender, age_group)
    540     bowstyle = "Compound"
    542 groupname = cls_funcs.get_groupname(bowstyle, gender, age_group)
--> 543 group_data = agb_outdoor_classifications[groupname]
    545 # Get scores required on this round for each classification
    546 class_scores = [
    547     hc.score_for_round(
    548         group_data["class_HC"][i],
   (...)
    553     for i in range(len(group_data["classes"]))
    554 ]

KeyError: 'u18_womens_recurve'

Finally you end up with the frustating situation that if you have already constructed a round object from scratch, or even loaded one from the pre-assigned rounds. You can't actually use it to work out classifications, even though the classification functions will use a fully instantiated round object to do thier own work.

from archeryutils import load_rounds
rnd = load_rounds.AGB_indoor.portsmouth
cf.agb_indoor_classification_scores(rnd, 'compound', 'male', 'under 14')

# TypeError: Unhasable type: 'Round'

It is, I think, a bit too far beyond the scope of what I want to get done in this request. But is something I'm keeping in mind as I work on it.

@jatkinson1000
Copy link
Owner

jatkinson1000 commented Dec 10, 2024

@TomHall2020 a lot to unpack, so will try and place comments here, but please remind me if I miss something.

As you already raise, this should remain a PR for reviving old outdoor classifications, with other amendments made through other pull requests please.

These could include:

Other points:

  • under 18 vs. 50+ was to match the WA/ArcheryGB terminology (and this was after a midway change from master)... Would need some thought on what is most sensible - there is an argument for sticking with the WA/AGB terminology that users would expect.
  • That said, it would be useful to have some way of easily checking what the available options are, as this has caught be before and I agree it is irritating digging back around to remind yourself what to use.
  • IIRC the reason for not using a Round object was because each classification scheme is restricted to a set of named rounds, and I enforce this by checking that the name appears in the dict of rounds instantiated in each classification module. I suppose one could pass in a Round and check that it's name appears in this list and I can perhaps see where this might come in useful, but could this not also be resolved by passing Round.name to the classification function?

For the linting issues you are seeing I have submitted a patch in #106 which you can rebase off of once merged.

@TomHall2020
Copy link
Contributor Author

TomHall2020 commented Dec 10, 2024

Use of itertools.product() Yep, not come across this before, so useful to learn about. Looks like it should do the job of flattening those loops.

In the spirit of seperating things out I'll do a quick PR for that once the other small ones are merged in, and then I can rebase the next steps for this on top of everything else. Should be a trivial change as its just an implementation detail, no changes to docs/testing required.

Moving to use cf rather than class_funcs. I think I am in agreement with this in principle. Is this only a documentation and notebook change?

And using the same convention where the package is imported into test files as well.

under 18 vs. 50+ was to match the WA/ArcheryGB terminology (and this was after a midway change from master)... Would need some thought on what is most sensible - there is an argument for sticking with the WA/AGB terminology that users would expect.

Huh, I hadn't really clocked that was the case but right you are. Example Ianseo results page. Having said that I don't know if most people will realise that there is that distinction, and at least at WA events it is also very common to see the abbreviated U21, U18 as well on schedules, target lists etc. for example Nimes.

That said, it would be useful to have some way of easily checking what the available options are, as this has caught be before and I agree it is irritating digging back around to remind yourself what to use.

At a minimum it needs to be documented, at the moment you can only find out from the souce code itself, and due to the way the data dictionaries are constructed even then the only place that actually contains all of the supported age category strings in a single file is the json data file.

Having said all that it seems a bit silly to put too much effort into deciding which nomenclature is the most correct because I think its clear that the best thing to do is to support common aliases at the point of use, so long as the input is unambigous. We aren't really here to catch users out.

I'm wondering if this is a use case for an Enum? That would be more self documenting and allows autocompletion. I'm fairly sure with the right use of StrEnum we could make it so that providing the old string values or valid aliases that we pre-define also still passes so it remains backwards compatible.

I'll open an issue on the topic of supporting aliases, and list some suggestions as to which ones could be included. Then any implementation details could follow on from that.

IIRC the reason for not using a Round object was because each classification scheme is restricted to a set of named rounds, and I enforce this by checking that the name appears in the dict of rounds instantiated in each classification module. I suppose one could pass in a Round and check that it's name appears in this list and I can perhaps see where this might come in useful, but could this not also be resolved by passing Round.name to the classification function?

Yeah, I did guess enforcing supported rounds per system was the reason. Passing round.name works fine, but on principle of least surprise it still seems frustrating to have to parse out my roundname from my perfectly functional round object just so it can be looked up again. I'd argue that given the fix of looking up the rounds name is so simple, why not just allow the functions to accept a round and we do that check as well? Could be abstracted to a utility function.

#inside xx_classification_scores()
rnd = get_check_round_by_name(roundname, ALL_XX_ROUNDS)
...

def get_check_round_by_name(rnd, accepted_rounds)
if isinstance(rnd, Round) and rnd.name in accepted_rounds
    return rnd
else:
    return accepted_rounds[rnd]

Naming feels a bit clunky but the principle is there.

@TomHall2020
Copy link
Contributor Author

For the linting issues you are seeing I have submitted a patch in #106 which you can rebase off of once merged.

D'oh, once again caught out by linter versioning. I actually hadn't updated my venv with the changes to main since my last contributions.

@TomHall2020
Copy link
Contributor Author

TomHall2020 commented Dec 10, 2024

On mentioning the linting, I'm pre-empting making the generic classification functions and related to the talk about string arugments to the underlying methods, we are going to get complaints from the linters about having too many function arguments when implementing those, and to be fair they do seem right. An example will be clearer:

# current example from agb_outdoors:
def calculate_agb_outdoor_classification(
    score: float,
    roundname: str,
    bowstyle: str,
    gender: str,
    age_group: str,
) -> str:
    ...

# to make generic we now also need to take 
# the handicap system (old or new) 
# and discipline/format (indoor/outdoor/field)
# as parameters

def calculate_classification(
    score: float,
    roundname: str,
    bowstyle: str,
    gender: str,
    age_group: str,
    handicap_system: str,
    discipline: str
) -> str:
    ...

# ERROR: PLR0913 Too many arguments in function definition (7 > 5)

I agree it is too many, they are nearly all strings so type safety is pretty useless, and would be pretty unwieldy to use:

>>> calculate_classification(1200, 'york', 'recurve', 'female', '50+', 'AGB', 'outdoor') 
# works
>>> calculate_classification(1200, 'york', 'female', 'recurve', '50+', 'AGB', 'outdoor')
# oops wrong order of args, mypy can't catch this

This ordering one especially bites me because while it matches the order the composite keys that get_groupname produces, it doesn't align with the order I would say it in my head which is probably one of either "recurve 50+ women" or "womens recurve 50+".

Accepting preformed group names such as"under15_male_recurve" is no good because:

  1. Makes it extra work for the consumer who now has to do thier own function call or else they will try to hack the group name together themselves
  2. We sometimes do business logic such as transforming the age categories or bowstyles before getting the group name

Perhaps one way to cheat the linter would be to take those as a (named)tuple rather than seperate arugments:

>>> calculate_classification(1200, ('recurve', 'female', '50+'), 'AGB', 'outdoor')
# smallest change from existing format, just need to enclose the category arguments in parentheses
>>> category = Category(gender='female', bowstyle='recurve', age='50+')
>>> calculate_classification(1200, category, 'AGB', 'outdoor')
# now we can also encapsulate the category information and pass arguements out of order by keyword

I was remembering how we handled the parsing of units as tuples in the target/pass/round functionality. Food for thought.

@jatkinson1000
Copy link
Owner

@TomHall2020 I have just added the handicap thresholds for the various categories.

I pushed them direct to this branch (couldn't get GitHub to play nice with opening a PR from my repo to your fork) so you'll want to pull to get these locally before any further changes to avoid conflicts.

More generally if you rebase off of the latest main on my repo that should hopefully resolve the conflicts and linting issues.

For tests there is a version of the old classification tables available here: https://southleedsarchers.org.uk/wp-content/uploads/2022/03/AGB-Shooting-Admin-Procedures-2020.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants