Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations/sets_metadata #487

Draft
wants to merge 41 commits into
base: master
Choose a base branch
from
Draft

Annotations/sets_metadata #487

wants to merge 41 commits into from

Conversation

LoannPeurey
Copy link
Contributor

@LoannPeurey LoannPeurey commented Dec 11, 2024

Allow the presence of more metadata for annotation sets in the form of a yaml file inside the annotation set folder.
The file is optional and has a list of known entries, but not limited to those entries.
Functions that create new sets will generate a file that will contain the known metadata and inherit form the sets it is merged from / derives from

  • Have an attribute in AnnotationManager storing the metadata (in a dataframe), reading from the yaml files in each set
  • function to get metadata of sets (result is the content of the dataframe + calculated values from other sources, like total duration annotated), and CLI command to get this metadata (in different formats)
  • extend on the overview command to give all of this as well, overview should take on an exportable format (both CLI and internal function) have some nice CLI representation when requesting the overview
  • functions to adapt:
    • derive_annotations make sure it generates a metadata yaml file that inherits the source set (Q: should this be created regardless of if a file already exists? could happen that some info was manually added and rerunning would erase it)
    • merge_sets : create a file inherited from the 2 sets? who has priority then for 2 values? There could be some hard coded lines that look at precisely what columns are kept and what info they contain (a bit forceful but should be fine) EDIT: there is an option to keep metadata from left or right set or none, content metadata is determined by column names
    • metrics : offer the possibility to get metadata from the set category as well in the final dataframe
    • conversations_summary : idem, offer the possibility to get metadata from the set category as well in the final dataframe
    • rename_set : make sure metannots.yml is carried with the rename
  • tests:
    • tests for basic functionality and CLI
    • tests for importation, validation, derivation, merging with or without the metadata present
    • for overview pipelines
    • tests for metrics and conversations with inclusion of set metadata present or not
  • documentation:
    • doc for yml file (must be present in general presentation, but also in importation), metannots.yml should be utf-8 encoded
    • doc for CLI call

other changes bundled here:

  • use pathlib instead of os.path where it makes sense
  • make usage of segments vs annotations more consistent
  • fixing init cmdline with force option

@LoannPeurey LoannPeurey self-assigned this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant