-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QA] Score tests improvements #935
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
…y/vizro into score_tests_improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these enhancements Alexey. There are a few comments, but other than that, it's all good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow I really like this whole Score tests logic you created! 💯 I think it's ready to be checked while we touch the vizro-ai code, and can help track the performance more easily.
I just have some minor questions and suggestions. Overall it's really cool!
…ests_improvements
for more information, see https://pre-commit.ci
…y/vizro into score_tests_improvements
Huge thanks for all of your comments! |
#temporary for development | ||
pull_request: | ||
branches: | ||
- main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to remove this before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a light review, looks good! I think this is going in the right direction! Left a few small comments. ⭐ 💯
In general I think my main comments are:
- Make adding tests as easy as possible (ideally just another prompt + expectation pair, nothing else)
- Make adding and removing models etc as easy as possible (ideally just some test parametrization or so)
- make the complexity of the dashboard a column in the report, and allow for individual names for newly added tests (such that we can have in the furture smth like,
easy_1
,easy_2
,easy_abc
, etc.
Other than that, I think it's exciting!
@@ -51,7 +51,7 @@ prep-release = [ | |||
pypath = "hatch run python -c 'import sys; print(sys.executable)'" | |||
test = "pytest tests {args}" | |||
test-integration = "pytest -vs --reruns 1 tests/integration --headless {args}" | |||
test-score = "pytest -vs --reruns 1 tests/score --headless {args}" | |||
test-score = "pytest -vs tests/score --headless {args}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would make sense to add a comment above on where to enter the API key. When I run the above command, then all tests fail, but only because the API key does not work
vizro-ai/tests/score/prompts.py
Outdated
complex_prompt = """ | ||
<Page 1> | ||
Show me 1 table on the first page that shows tips and sorted by day | ||
Using export button I want to export data to csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that even possible @lingyielia? I am not sure the json schema for the button actually includes possible custom actions?
["gpt-4o-mini"], | ||
ids=["gpt-4o-mini"], | ||
[ | ||
"gpt-4o-mini", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about gpt-40
(not mini)?
|
||
@pytest.mark.medium_dashboard | ||
@pytest.mark.parametrize("model_name", ["gpt-4o-mini"], ids=["gpt-4o-mini"]) | ||
def test_medium_dashboard(dash_duo, model_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need individual tests for easy, medium and complex dashboards? Should this not be another parameter to a single test?
In general I have mentioned this before, I think it would be good to not have three specific dashboards, but rather any number of dashboards that belong to a tier (three tiers are fine). In the future we could then easily add more by adding simply a new pair of prompt + expectation, that should be the aim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Description
numpy
libReference to potential complexity prompts improvements -> #935 (comment)
Notice
I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":