Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QA] Score tests improvements #935

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

[QA] Score tests improvements #935

wants to merge 20 commits into from

Conversation

l0uden
Copy link
Contributor

@l0uden l0uden commented Dec 24, 2024

Description

  • rewrote score calculation with numpy lib
  • added prompt text to report
  • added anthropic provider for an easy dashboard creation (it failed to build medium and complex dashboard)
  • added complex prompt to have score less than 1.0

Reference to potential complexity prompts improvements -> #935 (comment)

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

    • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
    • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorized to submit this contribution on behalf of the original creator(s) or their licensees.
    • I certify that the use of this contribution as authorized by the Apache 2.0 license does not violate the intellectual property rights of anyone else.
    • I have not referenced individuals, products or companies in any commits, directly or indirectly.
    • I have not added data or restricted code in any commits, directly or indirectly.

@github-actions github-actions bot added the Vizro-AI 🤖 Issue/PR that addresses Vizro-AI package label Dec 24, 2024
Copy link
Contributor

@petar-qb petar-qb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these enhancements Alexey. There are a few comments, but other than that, it's all good

vizro-ai/tests/score/prompts.py Outdated Show resolved Hide resolved
vizro-ai/tests/score/prompts.py Outdated Show resolved Hide resolved
vizro-ai/tests/score/prompts.py Outdated Show resolved Hide resolved
vizro-ai/tests/score/prompts.py Show resolved Hide resolved
vizro-ai/tests/score/test_dashboard.py Outdated Show resolved Hide resolved
vizro-ai/tests/score/test_dashboard.py Outdated Show resolved Hide resolved
vizro-ai/tests/score/test_dashboard.py Outdated Show resolved Hide resolved
vizro-ai/tests/score/test_dashboard.py Show resolved Hide resolved
Copy link
Contributor

@lingyielia lingyielia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I really like this whole Score tests logic you created! 💯 I think it's ready to be checked while we touch the vizro-ai code, and can help track the performance more easily.

I just have some minor questions and suggestions. Overall it's really cool!

.github/workflows/test-score-vizro-ai.yml Show resolved Hide resolved
.github/workflows/test-score-vizro-ai.yml Outdated Show resolved Hide resolved
vizro-ai/tests/score/prompts.py Show resolved Hide resolved
@l0uden
Copy link
Contributor Author

l0uden commented Jan 16, 2025

Huge thanks for all of your comments!
It is ready for review again

Comment on lines +11 to +14
#temporary for development
pull_request:
branches:
- main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to remove this before merging.

vizro-ai/tests/score/prompts.py Show resolved Hide resolved
vizro-ai/tests/score/test_dashboard.py Outdated Show resolved Hide resolved
vizro-ai/tests/score/test_dashboard.py Outdated Show resolved Hide resolved
vizro-ai/tests/score/test_dashboard.py Outdated Show resolved Hide resolved
@lingyielia lingyielia self-requested a review January 17, 2025 18:07
Copy link
Contributor

@lingyielia lingyielia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement! 🎉

Copy link
Contributor

@maxschulz-COL maxschulz-COL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a light review, looks good! I think this is going in the right direction! Left a few small comments. ⭐ 💯

In general I think my main comments are:

  1. Make adding tests as easy as possible (ideally just another prompt + expectation pair, nothing else)
  2. Make adding and removing models etc as easy as possible (ideally just some test parametrization or so)
  3. make the complexity of the dashboard a column in the report, and allow for individual names for newly added tests (such that we can have in the furture smth like, easy_1, easy_2,easy_abc, etc.

Other than that, I think it's exciting!

@@ -51,7 +51,7 @@ prep-release = [
pypath = "hatch run python -c 'import sys; print(sys.executable)'"
test = "pytest tests {args}"
test-integration = "pytest -vs --reruns 1 tests/integration --headless {args}"
test-score = "pytest -vs --reruns 1 tests/score --headless {args}"
test-score = "pytest -vs tests/score --headless {args}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would make sense to add a comment above on where to enter the API key. When I run the above command, then all tests fail, but only because the API key does not work

complex_prompt = """
<Page 1>
Show me 1 table on the first page that shows tips and sorted by day
Using export button I want to export data to csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that even possible @lingyielia? I am not sure the json schema for the button actually includes possible custom actions?

["gpt-4o-mini"],
ids=["gpt-4o-mini"],
[
"gpt-4o-mini",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about gpt-40 (not mini)?


@pytest.mark.medium_dashboard
@pytest.mark.parametrize("model_name", ["gpt-4o-mini"], ids=["gpt-4o-mini"])
def test_medium_dashboard(dash_duo, model_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need individual tests for easy, medium and complex dashboards? Should this not be another parameter to a single test?

In general I have mentioned this before, I think it would be good to not have three specific dashboards, but rather any number of dashboards that belong to a tier (three tiers are fine). In the future we could then easily add more by adding simply a new pair of prompt + expectation, that should be the aim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

vizro-ai/tests/score/prompts.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Vizro-AI 🤖 Issue/PR that addresses Vizro-AI package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants