Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix report output formatting #687

Merged
merged 4 commits into from
Apr 28, 2023
Merged

Conversation

elisno
Copy link
Member

@elisno elisno commented Apr 28, 2023

This PR refines the source code for report outputs with Datalab, primarily by adjusting the verbosity level required to display the threshold used in near-duplicate checks.

Minor change

lab.report()

gives us extra info on the datasets passed to Datalab.

Changed output

Here is a summary of the different kinds of issues found in the data:

    issue_type    score  num_issues
         label 0.909091          11
       outlier 0.522080           6
near_duplicate 0.246459           4

(Note: A lower score indicates a more severe issue across all examples in the dataset.)
+
+ Dataset Information: num_examples: 132, num_classes: 3

...

@codecov
Copy link

codecov bot commented Apr 28, 2023

Codecov Report

Merging #687 (78a1307) into master (aa2423b) will decrease coverage by 0.03%.
The diff coverage is 83.33%.

❗ Current head 78a1307 differs from pull request most recent head 652979b. Consider uploading reports for the commit 652979b to get more accurate results

@@            Coverage Diff             @@
##           master     #687      +/-   ##
==========================================
- Coverage   95.82%   95.80%   -0.03%     
==========================================
  Files          46       46              
  Lines        3639     3645       +6     
  Branches      648      649       +1     
==========================================
+ Hits         3487     3492       +5     
  Misses         79       79              
- Partials       73       74       +1     
Impacted Files Coverage Δ
cleanlab/datalab/issue_manager/duplicate.py 96.59% <ø> (ø)
cleanlab/datalab/issue_manager/issue_manager.py 90.65% <ø> (ø)
cleanlab/datalab/report.py 96.29% <83.33%> (-3.71%) ⬇️

... and 1 file with indirect coverage changes

Copy link
Contributor

@huiwengoh huiwengoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition - LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants