Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#860] Adding Spurious Correlation feature #1140

Merged
merged 10 commits into from
Jun 27, 2024

Conversation

allincowell
Copy link
Contributor

@allincowell allincowell commented Jun 12, 2024

Summary

🎯 Purpose: Adding Spurious Correlation feature for Image datasets.

📜 Example Usage: Finds correlation score between one of the image properties like dark score, blurry score, information score, size, aspect_ratio, etc. and the class labels using certain metrics like baseline accuracy and held-out accuracy by fitting a univariate model.

  1. Added spurious_correlation.py module in cleanlab/datalab/internal location.
  2. Added a private instance method _spurious_correlation in Datalab class that uses an instance of SpuriousCorrelations class.

Links to Relevant Issues or Conversations

Issue Link: #860
Early PR attempted: #872

@jwmueller jwmueller requested a review from elisno June 13, 2024 22:07
@@ -635,3 +636,64 @@ def load(path: str, data: Optional[Dataset] = None) -> "Datalab":
load_message = f"Datalab loaded from folder: {path}"
print(load_message)
return datalab

def _spurious_correlation(
self, properties_of_interest: Optional[List[str]] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, let's omit this argument in Datalab._spurious_correlation().

Remember to remove the parameter in the docstring as well.

odd_aspect_ratio_score 0.900000
"""
try:
issues = self.get_issues()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a validation step that ensures that the issues dataframe has all the relevant (image-specific) scores.
If it doesn't an error with a helpful message should be raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a validation step here to cjeck all vision/image issues are present in the correlations dataframe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the issues dataframe should be validated, not the correlations_df.

cleanlab/datalab/datalab.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jun 15, 2024

Codecov Report

Attention: Patch coverage is 94.23077% with 3 lines in your changes missing coverage. Please review.

Project coverage is 95.82%. Comparing base (ffdbe77) to head (7a1b0dc).
Report is 1 commits behind head on master.

Files Patch % Lines
cleanlab/datalab/internal/spurious_correlation.py 92.85% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1140      +/-   ##
==========================================
- Coverage   95.90%   95.82%   -0.08%     
==========================================
  Files          81       82       +1     
  Lines        6050     6102      +52     
  Branches      996     1071      +75     
==========================================
+ Hits         5802     5847      +45     
- Misses        148      151       +3     
- Partials      100      104       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@allincowell allincowell requested a review from elisno June 18, 2024 19:46
@allincowell allincowell force-pushed the spurious_correlation branch from 193543f to 2b01057 Compare June 24, 2024 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants