Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [timeseries] Some forecasting models fail during fit if an S3 path is used #4212

Open
3 tasks done
shchur opened this issue May 21, 2024 · 2 comments
Open
3 tasks done
Assignees
Labels
bug Something isn't working module: timeseries related to the timeseries module
Milestone

Comments

@shchur
Copy link
Collaborator

shchur commented May 21, 2024

Bug Report Checklist

  • I provided code that demonstrates a minimal reproducible example.
  • I confirmed bug exists on the latest mainline of AutoGluon via source install.
  • I confirmed bug exists on the latest stable version of AutoGluon.

Describe the bug
Setting TimeSeriesPredictor(path="s3://my-bucket/my-predictor") leads to some forecasting models failing during training.

Related issue: awslabs/gluonts#3171

Expected behavior
All models train successfully, same as if a local path was used.

To Reproduce

pred = TimeSeriesPredictor(path="s3://my-bucket/my-predictor").fit('https://autogluon.s3.amazonaws.com/datasets/timeseries/m4_hourly_tiny/train.csv')
@shchur shchur added bug: unconfirmed Something might not be working Needs Triage Issue requires Triage bug Something isn't working module: timeseries related to the timeseries module and removed bug: unconfirmed Something might not be working Needs Triage Issue requires Triage labels May 21, 2024
@shchur shchur added this to the 1.2 Release milestone May 21, 2024
@shchur shchur self-assigned this May 21, 2024
@Innixma
Copy link
Contributor

Innixma commented May 21, 2024

Open question: Should we support s3 paths?

In early versions of AutoGluon I supported s3 paths for TabularPredictor. I eventually found that once I mixed in ray and started using optimized model saving techniques that it became hard to ensure the code worked properly in both scenarios.

Because of this difficulty, I eventually stopped supporting S3 paths for predictor artifacts.

One work-around could be to use a local directory for all training / prediction, and then upload the local files to the s3 location after fitting is complete. Similarly, download the artifact if called via .load. I haven't implemented this approach but it could work.

If you have found a way to make s3 paths work with relative ease, I'd be interested in knowing how you did it.

@shchur
Copy link
Collaborator Author

shchur commented May 22, 2024

@Innixma I assumed that TabularPredictor already supported S3 paths since it worked fine with medium_quality presets, but now I understand that this might not be the case for high_quality / best_quality, where ray is used.

I will check how much work it would take to make S3 paths work in all scenarios. At the very least, we should raise some informative error message explaining that only local paths are supported. Currently for the TimeSeriesPredictor we just observe some model failures in the middle of the training process, which is a bad customer experience.

@shchur shchur modified the milestones: 1.2 Release, 1.3 Release Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module: timeseries related to the timeseries module
Projects
None yet
Development

No branches or pull requests

2 participants