-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/static covs #966
Feat/static covs #966
Conversation
…names Please enter the commit message for your changes. Lines starting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only reviewed TimeSeries
so far - will cover the rest later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks really great 💯💯. A few comments:
- Let's keep
TimeSeries
immutable and check static covariates format (e.g. DataFrame of right shape and maybe right dtype upon construction ofTimeSeries
). - On dimensionality of static covariates: I think what you have now is good and we can leave it like that. Maybe add some good documentation on any method to access those, so that it's clear what the dimensions are.
- You need add support for concatenation/stacking (hopefully quite easy) and maybe grep all parts of the code that calls
values()
orall_values()
to make sure static covariates are propagated where needed. - I don't think we need to scale the static covariates - at least not for now. But let's mention this in the scaler docstring.
- I think you can already pass on the static covariates tensors to all models and leave it up to the models (currently all of them but TFT) to ignore them.
- Categorical support: as discussed, let's be flexible on the types allowed in static covariates and not limit ourselves to numeric types. We can then throw a runtime error if a string is used in e.g. a torch model, and leave it up to the models (or datasets) to do these checks and use the covariates as intended (as discussed, one of the first thing to add later on will be support for categorical features in LGBM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great work @dennisbader - beautiful :)
And it's an excellent basis to start using static covariates in more places :)
I've had a few more very minor comments, but then from my perspective we can merge as soon as you feel confident you've covered all the cases (e.g. with the _xa
accesses).
DataFrame. If a Series, the index represents the static variables. The covariates are globally 'applied' | ||
to all components of the TimeSeries. If a DataFrame, the columns represent the static variables and the | ||
rows represent the components of the uni/multivariate TimeSeries. If a single-row DataFrame, the covariates | ||
are globally 'applied' to all components of the TimeSeries. If a multi-row DataFrame, the number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you write "If a single-row DataFrame, the covariates are globally applied" --> That's a good idea. But is it implemented? In the constructor, I only see a check that the length of the DataFrame matches the number of components. Maybe we should "tile" the DataFrame if it has length 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the constructor it checks the following:
Line 220 in 6dc7ff8
n_components > 1 and n_components != self.n_components, |
so either the df has 1 or n_components
components.
The component names are then set in the following lines:
Line 235 in 6dc7ff8
static_covariates.index = ( |
When the number of static cov components is equal to n_components
, it simply assign the component names to the static covs.
If the number is not equal n_components
, it must be a single global component and it gets the following component name:
Line 55 in 6dc7ff8
DEFAULT_GLOBAL_STATIC_COV_NAME = "global_components" |
I don't think we need to tile the DataFrame as it doesn't add any new information. WDYT?
If we have a multivariate series with the global staic covariates and we select on component from the series, the static covariate component name will change from DEFAULT_GLOBAL_STATIC_COV_NAME
to the name of the univariate component:
>>> sc = pd.Series([0., 1.], index=["sc1", "sc2"])
>>> ts = TimeSeries.from_values(values=np.random.rand(5, 2), columns=["a", "b"]).with_static_covariates(sc)
>>> print(ts.static_covariates)
>>> print("")
>>> print(ts["a"].static_covariates)
static_covariates sc1 sc2
global_components 0.0 1.0
static_covariates sc1 sc2
component
a 0.0 1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see. I had in mind to tile it so that whenever we access the covariates values (e.g. from training datasets), we always get a tensor with dimension matching the number of components.
Co-authored-by: Julien Herzen <julien@unit8.co>
Codecov Report
@@ Coverage Diff @@
## master #966 +/- ##
==========================================
+ Coverage 92.39% 92.92% +0.52%
==========================================
Files 76 76
Lines 7552 7632 +80
==========================================
+ Hits 6978 7092 +114
+ Misses 574 540 -34
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
ts_new = abs(ts) | ||
assert ts_new.static_covariates.equals(ts.static_covariates) | ||
# arithmetics with series (left) and non-series (right) | ||
self.helper_test_cov_transfer(ts, ts / 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Fixes #597.
Edit:
Summary
TimeSeries.with_static_covariates()
to add static covariates to any TimeSeriesTimeSeries.from_group_dataframe()
: group dataframes -> Basically multiple unit specific time series DataFrames that are vertically stacked. These can now be read and converted to multiple TimeSeries.TimeSeries.static_covariates_values()
,TimeSeries.static_covariates