-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to generate forecasts with prediction_length > 64
?
#40
Comments
|
predict()
class to have forecast autoregressive over unseen data_test
while prediction_length
recommended to be <=64 ?prediction_length >= 64
?
prediction_length >= 64
?prediction_length > 64
?
Alternatively, you can resample your dataset to a lower frequency. Here's an example with #-----------------------------------------------------------
# Libs
#-----------------------------------------------------------
# for plotting, run: pip install pandas matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
from chronos import ChronosPipeline
#-----------------------------------------------------------
# LOAD THE DATASET
#-----------------------------------------------------------
df = pd.read_csv('https://raw.githubusercontent.com/amcs1729/Predicting-cloud-CPU-usage-on-Azure-data/master/azure.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])
data = df.rename(columns={'min cpu': 'min_cpu',
'max cpu': 'max_cpu',
'avg cpu': 'avg_cpu',})
# Data preparation
# ==============================================================================
sliced_df = data[['timestamp', 'avg_cpu']]
# Convert data from Hz to MHz
# ==============================================================================
sliced_df['avg_cpu_Mhz'] = sliced_df['avg_cpu'] / 1000000
sliced_df = sliced_df.set_index("timestamp").resample("1H").sum().reset_index()
# Configuration
# ==============================================================================
name_columns='avg_cpu_Mhz'
lags=24
steps=24
n_backtest=3
step_size = steps * n_backtest
data_train = sliced_df[:-step_size]
data_test = sliced_df[-step_size:] #unseen
# Pipeline
# ==============================================================================
pipeline = ChronosPipeline.from_pretrained(
"amazon/chronos-t5-small",
device_map="cuda",
torch_dtype=torch.bfloat16,
)
# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
context = torch.tensor(data_train['avg_cpu_Mhz'])
prediction_length = 72 #len(data_test) #12
forecast = pipeline.predict(
context,
prediction_length,
num_samples=20, #20,
temperature=1.0,
top_k=50,
top_p=1.0,
limit_prediction_length=False
) # forecast shape: [num_series, num_samples, prediction_length]
# visualize the forecast
forecast_index = range(len(data_train), len(data_train) + prediction_length)
low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)
plt.figure(figsize=(8, 4))
plt.plot(data_train['avg_cpu_Mhz'], color="royalblue", label="historical train data")
plt.plot(data_test['avg_cpu_Mhz'] , color="navy", label="historical test data", linestyle='dashed')
plt.plot(forecast_index, median, color="tomato", label="median forecast")
plt.fill_between(forecast_index, low, high, color="tomato", alpha=0.3, label="80% prediction interval")
plt.title('Chronos forecast result')
plt.ylabel(' CPU usage [MHz]', fontsize=15)
plt.xlabel('Timestamp', fontsize=15)
plt.legend()
plt.grid()
plt.show() |
@abdulfatir Thanks for your answer. I have few Qs
Therefore to avoid suboptimal results we need ressample it. However, the fact that sometimes resampling with certain aggregation functions can damage the nature of time data. in this case, nature kept almost when you did
|
@clevilll sorry, missed this.
|
May I ask if it's possible to test the performance with a context length of 2048, as described in Section 5.6 on context length in the original paper? @abdulfatir |
@GritLs Those experiments were ablations conducted with a different model which was trained with a longer context length. Unfortunately, that model is not available in public and we don't have immediate plans to release it. That said, you should be able to train a similar model yourself. Please follow the pretraining instructions and let me know if you have questions. |
Hi,
I have time data and split to train and test (keep it unseen) by slicing the
df
from the end part. I used your pipeline overdata_train
and tried to forecast as length asdata_test
unsuccessfully as below :but results is as follow:
predict()
class to have forecast autoregressive over unseendata_test
?prediction_length
recommended to be <=64 ?The text was updated successfully, but these errors were encountered: