DOC: Closed parameter not intuitively documented in DataFrame.rolling #60485
Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.rolling.html
Documentation problem
I believe the parameter closed
is not very intuitively documented.
(I'm using Pandas 2.2.2 on a macOS Sequoia)
Window size used for closed
should be window+1
For this parameter to work, the actual window size should be thought of window+1. So for instance when window=3
this is how closed should be thought:
closed='right'
: from a window size of 4 (window=3+1), take the current element and 2 (4-2) elements just before the current one. Totals to 3 elements.closed='left'
: from a window size of 4 (window=3+1), don't take the current element but take the 3 (4-1) elements just before the current one. Totals to 3 elements.closed='both'
: from a window size of 4 (window=3+1), take the current element and 3 (4-1) elements just before the current one. Totals to 4 elements.
closed='neither'
Either 'neither' isn't working or what it does isn't straight forward to me. See examples below, both examples return NaN
in every position.
Intuitively I would guess this parameter would diminish the window size by two from a window size of window+1. So if window=3 it would mean the actual calculation would be done in 3+1-2=2 window but as you see below I only get NaN
.
Example 1: mean()
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8]})
# Rolling mean with 'left' closed
df['rolling_mean_left'] = df['A'].rolling(window=3, closed='left').mean()
# Rolling mean with 'both' closed
df['rolling_mean_both'] = df['A'].rolling(window=3, closed='both').mean()
# Rolling mean with 'right' closed
df['rolling_mean_right'] = df['A'].rolling(window=3, closed='right').mean()
# Rolling mean with neither closed
df['rolling_mean_neither'] = df['A'].rolling(window=3, closed='neither').mean()
df
Example 2: sum()
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8]})
# Rolling sum with 'left' closed
df['rolling_sum_left'] = df['A'].rolling(window=3, closed='left').sum()
# Rolling sum with 'both' closed
df['rolling_sum_both'] = df['A'].rolling(window=3, closed='both').sum()
# Rolling sum with 'right' closed
df['rolling_sum_right'] = df['A'].rolling(window=3, closed='right').sum()
# Rolling sum with neither closed
df['rolling_sum_neither'] = df['A'].rolling(window=3, closed='neither').sum()
df
Suggested fix for documentation
I would suggest stating that the window size taken into consideration for closed
is actually the parameter window
+ 1, then what's stated in the docs would make sense. OR, actually use the actual window
parameter which would make way much more sense to me. From the current docs:
- If 'right', the first point in the window is excluded from calculations.
- If 'left', the last point in the window is excluded from calculations.
- If 'both', no point in the window is excluded from calculations.
Maybe even add an image example like the ones I posted above.
As for 'neither', I don't have suggestions as I don't fully understand it from my testing.
Finally, I don't like the name closed
for the parameter, is doesn't mean much to me. I would maybe prefer something like ends
or ends_used
. I believe it would be more intuitive.