Skip to content

DOC: Closed parameter not intuitively documented in DataFrame.rolling #60485

Open
@caballerofelipe

Description

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.rolling.html

Documentation problem

I believe the parameter closed is not very intuitively documented.

(I'm using Pandas 2.2.2 on a macOS Sequoia)

Window size used for closed should be window+1

For this parameter to work, the actual window size should be thought of window+1. So for instance when window=3 this is how closed should be thought:

  • closed='right': from a window size of 4 (window=3+1), take the current element and 2 (4-2) elements just before the current one. Totals to 3 elements.
  • closed='left': from a window size of 4 (window=3+1), don't take the current element but take the 3 (4-1) elements just before the current one. Totals to 3 elements.
  • closed='both': from a window size of 4 (window=3+1), take the current element and 3 (4-1) elements just before the current one. Totals to 4 elements.

closed='neither'

Either 'neither' isn't working or what it does isn't straight forward to me. See examples below, both examples return NaN in every position.

Intuitively I would guess this parameter would diminish the window size by two from a window size of window+1. So if window=3 it would mean the actual calculation would be done in 3+1-2=2 window but as you see below I only get NaN.

Example 1: mean()

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8]})

# Rolling mean with 'left' closed
df['rolling_mean_left'] = df['A'].rolling(window=3, closed='left').mean()

# Rolling mean with 'both' closed
df['rolling_mean_both'] = df['A'].rolling(window=3, closed='both').mean()

# Rolling mean with 'right' closed
df['rolling_mean_right'] = df['A'].rolling(window=3, closed='right').mean()

# Rolling mean with neither closed
df['rolling_mean_neither'] = df['A'].rolling(window=3, closed='neither').mean()

df
Capture d’écran 2024-12-03 à 14 17 33

Example 2: sum()

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8]})

# Rolling sum with 'left' closed
df['rolling_sum_left'] = df['A'].rolling(window=3, closed='left').sum()

# Rolling sum with 'both' closed
df['rolling_sum_both'] = df['A'].rolling(window=3, closed='both').sum()

# Rolling sum with 'right' closed
df['rolling_sum_right'] = df['A'].rolling(window=3, closed='right').sum()

# Rolling sum with neither closed
df['rolling_sum_neither'] = df['A'].rolling(window=3, closed='neither').sum()

df
Capture d’écran 2024-12-03 à 14 18 32

Suggested fix for documentation

I would suggest stating that the window size taken into consideration for closed is actually the parameter window + 1, then what's stated in the docs would make sense. OR, actually use the actual window parameter which would make way much more sense to me. From the current docs:

  • If 'right', the first point in the window is excluded from calculations.
  • If 'left', the last point in the window is excluded from calculations.
  • If 'both', no point in the window is excluded from calculations.

Maybe even add an image example like the ones I posted above.

As for 'neither', I don't have suggestions as I don't fully understand it from my testing.

Finally, I don't like the name closed for the parameter, is doesn't mean much to me. I would maybe prefer something like ends or ends_used. I believe it would be more intuitive.

Metadata

Labels

DocsWindowrolling, ewma, expanding

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions