Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: stats.ContinuousDistribution: improve doc generation; fix broken doctests by making stats.Uniform public again #22027

Merged
merged 8 commits into from
Dec 11, 2024

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Dec 9, 2024

Reference issue

gh-21050

What does this implement/fix?

The method documentation of ContinuousDistribution was written when Uniform was public, so many of the examples used Uniform to illustrate a point. Uniform was temporarily made private, breaking the examples, but we didn't notice the problem because apparently these doctests aren't running1. This PR fixes the problem by making Uniform public again (taking the opportunity to fill out the implementation).

Also eliminates the code that was caching/loading the docs to/from a json file rather than generating them at import time. If we need to add something like that back, we can, but it's fast enough right now that the complexity is not warranted. We still need to do something about the fact that method documentation is rendered separately for every subclass of _ProbabilityDistribution but that is also something that can wait because it is not a big problem unless we have many distributions.

Footnotes

  1. Not sure what that's about; if we can't figure it out here, I'll open a separate issue. Maybe something to do with the docstrings being inherited from an ABC?

@mdhaber mdhaber added scipy.stats Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org labels Dec 9, 2024
@mdhaber mdhaber changed the title WIP: stats.ContinuousDistribution: improve doc generation MAINT: stats.ContinuousDistribution: improve doc generation; add stats.Uniform Dec 10, 2024
@mdhaber mdhaber changed the title MAINT: stats.ContinuousDistribution: improve doc generation; add stats.Uniform MAINT: stats.ContinuousDistribution: improve doc generation; expose missing stats.Uniform Dec 10, 2024
@mdhaber mdhaber changed the title MAINT: stats.ContinuousDistribution: improve doc generation; expose missing stats.Uniform MAINT: stats.ContinuousDistribution: improve doc generation; fix broken doctests by making stats.Uniform public again Dec 10, 2024
@mdhaber mdhaber marked this pull request as ready for review December 11, 2024 05:50
@mdhaber mdhaber removed request for rgommers and ev-br December 11, 2024 05:50
@ev-br
Copy link
Member

ev-br commented Dec 11, 2024

doctests aren't running

They shouldn't for private objects and should for public ones, anything else is a bug. Do they get collected as tests?

@mdhaber
Copy link
Contributor Author

mdhaber commented Dec 11, 2024

Do they get collected as tests?

I'm not sure I follow? Not sure how to check.

Take a look in main, for example.
Mixture is a (public) subclass of private ABC _ProbabilityDistribution.

class Mixture(_ProbabilityDistribution):

The public methods of Mixture (e.g. pdf) don't include a docstring, so they inherit the _ProbabilityDistribution documentation. This is definitely happening, since these docs are rendered.
image

In the examples:
image
stats.Uniform doesn't exist in main

from scipy.stats import Uniform
# ImportError: cannot import name 'Uniform' from 'scipy.stats' (/Users/matthaberland/Desktop/scipy/scipy/stats/__init__.py)

If these examples were running as doctests, I'd expect CI to fail, but it doesn't.

@mdhaber mdhaber added this to the 1.15.0 milestone Dec 11, 2024
@mdhaber mdhaber added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Dec 11, 2024
@ev-br
Copy link
Member

ev-br commented Dec 11, 2024

Mixture docstring does not seem to have examples, huh?

In [1]: from scipy.stats import Mixture

In [3]: Mixture.__doc__
Out[3]: 'Representation of a mixture distribution.\n\n    A mixture distribution is the distribution of a random variable\n    defined in the following way: first, a random variable is selected\n    from `components` according to the probabilities given by `weights`, then\n    the selected random variable is realized.\n\n    Parameters\n    ----------\n    components : sequence of `ContinuousDistribution`\n        The underlying instances of `ContinuousDistribution`.\n        All must have scalar shape parameters (if any); e.g., the `pdf` evaluated\n        at a scalar argument must return a scalar.\n    weights : sequence of floats, optional\n        The corresponding probabilities of selecting each random variable.\n        Must be non-negative and sum to one. The default behavior is to weight\n        all components equally.\n\n    Attributes\n    ----------\n    components : sequence of `ContinuousDistribution`\n        The underlying instances of `ContinuousDistribution`.\n    weights : ndarray\n        The corresponding probabilities of selecting each random variable.\n\n    Methods\n    -------\n    support\n\n    sample\n\n    moment\n\n    mean\n    median\n    mode\n\n    variance\n    standard_deviation\n\n    skewness\n    kurtosis\n\n    pdf\n    logpdf\n\n    cdf\n    icdf\n    ccdf\n    iccdf\n\n    logcdf\n    ilogcdf\n    logccdf\n    ilogccdf\n\n    entropy\n\n    Notes\n    -----\n    The following abbreviations are used throughout the documentation.\n\n    - PDF: probability density function\n    - CDF: cumulative distribution function\n    - CCDF: complementary CDF\n    - entropy: differential entropy\n    - log-*F*: logarithm of *F* (e.g. log-CDF)\n    - inverse *F*: inverse function of *F* (e.g. inverse CDF)\n\n    References\n    ----------\n    .. [1] Mixture distribution, *Wikipedia*,\n           https://en.wikipedia.org/wiki/Mixture_distribution\n\n    '

In [4]: 'Uniform' in Mixture.__doc__
Out[4]: False

In [5]: import scipy

In [6]: scipy.__version__
Out[6]: '1.16.0.dev0+git20241210.7f03fba'

So something fancy is going on with these docstrings, but as far as doctester is aware, Mixture does not have examples, so there's nothing to check.

@ev-br
Copy link
Member

ev-br commented Dec 11, 2024

Not sure how to check.

Either run $ python dev.py smoke-docs -v in the verbose mode for it to output all names it tests, or pass --collect-only to pytest:

$ python dev.py smoke-docs -s stats -- --collect-only 
💻  ninja -C /home/br/repos/scipy/scipy/build -j4
ninja: Entering directory `/home/br/repos/scipy/scipy/build'
[2/338] Generating subprojects/highs/src/HConfig.h with a custom command
Build OK
💻  meson install -C build --only-changed --tags runtime,python-runtime,tests,devel
Installing, see meson-install.log...
Installation OK
SciPy from development installed path at: /home/br/repos/scipy/scipy/build-install/lib/python3.12/site-packages
Running tests for scipy version:1.16.0.dev0+git20241210.7f03fba, installed at:/home/br/repos/scipy/scipy/build-install/lib/python3.12/site-packages/scipy
=================================================================== test session starts ====================================================================
platform linux -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
rootdir: /home/br/repos/scipy/scipy
configfile: pytest.ini
plugins: cov-5.0.0, scipy_doctest-1.5.1, hypothesis-6.108.0, xdoctest-1.2.0, xdist-3.6.1, anyio-4.4.0, timeout-2.3.1
collected 846 items                                                                                                                                        

<Package build-install/lib/python3.12/site-packages/scipy>
  <Package stats>
    <DTModule __init__.py>
      <DoctestItem scipy.stats.CensoredData>
      <DoctestItem scipy.stats.CensoredData.interval_censored>
      <DoctestItem scipy.stats.CensoredData.left_censored>
      <DoctestItem scipy.stats.CensoredData.right_censored>
      <DoctestItem scipy.stats.Covariance>
      <DoctestItem scipy.stats.Covariance.colorize>
      <DoctestItem scipy.stats.Covariance.from_cholesky>
      <DoctestItem scipy.stats.Covariance.from_diagonal>
      <DoctestItem scipy.stats.Covariance.from_eigendecomposition>
      <DoctestItem scipy.stats.Covariance.from_precision>
      <DoctestItem scipy.stats.Covariance.whiten>
      <DoctestItem scipy.stats.Normal>
      ... snip ...

One caveat is that our pytest plugin skips empty examples, https://github.com/scipy/scipy_doctest/blob/main/scipy_doctest/plugin.py#L239, so not having an entry might mean it either is missed completely or the docstring exists but does not have examples.

@ev-br
Copy link
Member

ev-br commented Dec 11, 2024

Oh, actually, the problem is not in the class docstring, what's missing are examples from docstrings of methods, is this correct? For one, Uniform.__doc__ is being checked, but Uniform.pdf.__doc__ is not.

The issue seems to be that doctesting does not collect examples from superclasses (I only took a cursory look, so the analysis below can be still incorrect or incomplete).

The output of $ python dev.py smoke-docs -s stats -- --collect-only has Normal but does not have Normal.pdf or other methods.

Looking through the collection chain, in the end of the day, our $ python dev.py smoke-docs delegates doctest finding to the standard library doctest module,
https://github.com/scipy/scipy_doctest/blob/main/scipy_doctest/impl.py#L540

And it seems to miss Normal.pdf indeed.

Is this related to inherting from ABC? Taking a quick peek into the inheritance hierarchy which does not have ABCs:

Adding examples to docstrings of CubicSpline.__call__ and _PPolyBase.__call__ (the public PPoly class inherits from _PPolyBase):

$ git diff
diff --git a/scipy/interpolate/_bsplines.py b/scipy/interpolate/_bsplines.py
index 3d68e8d532..27a660a1f0 100644
--- a/scipy/interpolate/_bsplines.py
+++ b/scipy/interpolate/_bsplines.py
@@ -492,6 +492,9 @@ class BSpline:
             Shape is determined by replacing the interpolation axis
             in the coefficient array with the shape of `x`.
 
+        >>> 2
+        3
+
         """
         if extrapolate is None:
             extrapolate = self.extrapolate
diff --git a/scipy/interpolate/_interpolate.py b/scipy/interpolate/_interpolate.py
index 7558bd7db2..6f6e84ce54 100644
--- a/scipy/interpolate/_interpolate.py
+++ b/scipy/interpolate/_interpolate.py
@@ -770,6 +770,10 @@ class _PPolyBase:
         breakpoints. The polynomial intervals are considered half-open,
         ``[a, b)``, except for the last interval which is closed
         ``[a, b]``.
+
+        >>> 2
+        3
+
         """

and running $ python dev.py smoke-docs -s interpolate, picks up a failure in CubicSpline.__call__ but does not fail in PPoly.__call__.

Is this related to our filtering of public objects which we apply on top of the standard library doctest finding behavior?
Does not seem to be: running $ python dev.py smoke-docs -s interpolate -- --doctest-collect=None which should use the standard doctest finding strategy still does not pick up PPoly.__call__.

An intermediate conclusion: looks like the root problem is the behavior of the standard library doctest module. Whether this is a bug or a feature is a separate question; whether there is a switch we can turn or a workaround we can apply --- don't know.

@mdhaber
Copy link
Contributor Author

mdhaber commented Dec 11, 2024

Oh, actually, the problem is not in the class docstring, what's missing are examples from docstrings of methods, is this correct?

Yes. The class-level examples seem to run. One emitted a warning, causing a refguide check failure. So it's just the documentation of the methods.

@ev-br
Copy link
Member

ev-br commented Dec 11, 2024

Okay, it's a scipy-doctest bug, scipy/scipy_doctest#177

Apparently, our doctest finding algorithm misses docstrings of methods defined in private parent classes.
The bug has been there forever--literally, from the very first commit to what became scipy-doctest.

Why we did not notice is --- most likely, this is the first time in the last decade somebody wanted to inherit a method with a docstring example from a private parent class.

Possible ways out:

@mdhaber
Copy link
Contributor Author

mdhaber commented Dec 11, 2024

@steppi do you use dev.py? If so, can you try @ev-br's suggestion for testing the method docstrings in this branch? Once we get that passing, I think we can call it good from this PR's perspective, and hopefully scipy/scipy_doctest#177 will get resolved eventually (or we'll make ProbabilityDistribution public).

@steppi
Copy link
Contributor

steppi commented Dec 11, 2024

@steppi do you use dev.py? If so, can you try @ev-br's suggestion for testing the method docstrings in this branch? Once we get that passing, I think we can call it good from this PR's perspective, and hopefully scipy/scipy_doctest#177 will get resolved eventually (or we'll make ProbabilityDistribution public).

Yes, I use dev.py, I'll try it out

@steppi
Copy link
Contributor

steppi commented Dec 11, 2024

I observed one failure for _probability_distribution.py

______ [doctest] scipy.stats._probability_distribution._ProbabilityDistribution.median _______
408         Examples
409         --------
410         Instantiate a distribution with the desired parameters:
411 
412         >>> from scipy import stats
413         >>> X = stats.Uniform(a=0., b=10.)
414 
415         Compute the median:
416 
417         >>> X.median()
Expected:
    5
Got:
    np.float64(5.0)

/home/birbir/scipy/build-install/lib/python3.12/site-packages/scipy/stats/_probability_distribution.py:417: DocTestFailure
================================== short test summary info ===================================
FAILED scipy/stats/_probability_distribution.py::scipy.stats._probability_distribution._ProbabilityDistribution.median                         
================================ 1 failed, 21 passed in 0.49s ================================

Two for _distribution_infrastructure.py

scipy/stats/_distribution_infrastructure.py ..FF......                                 [100%]

========================================== FAILURES ==========================================
_______ [doctest] scipy.stats._distribution_infrastructure.OrderStatisticDistribution ________
4215     order statistic and compare with a normalized histogram from simulation.
4216 
4217     >>> import numpy as np
4218     >>> import matplotlib.pyplot as plt
4219     >>> from scipy import stats
4220     >>>
4221     >>> X = stats.Normal()
4222     >>> data = X.sample(shape=(10000, 5))
4223     >>> ranks = np.sort(data, axis=1)
4224     >>> Y = stats.OrderStatisticDistribution(X, r=4, n=5)
UNEXPECTED EXCEPTION: AttributeError("module 'scipy.stats' has no attribute 'OrderStatisticDistribution'")
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/doctest.py", line 1361, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest scipy.stats._distribution_infrastructure.OrderStatisticDistribution[6]>", line 1, in <module>
AttributeError: module 'scipy.stats' has no attribute 'OrderStatisticDistribution'
/home/birbir/scipy/build-install/lib/python3.12/site-packages/scipy/stats/_distribution_infrastructure.py:4224: UnexpectedException
________________ [doctest] scipy.stats._distribution_infrastructure._log1mexp ________________
1168 
1169     Returns
1170     -------
1171     y : ndarray
1172         An array of the same shape as `x`.
1173 
1174     Examples
1175     --------
1176     >>> import numpy as np
1177     >>> from scipy.special import log1m
UNEXPECTED EXCEPTION: ImportError("cannot import name 'log1m' from 'scipy.special' (/home/birbir/scipy/build-install/lib/python3.12/site-packages/scipy/special/__init__.py)")
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/doctest.py", line 1361, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest scipy.stats._distribution_infrastructure._log1mexp[1]>", line 1, in <module>
ImportError: cannot import name 'log1m' from 'scipy.special' (/home/birbir/scipy/build-install/lib/python3.12/site-packages/scipy/special/__init__.py)
/home/birbir/scipy/build-install/lib/python3.12/site-packages/scipy/stats/_distribution_infrastructure.py:1177: UnexpectedException
================================== short test summary info ===================================
FAILED scipy/stats/_distribution_infrastructure.py::scipy.stats._distribution_infrastructure.OrderStatisticDistribution                           
FAILED scipy/stats/_distribution_infrastructure.py::scipy.stats._distribution_infrastructure._log1mexp                                            
================================ 2 failed, 8 passed in 0.67s =================================

no failures for _new_distributions.py

@mdhaber
Copy link
Contributor Author

mdhaber commented Dec 11, 2024

OK, the failures in _distribution_infrastructure.py are ignorable because they are in the documentation of truly private functions. I guess the only other one fails because it needs a . to make 5 $\rightarrow$ 5.. I'll do that with [skip ci], and then I think that's all I had for this one. Anything else to do here, @steppi?

Copy link
Contributor

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I've confirmed that

python dev.py smoke-docs -t scipy/stats/_probability_distribution.py

passes now.

@steppi steppi merged commit a803070 into scipy:main Dec 11, 2024
tylerjereddy pushed a commit to tylerjereddy/scipy that referenced this pull request Dec 13, 2024
…en doctests by making `stats.Uniform` public again (scipy#22027)

* MAINT: stats.ContinuousDistribution: revert to dynamic doc generation

* ENH: stats.Uniform: add uniform distribution

* BUILD: stats: remove reference to eliminated _new_distribution_docs.json

* MAINT: stats.Uniform: improvements

* MAINT: integrate.tanhsinh: fix regression

* DOC: stats._distribution_infrastructure: fix doctests

[skip ci]
@tylerjereddy tylerjereddy mentioned this pull request Dec 13, 2024
5 tasks
@tylerjereddy tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants