Skip to content

DOC: io.rst description and code inconsistent, plus the description is for deprecated behaviour #60705

Open
@wjandrea

Description

@wjandrea

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/dev/user_guide/io.html#reading-html-content

Read in the content of the file from the above URL and pass it to read_html as a string:

In [317]: html_str = """
   .....:          <table>
   .....:              <tr>
   .....:                  <th>A</th>
   .....:                  <th colspan="1">B</th>
   .....:                  <th rowspan="1">C</th>
   .....:              </tr>
   .....:              <tr>
   .....:                  <td>a</td>
   .....:                  <td>b</td>
   .....:                  <td>c</td>
   .....:              </tr>
   .....:          </table>
   .....:      """
   .....: 

In [318]: with open("tmp.html", "w") as f:
   .....:     f.write(html_str)
   .....: 

In [319]: df = pd.read_html("tmp.html")

In [320]: df[0]
Out[320]: 
   A  B  C
0  a  b  c

Documentation problems

Problem 1

The "above URL" is

url = 'https://www.sump.org/notes/request/' # HTTP request reflector

but data from that URL is not what's used in the code.

Problem 2

"pass it to read_html as a string" is not what's being demonstrated in the code.

Problem 3

read_html can take an HTML string, but that behaviour is deprecated, per its docs:

Deprecated since version 2.1.0: Passing html literal strings is deprecated. Wrap literal string/bytes input in io.StringIO/io.BytesIO instead.

Suggested fix for documentation

I'm not sure!

Activity

added
Needs TriageIssue that has not been reviewed by a pandas team member
on Jan 12, 2025
ShashwatAgrawal20

ShashwatAgrawal20 commented on Feb 2, 2025

@ShashwatAgrawal20
Contributor

https://www.sump.org/notes/request/ is down so that example shouldn't be there on the first place.

Image

@WillAyd @rhshadrach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      DOC: io.rst description and code inconsistent, plus the description is for deprecated behaviour · Issue #60705 · pandas-dev/pandas