Skip to content

Commit

Permalink
docs: move BigQuery to user docs (pypi#17162)
Browse files Browse the repository at this point in the history
* docs: move BigQuery to user docs

Signed-off-by: William Woodruff <william@trailofbits.com>

* docs: APIs and Datasets

Signed-off-by: William Woodruff <william@trailofbits.com>

---------

Signed-off-by: William Woodruff <william@trailofbits.com>
  • Loading branch information
woodruffw authored Nov 25, 2024
1 parent c863541 commit 1587fe2
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 30 deletions.
24 changes: 4 additions & 20 deletions docs/dev/api-reference/bigquery-datasets.rst
Original file line number Diff line number Diff line change
@@ -1,26 +1,10 @@
BigQuery Datasets
=================

We use BigQuery to serve our public datasets. PyPI offers two tables whose
data is sourced from projects on PyPI. The tables and its pertaining data are licensed
under the `Creative Commons License <https://creativecommons.org/licenses/by/4.0/>`_.
.. important::

Download Statistics Table
-------------------------
This API documentation has been migrated to a new page in
the `user documentation <https://docs.pypi.org/>`_:

The download statistics table allows you learn more about downloads patterns of
packages hosted on PyPI. This table is populated through the `Linehaul
project <https://github.com/pypa/linehaul-cloud-function/>`_ by streaming download logs from PyPI
to BigQuery. For more information on analyzing PyPI package downloads, see the `Python
Package Guide <https://packaging.python.org/guides/analyzing-pypi-package-downloads/>`_
* `BigQuery Datasets <https://docs.pypi.org/api/bigquery/>`_

Project Metadata Table
----------------------

We also have a table that provides access to distribution metadata
as outlined by the `core metadata specifications <https://packaging.python.org/specifications/core-metadata/>`_.
The table is meant to be a data dump of metadata from every
release on PyPI, which means that the rows in this BigQuery table
are immutable and are not removed even if a release or project is deleted.
This data can be accessible under the
``bigquery-public-data.pypi.distribution_metadata`` public dataset on BigQuery.
3 changes: 2 additions & 1 deletion docs/mkdocs-user-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,10 @@ nav:
- "attestations/publish/v1.md"
- "attestations/security-model.md"
- "project_metadata.md"
- "API Reference":
- "APIs and Datasets":
- "api/index.md"
- "api/index-api.md"
- "api/upload.md"
- "api/integrity.md"
- "api/stats.md"
- "api/bigquery.md"
32 changes: 32 additions & 0 deletions docs/user/api/bigquery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# BigQuery Datasets

We use BigQuery to serve our public datasets. PyPI offers two tables whose
data is sourced from projects on PyPI. The tables and its pertaining data are licensed
under the [Creative Commons License].

## Download Statistics Table

*Table name*: `bigquery-public-data.pypi.file_downloads`

The download statistics table allows you learn more about downloads patterns of
packages hosted on PyPI.

This table is populated through the [Linehaul project] by streaming download
logs from PyPI to BigQuery. For more information on analyzing PyPI package
downloads, see the [Python Package Guide].

## Project Metadata Table

*Table name*: `bigquery-public-data.pypi.distribution_metadata`

We also have a table that provides access to distribution metadata
as outlined by the [core metadata specifications].

The table is meant to be a data dump of metadata from every
release on PyPI, which means that the rows in this BigQuery table
are immutable and are not removed even if a release or project is deleted.

[Creative Commons License]: https://creativecommons.org/licenses/by/4.0/
[Linehaul project]: https://github.com/pypa/linehaul-cloud-function/
[Python Package Guide]: https://packaging.python.org/guides/analyzing-pypi-package-downloads/
[core metadata specifications]: https://packaging.python.org/specifications/core-metadata/
20 changes: 11 additions & 9 deletions docs/user/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,20 @@

<!--[[ preview('user-api-docs') ]]-->

PyPI has several API endpoints, each of which is referenced in the table
of contents for this hierarchy.
PyPI has several API endpoints and public datasets, each of which is referenced
in the table of contents for this hierarchy.

## API policies

Please be aware of these PyPI API policies:
Please be aware of these PyPI API policies.

### Caching

All API requests are cached. Requests to the JSON, RSS or Legacy APIs are
All API requests are cached. Requests to the JSON, RSS or Index APIs are
cached by our CDN provider. You can determine if you've hit the cache based on
the ``X-Cache`` and ``X-Cache-Hits`` headers in the response.

Requests to the JSON, RSS and Legacy APIs also provide an ``ETag`` header. If
Requests to the JSON, RSS and Index APIs also provide an ``ETag`` header. If
you're making a lot of repeated requests, ensure your API consumer will respect
this header to determine whether to actually repeat a request or not.

Expand All @@ -39,14 +39,16 @@ suggestions:
(minutes). Generally PyPI can handle it, but it's preferred to make requests
in serial over a longer amount of time if possible.
* If your consumer is actually an organization or service that will be
downloading a lot of packages from PyPI, consider `using your own index
mirror or cache
<https://packaging.python.org/guides/index-mirrors-and-caches/>`_.
downloading a lot of packages from PyPI, consider
[using your own index mirror or cache].

### API Preference

For periodically checking for new packages or updates to existing packages,
use our RSS feeds.

No new integrations should use the XML-RPC APIs as they are planned for
deprecation. Existing consumers should migrate to JSON/RSS/Legacy APIs.
deprecation. Existing consumers should migrate to JSON/RSS/[Index APIs].

[Index APIs]: ./index-api.md
[using your own index mirror or cache]: https://packaging.python.org/guides/index-mirrors-and-caches/

0 comments on commit 1587fe2

Please sign in to comment.