diff --git a/docs/dev/api-reference/bigquery-datasets.rst b/docs/dev/api-reference/bigquery-datasets.rst index fc58e401f891..431da8625890 100644 --- a/docs/dev/api-reference/bigquery-datasets.rst +++ b/docs/dev/api-reference/bigquery-datasets.rst @@ -1,26 +1,10 @@ BigQuery Datasets ================= -We use BigQuery to serve our public datasets. PyPI offers two tables whose -data is sourced from projects on PyPI. The tables and its pertaining data are licensed -under the `Creative Commons License `_. +.. important:: -Download Statistics Table -------------------------- + This API documentation has been migrated to a new page in + the `user documentation `_: -The download statistics table allows you learn more about downloads patterns of -packages hosted on PyPI. This table is populated through the `Linehaul -project `_ by streaming download logs from PyPI -to BigQuery. For more information on analyzing PyPI package downloads, see the `Python -Package Guide `_ + * `BigQuery Datasets `_ -Project Metadata Table ----------------------- - -We also have a table that provides access to distribution metadata -as outlined by the `core metadata specifications `_. -The table is meant to be a data dump of metadata from every -release on PyPI, which means that the rows in this BigQuery table -are immutable and are not removed even if a release or project is deleted. -This data can be accessible under the -``bigquery-public-data.pypi.distribution_metadata`` public dataset on BigQuery. diff --git a/docs/mkdocs-user-docs.yml b/docs/mkdocs-user-docs.yml index 47aa4e48a649..97cee34bf8b3 100644 --- a/docs/mkdocs-user-docs.yml +++ b/docs/mkdocs-user-docs.yml @@ -85,9 +85,10 @@ nav: - "attestations/publish/v1.md" - "attestations/security-model.md" - "project_metadata.md" - - "API Reference": + - "APIs and Datasets": - "api/index.md" - "api/index-api.md" - "api/upload.md" - "api/integrity.md" - "api/stats.md" + - "api/bigquery.md" diff --git a/docs/user/api/bigquery.md b/docs/user/api/bigquery.md new file mode 100644 index 000000000000..a8de7595d329 --- /dev/null +++ b/docs/user/api/bigquery.md @@ -0,0 +1,32 @@ +# BigQuery Datasets + +We use BigQuery to serve our public datasets. PyPI offers two tables whose +data is sourced from projects on PyPI. The tables and its pertaining data are licensed +under the [Creative Commons License]. + +## Download Statistics Table + +*Table name*: `bigquery-public-data.pypi.file_downloads` + +The download statistics table allows you learn more about downloads patterns of +packages hosted on PyPI. + +This table is populated through the [Linehaul project] by streaming download +logs from PyPI to BigQuery. For more information on analyzing PyPI package +downloads, see the [Python Package Guide]. + +## Project Metadata Table + +*Table name*: `bigquery-public-data.pypi.distribution_metadata` + +We also have a table that provides access to distribution metadata +as outlined by the [core metadata specifications]. + +The table is meant to be a data dump of metadata from every +release on PyPI, which means that the rows in this BigQuery table +are immutable and are not removed even if a release or project is deleted. + +[Creative Commons License]: https://creativecommons.org/licenses/by/4.0/ +[Linehaul project]: https://github.com/pypa/linehaul-cloud-function/ +[Python Package Guide]: https://packaging.python.org/guides/analyzing-pypi-package-downloads/ +[core metadata specifications]: https://packaging.python.org/specifications/core-metadata/ diff --git a/docs/user/api/index.md b/docs/user/api/index.md index 2678e9189d88..948bd4f27a39 100644 --- a/docs/user/api/index.md +++ b/docs/user/api/index.md @@ -2,20 +2,20 @@ -PyPI has several API endpoints, each of which is referenced in the table -of contents for this hierarchy. +PyPI has several API endpoints and public datasets, each of which is referenced +in the table of contents for this hierarchy. ## API policies -Please be aware of these PyPI API policies: +Please be aware of these PyPI API policies. ### Caching -All API requests are cached. Requests to the JSON, RSS or Legacy APIs are +All API requests are cached. Requests to the JSON, RSS or Index APIs are cached by our CDN provider. You can determine if you've hit the cache based on the ``X-Cache`` and ``X-Cache-Hits`` headers in the response. -Requests to the JSON, RSS and Legacy APIs also provide an ``ETag`` header. If +Requests to the JSON, RSS and Index APIs also provide an ``ETag`` header. If you're making a lot of repeated requests, ensure your API consumer will respect this header to determine whether to actually repeat a request or not. @@ -39,9 +39,8 @@ suggestions: (minutes). Generally PyPI can handle it, but it's preferred to make requests in serial over a longer amount of time if possible. * If your consumer is actually an organization or service that will be - downloading a lot of packages from PyPI, consider `using your own index - mirror or cache - `_. + downloading a lot of packages from PyPI, consider + [using your own index mirror or cache]. ### API Preference @@ -49,4 +48,7 @@ For periodically checking for new packages or updates to existing packages, use our RSS feeds. No new integrations should use the XML-RPC APIs as they are planned for -deprecation. Existing consumers should migrate to JSON/RSS/Legacy APIs. +deprecation. Existing consumers should migrate to JSON/RSS/[Index APIs]. + +[Index APIs]: ./index-api.md +[using your own index mirror or cache]: https://packaging.python.org/guides/index-mirrors-and-caches/