Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow fsspec URLs in open_(mf)dataset #4823

Merged
merged 24 commits into from
Feb 16, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
run black
  • Loading branch information
Martin Durant committed Jan 18, 2021
commit bb4174e99e4ca93ea6d59cb1a4dcb76f0bbf03b1
6 changes: 4 additions & 2 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -882,11 +882,13 @@ def open_mfdataset(
if isinstance(paths, str):
if is_remote_uri(paths):
from fsspec.core import get_fs_token_paths

# get_fs_token_paths also allows arguments embedded in URLs
fs, _, _ = get_fs_token_paths(
paths, mode='rb',
paths,
mode="rb",
storage_options=kwargs.get("backend_kwargs", {}).get("storage_options"),
expand=False
expand=False,
)
paths = fs.glob(fs._strip_protocol(paths))
paths = [fs.get_mapper(path) for path in paths]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky. This assumes the backend is to want a mapper object (as the zarr backend does). But, what if the glob returns a list of netcdf files? Wouldn't we want a list of file objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is my comment for "should we actually special case zarr". It could make files - for now it would just error. We don't have tests for this, though, but now might be the time to start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now tracking with the comments above, I think we have two options:

  1. inject some backend specific logic in the api here to make a decision about what sort of object to return (if engine=='zarr', return mapper; else return file_obj)
  2. only support globbing remote zarr stores

(1) seems to be the more reasonable thing to do here but is slightly less principled as we've been working to cleanly separate the api from the backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose a third alternative might be to pass the paths through, and create mappers in the zarr backend (will re-instantiate the FS, but that's fine) and add the opening of remote files into each of the other backends that can handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have essentially done 1), but excluded HTTP for the non-zarr path, because it has a special place for some backends (dap...). In any case, I don't suppose anyone is using globbing with http, since it's generally unreliable.

Expand Down
10 changes: 7 additions & 3 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -296,8 +296,12 @@ def open_group(
if isinstance(store, pathlib.Path):
store = os.fspath(store)

open_kwargs = dict(mode=mode, synchronizer=synchronizer, path=group,
storage_options=storage_options)
open_kwargs = dict(
mode=mode,
synchronizer=synchronizer,
path=group,
storage_options=storage_options,
)
if chunk_store:
open_kwargs["chunk_store"] = chunk_store

Expand Down Expand Up @@ -645,7 +649,7 @@ def open_zarr(
"consolidated": consolidated,
"overwrite_encoded_chunks": overwrite_encoded_chunks,
"chunk_store": chunk_store,
"storage_options": storage_options
"storage_options": storage_options,
}

ds = open_dataset(
Expand Down