Open
Description
Folks, especially @Mikejmnez , I'm trying to get oceanspy
to load the new datasets from SciServer-ceph. I've:
- Transferred data to ceph,.
- Mitya has provisioned new data volumes Poseidon-ceph and oceanography-ceph in the Grendel domain.
- Forked oceanspy to work on the updated intake catalog code. See my ceph-dev branch and
sciserver_catalogs/catalog_xarry.yaml
. - Installed
intake
. - Tried to open a ceph dataset and hit this:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [2], line 1
----> 1 od = ospy.open_oceandataset.from_catalog("get_started")
File ~/workspace/Storage/Thomas.Haine/persistent/Poseidon testing/ceph-dev/oceanspy/oceanspy/open_oceandataset.py:138, in from_catalog(name, catalog_url)
133 for entry in entries:
134 if intake_switch:
135 # Use intake-xarray
136
137 # Pop metadata
--> 138 mtdt = cat[entry].metadata
140 # Create ds
141 ds = cat[entry].to_dask()
File ~/mambaforge/envs/py39/lib/python3.9/site-packages/intake/catalog/base.py:472, in Catalog.__getitem__(self, key)
463 """Return a catalog entry by name.
464
465 Can also use attribute syntax, like ``cat.entry_name``, or
(...)
468 cat['name1', 'name2']
469 """
470 if not isinstance(key, list) and key in self:
471 # triggers reload_on_change
--> 472 s = self._get_entry(key)
473 if s.container == "catalog":
474 s.name = key
File ~/mambaforge/envs/py39/lib/python3.9/site-packages/intake/catalog/utils.py:43, in reload_on_change.<locals>.wrapper(self, *args, **kwargs)
40 @functools.wraps(f)
41 def wrapper(self, *args, **kwargs):
42 self.reload()
---> 43 return f(self, *args, **kwargs)
File ~/mambaforge/envs/py39/lib/python3.9/site-packages/intake/catalog/base.py:355, in Catalog._get_entry(self, name)
353 ups = [up for name, up in self.user_parameters.items() if name not in up_names]
354 entry._user_parameters = ups + (entry._user_parameters or [])
--> 355 return entry()
File ~/mambaforge/envs/py39/lib/python3.9/site-packages/intake/catalog/entry.py:60, in CatalogEntry.__call__(self, persist, **kwargs)
58 def __call__(self, persist=None, **kwargs):
59 """Instantiate DataSource with given user arguments"""
---> 60 s = self.get(**kwargs)
61 s._entry = self
62 s._passed_kwargs = list(kwargs)
File ~/mambaforge/envs/py39/lib/python3.9/site-packages/intake/catalog/local.py:312, in LocalCatalogEntry.get(self, **user_parameters)
309 if not user_parameters and self._default_source is not None:
310 return self._default_source
--> 312 plugin, open_args = self._create_open_args(user_parameters)
313 data_source = plugin(**open_args)
314 data_source.catalog_object = self._catalog
File ~/mambaforge/envs/py39/lib/python3.9/site-packages/intake/catalog/local.py:283, in LocalCatalogEntry._create_open_args(self, user_parameters)
273 open_args = merge_pars(
274 params,
275 user_parameters,
(...)
279 client=False,
280 )
282 if len(self._plugin) == 0:
--> 283 raise ValueError(
284 "No plugins loaded for this entry: %s\n"
285 "A listing of installable plugins can be found "
286 "at https://intake.readthedocs.io/en/latest/plugin"
287 "-directory.html ." % self._driver
288 )
289 elif isinstance(self._plugin, list):
290 plugin = self._plugin[0]
ValueError: No plugins loaded for this entry: netcdf
A listing of installable plugins can be found at https://intake.readthedocs.io/en/latest/plugin-directory.html .
I'm confused because netCDF4
is installed. Any ideas on how to fix/what to do next?
Metadata
Metadata
Assignees
Labels
No labels
Activity
Mikejmnez commentedon Jun 24, 2024
Hey @ThomasHaine - confusing. Some quick questions to help me understand:
Intake
andintake-xarray
are both needed .For example, in my Oceanography env I have Intake v2.0.3
and intake-xarray0.7.0
.ThomasHaine commentedon Jun 24, 2024
Thanks @Mikejmnez. Good point about the environment. It wasn't properly installed. To fix it, I ran:
Then
conda info --envs
gives:Now I select the
Oceanography-ceph
kernel for my notebook. It still errors with:This confuses me because this path has been replaced in
sciserver_catalog/catalog_xarray.yaml
.ThomasHaine commentedon Jun 25, 2024
OK, some progress: The
.yaml
catalogs are hard-coded inopen_oceandataset.py
and by default read the main stable release. Override the default like this:Now it's reading the ceph directory,
Mikejmnez commentedon Jun 25, 2024
Just catching up. That makes sense. Another alternative is to create your own
yaml
catalog withcatalog_url
and use that. I usually go around this way since there is no need to undo the changes tooceanspy
. Just make sure to reverse the change when you're ready to push onto main branch (PR).ThomasHaine commentedon Jun 26, 2024
Sounds good. Do you suggest I create (e.g.)
catalog_xarray-ceph.yaml
andcatalog_xmitgcm-ceph.yaml
and a newsciserver-ceph
dataset
indatasets_list.yaml
? Then we can add the new data sources inopen_oceandataset.py
(I might need some help with this bit!).Mikejmnez commentedon Jun 26, 2024
No, I think the way you were doing it was appropriate. You are essentially migrating the data to ceph and that requires updating the access pattern. Once you push your changes to a new PR and before merging, we should restore how
open_oceandataset.py
reads from main. That is, replaceceph-dev
withmain
belowWere you able to read the datasets from ceph?
ThomasHaine commentedon Jun 26, 2024
Sounds good. But we should maintain the original (filedb) functionality too, at least for a while. What's the easiest way to keep both access methods functional at the same time?
Yes, I can read the datasets from ceph. I've copied several (no LLC4320 or DYAMOND yet), and will test in the next few days.
ThomasHaine commentedon Jun 27, 2024
Actually, I can't read all the datasets. For
IGPwinter
,EGshelfIIseas2km_ASR_{crop,full}
, andEGshelfIIseas2km_ERAI_{6H,1D}
I get this error:Any ideas what's going on?