Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #9

Merged
merged 4 commits into from
Jul 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,17 +81,15 @@ del cm["key"]

Each `cloud-mapping` keeps an internal dict of [etags](https://en.wikipedia.org/wiki/HTTP_ETag) which it uses to ensure it is only reading/overwriting/deleting data it expects to. If the value in storage is not what the `cloud-mapping` expects, a `cloudmappings.errors.KeySyncError()` will be thrown.

If you would like to enable read (get) operations without ensuring etags, you can set `read_blindly=True`. This can be set in the constructor, or dynamically turned on and off with `set_read_blindly(True)` and `set_read_blindly(False)` respectively. Blindly reading a value that doesn't exist in the cloud will return `None`.
If you would like to enable read (get) operations without ensuring etags, you can set `read_blindly=True`. This can be set in the constructor, or dynamically on the cloud-mapping instance. Blindly reading a value that doesn't exist in the cloud will return the mapping's current value of `read_blindly_default` (which itself defaults to `None`).

If you know what you are doing and you want an operation other than get to go through despite etags, you will need to sync your `cloud-mapping` with the cloud by calling either `.sync_with_cloud()` to sync all keys or `.sync_with_cloud(key_prefix)` to sync a specific key or subset of keys. By default `.sync_with_cloud()` is called on instantiation of a `cloud-mapping` if the underlying provider storage already exists. You may skip this initial sync by passing an additional `sync_initially=False` parameter when you instantiate your `cloud-mapping`.

The `etags` property on a `cloud-mapping` can be manually inspected and adjusted for advanced use cases, but it is not recommended if your use case can be accomplished with the above methods.

### Serialisation

If you don't call `.with_pickle()` and instead pass your providers configuration directly to the `CloudMapping` class, you will get a "raw" `cloud-mapping` which accepts only byte-likes as values. Along with the `.with_pickle()` serialisation utility, `.with_json()` and `.with_json_zlib()` also exist.

You may build your own serialisation either using [zict](https://zict.readthedocs.io/en/latest/); or by calling `.with_serialisers([dumps_1, dumps_2, ..., dumps_N], [loads_1, loads_2, ..., loads_N])`, where `dumps` and `loads` are the ordered functions to serialise and parse your data respectively.
If you don't call `.with_pickle()` and instead pass your providers configuration directly to the `CloudMapping` class, you will get a "raw" `cloud-mapping` which accepts only byte-likes as values. Along with the `.with_pickle()` serialisation utility, `.with_json()` and `.with_json_zlib()` also exist. You may build your own serialisation by constructing your cloud-mapping with `ordered_dumps_funcs=[dumps_1, dumps_2, ..., dumps_N]` and `ordered_loads_funcs=[loads_1, loads_2, ..., loads_N]`, where `dumps` and `loads` are the ordered functions to serialise and parse your data respectively.



Expand All @@ -117,6 +115,6 @@ Set environment variables for each provider:

Run tests with:
```bash
pytest --test_container_id <container-to-use-for-tests>
pytest --test_container_id <container-suffix-to-use-for-tests>
```
_* Note that if the container specified it is expected that one test will fail._
The testing container will be prefixed by "pytest", and the commit sha is used within build & release workflows. Note that if the container specified already exists one test will fail.
4 changes: 1 addition & 3 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = cloud-mappings
version = 0.10.0
version = 1.0.0
author = Lucas Sargent
author_email = lucas.sargent@eliiza.com.au
description = MutableMapping interfaces for common cloud storage providers
Expand All @@ -20,8 +20,6 @@ package_dir =
= src
packages = find:
python_requires = >=3.6
install_requires =
zict>=2.0

[options.extras_require]
azureblob = azure-identity==1.6.0; azure-storage-blob==12.8.1
Expand Down
240 changes: 96 additions & 144 deletions src/cloudmappings/cloudstoragemapping.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from functools import partial
from typing import Callable, Dict, List, MutableMapping
from typing import Any, Callable, Dict, List, MutableMapping

from .storageproviders.storageprovider import StorageProvider

Expand All @@ -10,49 +10,85 @@ class CloudMapping(MutableMapping):
Parameters
----------
storage_provider : StorageProvider
The storage provider to use as the backing for the cloud-mapping
The storage provider to use as the backing for the cloud-mapping.
sync_initially : bool, default=True
Whether to call `sync_with_cloud` initially
read_blindly : bool, default=False
Whether to read blindly or not by default. See `get_read_blindly` for more information
Whether to read blindly or not by default. See `read_blindly` attribute for more
information.
read_blindly_default : Any, default=None
The value to return when read_blindly is enabled and the key does not have
a value in the cloud.
ordered_dumps_funcs : List[Callable]
An ordered list of functions to pass values through before saving bytes to the cloud.
The last function must return a bytes-like object.
ordered_loads_funcs : List[Callable]
An ordered list of functions to pass values through before saving bytes to the cloud.
The first function must expect a bytes-like object as its input.
"""

Attributes
----------
etags : dict
A mapping of known keys to their expected etags.

Methods
-------
sync_with_cloud()
Synchronise the cloud-mapping with what is in the underlying cloud resource
get_read_blindly()
Get whether the cloud-mapping is currently set to read from the cloud blindly
set_read_blindly(read_blindly: bool)
Set whether the cloud-mapping should read from the cloud blindly or not
read_blindly: bool
""" Whether the cloud-mapping is currently set to read from the cloud blindly.

When read blindly is `False`, a cloud-mapping will raise a KeyError if a key that it
doesn't know about is accessed. If a key that it does know about is accessed but then
found to be out of sync with the cloud, a `cloudmappings.errors.KeySyncError` will be
raised.

When read blindly is `True`, a cloud-mapping will return the latest cloud version
for any key accessed, including keys it has no prior knowledge of (ie not in it's etag
dict). If there is no value for a key in the cloud, the current value of
`read_blindly_default` will be returned.

When read blindly is `True` a cloud-mapping will not raise `KeyValue` or
`cloudmappings.errors.KeySyncError` errors for read/get operations.

By default a cloud-mapping is instantiated with read blindly set to `False`.
"""

_etags: Dict[str, str]
read_blindly_default: Any
"""The value to return when read_blindly is `True` and the key does not have
a value in the cloud.
"""

def __init__(
self,
storage_provider: StorageProvider,
sync_initially: bool = True,
read_blindly: bool = False,
read_blindly_default: Any = None,
ordered_dumps_funcs: List[Callable] = None,
ordered_loads_funcs: List[Callable] = None,
) -> None:
"""A cloud-mapping, a `MutableMapping` implementation backed by common cloud storage solutions.

Parameters
----------
storage_provider : StorageProvider
The storage provider to use as the backing for the cloud-mapping
The storage provider to use as the backing for the cloud-mapping.
sync_initially : bool, default=True
Whether to call `sync_with_cloud` initially
read_blindly : bool, default=False
Whether to read blindly or not by default. See `get_read_blindly` for more information
Whether to read blindly or not by default. See `read_blindly` attribute for more
information
read_blindly_default : Any, default=None
The value to return when read_blindly is enabled and the key does not have
a value in the cloud
ordered_dumps_funcs : List[Callable], default=None
An ordered list of functions to pass values through before saving bytes to the cloud.
The last function must return a bytes-like object.
ordered_loads_funcs : List[Callable], default=None
An ordered list of functions to pass values through before saving bytes to the cloud.
The first function must expect a bytes-like object as its input.
"""
self._storage_provider = storage_provider
self._etags = {}
self._read_blindly = read_blindly
self._ordered_dumps_funcs = ordered_dumps_funcs if ordered_dumps_funcs is not None else []
self._ordered_loads_funcs = ordered_loads_funcs if ordered_loads_funcs is not None else []

self.read_blindly = read_blindly
self.read_blindly_default = read_blindly_default

if self._storage_provider.create_if_not_exists() and sync_initially:
self.sync_with_cloud()

Expand Down Expand Up @@ -98,64 +134,21 @@ def etags(self) -> Dict:
"""
return self._etags

def get_read_blindly(self) -> bool:
"""Get whether the cloud-mapping is currently set to read keys it doesn't know about
blindly or not.

When read blindly is `False`, a cloud-mapping will raise a KeyError if a key that it
doesn't know about is accessed. If a key that it does know about is accessed but then
found to be out of sync with the cloud, a `cloudmappings.errors.KeySyncError` will be
raised.

When read blindly is `True`, a cloud-mapping will return the latest cloud version
for any key accessed, including keys it has no prior knowledge of (ie not in it's etag
dict). If there is no value for a key in the cloud `None` will be returned.

When read blindly is `True` a cloud-mapping will not raise `KeyValue` or
`cloudmappings.errors.KeySyncError` errors for read/get operations.

By default a cloud-mapping is instantiated with read blindly set to `False`.

Returns
-------
bool
Current read blindly setting
"""
return self._read_blindly

def set_read_blindly(self, read_blindly: bool) -> None:
"""Set whether the cloud-mapping should read keys it doesn't know about blindly or
not.

When read blindly is `False`, a cloud-mapping will raise a KeyError if a key that it
doesn't know about is accessed. If a key that it does know about is accessed but then
found to be out of sync with the cloud, a `cloudmappings.errors.KeySyncError` will be
raised.

When read blindly is `True`, a cloud-mapping will return the latest cloud version
for any key accessed, including keys it has no prior knowledge of (ie not in it's etag
dict). If there is no value for a key in the cloud `None` will be returned.

When read blindly is `True` a cloud-mapping will not raise `KeyValue` or
`cloudmappings.errors.KeySyncError` errors for read/get operations.

By default a cloud-mapping is instantiated with read blindly set to `False`.

Parameters
----------
read_blindly : bool
The value to set read_blindly to
"""
self._read_blindly = read_blindly

def __getitem__(self, key: str) -> bytes:
if not self._read_blindly and key not in self._etags:
def __getitem__(self, key: str) -> Any:
if not self.read_blindly and key not in self._etags:
raise KeyError(key)
return self._storage_provider.download_data(
key=self._encode_key(key), etag=None if self._read_blindly else self._etags[key]
value = self._storage_provider.download_data(
key=self._encode_key(key), etag=None if self.read_blindly else self._etags[key]
)

def __setitem__(self, key: str, value: bytes) -> None:
if self.read_blindly and value is None:
return self.read_blindly_default
for loads in self._ordered_loads_funcs:
value = loads(value)
return value

def __setitem__(self, key: str, value: Any) -> None:
for dumps in self._ordered_dumps_funcs:
value = dumps(value)
self._etags[key] = self._storage_provider.upload_data(
key=self._encode_key(key),
etag=self._etags.get(key, None),
Expand All @@ -182,63 +175,6 @@ def __len__(self) -> int:
def __repr__(self) -> str:
return f"cloudmapping<{self._storage_provider.logical_name()}>"

@classmethod
def with_serialisers(
cls,
ordered_dumps_funcs: List[Callable],
ordered_loads_funcs: List[Callable],
*args,
**kwargs,
) -> "CloudMapping":
"""Create a cloud-mapping instance with serialisation.

Creates a cloud-mapping which will pass all data input through the specified
`ordered_dumps_funcs` functions when setting, and inversely runs the bytes from the
cloud through the specified `ordered_loads_funcs` functions when getting.

Uses `zict` internally: https://zict.readthedocs.io/en/latest/

Parameters
----------
ordered_dumps_funcs : List[Callable]
An ordered list of functions to pass values through before saving bytes to the cloud.
The last function must return a bytes-like object.
ordered_loads_funcs : List[Callable]
An ordered list of functions to pass values through before saving bytes to the cloud.
The first function must expect a bytes-like object as its input.
*args : tuple, optional
Additional positional arguments to pass to the CloudMapping constructor
**kwargs : dict, optional
Additional keyword arguments to pass to the CloudMapping constructor

Raises
------
ValueError
If the number of `ordered_dumps_funcs` does not match the number of
`ordered_loads_funcs`

Returns
-------
CloudMapping
A new cloud-mapping setup with the specified serialisation functions
"""
from zict import Func

if len(ordered_dumps_funcs) != len(ordered_loads_funcs):
raise ValueError("Must have an equal number of dumps functions as loads functions")

raw_mapping = cls(*args, **kwargs)
mapping = raw_mapping

for dump, load in zip(ordered_dumps_funcs[::-1], ordered_loads_funcs):
mapping = Func(dump, load, mapping)

mapping.sync_with_cloud = raw_mapping.sync_with_cloud
mapping.etags = raw_mapping.etags
mapping.get_read_blindly = raw_mapping.get_read_blindly
mapping.set_read_blindly = raw_mapping.set_read_blindly
return mapping

@classmethod
def with_pickle(cls, *args, **kwargs) -> "CloudMapping":
"""Create a cloud-mapping instance that pickles values using pythons `pickle`
Expand All @@ -257,7 +193,13 @@ def with_pickle(cls, *args, **kwargs) -> "CloudMapping":
"""
import pickle

return cls.with_serialisers([pickle.dumps], [pickle.loads], *args, **kwargs)
kwargs.update(
dict(
ordered_dumps_funcs=[pickle.dumps],
ordered_loads_funcs=[pickle.loads],
)
)
return cls(*args, **kwargs)

@classmethod
def with_json(cls, encoding="utf-8", *args, **kwargs) -> "CloudMapping":
Expand All @@ -279,12 +221,13 @@ def with_json(cls, encoding="utf-8", *args, **kwargs) -> "CloudMapping":
"""
import json

return cls.with_serialisers(
[json.dumps, partial(bytes, encoding=encoding)],
[partial(str, encoding=encoding), json.loads],
*args,
**kwargs,
kwargs.update(
dict(
ordered_dumps_funcs=[partial(json.dumps, sort_keys=True), partial(bytes, encoding=encoding)],
ordered_loads_funcs=[partial(str, encoding=encoding), json.loads],
)
)
return cls(*args, **kwargs)

@classmethod
def with_json_zlib(cls, encoding="utf-8", *args, **kwargs) -> "CloudMapping":
Expand All @@ -310,9 +253,18 @@ def with_json_zlib(cls, encoding="utf-8", *args, **kwargs) -> "CloudMapping":
import json
import zlib

return cls.with_serialisers(
[json.dumps, partial(bytes, encoding=encoding), zlib.compress],
[zlib.decompress, partial(str, encoding=encoding), json.loads],
*args,
**kwargs,
kwargs.update(
dict(
ordered_dumps_funcs=[
partial(json.dumps, sort_keys=True),
partial(bytes, encoding=encoding),
zlib.compress,
],
ordered_loads_funcs=[
zlib.decompress,
partial(str, encoding=encoding),
json.loads,
],
)
)
return cls(*args, **kwargs)
12 changes: 12 additions & 0 deletions src/cloudmappings/errors.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
class KeySyncError(ValueError):
storage_provider_name: str
key: str
expected_etag: str

def __init__(self, storage_provider_name: str, key: str, etag: str) -> None:
self.storage_provider_name = storage_provider_name
self.key = key
self.expected_etag = etag
super().__init__(
f"Mapping is out of sync with cloud data.\n"
f"Cloud storage: '{storage_provider_name}'\n"
Expand All @@ -8,7 +15,12 @@ def __init__(self, storage_provider_name: str, key: str, etag: str) -> None:


class ValueSizeError(ValueError):
storage_provider_name: str
key: str

def __init__(self, storage_provider_name: str, key: str) -> None:
self.storage_provider_name = storage_provider_name
self.key = key
super().__init__(
f"Value is too big to fit in cloud.\n" f"Cloud storage: '{storage_provider_name}'\n" f"Key: '{key}'"
)
Loading