Skip to content

Commit

Permalink
add get_blindly
Browse files Browse the repository at this point in the history
  • Loading branch information
JJ11teen committed Jun 26, 2021
1 parent 4b4e712 commit f430a80
Show file tree
Hide file tree
Showing 11 changed files with 158 additions and 49 deletions.
41 changes: 38 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,14 +74,49 @@ del cm["key"]
"key" in cm # returns false
```

### Cloud Sync
### Etags

Each `cloud-mapping` keeps an internal dict of [etags](https://en.wikipedia.org/wiki/HTTP_ETag) which it uses to ensure it is only reading/overwriting/deleting data it expects to. If the value in storage is not what the `cloud-mapping` expects, a `cloudmappings.errors.KeySyncError()` will be thrown. If you know what you are doing and want your operation to go through anyway, you will need to sync your `cloud-mapping` with the cloud by calling either `.sync_with_cloud()` to sync all keys or `.sync_with_cloud(key)` to sync a specific key. By default `.sync_with_cloud()` is called on instantiation of a `cloud-mapping` if the underlying provider storage already exists. You may skip this initial sync by passing an additional `sync_initially=False` parameter when you instantiate your `cloud-mapping`.
Each `cloud-mapping` keeps an internal dict of [etags](https://en.wikipedia.org/wiki/HTTP_ETag) which it uses to ensure it is only reading/overwriting/deleting data it expects to. If the value in storage is not what the `cloud-mapping` expects, a `cloudmappings.errors.KeySyncError()` will be thrown.

If you would like to enable get (read) operations without ensuring etags, you can set `get_blindly=True`. This can be set in the constructor, or dynamically turned on and off directly on the `cloud-mapping` instance. Blindly getting a value that doesn't exist in the cloud will return `None`.

If you know what you are doing and you want an operation other than get to go through despite etags, you will need to sync your `cloud-mapping` with the cloud by calling either `.sync_with_cloud()` to sync all keys or `.sync_with_cloud(key_prefix)` to sync a specific key or subset of keys. By default `.sync_with_cloud()` is called on instantiation of a `cloud-mapping` if the underlying provider storage already exists. You may skip this initial sync by passing an additional `sync_initially=False` parameter when you instantiate your `cloud-mapping`.

### Serialisation

If you don't call `.with_pickle()` and instead pass your providers configuration directly to the `CloudMapping` class, you will get a "raw" `cloud-mapping` which accepts only byte-likes as values. Along with the `.with_pickle()` serialisation utility, `.with_json()` and `.with_json_zlib()` also exist.

You may build your own serialisation either using [zict](https://zict.readthedocs.io/en/latest/); or by calling `.with_buffers([dumps_1, dumps_2, ..., dumps_N], [loads_1, loads_2, ..., loads_N])`, where `dumps` and `loads` are the ordered functions to serialise and parse your data respectively.

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)




# Development

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

This project uses `.devcontainer` to describe the environment to use for development. You may use the environment described in this directory (it integrates automatically with vscode's 'remote containers' extension), or you may create your own environment with the same dependencies.

## Dependencies
Install development dependencies with:

`pip install .[azureblob,azuretable,gcpstorage,awss3,tests]`

## Tests
Set environment variables for each provider:
* Azure Blob: `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`
* Azure Table: `AZURE_TABLE_STORAGE_CONNECTION_STRING`
* GCP Storage: `GOOGLE_APPLICATION_CREDENTIALS` (path to credentials file)
* AWS S3: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`

Run tests with:
```bash
pytest
--azure_blob_storage_account_url <azure-blob-storage-account-url>
--azure_table
--gcp_storage_project <gcp-project-id>
--aws_s3
--test_container_id <unique-test-run-id>
```
You can turn on/off tests for individual providers by including/excluding their parameters in the above command. `--test_container_id` is always required.
4 changes: 2 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = cloud-mappings
version = 0.8.0
version = 0.9.0
author = Lucas Sargent
author_email = lucas.sargent@eliiza.com.au
description = MutableMapping interfaces for common cloud storage providers
Expand All @@ -25,7 +25,7 @@ install_requires =

[options.extras_require]
azureblob = azure-identity==1.6.0; azure-storage-blob==12.8.1
azuretable = azure-identity==1.6.0; azure-data-tables==12.0.0b7
azuretable = azure-identity==1.6.0; azure-data-tables==12.0.0
gcpstorage = google-cloud-storage==1.38.0
awss3 = boto3==1.17.73
tests = pytest==6.2.4
Expand Down
17 changes: 10 additions & 7 deletions src/cloudmappings/cloudstoragemapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ def __init__(
self,
storage_provider: StorageProvider,
sync_initially: bool = True,
get_blindly: bool = False,
) -> None:
self._storage_provider = storage_provider
self._etags = {}
self.get_blindly = get_blindly
if self._storage_provider.create_if_not_exists() and sync_initially:
self.sync_with_cloud()

Expand All @@ -22,12 +24,12 @@ def _encode_key(self, unsafe_key: str) -> str:
raise TypeError("Key must be of type 'str'. Got key:", unsafe_key)
return self._storage_provider.encode_key(unsafe_key=unsafe_key)

def sync_with_cloud(self, key: str = None) -> None:
prefix_key = self._encode_key(key) if key is not None else None
def sync_with_cloud(self, key_prefix: str = None) -> None:
key_prefix = None if key_prefix is None else self._encode_key(key_prefix)
self._etags.update(
{
self._storage_provider.decode_key(k): i
for k, i in self._storage_provider.list_keys_and_etags(prefix_key).items()
for k, i in self._storage_provider.list_keys_and_etags(key_prefix).items()
}
)

Expand All @@ -36,13 +38,13 @@ def etags(self):
return self._etags

def __getitem__(self, key: str) -> bytes:
if key not in self._etags:
if not self.get_blindly and key not in self._etags:
raise KeyError(key)
return self._storage_provider.download_data(key=self._encode_key(key), etag=self._etags[key])
return self._storage_provider.download_data(
key=self._encode_key(key), etag=None if self.get_blindly else self._etags[key]
)

def __setitem__(self, key: str, value: bytes) -> None:
if not isinstance(value, bytes):
raise ValueError("Value must be bytes like")
self._etags[key] = self._storage_provider.upload_data(
key=self._encode_key(key),
etag=self._etags.get(key, None),
Expand Down Expand Up @@ -84,6 +86,7 @@ def with_buffers(cls, input_buffers, output_buffers, *args, **kwargs) -> "CloudM

mapping.sync_with_cloud = raw_mapping.sync_with_cloud
mapping.etags = raw_mapping.etags
mapping.get_blindly = raw_mapping.get_blindly
return mapping

@classmethod
Expand Down
4 changes: 4 additions & 0 deletions src/cloudmappings/storageproviders/awss3.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,13 @@ def download_data(self, key: str, etag: str) -> bytes:
body, existing_etag, _ = self._get_body_etag_version_id_if_exists(key)
if etag is not None and (body is None or etag != existing_etag):
self.raise_key_sync_error(key=key, etag=etag)
if body is None:
return None
return body.read()

def upload_data(self, key: str, etag: str, data: bytes) -> str:
if not isinstance(data, bytes):
raise ValueError("Data must be bytes like")
_, existing_etag, _ = self._get_body_etag_version_id_if_exists(key)
if etag != existing_etag:
self.raise_key_sync_error(key=key, etag=etag)
Expand Down
32 changes: 23 additions & 9 deletions src/cloudmappings/storageproviders/azureblobstorage.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import json

from azure.core import MatchConditions
from azure.core.exceptions import ResourceExistsError, ResourceModifiedError
from azure.core.exceptions import ResourceExistsError, ResourceModifiedError, ResourceNotFoundError
from azure.storage.blob import ContainerClient
from azure.identity import DefaultAzureCredential

Expand Down Expand Up @@ -39,21 +39,35 @@ def create_if_not_exists(self):
return False

def download_data(self, key: str, etag: str) -> bytes:
args = dict(blob=key)
if etag is not None:
args.update(
dict(
etag=etag,
match_condition=MatchConditions.IfNotModified,
)
)
try:
return self._container_client.download_blob(
blob=key,
etag=etag,
match_condition=MatchConditions.IfNotModified if etag is not None else None,
).readall()
return self._container_client.download_blob(**args).readall()
except ResourceModifiedError as e:
self.raise_key_sync_error(key=key, etag=etag, inner_exception=e)
except ResourceNotFoundError as e:
if etag is None:
return None
self.raise_key_sync_error(key=key, etag=etag, inner_exception=e)

def upload_data(self, key: str, etag: str, data: bytes) -> str:
if not isinstance(data, bytes):
raise ValueError("Data must be bytes like")
expecting_blob = etag is not None
args = {"overwrite": expecting_blob}
args = dict(overwrite=expecting_blob)
if expecting_blob:
args["etag"] = etag
args["match_condition"] = MatchConditions.IfNotModified
args.update(
dict(
etag=etag,
match_condition=MatchConditions.IfNotModified,
)
)
bc = self._container_client.get_blob_client(blob=key)
try:
response = bc.upload_blob(
Expand Down
26 changes: 17 additions & 9 deletions src/cloudmappings/storageproviders/azuretablestorage.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from urllib.parse import quote, unquote

from azure.core import MatchConditions
from azure.core.exceptions import ResourceExistsError, HttpResponseError
from azure.core.exceptions import ResourceExistsError, HttpResponseError, ResourceNotFoundError
from azure.data.tables import TableClient, UpdateMode

from .storageprovider import StorageProvider
Expand All @@ -16,7 +16,7 @@ def _chunk_bytes(data: bytes) -> Dict[str, bytes]:


def _dechunk_entity(entity: Dict[str, bytes]) -> bytes:
return b"".join([v.value for k, v in entity.items() if k.startswith("d_")])
return b"".join([v for k, v in entity.items() if k.startswith("d_")])


class AzureTableStorageProvider(StorageProvider):
Expand Down Expand Up @@ -59,15 +59,23 @@ def create_if_not_exists(self):
return False

def download_data(self, key: str, etag: str) -> bytes:
entity = self._table_client.get_entity(
partition_key=key,
row_key="cm",
)
if etag is not None and etag != entity.metadata["etag"]:
self.raise_key_sync_error(key=key, etag=etag)
return _dechunk_entity(entity)
try:
entity = self._table_client.get_entity(
partition_key=key,
row_key="cm",
)
except ResourceNotFoundError as e:
if etag is None:
return None
self.raise_key_sync_error(key=key, etag=etag, inner_exception=e)
else:
if etag is not None and etag != entity.metadata["etag"]:
self.raise_key_sync_error(key=key, etag=etag)
return _dechunk_entity(entity)

def upload_data(self, key: str, etag: str, data: bytes) -> str:
if not isinstance(data, bytes):
raise ValueError("Data must be bytes like")
entity = {
"PartitionKey": key,
"RowKey": "cm",
Expand Down
6 changes: 5 additions & 1 deletion src/cloudmappings/storageproviders/googlecloudstorage.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,17 @@ def download_data(self, key: str, etag: str) -> bytes:
blob_name=key,
)
existing_etag = self._parse_etag(b)
if etag != existing_etag:
if etag is not None and etag != existing_etag:
self.raise_key_sync_error(key=key, etag=etag)
if b is None:
return None
return b.download_as_bytes(
if_generation_match=b.generation,
)

def upload_data(self, key: str, etag: str, data: bytes) -> str:
if not isinstance(data, bytes):
raise ValueError("Data must be bytes like")
b = self._bucket.get_blob(
blob_name=key,
)
Expand Down
2 changes: 1 addition & 1 deletion src/cloudmappings/storageproviders/storageprovider.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def download_data(self, key: str, etag: str) -> bytes:
@abstractmethod
def upload_data(self, key: str, etag: str, data: bytes) -> str:
"""Upload data to cloud storage. Raise KeyCloudSyncError if etag does not match the latest
version in the cloud.
version in the cloud. Raise ValueError is data is not bytes.
:param etag: Expected etag if key already exists. Otherwise None
:return: Etag of newly uploaded data, as str.
"""
Expand Down
29 changes: 27 additions & 2 deletions tests/tests/1_storageproviders.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,24 @@ def test_data_is_stored(self, storage_provider: StorageProvider, test_id: str):
encoded_key = storage_provider.encode_key(key)

etag = storage_provider.upload_data(encoded_key, None, b"data")
data = storage_provider.download_data(encoded_key, etag)
assert storage_provider.download_data(encoded_key, etag) == b"data"

assert data == b"data"
def test_non_byte_values_error(self, storage_provider: StorageProvider, test_id: str):
key = test_id + "-non-bytes-error"
encoded_key = storage_provider.encode_key(key)

with pytest.raises(ValueError, match="must be bytes like"):
storage_provider.upload_data(encoded_key, None, True)
with pytest.raises(ValueError, match="must be bytes like"):
storage_provider.upload_data(encoded_key, None, 10)
with pytest.raises(ValueError, match="must be bytes like"):
storage_provider.upload_data(encoded_key, None, "string-data")
with pytest.raises(ValueError, match="must be bytes like"):
storage_provider.upload_data(encoded_key, None, [0, 1, 0, 1])
with pytest.raises(ValueError, match="must be bytes like"):
storage_provider.upload_data(encoded_key, None, {"or": "something more", "elaborate": True})
with pytest.raises(ValueError, match="must be bytes like"):
storage_provider.upload_data(encoded_key, None, None)

def test_keys_and_etags_are_listed(self, storage_provider: StorageProvider, test_id: str):
key_1 = test_id + "-keys-and-etags-list-test-1"
Expand Down Expand Up @@ -87,3 +102,13 @@ def test_etags_change_with_same_data(self, storage_provider: StorageProvider, te
second_etag = storage_provider.upload_data(encoded_key, first_etag, b"static-data")

assert first_etag != second_etag

def test_download_with_no_etag(self, storage_provider: StorageProvider, test_id: str):
key = test_id + "-download-with-no-etag"
encoded_key = storage_provider.encode_key(key)

# Assert no value returns None
assert storage_provider.download_data(encoded_key, None) is None
# Assert returns latest if value
storage_provider.upload_data(encoded_key, None, b"data")
assert storage_provider.download_data(encoded_key, None) == b"data"
25 changes: 10 additions & 15 deletions tests/tests/2_singlecloudmapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,21 +32,6 @@ def test_repr(self, storage_provider: StorageProvider):
else:
pytest.fail("Unknown provider repr")

def test_non_byte_values_error(self, storage_provider: StorageProvider, test_id: str):
cm = CloudMapping(storage_provider=storage_provider, sync_initially=False)
key = test_id + "non-bytes-error"

with pytest.raises(ValueError, match="must be bytes like"):
cm[key] = True
with pytest.raises(ValueError, match="must be bytes like"):
cm[key] = 10
with pytest.raises(ValueError, match="must be bytes like"):
cm[key] = "string-data"
with pytest.raises(ValueError, match="must be bytes like"):
cm[key] = [0, 1, 0, 1]
with pytest.raises(ValueError, match="must be bytes like"):
cm[key] = {"or": "something more", "elaborate": True}

def test_no_key_errors(self, storage_provider: StorageProvider, test_id: str):
cm = CloudMapping(storage_provider=storage_provider, sync_initially=False)
key = test_id + "/no-key-errors-test"
Expand All @@ -68,6 +53,16 @@ def test_basic_setting_and_getting(self, storage_provider: StorageProvider, test
assert cm[test_id + "-key-a"] == b"uncapitalised"
assert cm[test_id + "-key-3"] == b"three"

def test_get_blindly_defaults_none(self, storage_provider: StorageProvider, test_id: str):
cm = CloudMapping(storage_provider=storage_provider, sync_initially=False)
key = test_id + "/get-blindly-test"

# CloudMappings default to not getting blindly:
assert not cm.get_blindly
# If get_blindly, values default to None
cm.get_blindly = True
assert cm[key] is None

def test_complex_keys(self, storage_provider: StorageProvider, test_id: str):
cm = CloudMapping(storage_provider=storage_provider, sync_initially=False)
key1 = test_id + "/here/are/some/sub/dirs"
Expand Down
21 changes: 21 additions & 0 deletions tests/tests/3_multiplecloudmapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,27 @@ def test_no_ownership_error(self, storage_provider: StorageProvider, test_id: st
with pytest.raises(KeySyncError):
del sess_3[key]

def test_blind_get(self, storage_provider: StorageProvider, test_id: str):
sess_1 = CloudMapping(storage_provider=storage_provider, sync_initially=False)
sess_2 = CloudMapping(storage_provider=storage_provider, sync_initially=False, get_blindly=True)
sess_3 = CloudMapping(storage_provider=storage_provider, sync_initially=False, get_blindly=False)
key = test_id + "/concurrent/blind-get-test"

# Session 1 uploads data:
sess_1[key] = b"data"

# Session 2 blindly gets by default, but can be turned off:
assert sess_2[key] == b"data"
sess_2.get_blindly = False
with pytest.raises(KeyError):
sess_2[key]

# Session 3 dones't blindly get by default, but can be set to:
with pytest.raises(KeyError):
sess_3[key]
sess_3.get_blindly = True
assert sess_3[key] == b"data"

def test_manual_change_error(self, storage_provider: StorageProvider, test_id: str):
cm = CloudMapping(storage_provider=storage_provider, sync_initially=False)
key = test_id + "/concurrent/manual-change-test"
Expand Down

0 comments on commit f430a80

Please sign in to comment.