Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS support #1125

Merged
merged 90 commits into from
Sep 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
467ae13
Add GCS storage provider
kristinagrig06 Aug 20, 2021
b1082e5
Fix docstring
kristinagrig06 Aug 20, 2021
823057f
Add gcsfs to requirements
kristinagrig06 Aug 20, 2021
5a94c0f
Move requirement to common
kristinagrig06 Aug 20, 2021
2d63fab
Change circleci config
kristinagrig06 Aug 20, 2021
7b2d1f0
Fix lint
kristinagrig06 Aug 20, 2021
77e998b
Fix gcs_ds fixture
kristinagrig06 Aug 20, 2021
a8c46f0
Add token file input
kristinagrig06 Aug 20, 2021
bc834e1
Add print for test
kristinagrig06 Aug 20, 2021
b1c63a3
Add creds from env var
kristinagrig06 Aug 20, 2021
55387e4
Modify CONTRIBUTING.md
kristinagrig06 Aug 20, 2021
1361620
Check readonly
kristinagrig06 Aug 20, 2021
32f4f70
Format docstrings
kristinagrig06 Aug 20, 2021
b8a62ad
Modify circleci timeout
kristinagrig06 Aug 23, 2021
d7f2d66
Fix config
kristinagrig06 Aug 23, 2021
b4e8e50
Remove duplicate
kristinagrig06 Aug 23, 2021
d750e7c
Fix tab
kristinagrig06 Aug 23, 2021
48c54ac
Remove parallelism
kristinagrig06 Aug 23, 2021
c68a63a
Replace join for path
kristinagrig06 Aug 23, 2021
2bf9a89
Change to gcloud
kristinagrig06 Aug 25, 2021
b9b7349
Fix delte
kristinagrig06 Aug 25, 2021
83a7d0b
Move import
kristinagrig06 Aug 25, 2021
fceb0b8
Fix pickle
kristinagrig06 Aug 25, 2021
5afb3bd
Fix creds
kristinagrig06 Aug 25, 2021
b3ca9d6
Convert bytearrays
kristinagrig06 Aug 25, 2021
6ebe0f0
Creds formats
kristinagrig06 Aug 25, 2021
0169f50
Add browser token generation
kristinagrig06 Aug 26, 2021
fb7164b
Add exception
kristinagrig06 Aug 26, 2021
d2a865c
Fix lint
kristinagrig06 Aug 26, 2021
f9cf3dd
Add oauth to requirements
kristinagrig06 Aug 26, 2021
3272e38
Fix docstrings
kristinagrig06 Aug 26, 2021
0280028
Change no_output_timeout in config
kristinagrig06 Aug 26, 2021
b198245
Add project to state
kristinagrig06 Aug 26, 2021
1ad2514
Add multiprocessing import to test_transform
kristinagrig06 Aug 26, 2021
b7f7f22
Merge branch 'main' of https://github.com/activeloopai/Hub into featu…
kristinagrig06 Aug 26, 2021
733b0f6
Merge branch 'main' of https://github.com/activeloopai/Hub into featu…
kristinagrig06 Aug 26, 2021
96f928d
Rename class
kristinagrig06 Aug 26, 2021
a1a3fc5
Downgrade gcloud storage
kristinagrig06 Aug 27, 2021
1aaffaf
Revert "Downgrade gcloud storage"
kristinagrig06 Aug 27, 2021
3f36ab2
Skip some transform tests for gcs
kristinagrig06 Aug 27, 2021
bac5532
Add retry
kristinagrig06 Aug 27, 2021
4c26979
Add token tests
kristinagrig06 Aug 27, 2021
0858e98
Modify cache token
kristinagrig06 Aug 27, 2021
fa2bc3d
Remove import
kristinagrig06 Aug 27, 2021
908479f
Create tempfile for dicts
kristinagrig06 Aug 27, 2021
a37186d
Add excpetion test
kristinagrig06 Aug 27, 2021
0b14999
Add client reinitializing
kristinagrig06 Aug 30, 2021
33f8197
Disable some api tests for gcs
kristinagrig06 Aug 30, 2021
cd0490b
Change fixture name
kristinagrig06 Aug 30, 2021
ef1156e
Check cache
kristinagrig06 Aug 30, 2021
539f333
Fix check
kristinagrig06 Aug 30, 2021
8ac9a15
Change method name
kristinagrig06 Aug 30, 2021
190da7f
Ignore check
kristinagrig06 Aug 30, 2021
5cb0be0
Move typing_extensions requirement
kristinagrig06 Aug 30, 2021
8b8a923
Specify version
kristinagrig06 Aug 30, 2021
edbc2fd
Remove unused fixture
kristinagrig06 Aug 30, 2021
d81ba97
Remove print
kristinagrig06 Aug 30, 2021
529b664
Merge branch 'main' of https://github.com/activeloopai/Hub into featu…
kristinagrig06 Aug 31, 2021
8dacd6b
Remove reinitialization, add docstrings, modify setitem
kristinagrig06 Aug 31, 2021
a2af6d8
Remove reinitialize from transform
kristinagrig06 Aug 31, 2021
1839891
Fix __all_keys in iter
kristinagrig06 Aug 31, 2021
0a5a872
Increase maxfiles open
kristinagrig06 Sep 2, 2021
ccc1952
Add limit
kristinagrig06 Sep 3, 2021
82212b2
Revert reinit
kristinagrig06 Sep 3, 2021
5937936
Modify maxfiles per proc
kristinagrig06 Sep 3, 2021
74af2c9
Merge branch 'main' of https://github.com/activeloopai/Hub into featu…
kristinagrig06 Sep 3, 2021
dea7758
Close pools
kristinagrig06 Sep 3, 2021
94292de
Fix
kristinagrig06 Sep 3, 2021
a40fe17
Terminate pool
kristinagrig06 Sep 3, 2021
4679216
Change to closing
kristinagrig06 Sep 3, 2021
83e4a54
Use with
kristinagrig06 Sep 3, 2021
77d400a
Set sharing for pytorch
kristinagrig06 Sep 3, 2021
1389c5e
worker_init_fn in Dataloader
kristinagrig06 Sep 3, 2021
f8e852f
Add ulimit
kristinagrig06 Sep 3, 2021
dd010e4
Remove ulimit
kristinagrig06 Sep 3, 2021
06f7ecb
Change n
kristinagrig06 Sep 3, 2021
cd8a90a
Comment pytorch read test
kristinagrig06 Sep 3, 2021
90ab53e
Use deepcopy of batches
kristinagrig06 Sep 3, 2021
e71a38d
Change setitem
kristinagrig06 Sep 3, 2021
5515f52
Remove del
kristinagrig06 Sep 9, 2021
094e131
Unset limits
kristinagrig06 Sep 9, 2021
834bbaa
Merge remote-tracking branch 'origin' into feature/gcs_support
AbhinavTuli Sep 12, 2021
ae886b2
fix mac test
AbhinavTuli Sep 12, 2021
7b8585a
add back parallelism
AbhinavTuli Sep 12, 2021
3dd6815
fix mac tests by closing
AbhinavTuli Sep 13, 2021
a92c777
lint fix
AbhinavTuli Sep 13, 2021
d793a48
Add windows path
kristinagrig06 Sep 13, 2021
a6193dd
Merge branch 'feature/gcs_support' of https://github.com/activeloopai…
kristinagrig06 Sep 13, 2021
1f92d2a
Change platform to os.name
kristinagrig06 Sep 13, 2021
a2f65a8
Merge branch 'main' of https://github.com/activeloopai/Hub into featu…
kristinagrig06 Sep 13, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,9 @@ commands:
- run:
name: "Install dependencies"
command: |
brew install zlib libjpeg webp
brew install zlib libjpeg webp
sudo ln -s /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/* /usr/local/include/
sudo launchctl limit maxfiles 65536 200000
info:
steps:
- run:
Expand Down Expand Up @@ -176,7 +177,8 @@ commands:
command: |
$Env:GOOGLE_APPLICATION_CREDENTIALS = $Env:CI_GCS_PATH
setx /m GOOGLE_APPLICATION_CREDENTIALS "$Env:GOOGLE_APPLICATION_CREDENTIALS"
python3 -m pytest --cov-report=xml --cov=./ --local --s3 --hub-cloud --kaggle --ignore-glob=buH/*
python3 -m pytest --cov-report=xml --cov=./ --local --s3 --gcs --hub-cloud --kaggle --ignore-glob=buH/*
no_output_timeout: 30m
- when:
condition: << parameters.unix-like >>
steps:
Expand All @@ -186,9 +188,10 @@ commands:
BUGGER_OFF: "true"
command: |
export GOOGLE_APPLICATION_CREDENTIALS=$HOME/.secrets/gcs.json
python3 -m pytest --cov-report=xml --cov=./ --local --s3 --hub-cloud --kaggle --ignore-glob=buH/*
python3 -m pytest --cov-report=xml --cov=./ --local --s3 --gcs --hub-cloud --kaggle --ignore-glob=buH/*
no_output_timeout: 30m
parallelism: 10

run-backwards-compatibility-tests:
steps:
- run:
Expand Down
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ pip3 install -r hub/requirements/tests.txt
- `pytest .`: Run all tests with memory only.
- `pytest . --local`: Run all tests with memory and local.
- `pytest . --s3`: Run all tests with memory and s3.
- `pytest . --gcs`: Run all tests with memory and GCS
- `pytest . --kaggle`: Run all tests that use the kaggle API.
- `pytest . --memory-skip --hub-cloud`: Run all tests with hub cloud only.
#### Backwards Compatibility Tests
Expand All @@ -41,6 +42,7 @@ Combine any of the following options to suit your test cases.

- `--local`: Enable local tests.
- `--s3`: Enable S3 tests.
- `--gcs`: Enable GCS tests.
- `--hub-cloud`: Enable hub cloud tests.
- `--memory-skip`: Disable memory tests.
- `--s3-path`: Specify an s3 path if you don't have access to our internal testing bucket.
Expand Down
3 changes: 3 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ def pytest_addoption(parser):
LOCAL_OPT, action="store_true", help="Local tests will run if enabled."
)
parser.addoption(S3_OPT, action="store_true", help="S3 tests will run if enabled.")
parser.addoption(
GCS_OPT, action="store_true", help="GCS tests will run if enabled."
)
parser.addoption(
HUB_CLOUD_OPT, action="store_true", help="Hub cloud tests will run if enabled."
)
Expand Down
7 changes: 4 additions & 3 deletions hub/api/tests/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from hub.tests.dataset_fixtures import (
enabled_datasets,
enabled_persistent_dataset_generators,
enabled_non_gcs_datasets,
)


Expand Down Expand Up @@ -145,15 +146,15 @@ def test_stringify_with_path(local_ds):
assert str(ds) == f"Dataset(path='{local_ds.path}', tensors=[])"


@enabled_datasets
@enabled_non_gcs_datasets
def test_compute_fixed_tensor(ds):
ds.create_tensor("image")
ds.image.extend(np.ones((32, 28, 28)))
assert len(ds) == 32
np.testing.assert_array_equal(ds.image.numpy(), np.ones((32, 28, 28)))


@enabled_datasets
@enabled_non_gcs_datasets
def test_compute_dynamic_tensor(ds):
ds.create_tensor("image")

Expand Down Expand Up @@ -216,7 +217,7 @@ def test_empty_samples(ds: Dataset):
np.testing.assert_array_equal(actual, expected)


@enabled_datasets
@enabled_non_gcs_datasets
def test_safe_downcasting(ds: Dataset):
int_tensor = ds.create_tensor("int", dtype="uint8")
int_tensor.append(0)
Expand Down
3 changes: 3 additions & 0 deletions hub/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,17 +62,20 @@
PYTEST_MEMORY_PROVIDER_BASE_ROOT = "mem://hub_pytest"
PYTEST_LOCAL_PROVIDER_BASE_ROOT = "/tmp/hub_pytest/" # TODO: may fail for windows
PYTEST_S3_PROVIDER_BASE_ROOT = "s3://hub-2.0-tests/"
PYTEST_GCS_PROVIDER_BASE_ROOT = "gcs://snark-test/"
PYTEST_HUB_CLOUD_PROVIDER_BASE_ROOT = f"hub://{HUB_CLOUD_DEV_USERNAME}/"

# environment variables
ENV_HUB_DEV_PASSWORD = "ACTIVELOOP_HUB_PASSWORD"
ENV_KAGGLE_USERNAME = "KAGGLE_USERNAME"
ENV_KAGGLE_KEY = "KAGGLE_KEY"
ENV_GOOGLE_APPLICATION_CREDENTIALS = "GOOGLE_APPLICATION_CREDENTIALS"

# pytest options
MEMORY_OPT = "--memory-skip"
LOCAL_OPT = "--local"
S3_OPT = "--s3"
GCS_OPT = "--gcs"
HUB_CLOUD_OPT = "--hub-cloud"
S3_PATH_OPT = "--s3-path"
KEEP_STORAGE_OPT = "--keep-storage"
Expand Down
5 changes: 5 additions & 0 deletions hub/core/compute/process.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,8 @@ def __init__(self, workers):

def map(self, func, iterable):
return self.pool.map(func, iterable)

def close(self):
self.pool.close()
self.pool.join()
self.pool.clear()
4 changes: 4 additions & 0 deletions hub/core/compute/provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,7 @@ def map(self, func, iterable):
"""Apply 'func' to each element in 'iterable', collecting the results
in a list that is returned.
"""

@abstractmethod
def close(self):
"""Closes the provider."""
3 changes: 3 additions & 0 deletions hub/core/compute/serial.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,6 @@ def __init__(self):

def map(self, func, iterable):
return map(func, iterable)

def close(self):
return
5 changes: 5 additions & 0 deletions hub/core/compute/thread.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,8 @@ def __init__(self, workers):

def map(self, func, iterable):
return self.pool.map(func, iterable)

def close(self):
self.pool.close()
self.pool.join()
self.pool.clear()
Loading