Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): enable loading both JSON and YAML pipelines IR #1089

Merged
merged 25 commits into from
Apr 6, 2022

Conversation

connor-mccarthy
Copy link
Member

@connor-mccarthy connor-mccarthy commented Mar 17, 2022

Fixes #1088

Main changes
This PR switches load_json to load_yaml. This change is backward-compatible, as YAML is a superset of JSON.

Other changes

  • Added tests for load_yaml to demonstrate that is reads both JSON and YAML.
  • Added pyyaml as a library dependency.
  • Updated copyright where applicable.
  • Formatted tests/unit/aiplatform/test_utils.py with black.

Manually tested the four primary codepaths

  1. Read JSON locally.
  2. Read YAML locally.
  3. Read JSON from Google Cloud Storage.
  4. Read YAML from Google Cloud Storage.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

cc: @chensun @ji-yaqi

@connor-mccarthy connor-mccarthy requested a review from a team as a code owner March 17, 2022 16:36
@connor-mccarthy connor-mccarthy marked this pull request as draft March 17, 2022 17:50
@sasha-gitg sasha-gitg added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 18, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 18, 2022
@connor-mccarthy connor-mccarthy marked this pull request as ready for review March 22, 2022 22:42
Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding some YAML snippets here along side the existing JSON one?

_TEST_PIPELINE_SPEC_LEGACY = {
"pipelineInfo": {"name": "my-pipeline"},
"root": {
"dag": {"tasks": {}},
"inputDefinitions": {"parameters": {"string_param": {"type": "STRING"}}},
},
"schemaVersion": "2.0.0",
"components": {},
}
_TEST_PIPELINE_SPEC = {
"pipelineInfo": {"name": "my-pipeline"},
"root": {
"dag": {"tasks": {}},
"inputDefinitions": {
"parameters": {
"string_param": {"parameterType": "STRING"},
"bool_param": {"parameterType": "BOOLEAN"},
"double_param": {"parameterType": "NUMBER_DOUBLE"},
"int_param": {"parameterType": "NUMBER_INTEGER"},
"list_int_param": {"parameterType": "LIST"},
"list_string_param": {"parameterType": "LIST"},
"struct_param": {"parameterType": "STRUCT"},
}
},
},
"schemaVersion": "2.1.0",
"components": {},
}
_TEST_TFX_PIPELINE_SPEC = {
"pipelineInfo": {"name": "my-pipeline"},
"root": {
"dag": {"tasks": {}},
"inputDefinitions": {"parameters": {"string_param": {"type": "STRING"}}},
},
"schemaVersion": "2.0.0",
"sdkVersion": "tfx-1.4.0",
"components": {},
}

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-

# Copyright 2020 Google LLC
# Copyright 2022 Google LLC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: preserve the initial year, or leave it untouched.
https://opensource.google/documentation/reference/copyright#the_year

from typing import Any, Dict, Optional

import yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an optional dependency is should be wrapped with an informative exception on import:

try:
import tensorflow as tf
except ImportError:
raise ImportError(
"Tensorflow is not installed and is required to load saved model. "
'Please install the SDK using "pip install google-cloud-aiplatform[lit]"'
)

Preference to import in the function that uses this package so it doesn't raise on module import.

class TestYamlUtils:
def test_load_yaml_from_local_file__with_json(self, yaml_file):
data = yaml_utils.load_yaml(yaml_file)
assert data["key"] == "val"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pytest supports dict equality with assert d1 == d2

@sasha-gitg sasha-gitg added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 25, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 25, 2022
yield mock_load_json
def mock_load_yaml_and_json(job_spec):
with patch.object(storage.Blob, "download_as_bytes") as mock_load_yaml_and_json:
job_spec_str = json.dumps(job_spec) if isinstance(job_spec, dict) else job_spec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems inconsistent to handle the json case as job_spec being deserialized and the yaml case as job_spec being serialized. Preference to add a comment describing this for future reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. That originates from the inconsistent starting point of the two representations in the tests: the JSON starts as a dict and the YAML starts as a string. In real usage, they would likely both be read in as strings from a file.

One option to make it more consistent is to put the json/dicts into strings as well, either literally with """ or programmatically (my preference) by wrapping in json.dumps at definition. If not one of those two options, I'll add a comment. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.dumps SGTM. Thanks!

@sasha-gitg sasha-gitg added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 29, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 29, 2022
@connor-mccarthy
Copy link
Member Author

@sasha-gitg, can you please re-add the automerge label if all looks good?

@sasha-gitg sasha-gitg added the automerge Merge the pull request once unit tests and other checks pass. label Mar 31, 2022
@gcf-merge-on-green
Copy link
Contributor

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Apr 1, 2022
@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Apr 1, 2022
@sasha-gitg sasha-gitg added the automerge Merge the pull request once unit tests and other checks pass. label Apr 1, 2022
@gcf-merge-on-green
Copy link
Contributor

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Apr 1, 2022
@connor-mccarthy connor-mccarthy force-pushed the enable-use-of-yaml-ir branch from ec57bd7 to 3914dd0 Compare April 5, 2022 16:43
@connor-mccarthy connor-mccarthy requested a review from a team as a code owner April 5, 2022 16:43
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Apr 5, 2022
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Apr 5, 2022
@sasha-gitg sasha-gitg added the automerge Merge the pull request once unit tests and other checks pass. label Apr 5, 2022
@sasha-gitg sasha-gitg added automerge Merge the pull request once unit tests and other checks pass. and removed automerge Merge the pull request once unit tests and other checks pass. labels Apr 5, 2022
@gcf-merge-on-green
Copy link
Contributor

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Apr 6, 2022
@sasha-gitg sasha-gitg merged commit f2e70b1 into googleapis:main Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support YAML Pipelines IR serialization format
7 participants