Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow empty YAML fields #1890

Merged
merged 17 commits into from
Sep 13, 2023
Merged

Allow empty YAML fields #1890

merged 17 commits into from
Sep 13, 2023

Conversation

iojw
Copy link
Collaborator

@iojw iojw commented Apr 21, 2023

Users may often comment out the nested contents of YAML fields without remove the top-level field. Previously, this would throw the following error due to our YAML validation:

ValueError: Invalid task YAML: None is not of type 'string'. Check problematic field(s): $.run

This PR updates it so that such fields are treated as though they do not exist. It does this by removing all fields with a None value in the parsed dictionary before validation.

We do not solve this issue by adding null to the JSON schema for our YAML config because many places in the code test for the existence of specific fields in the dictionary so we would have to modify all of their logic. Moreover, having to add null to every single field in the JSON schema reduces readability.

Tested (run the relevant ones):

  • Tested with YAML configs with empty fields
  • Added unit test for yaml parsing
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iojw. A nit. Consider adding a unit test to ensure this doesn’t break in the future?

@@ -766,6 +766,7 @@ def warn_for_git_dir(source: str):

@classmethod
def from_yaml_config(cls, config: Dict[str, Any]) -> 'Storage':
config = {k: v for k, v in config.items() if v is not None}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible to move this into backend_utils.validate_schema since there are three places?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a flag so we can enable this for validation. The reason we don't want to do it for all cases is that it prevents us from actually having null values in the schema in the future.

In addition, a lot of the current code relies on the fact that if a key exists in the config dictionary, then the value is valid. E.g. if we allow None to be inside the config dictionary, then inside storage.from_yaml_config, force_delete = config.pop('_force_delete', False) would incorrectly set force_delete to None if the user specifies force_delete in the YAML without a value. Therefore, I also had to refactor the code such that it properly handle None config values.

@concretevitamin
Copy link
Member

concretevitamin commented Apr 21, 2023

Have we ensured that for all fields in the current YAML spec, None means absence means default value? Any cases where None means something else?

@iojw
Copy link
Collaborator Author

iojw commented Apr 25, 2023

@concretevitamin Yes, I've looked through Resources, Storage, and Task to ensure that for these schemas, None does not have a special meaning and is properly handled.

@iojw iojw force-pushed the yaml-schema-update branch from c150bd4 to 7b78873 Compare April 25, 2023 00:17
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, some questions.

sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/onprem_utils.py Outdated Show resolved Hide resolved
tests/test_yaml_parser.py Show resolved Hide resolved
tests/test_yaml_parser.py Outdated Show resolved Hide resolved
tests/test_yaml_parser.py Show resolved Hide resolved
tests/test_yaml_parser.py Outdated Show resolved Hide resolved
tests/test_yaml_parser.py Show resolved Hide resolved
@@ -2471,7 +2471,10 @@ def stop_handler(signum, frame):
raise KeyboardInterrupt(exceptions.SIGTSTP_CODE)


def validate_schema(obj, schema, err_msg_prefix=''):
def validate_schema(obj, schema, err_msg_prefix='', skip_none=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we want skip_none=False? In cli.py’s admin_deploy()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want it in cases where None is a valid value for some config we are validating. After thinking about it further, it makes more sense to set the default to True, since it is likelier than we want to remove None values than not. I've updated to code for this.

sky/resources.py Show resolved Hide resolved
@iojw iojw requested a review from concretevitamin June 2, 2023 18:29
@concretevitamin
Copy link
Member

@iojw Oops, slipped through the radar. Could you merge with master first?

@iojw
Copy link
Collaborator Author

iojw commented Jun 5, 2023

@concretevitamin Updated!

@concretevitamin concretevitamin mentioned this pull request Jun 7, 2023
3 tasks
@concretevitamin
Copy link
Member

LGTM. With the latest commit,

FAILED tests/test_smoke.py::test_gcp_stale_job_manual_restart - AttributeError: module 'sky.skylet.events' has no attribute 'JobUpdateEvent'. Did you mean: 'SpotJobUpdateEvent'?
FAILED tests/test_smoke.py::test_file_mounts - Exception: test failed: less /tmp/using_file_mounts-bsvqi8s_.log
FAILED tests/test_smoke.py::test_large_job_queue - Exception: test failed: less /tmp/large_job_queue-x96b5ene.log
FAILED tests/test_smoke.py::test_job_queue_multinode - Exception: test failed: less /tmp/job_queue_multinode-iyo5bv9n.log
FAILED tests/test_smoke.py::test_multi_echo - Exception: test failed: less /tmp/multi_echo-86fyim3q.log
FAILED tests/test_smoke.py::test_env_check - Exception: test failed: less /tmp/env_check-v7b2xnyx.log
6 failed, 62 passed, 35 skipped, 1646 warnings in 3462.05s (0:57:42)

These tests failed, most likely due to merge conflicts. See comment: https://github.com/skypilot-org/skypilot/pull/1636/files#r1220710715

Could you merge again?

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @iojw.

@iojw iojw merged commit ddbf3b9 into master Sep 13, 2023
@iojw iojw deleted the yaml-schema-update branch September 13, 2023 04:12
@iojw
Copy link
Collaborator Author

iojw commented Sep 13, 2023

Confirmed that smoke tests pass with these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants