Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approach to dinamically (on the fly) provide system data from runs #3833

Open
SilinPavel opened this issue Dec 19, 2024 · 0 comments
Open

Approach to dinamically (on the fly) provide system data from runs #3833

SilinPavel opened this issue Dec 19, 2024 · 0 comments
Labels
kind/enhancement New feature or request

Comments

@SilinPavel
Copy link
Member

SilinPavel commented Dec 19, 2024

Background
In some cases it can be very beneficial to have a mechanism for runs to provide some data about itself (periodically sync specific data files from run instance to the some central location).

f.e. In Nextflow there is a trace.txt file for each nextflow run which can be a source for the very helpful information such as task statuses, resource consumption, etc.

For Cloud-Pipeline it will be very helpful to have unified mechanism for runs to provide such information on a fly.

Let's implement the following approach:

  • new System Preference launch.run.sync.data:
{
  "syncTimeout": dd # timeout in sec how to configure CP_SYNC_TO_STORAGE_TIMEOUT_SEC
  "data": {
    "<data-type>": {
      "storagePathPrefix": <path-prefix> # path prefix, used to store data in, f.e. storagePathPrefix = "s3://bucket/prefix" - > this data will be stored under "s3://bucket/prefix/<run-id>" path
    }
  }
}
  • usage of he sync_to_storage functionality inside a run to sync this data

Resulted schema would be look like:
Untitled Diagram drawio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant