Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containerize Flexpart and Automate S3 Data Handling for GRIB Input #10

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

ninaburg
Copy link
Collaborator

@ninaburg ninaburg commented Nov 1, 2024

Purpose

This pull request aims to achieve the following:

  1. Containerize the Flexpart Application: The Flexpart application has been containerized to streamline deployment and execution. Jenkins is set up to build this container image automatically and push it to both NEXUS and AWS ECR repositories.

  2. Add Job Automation Scripts:

  • Data Download Script: A script has been added to download the necessary input GRIB files from an S3 bucket. This input data is required to run Flexpart.
  • Flexpart Job Launch Script: A script to initiate the Flexpart job inside the container.
  • Data Upload Script: After processing, the output data will be uploaded to a specified S3 bucket.

Notes

  • Most of the code in this PR is adapted from the fdb branch in this repository, but the fdb option has been removed

ARG container_registry=dockerhub.apps.cp.meteoswiss.ch
ARG base_tag=v0.2.1

FROM ${container_registry}/flexpart-poc/spack-dependencies:${base_tag} as intermediate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we depend on another image, we should also include in this repo the Dockerfile for that image, eg Dockerfile.base . Or we can simply merge the two into one Dockerfile but this will take a lot longer to build (the base only needs to be rebuild every now and then)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also i would rename that base image, not to include the word 'poc'

jenkins/test.sh Outdated
wget -q https://nexus.meteoswiss.ch/nexus/repository/app-artifacts-mch/nwp-rzplus/flexpart-poc/flexpart/IGBP_int1.dat


podman run --name flexpart-container-test-$BRANCH_NAME -v ${data_dir}:/data:ro $IMAGE_INTERN --fdb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this --fdb flag do anything? Should be removed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

IBTIME: 000006
IEDATE: 20230918
IETIME: 000010
FDBFLAG: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed

Suggested change
FDBFLAG: 1

}
}
}
stage('Test container') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build and test should be done before Deploy stage . I would copy the layout here: https://github.com/C2SM-RCM/flexpart/blob/main/Jenkinsfile

&& apt-get -yqq install --no-install-recommends \
zlib1g && spack external find

RUN spack env activate -p spack-env && spack concretize && spack install
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always install the main branch of the image.. which may be what you want, but if we want to always build the current commit in Jenkins, I would do something like this: https://github.com/C2SM-RCM/flexpart/blob/main/Dockerfile#L19-L25

from yaml import Loader, Dumper


def download_file(obj, file):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be neater if we had a new file called s3_utils.py with the contents of upload_s3.py and this function (and whatever is s3 related), removing then upload_s3.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I would restructure the python code into a library, and have separate functions rather than one script. Then we can also add pytest tests much easier.

spack:
# add package specs to the `specs` list
specs:
- flexpart-ifs@main
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of using spack develop for building from a local source (if we want to support building current commit in CICD).

spack:
  specs:
  - flexpart-cosmo@main
  config:
    install_tree:
      root: /home/vcherkas/dev/flexpart-cosmo-sandbox/spack-root
  develop:
    flexpart-cosmo:
      path: /home/vcherkas/dev/flexpart-cosmo-sandbox/flexpart
      spec: flexpart-cosmo@=main

also here in CICD: https://github.com/C2SM-RCM/flexpart/blob/main/Dockerfile.base#L91

NAMESPACE = "numericalweatherpredictions/dispersionmodelling/flexpart-ifs/flexpart-containerize"
IMAGE_INTERN = "docker-intern-nexus.meteoswiss.ch/$NAMESPACE:$TAG"
IMAGE_PUBLIC = "docker-public-nexus.meteoswiss.ch/$NAMESPACE:$TAG"
REPO_ECR = "493666016161.dkr.ecr.eu-central-2.amazonaws.com"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed I would not expose AWS account ID. Would fetch this from Vault.

script: 'if [ "$BRANCH_NAME" = "main" ]; then echo -n 1; elif [ -n "${TAG_NAME+1}" ]; then echo -n 1; else echo -n 0; fi'
)

NAMESPACE = "numericalweatherpredictions/dispersionmodelling/flexpart-ifs/flexpart-containerize"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NAMESPACE = "numericalweatherpredictions/dispersionmodelling/flexpart-ifs/flexpart-containerize"
NAMESPACE = "numericalweatherpredictions/dispersionmodelling/flexpart-ifs/flexpart"

The name flexpart is already descriptive enough, or is there already an image for that?
OR could add ewc to the $TAG if not

Copy link
Collaborator Author

@ninaburg ninaburg Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there isn’t another flexpart-ifs/flexpart image, so I switched to that name to maintain consistency with the flexpart-cosmo naming convention.

@ninaburg ninaburg changed the base branch from containerize to main November 26, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants