-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Cosmos enable_cache setting #1025
Conversation
During refactoring, it was changed to a non-boolean, not allowing users to disable the cache in Cosmos using AIRFLOW__COSMOS__ENABLE_CACHE=0
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing it!
As of Cosmos 1.4.1, users are not able to disable the cache in Cosmos using `AIRFLOW__COSMOS__ENABLE_CACHE=0`. During a recent refactoring #975, the `enable_cache` was changed to a non-boolean config. This PR fixes this issue.
Bug fixes * Fix the invocation mode for ``ExecutionMode.VIRTUALENV`` by @marco9663 in #1023 * Fix Cosmos ``enable_cache`` setting by @tatiana in #1025 * Make ``GoogleCloudServiceAccountDictProfileMapping`` dataset profile arg optional by @oliverrmaa and @pankajastro in #839 and #1017 * Athena profile mapping set ``aws_session_token`` in profile only if it exists by @pankajastro in #1022 Others * Update dbt and Airflow conflicts matrix by @tatiana in #1026 * Enable Python 3.12 unittest by @pankajastro in #1018 * Improve error logging in ``DbtLocalBaseOperator`` by @davidsteinar in #1004 * Add GitHub issue templates for bug reports and feature request by @pankajkoti in #1009 * Add more fields in bug template to reduce turnaround in issue triaging by @pankajkoti in #1027 * Fix ``dev/Dockerfile`` + Add ``uv pip install`` for faster build time by @dwreeves in #997 * Drop support for Airflow 2.3 by @pankajkoti in #994 * Update Astro Runtime image by @RNHTTR in #988 and #989 * Enable ruff F linting by @pankajastro in #985 * Move Cosmos Airflow configuration to settings.py by @pankajastro in #975 * Fix CI Issues by @tatiana in #1005 * Pre-commit hook updates in #1000, #1019 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Reeves <31971762+dwreeves@users.noreply.github.com> Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: davidsteinar <davidsteinar@gmail.com> Co-authored-by: David Steinar Asgrimsson <david-steinar.asgrimsson@kommuninvest.se> Co-authored-by: Marco Yuen <33394715+marco9663@users.noreply.github.com> Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com> Co-authored-by: Ollie Ma <oliver.zheyi.ma@gmail.com>
@tatiana Does |
@maslick There was an issue in the |
@tatiana I understand. The reason I am asking is that we're using |
@maslick In Cosmos 1.4.1, if you want to disable the cache, you can do this via
If you check this PR, we fixed how we retrieve the envvar to use Airflow
In Cosmos 1.4.1, this was implemented From Cosmos 1.4.2, both Did you have any issues with cache that made you want to switch off this feature? |
@tatiana our issue with cache enabled was that it quickly fills up the 30 GB storage volume on MWAA workers. We are running a large number of DAGs (>500) with lots of dbt models/transformations... We can't increase the storage volume on MWAA either. That's why we decided to disable cache entirely. Of course, this would increase the parsing times... |
@tatiana With cache disabled it's no longer possible to do partial parsing from a supplied manifest. Is this intended behavior and or is there some config that would need to change to get it working? |
@maslick this issue will hopefully be resolved once #927 is implemented. I'm sorry that the cache had such a massive impact on your deployment - we weren't expecting for that to take 30 GB volume storage. It's a shame MWAA doesn't handle better temp files. @josefbits I believe this is unintended behaviour. Please open an issue using https://github.com/astronomer/astronomer-cosmos/issues, and we'll follow up on this ASAP. Contributions are also welcome; it should be a small fix. |
@tatiana Not sure if this is the right place to continue this discussion, but with regards to
Is there some specific reason that you need to cache multiple copies of the manifests? My best guess was in case of write locks by multiple Airflow tasks. Right now Cosmos is caching once per DbtTaskGroup and for projects with a large number of models, dags, task groups etc it can become problematic on managed Airflow solutions. Speaking of dbt parsing specifically, I've done a bunch of comparisons on the manifest.json before and after run, with partial parse enabled/disabled, and different methods to change the dbt state. From what I've seen the only thing that really changes is metadata like created_at, raw_code, and compile/build paths. This is, of course, assuming parity between CI build environment and run time, and that one is not doing something strange that might change the model code or dependencies. Just looking for a bit more insight :) |
@josefbits, thanks for your insights! The issue we're discussing was not introduced by this PR, which was already merged last week - so new discussions may get lost. |
As of Cosmos 1.4.1, users are not able to disable the cache in Cosmos using `AIRFLOW__COSMOS__ENABLE_CACHE=0`. During a recent refactoring astronomer#975, the `enable_cache` was changed to a non-boolean config. This PR fixes this issue.
Bug fixes * Fix the invocation mode for ``ExecutionMode.VIRTUALENV`` by @marco9663 in astronomer#1023 * Fix Cosmos ``enable_cache`` setting by @tatiana in astronomer#1025 * Make ``GoogleCloudServiceAccountDictProfileMapping`` dataset profile arg optional by @oliverrmaa and @pankajastro in astronomer#839 and astronomer#1017 * Athena profile mapping set ``aws_session_token`` in profile only if it exists by @pankajastro in astronomer#1022 Others * Update dbt and Airflow conflicts matrix by @tatiana in astronomer#1026 * Enable Python 3.12 unittest by @pankajastro in astronomer#1018 * Improve error logging in ``DbtLocalBaseOperator`` by @davidsteinar in astronomer#1004 * Add GitHub issue templates for bug reports and feature request by @pankajkoti in astronomer#1009 * Add more fields in bug template to reduce turnaround in issue triaging by @pankajkoti in astronomer#1027 * Fix ``dev/Dockerfile`` + Add ``uv pip install`` for faster build time by @dwreeves in astronomer#997 * Drop support for Airflow 2.3 by @pankajkoti in astronomer#994 * Update Astro Runtime image by @RNHTTR in astronomer#988 and astronomer#989 * Enable ruff F linting by @pankajastro in astronomer#985 * Move Cosmos Airflow configuration to settings.py by @pankajastro in astronomer#975 * Fix CI Issues by @tatiana in astronomer#1005 * Pre-commit hook updates in astronomer#1000, astronomer#1019 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Reeves <31971762+dwreeves@users.noreply.github.com> Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: davidsteinar <davidsteinar@gmail.com> Co-authored-by: David Steinar Asgrimsson <david-steinar.asgrimsson@kommuninvest.se> Co-authored-by: Marco Yuen <33394715+marco9663@users.noreply.github.com> Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com> Co-authored-by: Ollie Ma <oliver.zheyi.ma@gmail.com>
As of Cosmos 1.4.1, users are not able to disable the cache in Cosmos using
AIRFLOW__COSMOS__ENABLE_CACHE=0
.During a recent refactoring #975, the
enable_cache
was changed to a non-boolean config. This PR fixes this issue.