Per segment chunks (#8272)

### Motivation and context  - Changed chunk generation from per-task chunks to per-segment chunks - Fixed a memory leak in video reading on the server side (only in media_extractors, so there are several more left) - Fixed a potential hang in `import` worker or the server process on process shutdown - Disabled multithreading in video reading in endpoints (not in static chunk generation) - Refactored static chunk generation code (moved after job creation) - Refactored various server internal APIs for frame retrieval - Updated UI logic to access chunks, added support for non-sequential frames in chunks - Added a new server configuration option `CVAT_ALLOW_STATIC_CACHE` (boolean) to enable and disable static cache support. The option is disabled by default (it's changed from the previous behavior) - Added tests for the changes made - Added missing original chunk type field in job responses - Fixed invalid kvrocks cleanup in tests for Helm deployment - Added a new 0-based `index` parameter in `GET /api/jobs/{id}/data/?type=chunk` to simplify indexing - GT job chunks with non-sequential frames have no placeholders inside When this update is applied to the server, there will be a data storage setting migration for the tasks. Existing tasks using static chunks (`task.data.storage_method == FILE_SYSTEM`) will be switched to the dynamic cache (i.e. to `== CACHE)`). The remaining files should be removed manually, there will be a list of such tasks in the migration log file. After this update, you'll have an option to enable or disable static cache use during task creation. This allows, in particular, prohibit new tasks using the static cache. With this option, any tasks using static cache will use the dynamic cache instead on data access. User-observable changes: - Job chunk ids now start from 0 for each job instead of using parent task ids - The `use_cache = false` or `storage_method = filesystem` parameters in task creation can be ignored by the server - Task chunk access may be slower for some chunks (particularly, for tasks with overlap configured, for chunks on segment boundaries, and for tasks previously using static chunks) - The last chunk in a job will contain only the frames from the current job, even if there are more frames in the task ### How has this been tested?  ### Checklist  - [ ] I submit my changes into the `develop` branch - [ ] I have created a changelog fragment  - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [ ] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [ ] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/cvat-ai/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/cvat-ai/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/cvat-ai/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/cvat-ai/cvat/tree/develop/cvat-ui#versioning)) ### License - [ ] I submit _my code changes_ under the same [MIT License]( https://github.com/cvat-ai/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern.  ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Introduced a new server setting to disable media chunks on the local filesystem. - Enhanced frame prefetching with a `startFrame` parameter for improved chunk calculations. - Added a new property, `data_original_chunk_type`, for enhanced job differentiation in the metadata. - **Bug Fixes** - Resolved memory management issues to prevent leaks during video processing. - Corrected naming inconsistencies related to the `prefetchAnalyzer`. - **Documentation** - Included configuration for code formatting tools to ensure consistent code quality across the project. - **Refactor** - Restructured classes and methods for improved clarity and maintainability, particularly in media handling and task processing. - **Chores** - Updated formatting scripts to include additional directories for automated code formatting.
cvat-ai · Sep 24, 2024 · c9754a9 · c9754a9
1 parent f227718
commit c9754a9
Show file tree

Hide file tree

Showing 50 changed files with 3,516 additions and 1,373 deletions.
diff --git a/.github/workflows/full.yml b/.github/workflows/full.yml
@@ -165,6 +165,8 @@ jobs:
         id: run_tests
         run: |
           pytest tests/python/
+          ONE_RUNNING_JOB_IN_QUEUE_PER_USER="true" pytest tests/python/rest_api/test_queues.py
+          CVAT_ALLOW_STATIC_CACHE="true" pytest -k "TestTaskData" tests/python
 
       - name: Creating a log file from cvat containers
         if: failure() && steps.run_tests.conclusion == 'failure'

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -177,8 +177,9 @@ jobs:
           COVERAGE_PROCESS_START: ".coveragerc"
         run: |
           pytest tests/python/ --cov --cov-report=json
-          for COVERAGE_FILE in `find -name "coverage*.json" -type f -printf "%f\n"`; do mv ${COVERAGE_FILE} "${COVERAGE_FILE%%.*}_0.json"; done
           ONE_RUNNING_JOB_IN_QUEUE_PER_USER="true" pytest tests/python/rest_api/test_queues.py --cov --cov-report=json
+          CVAT_ALLOW_STATIC_CACHE="true" pytest -k "TestTaskData" tests/python --cov --cov-report=json
+          for COVERAGE_FILE in `find -name "coverage*.json" -type f -printf "%f\n"`; do mv ${COVERAGE_FILE} "${COVERAGE_FILE%%.*}_0.json"; done
 
       - name: Uploading code coverage results as an artifact
         uses: actions/upload-artifact@v4

diff --git a/.github/workflows/schedule.yml b/.github/workflows/schedule.yml
@@ -170,6 +170,12 @@ jobs:
           pytest tests/python/
           pytest tests/python/ --stop-services
 
+          ONE_RUNNING_JOB_IN_QUEUE_PER_USER="true" pytest tests/python/rest_api/test_queues.py
+          pytest tests/python/ --stop-services
+
+          CVAT_ALLOW_STATIC_CACHE="true" pytest tests/python
+          pytest tests/python/ --stop-services
+
       - name: Unit tests
         env:
           HOST_COVERAGE_DATA_DIR: ${{ github.workspace }}

diff --git a/changelog.d/20240812_161617_mzhiltso_job_chunks.md b/changelog.d/20240812_161617_mzhiltso_job_chunks.md
@@ -0,0 +1,24 @@
+### Added
+
+- A server setting to enable or disable storage of permanent media chunks on the server filesystem
+  (<https://github.com/cvat-ai/cvat/pull/8272>)
+- \[Server API\] `GET /api/jobs/{id}/data/?type=chunk&index=x` parameter combination.
+  The new `index` parameter allows to retrieve job chunks using 0-based index in each job,
+  instead of the `number` parameter, which used task chunk ids.
+  (<https://github.com/cvat-ai/cvat/pull/8272>)
+
+### Changed
+
+- Job assignees will not receive frames from adjacent jobs in chunks
+  (<https://github.com/cvat-ai/cvat/pull/8272>)
+
+### Deprecated
+
+- \[Server API\] `GET /api/jobs/{id}/data/?type=chunk&number=x` parameter combination
+  (<https://github.com/cvat-ai/cvat/pull/8272>)
+
+
+### Fixed
+
+- Various memory leaks in video reading on the server
+  (<https://github.com/cvat-ai/cvat/pull/8272>)