Skip to content

Commit

Permalink
Per segment chunks (#8272)
Browse files Browse the repository at this point in the history
<!-- Raise an issue to propose your change
(https://github.com/cvat-ai/cvat/issues).
It helps to avoid duplication of efforts from multiple independent
contributors.
Discuss your ideas with maintainers to be sure that changes will be
approved and merged.
Read the [Contribution guide](https://docs.cvat.ai/docs/contributing/).
-->

<!-- Provide a general summary of your changes in the Title above -->

### Motivation and context
<!-- Why is this change required? What problem does it solve? If it
fixes an open
issue, please link to the issue here. Describe your changes in detail,
add
screenshots. -->

- Changed chunk generation from per-task chunks to per-segment chunks
- Fixed a memory leak in video reading on the server side (only in
media_extractors, so there are several more left)
- Fixed a potential hang in `import` worker or the server process on
process shutdown
- Disabled multithreading in video reading in endpoints (not in static
chunk generation)
- Refactored static chunk generation code (moved after job creation)
- Refactored various server internal APIs for frame retrieval
- Updated UI logic to access chunks, added support for non-sequential
frames in chunks
- Added a new server configuration option `CVAT_ALLOW_STATIC_CACHE`
(boolean) to enable and disable static cache support. The option is
disabled by default (it's changed from the previous behavior)
- Added tests for the changes made
- Added missing original chunk type field in job responses
- Fixed invalid kvrocks cleanup in tests for Helm deployment
- Added a new 0-based `index` parameter in `GET
/api/jobs/{id}/data/?type=chunk` to simplify indexing
  - GT job chunks with non-sequential frames have no placeholders inside

When this update is applied to the server, there will be a data storage
setting migration for the tasks. Existing tasks using static chunks
(`task.data.storage_method == FILE_SYSTEM`) will be switched to the
dynamic cache (i.e. to `== CACHE)`). The remaining files should be
removed manually, there will be a list of such tasks in the migration
log file.

After this update, you'll have an option to enable or disable static
cache use during task creation. This allows, in particular, prohibit new
tasks using the static cache. With this option, any tasks using static
cache will use the dynamic cache instead on data access.

User-observable changes:
- Job chunk ids now start from 0 for each job instead of using parent
task ids
- The `use_cache = false` or `storage_method = filesystem` parameters in
task creation can be ignored by the server
- Task chunk access may be slower for some chunks (particularly, for
tasks with overlap configured, for chunks on segment boundaries, and for
tasks previously using static chunks)
- The last chunk in a job will contain only the frames from the current
job, even if there are more frames in the task

### How has this been tested?
<!-- Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc. -->

### Checklist
<!-- Go over all the following points, and put an `x` in all the boxes
that apply.
If an item isn't applicable for some reason, then ~~explicitly
strikethrough~~ the whole
line. If you don't do that, GitHub will show incorrect progress for the
pull request.
If you're unsure about any of these, don't hesitate to ask. We're here
to help! -->
- [ ] I submit my changes into the `develop` branch
- [ ] I have created a changelog fragment <!-- see top comment in
CHANGELOG.md -->
- [ ] I have updated the documentation accordingly
- [ ] I have added tests to cover my changes
- [ ] I have linked related issues (see [GitHub docs](

https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword))
- [ ] I have increased versions of npm packages if it is necessary

([cvat-canvas](https://github.com/cvat-ai/cvat/tree/develop/cvat-canvas#versioning),

[cvat-core](https://github.com/cvat-ai/cvat/tree/develop/cvat-core#versioning),

[cvat-data](https://github.com/cvat-ai/cvat/tree/develop/cvat-data#versioning)
and

[cvat-ui](https://github.com/cvat-ai/cvat/tree/develop/cvat-ui#versioning))

### License

- [ ] I submit _my code changes_ under the same [MIT License](
https://github.com/cvat-ai/cvat/blob/develop/LICENSE) that covers the
project.
  Feel free to contact the maintainers if that's a concern.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Introduced a new server setting to disable media chunks on the local
filesystem.
- Enhanced frame prefetching with a `startFrame` parameter for improved
chunk calculations.
- Added a new property, `data_original_chunk_type`, for enhanced job
differentiation in the metadata.

- **Bug Fixes**
- Resolved memory management issues to prevent leaks during video
processing.
	- Corrected naming inconsistencies related to the `prefetchAnalyzer`.

- **Documentation**
- Included configuration for code formatting tools to ensure consistent
code quality across the project.

- **Refactor**
- Restructured classes and methods for improved clarity and
maintainability, particularly in media handling and task processing.

- **Chores**
- Updated formatting scripts to include additional directories for
automated code formatting.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
  • Loading branch information
zhiltsov-max authored Sep 24, 2024
1 parent f227718 commit c9754a9
Show file tree
Hide file tree
Showing 50 changed files with 3,516 additions and 1,373 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,8 @@ jobs:
id: run_tests
run: |
pytest tests/python/
ONE_RUNNING_JOB_IN_QUEUE_PER_USER="true" pytest tests/python/rest_api/test_queues.py
CVAT_ALLOW_STATIC_CACHE="true" pytest -k "TestTaskData" tests/python
- name: Creating a log file from cvat containers
if: failure() && steps.run_tests.conclusion == 'failure'
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -177,8 +177,9 @@ jobs:
COVERAGE_PROCESS_START: ".coveragerc"
run: |
pytest tests/python/ --cov --cov-report=json
for COVERAGE_FILE in `find -name "coverage*.json" -type f -printf "%f\n"`; do mv ${COVERAGE_FILE} "${COVERAGE_FILE%%.*}_0.json"; done
ONE_RUNNING_JOB_IN_QUEUE_PER_USER="true" pytest tests/python/rest_api/test_queues.py --cov --cov-report=json
CVAT_ALLOW_STATIC_CACHE="true" pytest -k "TestTaskData" tests/python --cov --cov-report=json
for COVERAGE_FILE in `find -name "coverage*.json" -type f -printf "%f\n"`; do mv ${COVERAGE_FILE} "${COVERAGE_FILE%%.*}_0.json"; done
- name: Uploading code coverage results as an artifact
uses: actions/upload-artifact@v4
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,12 @@ jobs:
pytest tests/python/
pytest tests/python/ --stop-services
ONE_RUNNING_JOB_IN_QUEUE_PER_USER="true" pytest tests/python/rest_api/test_queues.py
pytest tests/python/ --stop-services
CVAT_ALLOW_STATIC_CACHE="true" pytest tests/python
pytest tests/python/ --stop-services
- name: Unit tests
env:
HOST_COVERAGE_DATA_DIR: ${{ github.workspace }}
Expand Down
24 changes: 24 additions & 0 deletions changelog.d/20240812_161617_mzhiltso_job_chunks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
### Added

- A server setting to enable or disable storage of permanent media chunks on the server filesystem
(<https://github.com/cvat-ai/cvat/pull/8272>)
- \[Server API\] `GET /api/jobs/{id}/data/?type=chunk&index=x` parameter combination.
The new `index` parameter allows to retrieve job chunks using 0-based index in each job,
instead of the `number` parameter, which used task chunk ids.
(<https://github.com/cvat-ai/cvat/pull/8272>)

### Changed

- Job assignees will not receive frames from adjacent jobs in chunks
(<https://github.com/cvat-ai/cvat/pull/8272>)

### Deprecated

- \[Server API\] `GET /api/jobs/{id}/data/?type=chunk&number=x` parameter combination
(<https://github.com/cvat-ai/cvat/pull/8272>)


### Fixed

- Various memory leaks in video reading on the server
(<https://github.com/cvat-ai/cvat/pull/8272>)
Loading

0 comments on commit c9754a9

Please sign in to comment.