Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom chunked upload with custom progress #457

Open
gsemet opened this issue Oct 22, 2024 · 1 comment
Open

Custom chunked upload with custom progress #457

gsemet opened this issue Oct 22, 2024 · 1 comment

Comments

@gsemet
Copy link
Contributor

gsemet commented Oct 22, 2024

Hello,

i would be interested in having the equivalent of writeto but for upload. I would like to implement a tdqm progress bar for long upload, and for that i need somehow not to be blocked during deploy_file, or provide a callback mecanism.

is it something possible to do right now?

Here is my code for the download progress using tdqm:

CHUNK_SIZE = 1024 * 1024  # 1MB


def download_file_tqdm(
    source_file: ArtifactoryPath,
    target_file: Path,
    chuck_size: int = CHUNK_SIZE,
    progress_bar_desc: str = "Downloading",
):
    """
    Download a file from Artifactory with a tqdm progress bar.

    Args:
        source_file: The source file path in Artifactory.
        target_file: The target file path on the local filesystem. Parent folders are created.
        chuck_size: The size of each chunk to read at a time. Defaults to 1 Mb.
        progress_bar_desc: The description of the progress bar. Defaults to "Downloading".

    Example:
        .. code-block:: python

            source = ArtifactoryPath("http://artifactory.example.com/path/to/file")
            target = Path("/local/path/to/file")
            download_file_tqdm(source, target)
    """

    class _TdqmArtifactoryPathWriteTo(tqdm.tqdm):
        def progress_func(self, bytes_now, total_size):
            if total_size is not None:
                self.total = total_size
            return self.update(bytes_now - self.n)  # also sets self.n = b * bsize

    target_file.parent.mkdir(parents=True, exist_ok=True)
    with open(str(target_file), "wb") as target_fd:
        with _TdqmArtifactoryPathWriteTo(
            desc=progress_bar_desc,
            unit="B",
            unit_scale=True,
            unit_divisor=1024,
            miniters=1,
        ) as t:
            source_file.writeto(
                target_fd,
                chunk_size=chuck_size,
                progress_func=t.progress_func,
            )

thanks !

@gsemet
Copy link
Contributor Author

gsemet commented Oct 23, 2024

Edit: I think I found a way:

def upload_file_tqdm(
    local_file_path: Path,
    remote_path: ArtifactoryPath,
    chuck_size: int = CHUNK_SIZE,
    progress_bar_desc: str = "Uploading",
):
    """
    Upload a file to Artifactory with a progress bar.

    Args:
        local_file_path: The local file path to upload.
        remote_path: The remote Artifactory path where the file will be uploaded. Existing file will
            be replaced.
        chuck_size: The size of each chunk to read at a time. Defaults to 1 Mb.
        progress_bar_desc: The description of the progress bar. Defaults to "Uploading".

    Example:
        .. code-block:: python

            local_file = Path("/local/path/to/file")
            remote = ArtifactoryPath("http://artifactory.example.com/path/to/file")
            upload_file_tqdm(local_file, remote)
    """
    if not local_file_path.exists():
        raise FileNotFoundError(f"File {local_file_path} does not exist")

    filesize = local_file_path.stat().st_size
    if filesize == 0:
        # File is empty, just create a remote, empty file
        remote_path.touch()
        return
    with local_file_path.open("rb") as fobj:
        with tqdm.tqdm(
            desc=progress_bar_desc,
            total=filesize,
            unit_scale=True,
            unit="B",
            unit_divisor=1024,
            miniters=1,
        ) as t:

            def iter_in_file():
                for chunk in read_in_chunks(
                    fobj,
                    chunk_size=chuck_size,
                ):
                    yield chunk
                    t.update(len(chunk))

            session = remote_path.session
            session.put(
                str(remote_path),
                data=iter_in_file(),
                stream=True,
            )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant