Skip to content

GCP cloud storage downloaded file corruption #2301

@Cai-Chen

Description

@Cai-Chen

Hi, recently we got an intermittent issue that the file size downloaded via storage sdk is different from the GCP cloud storage. Our initial investigation pointed us to here (code) that when an exception is thrown the retry won't update position then data will be duplicated/corrupted.

We wrote a simple test to verify.

package test;

import com.google.auth.oauth2.GoogleCredentials;
import com.google.cloud.ReadChannel;
import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.StorageOptions;
import org.testng.annotations.Test;
import org.testng.collections.Lists;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.nio.Buffer;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Paths;

public class test {
    @Test
    public void testDownload() throws Exception {
        GoogleCredentials credentials = GoogleCredentials.fromStream(new FileInputStream("/path/secret.json"))
                .createScoped(Lists.newArrayList("https://www.googleapis.com/auth/cloud-platform"));
        var storage = StorageOptions.newBuilder().setCredentials(credentials).build().getService();

        var bucket = "test-bucket";
        var name = "test.file";

        var blobReference = new GCPRemoteObjectReference(BlobId.of(bucket, name));

        final File localFilePath = Paths.get("/local/path/test.file").toFile();

        try (final ReadChannel inputChannel = storage.reader(blobReference.getBlobId())) {
            localFilePath.getParentFile().mkdirs();
            try (FileChannel fileChannel = new FileOutputStream(localFilePath).getChannel()) {
                ByteBuffer bytes = ByteBuffer.allocate(64 * 1024);

                while (inputChannel.read(bytes) > 0) {
                    ((Buffer) bytes).flip();
                    fileChannel.write(bytes);
                    bytes.clear();
                }
            }
        }
    }

}

And we set a breakpoint in java.nio.channels.Channels
image

When debugging this test and hitting this breakpoint, manually throw an java.net.SocketTimeoutException. Then remove the breakpoint and Resume Program to let it proceed. And check the file size in local and bucket.
image

I know this internal/hack way is not a perfect way to reproduce this issue, but it's just our first investigation and hard to reproduce externally.

Could this be a false alarm?

Thanks.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

api: storageIssues related to the googleapis/java-storage API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions