feat(storage/transfermanager): automatically shard downloads #10379

BrennaEpp · 2024-06-13T11:36:22Z

This is missing a few components that should be added in follow ups:

checksums
transcoding test

Note that this is missing a few components still: - checksums - unit tests for DownloadBuffer - more integration tests (error testing, transcoding...) It does work as-is (tests pass)

tritone

Good start on this, a few initial comments...

tritone · 2024-06-13T15:51:06Z

storage/transfermanager/download_buffer.go

+	requiredLength := int64(len(p)) + off
+
+	// Our buffer isn't big enough, let's grow it.
+	if int64(cap(db.bytes)) < requiredLength {


This seems like it could have to grow by a lot of small increments potentially; we should see if that causes problems in profiling (and if so maybe do bigger increments of growth).

It's completely possible that that happens (depends on partsize), but I think we also don't want to increment by a lot more than is needed. I was initially thinking to double it as append does, but that could more easily run into problems with large buffers.

To avoid a lot of small increments, users could pass in a buffer that's already close to the object size to NewDownloadBuffer, or we could schedule the last shard first, so that it grows the buffer all it needs to since the beginning (but that has other potential performance implications). Another option is to give users the ability to configure how much the buffer grows by.

I think even if partsize is large it still might be possible for many small writes to happen. But yeah profiling is the way to diagnose whether this is an issue.

I think we can add the growth rate later as an exported field, so we can merge as-is and just add that later if need be.

tritone · 2024-06-13T16:13:46Z

storage/transfermanager/downloader.go

+	errs := []error{}
+	var shardOut *DownloadOutput
+	for i := 1; i < shards; i++ {
+		// Add monitoring here? This could hang if any individual piece does.


By monitoring do you mean metrics or some kind of go routine to monitor?

A go routine, but it might not be necessary as everything should cancel when the ctx does (so timeouts would stop this from hanging)

Okay, I think as long as we have a test for a hanging part then we can remove the comment.

I'm not sure if there is an easy way to simulate a hanging part, besides that would just be testing that a read request gets cancelled when its context is, which feels out of scope for TM.

Other than that, if we add some logic here then we could test - some options:

extend the perOpTimeout and stop listening for pieces when done (start a timer with per_op_timeout * num_shards)

have a default timer for collecting all pieces (probably not this)

check for ctx cancel directly in here and stop when cancelled, but I don't like that because then it would miss receiving any other errors in shards that were just completing...

instead of 3 with immediate cancellation, maybe a timer for a couple seconds could start once ctx cancel is received, and we only stop listening once that timer is up

storage/transfermanager/downloader.go

storage/transfermanager/downloader_test.go

tritone · 2024-06-13T16:18:31Z

storage/transfermanager/option.go

+// WithPartSize returns a TransferManagerOption that specifies the size of the
+// shards to transfer; that is, if the object is larger than this size, it will
+// be uploaded or downloaded in concurrent pieces.
+// The default is 32 MiB for downloads.


Is this based on other langs' default?

Yes, this is based off of Python. We should do some perf testing and change this accordingly.

I think we should also probably set a minimum for partSize, and a way to turn off sharding (set it to zero, or negative?) But that can come in a separate PR.

storage/transfermanager/downloader.go

tritone

Few more minor comments, but otherwise looks good

tritone · 2024-06-21T03:49:55Z

storage/transfermanager/downloader.go

@@ -290,6 +293,7 @@ func NewDownloader(c *storage.Client, opts ...Option) (*Downloader, error) {
 }

 // DownloadRange specifies the object range.
+// Transcoded objects do not support ranged reads.


I think this needs a little more description, I would copy the note from the NewRangeReader docs https://pkg.go.dev/cloud.google.com/go/storage#ObjectHandle.NewRangeReader

storage/transfermanager/downloader.go

tritone · 2024-06-24T15:03:13Z

storage/transfermanager/downloader_buffer_test.go

 )

 func TestDownloadBuffer(t *testing.T) {
 	t.Parallel()
-	// Unit test DownloadBuffer
+
+	// Create without an underlying buffer.


tritone · 2024-06-24T15:24:05Z

storage/transfermanager/download_buffer.go

+	requiredLength := int64(len(p)) + off
+
+	// Our buffer isn't big enough, let's grow it.
+	if int64(cap(db.bytes)) < requiredLength {


I think even if partsize is large it still might be possible for many small writes to happen. But yeah profiling is the way to diagnose whether this is an issue.

I think we can add the growth rate later as an exported field, so we can merge as-is and just add that later if need be.

storage/transfermanager/downloader.go

tritone · 2024-06-24T15:26:38Z

storage/transfermanager/downloader.go

+	errs := []error{}
+	var shardOut *DownloadOutput
+	for i := 1; i < shards; i++ {
+		// Add monitoring here? This could hang if any individual piece does.


Okay, I think as long as we have a test for a hanging part then we can remove the comment.

storage/transfermanager/downloader.go

tritone · 2024-06-24T15:32:24Z

storage/transfermanager/option.go

+// WithPartSize returns a TransferManagerOption that specifies the size of the
+// shards to transfer; that is, if the object is larger than this size, it will
+// be uploaded or downloaded in concurrent pieces.
+// The default is 32 MiB for downloads.


I think we should also probably set a minimum for partSize, and a way to turn off sharding (set it to zero, or negative?) But that can come in a separate PR.

feat(storage/transfermanager): automatically shard downloads

955b88f

Note that this is missing a few components still: - checksums - unit tests for DownloadBuffer - more integration tests (error testing, transcoding...) It does work as-is (tests pass)

product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Jun 13, 2024

tritone reviewed Jun 13, 2024

View reviewed changes

BrennaEpp added 4 commits June 19, 2024 00:54

add buffer unit tests

9958cc5

address comments

f75e2df

Merge branch 'main' into tm-shard

6f06e84

edit metric header

03d8b78

BrennaEpp commented Jun 19, 2024

View reviewed changes

storage/transfermanager/downloader.go Show resolved Hide resolved

fix merge issue

d76c29a

BrennaEpp marked this pull request as ready for review June 19, 2024 08:21

BrennaEpp requested review from a team as code owners June 19, 2024 08:21

add unit test for gatherShards

3283bf9

tritone reviewed Jun 24, 2024

View reviewed changes

edit transcoding comments

d809dbe

BrennaEpp requested a review from tritone June 24, 2024 21:54

tritone approved these changes Jun 25, 2024

View reviewed changes

Merge branch 'main' into tm-shard

1f8bec0

tritone added the automerge Merge the pull request once unit tests and other checks pass. label Jun 25, 2024

BrennaEpp added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 25, 2024

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 25, 2024

gcf-merge-on-green bot merged commit 05816f9 into googleapis:main Jun 25, 2024
8 checks passed

gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Jun 25, 2024

release-please bot mentioned this pull request Jun 24, 2024

chore(main): release storage 1.43.0 #10375

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage/transfermanager): automatically shard downloads #10379

feat(storage/transfermanager): automatically shard downloads #10379

BrennaEpp commented Jun 13, 2024 •

edited

Loading

tritone left a comment

tritone Jun 13, 2024

BrennaEpp Jun 13, 2024

tritone Jun 24, 2024

tritone Jun 13, 2024

BrennaEpp Jun 13, 2024

tritone Jun 24, 2024

BrennaEpp Jun 24, 2024

tritone Jun 13, 2024

BrennaEpp Jun 13, 2024

tritone Jun 24, 2024

tritone left a comment

tritone Jun 21, 2024

tritone Jun 24, 2024

tritone Jun 24, 2024

tritone Jun 24, 2024

tritone Jun 24, 2024

feat(storage/transfermanager): automatically shard downloads #10379

feat(storage/transfermanager): automatically shard downloads #10379

Conversation

BrennaEpp commented Jun 13, 2024 • edited Loading

tritone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tritone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BrennaEpp commented Jun 13, 2024 •

edited

Loading