[CELEBORN-1300] Optimize CelebornInputStreamImpl's memory usage #2348

waitinfuture · 2024-03-01T08:24:41Z

What changes were proposed in this pull request?

To avoid too much memory usage when CelebornShuffleReader creates input streams.
This PR does the following:

Constructor of CelebornInputStream does not fetch chunk
compressedBuf and rawDataBuf are created first time fillBuffer is called
When fillBuffer returns false, which means the inputstream is exhausted, close is called and resource released
CelebornFetchFailureSuite is only run for Spark 3.0 and newer

Why are the changes needed?

ditto

Does this PR introduce any user-facing change?

No

How was this patch tested?

GA and e2e test.

waitinfuture · 2024-03-01T08:24:54Z

cc @CodingCat @RexXiong @FMX @ErikFang

codecov · 2024-03-01T09:04:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 48.85%. Comparing base (cae4de1) to head (5bdd9ad).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2348      +/-   ##
==========================================
- Coverage   48.90%   48.85%   -0.04%     
==========================================
  Files         207      207              
  Lines       12965    12965              
  Branches     1113     1113              
==========================================
- Hits         6339     6333       -6     
- Misses       6220     6224       +4     
- Partials      406      408       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

RexXiong

LGTM, thanks!

client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java

tests/spark-it/src/test/scala/org/apache/celeborn/tests/spark/SparkTestBase.scala

client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java

tests/spark-it/src/test/scala/org/apache/celeborn/tests/spark/SparkTestBase.scala

…nputStream.java Co-authored-by: Nicholas Jiang <programgeek@163.com>

…SparkTestBase.scala Co-authored-by: Nicholas Jiang <programgeek@163.com>

waitinfuture · 2024-03-05T02:53:02Z

Comments addressed, PTAL @CodingCat @SteNicholas @pan3793

client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java

CodingCat · 2024-03-05T04:48:55Z

LGTM

pan3793 · 2024-03-05T05:45:24Z

tests/spark-it/src/test/scala/org/apache/celeborn/tests/spark/SparkTestBase.scala

@@ -32,6 +33,10 @@ import org.apache.celeborn.service.deploy.MiniClusterFeature

 trait SparkTestBase extends AnyFunSuite
  with Logging with MiniClusterFeature with BeforeAndAfterAll with BeforeAndAfterEach {
+
+  val Spark3OrNewer = SPARK_VERSION >= "3.0"


not a big deal, in some cases, SPARK_VERSION returns "unknown", and

scala> "unknown" >= "3.0" res0: Boolean = true

### What changes were proposed in this pull request? To avoid too much memory usage when CelebornShuffleReader creates input streams. This PR does the following: 1. Constructor of `CelebornInputStream` does not fetch chunk 2. `compressedBuf` and `rawDataBuf` are created first time `fillBuffer` is called 3. When `fillBuffer` returns false, which means the inputstream is exhausted, `close` is called and resource released 4. `CelebornFetchFailureSuite` is only run for Spark 3.0 and newer ### Why are the changes needed? ditto ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA and e2e test. Closes #2348 from waitinfuture/1300. Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Co-authored-by: Keyong Zhou <waitinfuture@gmail.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> (cherry picked from commit 8b6bc35) Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>

waitinfuture · 2024-03-05T06:05:44Z

Thanks! Merged to main(0.5.0)/branch-0.4(v0.4.1)

To avoid too much memory usage when CelebornShuffleReader creates input streams. This PR does the following: 1. Constructor of `CelebornInputStream` does not fetch chunk 2. `compressedBuf` and `rawDataBuf` are created first time `fillBuffer` is called 3. When `fillBuffer` returns false, which means the inputstream is exhausted, `close` is called and resource released 4. `CelebornFetchFailureSuite` is only run for Spark 3.0 and newer ditto No GA and e2e test. Closes apache#2348 from waitinfuture/1300. Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Co-authored-by: Keyong Zhou <waitinfuture@gmail.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> (cherry picked from commit 8b6bc35) Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>

…ream ### What changes were proposed in this pull request? #2348 avoids fetching first chunk in the constructor of `CelebornInputStreamImpl`, but in some cases, i.e. coalescing 3000 partitions into one in Spark, it can be beneficial to do so for performance. This PR adds back prefetching with knobs default to false. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? Yes, two configs are added. ### How was this patch tested? Extended `MemorySkewJoinSuite` and `ReusedExchangeSuite`, and manual test. Closes #2549 from waitinfuture/1446. Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>

…ream ### What changes were proposed in this pull request? #2348 avoids fetching first chunk in the constructor of `CelebornInputStreamImpl`, but in some cases, i.e. coalescing 3000 partitions into one in Spark, it can be beneficial to do so for performance. This PR adds back prefetching with knobs default to false. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? Yes, two configs are added. ### How was this patch tested? Extended `MemorySkewJoinSuite` and `ReusedExchangeSuite`, and manual test. Closes #2549 from waitinfuture/1446. Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com> (cherry picked from commit d692e49) Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>

waitinfuture marked this pull request as draft March 1, 2024 08:39

waitinfuture changed the title ~~[CELEBORN-1300] Don't fetch chunk in CelebornInputStreamImpl's constructor~~ [CELEBORN-1300] Optimize CelebornInputStreamImpl's memory usage Mar 3, 2024

[CELEBORN-1300] Optimize CelebornInputStreamImpl's memory usage

9769a00

waitinfuture force-pushed the 1300 branch from a79accf to 9769a00 Compare March 3, 2024 07:34

waitinfuture marked this pull request as ready for review March 3, 2024 07:35

RexXiong approved these changes Mar 4, 2024

View reviewed changes

SteNicholas reviewed Mar 4, 2024

View reviewed changes

client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java Outdated Show resolved Hide resolved

SteNicholas reviewed Mar 4, 2024

View reviewed changes

client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java Outdated Show resolved Hide resolved

SteNicholas reviewed Mar 4, 2024

View reviewed changes

tests/spark-it/src/test/scala/org/apache/celeborn/tests/spark/SparkTestBase.scala Outdated Show resolved Hide resolved

pan3793 reviewed Mar 4, 2024

View reviewed changes

tests/spark-it/src/test/scala/org/apache/celeborn/tests/spark/SparkTestBase.scala Outdated Show resolved Hide resolved

CodingCat approved these changes Mar 4, 2024

View reviewed changes

client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java Outdated Show resolved Hide resolved

tests/spark-it/src/test/scala/org/apache/celeborn/tests/spark/SparkTestBase.scala Outdated Show resolved Hide resolved

waitinfuture and others added 3 commits March 5, 2024 10:28

Update client/src/main/java/org/apache/celeborn/client/read/CelebornI…

83983c8

…nputStream.java Co-authored-by: Nicholas Jiang <programgeek@163.com>

Update tests/spark-it/src/test/scala/org/apache/celeborn/tests/spark/…

7d17339

…SparkTestBase.scala Co-authored-by: Nicholas Jiang <programgeek@163.com>

address comments

e50b8cf

SteNicholas approved these changes Mar 5, 2024

View reviewed changes

SteNicholas reviewed Mar 5, 2024

View reviewed changes

client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java Show resolved Hide resolved

refine

5bdd9ad

SteNicholas approved these changes Mar 5, 2024

View reviewed changes

pan3793 reviewed Mar 5, 2024

View reviewed changes

pan3793 approved these changes Mar 5, 2024

View reviewed changes

waitinfuture closed this in 8b6bc35 Mar 5, 2024

waitinfuture mentioned this pull request Jun 9, 2024

[CELEBORN-1446] Enable chunk prefetch when initialize CelebornInputStream #2549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CELEBORN-1300] Optimize CelebornInputStreamImpl's memory usage #2348

[CELEBORN-1300] Optimize CelebornInputStreamImpl's memory usage #2348

waitinfuture commented Mar 1, 2024 •

edited

Loading

waitinfuture commented Mar 1, 2024 •

edited

Loading

codecov bot commented Mar 1, 2024 •

edited

Loading

RexXiong left a comment

waitinfuture commented Mar 5, 2024

CodingCat commented Mar 5, 2024

pan3793 Mar 5, 2024 •

edited

Loading

waitinfuture commented Mar 5, 2024

[CELEBORN-1300] Optimize CelebornInputStreamImpl's memory usage #2348

[CELEBORN-1300] Optimize CelebornInputStreamImpl's memory usage #2348

Conversation

waitinfuture commented Mar 1, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

waitinfuture commented Mar 1, 2024 • edited Loading

codecov bot commented Mar 1, 2024 • edited Loading

Codecov Report

RexXiong left a comment

Choose a reason for hiding this comment

waitinfuture commented Mar 5, 2024

CodingCat commented Mar 5, 2024

pan3793 Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

waitinfuture commented Mar 5, 2024

waitinfuture commented Mar 1, 2024 •

edited

Loading

waitinfuture commented Mar 1, 2024 •

edited

Loading

codecov bot commented Mar 1, 2024 •

edited

Loading

pan3793 Mar 5, 2024 •

edited

Loading