Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[CELEBORN-1034] Offer slots uses random range of available workers in…
…stead of shuffling ### What changes were proposed in this pull request? In original design, (primary worker, replica worker) pairs tends to stay stable, for example, for primary PartitionLocations on Worker A, their replica PartitionLocations will all be on Worker B in general scenarios, i.e. all workers are healthy and works well. This way, one Worker will have only one (or very few) connections to other workers' replicate netty server. However, apache#1790 calls `Collections.shuffle(availableWorkers)`, causing the number of replica connections increases dramatically: ![image](https://github.com/apache/incubator-celeborn/assets/948245/013c7bc8-a224-413e-9c0c-519ae76c9d32) ### Why are the changes needed? This PR refine the logic of selecting limited number of workers, instead of shuffling, Master just randomly picks a range of available workers. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Closes apache#1975 from waitinfuture/1034. Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Co-authored-by: Keyong Zhou <waitinfuture@gmail.com> Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
- Loading branch information