Skip to content

[RFC] Replace "parallel learning" in docs with "distributed learning"? #3596

Closed
@jameslamb

Description

LightGBM comes with the ability to use multiple machines for training. This can be done with the CLI, or with integrations like Spark, Kubeflow Fairing, and Dask (#3515 ).

Today, the docs refer to training with multiple machines as "parallel learning".

image

I think that this is not quite precise enough, and can lead to some confusion.

LightGBM has at least two types of parallelism:

  • within one process (shared memory), using multithreading with OpenMP
  • across multiple processes (possibly on multiple machines, and with distributed data), using either sockets or MPI

https://lightgbm.readthedocs.io/en/latest/Parallel-Learning-Guide.html#parallel-learning-guide only refers to the second case today.

I think we should rename this guide to "Distributed Learning" and use the word "distributed" everywhere in the documentation that talks about using multiple machines to accomplish model training.

Wanted to open this request for comment before I start making changes. What do you think?

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions