[RFC] Replace "parallel learning" in docs with "distributed learning"?

LightGBM comes with the ability to use multiple machines for training. This can be done with the CLI, or with integrations like Spark, Kubeflow Fairing, and Dask (#3515 ).

Today, the docs refer to training with multiple machines as "parallel learning".

![image](https://user-images.githubusercontent.com/7608904/100284019-20cc5900-2f34-11eb-8f94-a14e756ebd3d.png)

I think that this is not quite precise enough, and can lead to some confusion.

LightGBM has at least two types of parallelism:

* within one process (shared memory), using multithreading with OpenMP
* across multiple processes (possibly on multiple machines, and with distributed data), using either sockets or MPI

https://lightgbm.readthedocs.io/en/latest/Parallel-Learning-Guide.html#parallel-learning-guide only refers to the second case today.

I think we should rename this guide to "Distributed Learning" and use the word "distributed" everywhere in the documentation that talks about using multiple machines to accomplish model training.

Wanted to open this request for comment before I start making changes. What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Replace "parallel learning" in docs with "distributed learning"? #3596

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development