Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipeline] refactor pipeline #679

Merged

Conversation

YuliangLiu0306
Copy link
Contributor

infer pipeline schedule parameters, such as tensor_shape or scatter_gather_tensors, from user config file

@YuliangLiu0306 YuliangLiu0306 linked an issue Apr 6, 2022 that may be closed by this pull request
@YuliangLiu0306 YuliangLiu0306 marked this pull request as ready for review April 6, 2022 07:35
@feifeibear feifeibear changed the title Feature/refactor pipeline [pipeline] refactor pipeline Apr 6, 2022
@ver217
Copy link
Member

ver217 commented Apr 6, 2022

Is this a good practice? In this way, users rely on config file heavier, which is diffenret from Pytorch's pattern.

@YuliangLiu0306
Copy link
Contributor Author

Is this a good practice? In this way, users rely on config file heavier, which is diffenret from Pytorch's pattern.

Actually, this PR do not add any addtitional requirement to the config file except set NUM_MICRO_BATCHES in pipeline parallel mode. We could infer those argument for schedule before, but we don't. This PR just fill it.

@kurisusnowdeng
Copy link
Member

I think a good way to define tensor shape for pipeline is to set static batch size, seq length and hidden size for the model. The schedule reads the sizes from the model. If they are None, we transfer the shape in the first step.

@YuliangLiu0306
Copy link
Contributor Author

I think a good way to define tensor shape for pipeline is to set static batch size, seq length and hidden size for the model. The schedule reads the sizes from the model. If they are None, we transfer the shape in the first step.

I agree with you, but I don't find a standard to define the config file, so the rule in this PR is induced from the examples config. If they are None, we will get the shape in the warmup stage.

@FrankLeeeee
Copy link
Contributor

FrankLeeeee commented Apr 6, 2022

I think a good way to define tensor shape for pipeline is to set static batch size, seq length and hidden size for the model. The schedule reads the sizes from the model. If they are None, we transfer the shape in the first step.

I agree with you, but I don't find a standard to define the config file, so the rule in this PR is induced from the examples config. If they are None, we will get the shape in the warmup stage.

In fact, I am pondering over the necessity of setting static tensor shape at all. In my opinion, there are several problems if we ask the user to set the static tensor shape.

  1. The user does not know the tensor shape. For example, NLP experts surely know that the intermediate activation of transformers is of shape (b, s, h), but no guarantee that novices can get it instantly and correctly.
  2. Static tensor shape is highly coupled with the tensor parallel mode. For example, 1D and 2D will split the tensor differently, and this is the prior knowledge that a user must know in order to give the correct tensor shape
  3. Static tensor shape does not help with multi-tensor passing between pipeline stages. If a layer produces multiple outputs which should be passed to the next stage, how can the user specify the tensor shape?
  4. Static tensor shape may potentially fail in the last batch of a training epoch as the last batch may be of a different shape due to insufficient data to make up to a full batch. This requires the users to configure other components, e.g. asking the user to drop the last batch in order to prevent error.

I propose to solve this problem by recording the tensor shape on the fly. For the first iteration, we check the tensor shape and data type and store these meta info in an object (e.g. PipelineMetaTracker). The subsequent iterations will re-use the meta information stored in it for communication between pipeline stages. When a different tensor shape is detected, it will update the data in the object. In this way, we can solve problems 1-4.

@kurisusnowdeng
Copy link
Member

@FrankLeeeee In what case we need to transfer multiple tensors across the pipeline?

@FrankLeeeee
Copy link
Contributor

@FrankLeeeee In what case we need to transfer multiple tensors across the pipeline?

In cases such as dual path network, one example is SlowFast, one of the most popular video networks.

@YuliangLiu0306
Copy link
Contributor Author

@YuliangLiu0306 scatter_gather_tensors is only used for 1D parallelism. We suggest to enable it when 1D is used. So, how about hide this option from users?

Yes. As we put schedule into engine, not only that option, the whole initialization process of schedule has been hidden. From users' view, they don't have to care about how to initialize schedule.

@FrankLeeeee FrankLeeeee merged commit 0ed7042 into hpcaitech:main Apr 7, 2022
@YuliangLiu0306 YuliangLiu0306 deleted the feature/refactor_pipeline branch July 27, 2022 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]: infer pipeline schedule params from config
5 participants