-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pipeline] refactor pipeline #679
[pipeline] refactor pipeline #679
Conversation
…/ColossalAI into feature/refactor_pipeline
Is this a good practice? In this way, users rely on config file heavier, which is diffenret from Pytorch's pattern. |
Actually, this PR do not add any addtitional requirement to the config file except set NUM_MICRO_BATCHES in pipeline parallel mode. We could infer those argument for schedule before, but we don't. This PR just fill it. |
I think a good way to define tensor shape for pipeline is to set static batch size, seq length and hidden size for the model. The schedule reads the sizes from the model. If they are None, we transfer the shape in the first step. |
I agree with you, but I don't find a standard to define the config file, so the rule in this PR is induced from the examples config. If they are None, we will get the shape in the warmup stage. |
In fact, I am pondering over the necessity of setting static tensor shape at all. In my opinion, there are several problems if we ask the user to set the static tensor shape.
I propose to solve this problem by recording the tensor shape on the fly. For the first iteration, we check the tensor shape and data type and store these meta info in an object (e.g. PipelineMetaTracker). The subsequent iterations will re-use the meta information stored in it for communication between pipeline stages. When a different tensor shape is detected, it will update the data in the object. In this way, we can solve problems 1-4. |
@FrankLeeeee In what case we need to transfer multiple tensors across the pipeline? |
In cases such as dual path network, one example is SlowFast, one of the most popular video networks. |
Yes. As we put schedule into engine, not only that option, the whole initialization process of schedule has been hidden. From users' view, they don't have to care about how to initialize schedule. |
infer pipeline schedule parameters, such as tensor_shape or scatter_gather_tensors, from user config file