-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add pytorch hooks #179
add pytorch hooks #179
Conversation
Hi, thanks for your awesome PR. In |
Sure, I will fix it. |
Hey, I polished the code and added a simple GPU memory tracer using the ophooks. It can dump the GPU memory usage curve during #niter iterations to a file. I hope you like the feature.
|
Hi, the ophooks look great to me. As for your second point, we initialize the trainer hooks objects in the python script and pass the hook objects to the trainer instead of defining them in the config file. I am wondering if you can tell us where did you see such usage as it might be deprecated doc and we can update it. |
I saw APIs in builder/builder.py
Most of the functions in this file are not used in the project. |
Noted, these are deprecated after we update some APIs. Some code cleanup is needed. |
Meanwhile, I saw some print statements in the op hooks. I would recommend using A general usage of logger will be like this: from colossalai.logging import get_dist_logger
from colossalai.context import ParallelMode
# this will get the root Python logger
logger = get_dist_logger()
# log on all ranks
logger.info("some message")
# log only on rank 0
logger.info("some messsages", ranks=[0])
# log on rank 0 and rank 1
logger.info("some messsages", ranks=[0, 1])
# log on all data parallel rank 0
logger.info("some messages", ranks=[0], parallel_mode=ParallelMode.DATA)
# save the log
logger.log_to_file('./logs') You can find the API doc here. |
The MR has been polished. Replacing print() with logger. |
Awesome, thanks for the contribution :) |
* add pytorch hooks fix hpcaitech#175 * remove licenses in src code * add gpu memory tracer * replacing print with logger in ophooks.
fix #175