-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【NOMERGE】Profile load graph time #619
Conversation
image = base( | ||
prompt=args.prompt, | ||
height=args.height, | ||
width=args.width, | ||
num_inference_steps=args.n_steps, | ||
output_type=OUTPUT_TYPE, | ||
).images | ||
flow._oneflow_internal.eager.Sync() | ||
end_time = time.time() | ||
print(f"{end_time-start_time}s elapsed: 1st infer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
运行:
python examples/text_to_image_sdxl.py --compile_unet true --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0
得到的结果比较奇怪:
虽然有 load_graph
的耗时(10 s 左右),但是第一次推理出图需要的时间,仍然需要 54s ,而第二次推理是 4s。
说明 加载预编译图之后,开始进行第一次迭代采样之前,有操作耗时 50s 左右。
我想 vyro 一直和我们抱怨 load_graph 慢其实是抱怨的这点,并不真是函数 load_graph
耗时久,而是函数 load_graph
调用、到第一次出图的耗时久。
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 4.43it/s]
Compiling unet with oneflow.
1.2790741920471191s elapsed: compile
Loading from graph
2.3534035682678223s elpased: assign of_module time
0.03800535202026367s elpased: get_oneflow_graph time
2.3914899826049805s elpased: dpl_graph time
7.607182264328003s elpased: load_graph time
9.998764038085938s elapsed: unet.load_graph
Warmup with running graphs...
0%| | 0/30 [00:00<?, ?it/s] 8.821487426757812e-06s elpased: assign of_module time
0.03774738311767578s elpased: get_oneflow_graph time
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:53<00:00, 1.78s/it]
54.89510440826416s elapsed: 1st infer
Normal SDXL run...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:04<00:00, 7.40it/s]
4.648606061935425s elapsed: 2nd infer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
54.89510440826416s elapsed: 1st infer
怎么感觉在编译。
可以开下 ONEDIFF_BEBUG=1 来看是否触发了编译
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
经确认,不管是本 PR 的新增(简单)脚本,还是 https://github.com/siliconflow/onediff/blob/main/examples/text_to_image_sdxl_save_load.py 脚本。 不管有否 这里有几个问题需要讨论和定位:
|
this is also something we observed quite recently, the first inference after loading the graph takes 40-50 seconds but I wasn't sure what happened. I think this is a new regression, but can't really point into the particular commit. |
sorry for that. I have confirmed this branch raise the bug: onediff/src/onediff/infer_compiler/utils/args_tree_util.py Lines 40 to 46 in 814053b
we will fix it soon. update: it has been fixed by #622 |
…nediff into profile_load_graph_time
fix with #622 |
No description provided.