【NOMERGE】Profile load graph time #619

doombeaker · 2024-02-04T07:33:33Z

No description provided.

doombeaker · 2024-02-04T07:41:48Z

examples/text_to_image_sdxl.py

 image = base(
    prompt=args.prompt,
    height=args.height,
    width=args.width,
    num_inference_steps=args.n_steps,
    output_type=OUTPUT_TYPE,
 ).images
+flow._oneflow_internal.eager.Sync()
+end_time = time.time()
+print(f"{end_time-start_time}s elapsed: 1st infer")


运行：

python examples/text_to_image_sdxl.py --compile_unet true --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0

得到的结果比较奇怪：

虽然有 load_graph 的耗时（10 s 左右），但是第一次推理出图需要的时间，仍然需要 54s ，而第二次推理是 4s。

说明 加载预编译图之后，开始进行第一次迭代采样之前，有操作耗时 50s 左右。

我想 vyro 一直和我们抱怨 load_graph 慢其实是抱怨的这点，并不真是函数 load_graph 耗时久，而是函数 load_graph 调用、到第一次出图的耗时久。

Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 4.43it/s] Compiling unet with oneflow. 1.2790741920471191s elapsed: compile Loading from graph 2.3534035682678223s elpased: assign of_module time 0.03800535202026367s elpased: get_oneflow_graph time 2.3914899826049805s elpased: dpl_graph time 7.607182264328003s elpased: load_graph time 9.998764038085938s elapsed: unet.load_graph Warmup with running graphs... 0%| | 0/30 [00:00<?, ?it/s] 8.821487426757812e-06s elpased: assign of_module time 0.03774738311767578s elpased: get_oneflow_graph time 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:53<00:00, 1.78s/it] 54.89510440826416s elapsed: 1st infer Normal SDXL run... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:04<00:00, 7.40it/s] 4.648606061935425s elapsed: 2nd infer

54.89510440826416s elapsed: 1st infer

怎么感觉在编译。

可以开下 ONEDIFF_BEBUG=1 来看是否触发了编译

@doombeaker

doombeaker · 2024-02-04T13:22:50Z

经确认，不管是本 PR 的新增（简单）脚本，还是 https://github.com/siliconflow/onediff/blob/main/examples/text_to_image_sdxl_save_load.py 脚本。

不管有否 load_graph，第一次调用都会进入这个 self._compile 调用中，这个调用中会编译整个 graph

https://github.com/siliconflow/oneflow/blob/876c6d6b258e5ada564a0a71f490918317d764ab/python/oneflow/nn/graph/graph.py#L294-L295

这里有几个问题需要讨论和定位：

之前有 真正做到过 load_graph 后，调用时直接使用加载的 graph，整个端到端构图省掉编译时间的例子吗。如果有，那说明中间某些 commit，把功能改坏过
一个 graph 调用时，到底是不是进行编译，判断条件是？ "graph 层就一个判断逻辑 if not self._is_compiled"，oneflow_compile 中还有些包装后的判断（如 dpl_graph._is_raw_deployable_module），感觉没有一致导致了这个问题。

isidentical · 2024-02-04T23:59:59Z

this is also something we observed quite recently, the first inference after loading the graph takes 40-50 seconds but I wasn't sure what happened. I think this is a new regression, but can't really point into the particular commit.

doombeaker · 2024-02-05T01:47:33Z

this is also something we observed quite recently, the first inference after loading the graph takes 40-50 seconds but I wasn't sure what happened. I think this is a new regression, but can't really point into the particular commit.

sorry for that. I have confirmed this branch raise the bug:

onediff/src/onediff/infer_compiler/utils/args_tree_util.py

Lines 40 to 46 in 814053b

    
           if count != input_count: 
        
               logger.warning( 
        
                   f"Module {type(self._deployable_module_model.oneflow_module)} input tensor count changed from {count} to {input_count}, will compile again." 
        
               ) 
        
               self._deployable_module_dpl_graph = None 
        
               self._load_graph_first_run = True 
        
               self._deployable_module_input_count = input_count

we will fix it soon.

update: it has been fixed by #622

…nediff into profile_load_graph_time

来源：#619 (comment)

strint · 2024-02-29T07:50:43Z

fix with #622

doombeaker added 3 commits February 2, 2024 11:12

remove custom scheduler to refine quality

ae31423

wired time consuming

d4f49ef

using origin file

5c6eb64

doombeaker commented Feb 4, 2024

View reviewed changes

strint marked this pull request as ready for review February 4, 2024 08:27

Fix duplicate compilation

54ae7e4

ccssu mentioned this pull request Feb 5, 2024

Fix duplicate compilation #622

Merged

Merge branch 'fix_repetitive_compilation' of github.com:siliconflow/o…

13bc5fe

…nediff into profile_load_graph_time

ccssu added a commit that referenced this pull request Feb 5, 2024

Fix duplicate compilation (#622)

b7eeb1b

来源：#619 (comment)

doombeaker added 3 commits February 5, 2024 14:00

Merge branch 'main' into profile_load_graph_time

9f88c65

init dev warmupmodule

4d3ba03

for debug update

4e7cd9b

strint added Request-new_feature Request for a new feature Rsp-milestone labels Feb 7, 2024

strint added this to the v0.12.1 milestone Feb 7, 2024

profiling

7061345

strint closed this Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【NOMERGE】Profile load graph time #619

【NOMERGE】Profile load graph time #619

doombeaker commented Feb 4, 2024

doombeaker Feb 4, 2024 •

edited

Loading

strint Feb 4, 2024

strint Feb 4, 2024

doombeaker commented Feb 4, 2024

isidentical commented Feb 4, 2024

doombeaker commented Feb 5, 2024 •

edited

Loading

strint commented Feb 29, 2024

【NOMERGE】Profile load graph time #619

【NOMERGE】Profile load graph time #619

Conversation

doombeaker commented Feb 4, 2024

doombeaker Feb 4, 2024 • edited Loading

Choose a reason for hiding this comment

strint Feb 4, 2024

Choose a reason for hiding this comment

strint Feb 4, 2024

Choose a reason for hiding this comment

doombeaker commented Feb 4, 2024

isidentical commented Feb 4, 2024

doombeaker commented Feb 5, 2024 • edited Loading

strint commented Feb 29, 2024

doombeaker Feb 4, 2024 •

edited

Loading

doombeaker commented Feb 5, 2024 •

edited

Loading