Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

Extremely low throughput of running on IBM POWER9 processor #407

Open
@jw447

Description

Hi,

I'm running TensorFlow benchmark on IBM machine(POWER9 processor + V100 GPUs). I know it is not the optimal way to go, but I'm just trying out the performance of POWER9 without using GPUs. Turns out the performance is VERY low (~0.5 images/sec to 4 images/sec) regardless my tuning of threading number(from 16 to 160). I'm not sure if anyone has been playing with similar setup but I cannot seem to find any reported performance. I'm doubting the performance number because Power9 seems to have very high CPU frequency despite no MKL.

So can anyone give me any suggestions? I'm attaching the script here:

python ~/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --data_format=NHWC --batch_size=128 --num_batches=50 --model=resnet50 --optimizer=sgd --variable_update=replicated --use_fp16=False --nodistortions --gradient_repacking=2 --datasets_use_prefetch=True --loss_type_to_report=base_loss --compute_lr_on_cpu=True --single_l2_loss_op=True --local_parameter_device=cpu --device=cpu --local_parameter_device=cpu --display_every=10 --num_intra_threads=128 --num_inter_threads=1

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions