Description
Hi,
I'm running TensorFlow benchmark on IBM machine(POWER9 processor + V100 GPUs). I know it is not the optimal way to go, but I'm just trying out the performance of POWER9 without using GPUs. Turns out the performance is VERY low (~0.5 images/sec to 4 images/sec) regardless my tuning of threading number(from 16 to 160). I'm not sure if anyone has been playing with similar setup but I cannot seem to find any reported performance. I'm doubting the performance number because Power9 seems to have very high CPU frequency despite no MKL.
So can anyone give me any suggestions? I'm attaching the script here:
python ~/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --data_format=NHWC --batch_size=128 --num_batches=50 --model=resnet50 --optimizer=sgd --variable_update=replicated --use_fp16=False --nodistortions --gradient_repacking=2 --datasets_use_prefetch=True --loss_type_to_report=base_loss --compute_lr_on_cpu=True --single_l2_loss_op=True --local_parameter_device=cpu --device=cpu --local_parameter_device=cpu --display_every=10 --num_intra_threads=128 --num_inter_threads=1