Skip to content

Commit

Permalink
[Docs] Benchmark docs (huggingface#5360)
Browse files Browse the repository at this point in the history
* first doc version

* add benchmark docs

* fix typos

* improve README

* Update docs/source/benchmarks.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix naming and docs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
  • Loading branch information
patrickvonplaten and LysandreJik authored Jun 29, 2020
1 parent 482c917 commit 4bcc35c
Show file tree
Hide file tree
Showing 10 changed files with 373 additions and 109 deletions.
54 changes: 0 additions & 54 deletions docs/source/benchmarks.md

This file was deleted.

322 changes: 322 additions & 0 deletions docs/source/benchmarks.rst

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions examples/benchmarking/run_benchmark_tf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Benchmarking the library on inference and training in Tensorflow"""
""" Benchmarking the library on inference and training in TensorFlow"""

from transformers import HfArgumentParser, TensorflowBenchmark, TensorflowBenchmarkArguments
from transformers import HfArgumentParser, TensorFlowBenchmark, TensorFlowBenchmarkArguments


def main():
parser = HfArgumentParser(TensorflowBenchmarkArguments)
parser = HfArgumentParser(TensorFlowBenchmarkArguments)
benchmark_args = parser.parse_args_into_dataclasses()[0]
benchmark = TensorflowBenchmark(args=benchmark_args)
benchmark = TensorFlowBenchmark(args=benchmark_args)
benchmark.run()


Expand Down
4 changes: 0 additions & 4 deletions examples/benchmarking/time_xla_1.csv

This file was deleted.

32 changes: 16 additions & 16 deletions notebooks/05-benchmark.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@
"\n",
"Being able to accurately benchmark language models on both *speed* and *required memory* is therefore very important.\n",
"\n",
"HuggingFace's Transformer library allows users to benchmark models for both Tensorflow 2 and PyTorch using the `PyTorchBenchmark` and `TensorflowBenchmark` classes.\n",
"HuggingFace's Transformer library allows users to benchmark models for both TensorFlow 2 and PyTorch using the `PyTorchBenchmark` and `TensorFlowBenchmark` classes.\n",
"\n",
"The currently available features for `PyTorchBenchmark` are summarized in the following table.\n",
"\n",
Expand All @@ -306,7 +306,7 @@
"\n",
"* *torchscript* corresponds to PyTorch's torchscript format, see [here](https://pytorch.org/docs/stable/jit.html).\n",
"\n",
"The currently available features for `TensorflowBenchmark` are summarized in the following table.\n",
"The currently available features for `TensorFlowBenchmark` are summarized in the following table.\n",
"\n",
"| | CPU | CPU + eager execution | GPU | GPU + eager execution | GPU + XLA | GPU + FP16 | TPU |\n",
":-- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |\n",
Expand All @@ -315,16 +315,16 @@
"**Speed - Train** | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |\n",
"**Memory - Train** | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |\n",
"\n",
"* *eager execution* means that the function is run in the eager execution environment of Tensorflow 2, see [here](https://www.tensorflow.org/guide/eager).\n",
"* *eager execution* means that the function is run in the eager execution environment of TensorFlow 2, see [here](https://www.tensorflow.org/guide/eager).\n",
"\n",
"* *XLA* stands for Tensorflow's Accelerated Linear Algebra (XLA) compiler, see [here](https://www.tensorflow.org/xla)\n",
"* *XLA* stands for TensorFlow's Accelerated Linear Algebra (XLA) compiler, see [here](https://www.tensorflow.org/xla)\n",
"\n",
"* *FP16* stands for Tensorflow's mixed-precision package and is analogous to PyTorch's FP16 feature, see [here](https://www.tensorflow.org/guide/mixed_precision).\n",
"* *FP16* stands for TensorFlow's mixed-precision package and is analogous to PyTorch's FP16 feature, see [here](https://www.tensorflow.org/guide/mixed_precision).\n",
"\n",
"***Note***: In ~1,2 weeks it will also be possible to benchmark training in Tensorflow.\n",
"***Note***: In ~1,2 weeks it will also be possible to benchmark training in TensorFlow.\n",
"\n",
"\n",
"This notebook will show the user how to use `PyTorchBenchmark` and `TensorflowBenchmark` for two different scenarios:\n",
"This notebook will show the user how to use `PyTorchBenchmark` and `TensorFlowBenchmark` for two different scenarios:\n",
"\n",
"1. **Inference - Pre-trained Model Comparison** - *A user wants to implement a pre-trained model in production for inference. She wants to compare different models on speed and required memory.*\n",
"\n",
Expand Down Expand Up @@ -443,7 +443,7 @@
"source": [
"Looks good! Now we import `transformers` and download the scripts `run_benchmark.py`, `run_benchmark_tf.py`, and `plot_csv_file.py` which can be found under `transformers/examples/benchmarking`.\n",
"\n",
"`run_benchmark_tf.py` and `run_benchmark.py` are very simple scripts leveraging the `PyTorchBenchmark` and `TensorflowBenchmark` classes, respectively."
"`run_benchmark_tf.py` and `run_benchmark.py` are very simple scripts leveraging the `PyTorchBenchmark` and `TensorFlowBenchmark` classes, respectively."
]
},
{
Expand Down Expand Up @@ -482,7 +482,7 @@
"colab_type": "text"
},
"source": [
"Information about the input arguments to the *run_benchmark* scripts can be accessed by running `!python run_benchmark.py --help` for PyTorch and `!python run_benchmark_tf.py --help` for Tensorflow."
"Information about the input arguments to the *run_benchmark* scripts can be accessed by running `!python run_benchmark.py --help` for PyTorch and `!python run_benchmark_tf.py --help` for TensorFlow."
]
},
{
Expand Down Expand Up @@ -1130,7 +1130,7 @@
},
"source": [
"At this point, it is important to understand how the peak memory is measured. The benchmarking tools measure the peak memory usage the same way the command `nvidia-smi` does - see [here](https://developer.nvidia.com/nvidia-system-management-interface) for more information. \n",
"In short, all memory that is allocated for a given *model identifier*, *batch size* and *sequence length* is measured in a separate process. This way it can be ensured that there is no previously unreleased memory falsely included in the measurement. One should also note that the measured memory even includes the memory allocated by the CUDA driver to load PyTorch and Tensorflow and is, therefore, higher than library-specific memory measurement function, *e.g.* this one for [PyTorch](https://pytorch.org/docs/stable/cuda.html#torch.cuda.max_memory_allocated).\n",
"In short, all memory that is allocated for a given *model identifier*, *batch size* and *sequence length* is measured in a separate process. This way it can be ensured that there is no previously unreleased memory falsely included in the measurement. One should also note that the measured memory even includes the memory allocated by the CUDA driver to load PyTorch and TensorFlow and is, therefore, higher than library-specific memory measurement function, *e.g.* this one for [PyTorch](https://pytorch.org/docs/stable/cuda.html#torch.cuda.max_memory_allocated).\n",
"\n",
"Alright, let's analyze the results. It can be noted that the models `aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616_squad2` and `deepset/roberta-base-squad2` require significantly less memory than the other three models. Besides `mrm8488/longformer-base-4096-finetuned-squadv2` all models more or less follow the same memory consumption pattern with `aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616_squad2` seemingly being able to better scale to larger sequence lengths. \n",
"`mrm8488/longformer-base-4096-finetuned-squadv2` is a *Longformer* model, which makes use of *LocalAttention* (check this blog post to learn more about local attention) so that the model scales much better to longer input sequences.\n",
Expand Down Expand Up @@ -1256,7 +1256,7 @@
"source": [
"Interesting! `aodiniz/bert_uncased_L-10_H-51` clearly scales better for higher batch sizes and does not even run out of memory for 512 tokens.\n",
"\n",
"For comparison, let's run the same benchmarking on Tensorflow."
"For comparison, let's run the same benchmarking on TensorFlow."
]
},
{
Expand Down Expand Up @@ -1341,7 +1341,7 @@
"colab_type": "text"
},
"source": [
"Let's see the same plot for Tensorflow."
"Let's see the same plot for TensorFlow."
]
},
{
Expand Down Expand Up @@ -1394,7 +1394,7 @@
"colab_type": "text"
},
"source": [
"The model implemented in Tensorflow requires more memory than the one implemented in PyTorch. Let's say for whatever reason we have decided to use Tensorflow instead of PyTorch. \n",
"The model implemented in TensorFlow requires more memory than the one implemented in PyTorch. Let's say for whatever reason we have decided to use TensorFlow instead of PyTorch. \n",
"\n",
"The next step is to measure the inference time of these two models. Instead of disabling time measurement with `--no_speed`, we will now disable memory measurement with `--no_memory`."
]
Expand Down Expand Up @@ -1499,7 +1499,7 @@
"source": [
"Ok, this took some time... time measurements take much longer than memory measurements because the forward pass is called multiple times for stable results. Timing measurements leverage Python's [timeit module](https://docs.python.org/2/library/timeit.html#timeit.Timer.repeat) and run 10 times the value given to the `--repeat` argument (defaults to 3), so in our case 30 times.\n",
"\n",
"Let's focus on the resulting plot. It becomes obvious that `aodiniz/bert_uncased_L-10_H-51` is around twice as fast as `deepset/roberta-base-squad2`. Given that the model is also more memory efficient and assuming that the model performs reasonably well, for the sake of this notebook we will settle on `aodiniz/bert_uncased_L-10_H-51`. Our model should be able to process input sequences of up to 512 tokens. Latency time of around 2 seconds might be too long though, so let's compare the time for different batch sizes and using Tensorflows XLA package for more speed."
"Let's focus on the resulting plot. It becomes obvious that `aodiniz/bert_uncased_L-10_H-51` is around twice as fast as `deepset/roberta-base-squad2`. Given that the model is also more memory efficient and assuming that the model performs reasonably well, for the sake of this notebook we will settle on `aodiniz/bert_uncased_L-10_H-51`. Our model should be able to process input sequences of up to 512 tokens. Latency time of around 2 seconds might be too long though, so let's compare the time for different batch sizes and using TensorFlows XLA package for more speed."
]
},
{
Expand Down Expand Up @@ -1551,7 +1551,7 @@
"colab_type": "text"
},
"source": [
"First of all, it can be noted that XLA reduces latency time by a factor of ca. 1.3 (which is more than observed for other models by Tensorflow [here](https://www.tensorflow.org/xla)). A batch size of 64 looks like a good choice. More or less half a second for the forward pass is good enough.\n",
"First of all, it can be noted that XLA reduces latency time by a factor of ca. 1.3 (which is more than observed for other models by TensorFlow [here](https://www.tensorflow.org/xla)). A batch size of 64 looks like a good choice. More or less half a second for the forward pass is good enough.\n",
"\n",
"Cool, now it should be straightforward to benchmark your favorite models. All the inference time measurements can also be done using the `run_benchmark.py` script for PyTorch."
]
Expand Down Expand Up @@ -2021,4 +2021,4 @@
]
}
]
}
}
4 changes: 2 additions & 2 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -613,8 +613,8 @@
from .trainer_tf import TFTrainer

# Benchmarks
from .benchmark.benchmark_tf import TensorflowBenchmark
from .benchmark.benchmark_args_tf import TensorflowBenchmarkArguments
from .benchmark.benchmark_tf import TensorFlowBenchmark
from .benchmark.benchmark_args_tf import TensorFlowBenchmarkArguments


if not is_tf_available() and not is_torch_available():
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/benchmark/benchmark_args_tf.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@


@dataclass
class TensorflowBenchmarkArguments(BenchmarkArguments):
class TensorFlowBenchmarkArguments(BenchmarkArguments):
tpu_name: str = field(
default=None, metadata={"help": "Name of TPU"},
)
Expand Down
16 changes: 8 additions & 8 deletions src/transformers/benchmark/benchmark_tf.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@

if is_tf_available():
import tensorflow as tf
from .benchmark_args_tf import TensorflowBenchmarkArguments
from .benchmark_args_tf import TensorFlowBenchmarkArguments
from tensorflow.python.framework.errors_impl import ResourceExhaustedError

if is_py3nvml_available():
Expand Down Expand Up @@ -75,11 +75,11 @@ def random_input_ids(batch_size: int, sequence_length: int, vocab_size: int) ->
return tf.constant(values, shape=(batch_size, sequence_length), dtype=tf.int32)


class TensorflowBenchmark(Benchmark):
class TensorFlowBenchmark(Benchmark):

args: TensorflowBenchmarkArguments
args: TensorFlowBenchmarkArguments
configs: PretrainedConfig
framework: str = "Tensorflow"
framework: str = "TensorFlow"

@property
def framework_version(self):
Expand All @@ -88,7 +88,7 @@ def framework_version(self):
def _inference_speed(self, model_name: str, batch_size: int, sequence_length: int) -> float:
# initialize GPU on separate process
strategy = self.args.strategy
assert strategy is not None, "A device strategy has to be initialized before using Tensorflow."
assert strategy is not None, "A device strategy has to be initialized before using TensorFlow."
_inference = self._prepare_inference_func(model_name, batch_size, sequence_length)
return self._measure_speed(_inference)

Expand All @@ -104,7 +104,7 @@ def _inference_memory(
if self.args.is_gpu:
tf.config.experimental.set_memory_growth(self.args.gpu_list[self.args.device_idx], True)
strategy = self.args.strategy
assert strategy is not None, "A device strategy has to be initialized before using Tensorflow."
assert strategy is not None, "A device strategy has to be initialized before using TensorFlow."
_inference = self._prepare_inference_func(model_name, batch_size, sequence_length)
return self._measure_memory(_inference)

Expand Down Expand Up @@ -166,7 +166,7 @@ def _measure_speed(self, func) -> float:

def _measure_memory(self, func: Callable[[], None]) -> [Memory, MemorySummary]:
logger.info(
"Note that Tensorflow allocates more memory than"
"Note that TensorFlow allocates more memory than"
"it might need to speed up computation."
"The memory reported here corresponds to the memory"
"reported by `nvidia-smi`, which can vary depending"
Expand Down Expand Up @@ -210,7 +210,7 @@ def _measure_memory(self, func: Callable[[], None]) -> [Memory, MemorySummary]:
# cpu
if self.args.trace_memory_line_by_line:
logger.info(
"When enabling line by line tracing, the max peak memory for CPU is inaccurate in Tensorflow."
"When enabling line by line tracing, the max peak memory for CPU is inaccurate in TensorFlow."
)
memory = None
else:
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/benchmark/benchmark_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -740,7 +740,7 @@ def environment_info(self):
info["framework"] = self.framework
if self.framework == "PyTorch":
info["use_torchscript"] = self.args.torchscript
if self.framework == "Tensorflow":
if self.framework == "TensorFlow":
info["eager_mode"] = self.args.eager_mode
info["use_xla"] = self.args.use_xla
info["framework_version"] = self.framework_version
Expand Down
Loading

0 comments on commit 4bcc35c

Please sign in to comment.