[Docs] Benchmark docs (huggingface#5360)

* first doc version * add benchmark docs * fix typos * improve README * Update docs/source/benchmarks.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * fix naming and docs Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
zsc · Jun 29, 2020 · 4bcc35c · 4bcc35c
1 parent 482c917
commit 4bcc35c
Show file tree

Hide file tree

Showing 10 changed files with 373 additions and 109 deletions.
diff --git a/docs/source/benchmarks.md b/docs/source/benchmarks.md
diff --git a/docs/source/benchmarks.rst b/docs/source/benchmarks.rst
diff --git a/examples/benchmarking/run_benchmark_tf.py b/examples/benchmarking/run_benchmark_tf.py
@@ -13,15 +13,15 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-""" Benchmarking the library on inference and training in Tensorflow"""
+""" Benchmarking the library on inference and training in TensorFlow"""
 
-from transformers import HfArgumentParser, TensorflowBenchmark, TensorflowBenchmarkArguments
+from transformers import HfArgumentParser, TensorFlowBenchmark, TensorFlowBenchmarkArguments
 
 
 def main():
-    parser = HfArgumentParser(TensorflowBenchmarkArguments)
+    parser = HfArgumentParser(TensorFlowBenchmarkArguments)
     benchmark_args = parser.parse_args_into_dataclasses()[0]
-    benchmark = TensorflowBenchmark(args=benchmark_args)
+    benchmark = TensorFlowBenchmark(args=benchmark_args)
     benchmark.run()
 
 

diff --git a/examples/benchmarking/time_xla_1.csv b/examples/benchmarking/time_xla_1.csv
diff --git a/notebooks/05-benchmark.ipynb b/notebooks/05-benchmark.ipynb
@@ -289,7 +289,7 @@
         "\n",
         "Being able to accurately benchmark language models on both *speed* and *required memory* is therefore very important.\n",
         "\n",
-        "HuggingFace's Transformer library allows users to benchmark models for both Tensorflow 2 and PyTorch using the `PyTorchBenchmark` and `TensorflowBenchmark` classes.\n",
+        "HuggingFace's Transformer library allows users to benchmark models for both TensorFlow 2 and PyTorch using the `PyTorchBenchmark` and `TensorFlowBenchmark` classes.\n",
         "\n",
         "The currently available features for `PyTorchBenchmark` are summarized in the following table.\n",
         "\n",
@@ -306,7 +306,7 @@
         "\n",
         "*   *torchscript* corresponds to PyTorch's torchscript format, see [here](https://pytorch.org/docs/stable/jit.html).\n",
         "\n",
-        "The currently available features for `TensorflowBenchmark` are summarized in the following table.\n",
+        "The currently available features for `TensorFlowBenchmark` are summarized in the following table.\n",
         "\n",
         "| | CPU | CPU + eager execution | GPU | GPU + eager execution | GPU + XLA | GPU + FP16 | TPU |\n",
         ":-- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |\n",
@@ -315,16 +315,16 @@
         "**Speed - Train** | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |\n",
         "**Memory - Train** | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |\n",
         "\n",
-        "*   *eager execution* means that the function is run in the eager execution environment of Tensorflow 2, see [here](https://www.tensorflow.org/guide/eager).\n",
+        "*   *eager execution* means that the function is run in the eager execution environment of TensorFlow 2, see [here](https://www.tensorflow.org/guide/eager).\n",
         "\n",
-        "*   *XLA* stands for Tensorflow's Accelerated Linear Algebra (XLA) compiler, see [here](https://www.tensorflow.org/xla)\n",
+        "*   *XLA* stands for TensorFlow's Accelerated Linear Algebra (XLA) compiler, see [here](https://www.tensorflow.org/xla)\n",
         "\n",
-        "*   *FP16* stands for Tensorflow's mixed-precision package and is analogous to PyTorch's FP16 feature, see [here](https://www.tensorflow.org/guide/mixed_precision).\n",
+        "*   *FP16* stands for TensorFlow's mixed-precision package and is analogous to PyTorch's FP16 feature, see [here](https://www.tensorflow.org/guide/mixed_precision).\n",
         "\n",
-        "***Note***: In ~1,2 weeks it will also be possible to benchmark training in Tensorflow.\n",
+        "***Note***: In ~1,2 weeks it will also be possible to benchmark training in TensorFlow.\n",
         "\n",
         "\n",
-        "This notebook will show the user how to use `PyTorchBenchmark` and `TensorflowBenchmark` for two different scenarios:\n",
+        "This notebook will show the user how to use `PyTorchBenchmark` and `TensorFlowBenchmark` for two different scenarios:\n",
         "\n",
         "1. **Inference - Pre-trained Model Comparison** - *A user wants to implement a pre-trained model in production for inference. She wants to compare different models on speed and required memory.*\n",
         "\n",
@@ -443,7 +443,7 @@
       "source": [
         "Looks good! Now we import `transformers` and download the scripts `run_benchmark.py`, `run_benchmark_tf.py`, and `plot_csv_file.py` which can be found under `transformers/examples/benchmarking`.\n",
         "\n",
-        "`run_benchmark_tf.py` and `run_benchmark.py` are very simple scripts leveraging the `PyTorchBenchmark` and `TensorflowBenchmark` classes, respectively."
+        "`run_benchmark_tf.py` and `run_benchmark.py` are very simple scripts leveraging the `PyTorchBenchmark` and `TensorFlowBenchmark` classes, respectively."
       ]
     },
     {
@@ -482,7 +482,7 @@
         "colab_type": "text"
       },
       "source": [
-        "Information about the input arguments to the *run_benchmark* scripts can be accessed by running `!python run_benchmark.py --help` for PyTorch and `!python run_benchmark_tf.py --help` for Tensorflow."
+        "Information about the input arguments to the *run_benchmark* scripts can be accessed by running `!python run_benchmark.py --help` for PyTorch and `!python run_benchmark_tf.py --help` for TensorFlow."
       ]
     },
     {
@@ -1130,7 +1130,7 @@
       },
       "source": [
         "At this point, it is important to understand how the peak memory is measured. The benchmarking tools measure the peak memory usage the same way the command `nvidia-smi` does - see [here](https://developer.nvidia.com/nvidia-system-management-interface) for more information. \n",
-        "In short, all memory that is allocated for a given *model identifier*, *batch size* and *sequence length* is measured in a separate process. This way it can be ensured that there is no previously unreleased memory falsely included in the measurement. One should also note that the measured memory even includes the memory allocated by the CUDA driver to load PyTorch and Tensorflow and is, therefore, higher than library-specific memory measurement function, *e.g.* this one for [PyTorch](https://pytorch.org/docs/stable/cuda.html#torch.cuda.max_memory_allocated).\n",
+        "In short, all memory that is allocated for a given *model identifier*, *batch size* and *sequence length* is measured in a separate process. This way it can be ensured that there is no previously unreleased memory falsely included in the measurement. One should also note that the measured memory even includes the memory allocated by the CUDA driver to load PyTorch and TensorFlow and is, therefore, higher than library-specific memory measurement function, *e.g.* this one for [PyTorch](https://pytorch.org/docs/stable/cuda.html#torch.cuda.max_memory_allocated).\n",
         "\n",
         "Alright, let's analyze the results. It can be noted that the models `aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616_squad2` and `deepset/roberta-base-squad2` require significantly less memory than the other three models. Besides `mrm8488/longformer-base-4096-finetuned-squadv2` all models more or less follow the same memory consumption pattern with `aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616_squad2` seemingly being able to better scale to larger sequence lengths. \n",
         "`mrm8488/longformer-base-4096-finetuned-squadv2` is a *Longformer* model, which makes use of *LocalAttention* (check this blog post to learn more about local attention) so that the model scales much better to longer input sequences.\n",
@@ -1256,7 +1256,7 @@
       "source": [
         "Interesting! `aodiniz/bert_uncased_L-10_H-51` clearly scales better for higher batch sizes and does not even run out of memory for 512 tokens.\n",
         "\n",
-        "For comparison, let's run the same benchmarking on Tensorflow."
+        "For comparison, let's run the same benchmarking on TensorFlow."
       ]
     },
     {
@@ -1341,7 +1341,7 @@
         "colab_type": "text"
       },
       "source": [
-        "Let's see the same plot for Tensorflow."
+        "Let's see the same plot for TensorFlow."
       ]
     },
     {
@@ -1394,7 +1394,7 @@
         "colab_type": "text"
       },
       "source": [
-        "The model implemented in Tensorflow requires more memory than the one implemented in PyTorch. Let's say for whatever reason we have decided to use Tensorflow instead of PyTorch. \n",
+        "The model implemented in TensorFlow requires more memory than the one implemented in PyTorch. Let's say for whatever reason we have decided to use TensorFlow instead of PyTorch. \n",
         "\n",
         "The next step is to measure the inference time of these two models. Instead of disabling time measurement with `--no_speed`, we will now disable memory measurement with `--no_memory`."
       ]
@@ -1499,7 +1499,7 @@
       "source": [
         "Ok, this took some time... time measurements take much longer than memory measurements because the forward pass is called multiple times for stable results. Timing measurements leverage Python's [timeit module](https://docs.python.org/2/library/timeit.html#timeit.Timer.repeat) and run 10 times the value given to the `--repeat` argument (defaults to 3), so in our case 30 times.\n",
         "\n",
-        "Let's focus on the resulting plot. It becomes obvious that `aodiniz/bert_uncased_L-10_H-51` is around twice as fast as `deepset/roberta-base-squad2`. Given that the model is also more memory efficient and assuming that the model performs reasonably well, for the sake of this notebook we will settle on `aodiniz/bert_uncased_L-10_H-51`. Our model should be able to process input sequences of up to 512 tokens. Latency time of around 2 seconds might be too long though, so let's compare the time for different batch sizes and using Tensorflows XLA package for more speed."
+        "Let's focus on the resulting plot. It becomes obvious that `aodiniz/bert_uncased_L-10_H-51` is around twice as fast as `deepset/roberta-base-squad2`. Given that the model is also more memory efficient and assuming that the model performs reasonably well, for the sake of this notebook we will settle on `aodiniz/bert_uncased_L-10_H-51`. Our model should be able to process input sequences of up to 512 tokens. Latency time of around 2 seconds might be too long though, so let's compare the time for different batch sizes and using TensorFlows XLA package for more speed."
       ]
     },
     {
@@ -1551,7 +1551,7 @@
         "colab_type": "text"
       },
       "source": [
-        "First of all, it can be noted that XLA reduces latency time by a factor of ca. 1.3 (which is more than observed for other models by Tensorflow [here](https://www.tensorflow.org/xla)). A batch size of 64 looks like a good choice. More or less half a second for the forward pass is good enough.\n",
+        "First of all, it can be noted that XLA reduces latency time by a factor of ca. 1.3 (which is more than observed for other models by TensorFlow [here](https://www.tensorflow.org/xla)). A batch size of 64 looks like a good choice. More or less half a second for the forward pass is good enough.\n",
         "\n",
         "Cool, now it should be straightforward to benchmark your favorite models. All the inference time measurements can also be done using the `run_benchmark.py` script for PyTorch."
       ]
@@ -2021,4 +2021,4 @@
       ]
     }
   ]
-}
+}
diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py
@@ -613,8 +613,8 @@
     from .trainer_tf import TFTrainer
 
     # Benchmarks
-    from .benchmark.benchmark_tf import TensorflowBenchmark
-    from .benchmark.benchmark_args_tf import TensorflowBenchmarkArguments
+    from .benchmark.benchmark_tf import TensorFlowBenchmark
+    from .benchmark.benchmark_args_tf import TensorFlowBenchmarkArguments
 
 
 if not is_tf_available() and not is_torch_available():

diff --git a/src/transformers/benchmark/benchmark_args_tf.py b/src/transformers/benchmark/benchmark_args_tf.py
@@ -30,7 +30,7 @@
 
 
 @dataclass
-class TensorflowBenchmarkArguments(BenchmarkArguments):
+class TensorFlowBenchmarkArguments(BenchmarkArguments):
     tpu_name: str = field(
         default=None, metadata={"help": "Name of TPU"},
     )

diff --git a/src/transformers/benchmark/benchmark_tf.py b/src/transformers/benchmark/benchmark_tf.py
@@ -38,7 +38,7 @@
 
 if is_tf_available():
     import tensorflow as tf
-    from .benchmark_args_tf import TensorflowBenchmarkArguments
+    from .benchmark_args_tf import TensorFlowBenchmarkArguments
     from tensorflow.python.framework.errors_impl import ResourceExhaustedError
 
 if is_py3nvml_available():
@@ -75,11 +75,11 @@ def random_input_ids(batch_size: int, sequence_length: int, vocab_size: int) ->
     return tf.constant(values, shape=(batch_size, sequence_length), dtype=tf.int32)
 
 
-class TensorflowBenchmark(Benchmark):
+class TensorFlowBenchmark(Benchmark):
 
-    args: TensorflowBenchmarkArguments
+    args: TensorFlowBenchmarkArguments
     configs: PretrainedConfig
-    framework: str = "Tensorflow"
+    framework: str = "TensorFlow"
 
     @property
     def framework_version(self):
@@ -88,7 +88,7 @@ def framework_version(self):
     def _inference_speed(self, model_name: str, batch_size: int, sequence_length: int) -> float:
         # initialize GPU on separate process
         strategy = self.args.strategy
-        assert strategy is not None, "A device strategy has to be initialized before using Tensorflow."
+        assert strategy is not None, "A device strategy has to be initialized before using TensorFlow."
         _inference = self._prepare_inference_func(model_name, batch_size, sequence_length)
         return self._measure_speed(_inference)
 
@@ -104,7 +104,7 @@ def _inference_memory(
         if self.args.is_gpu:
             tf.config.experimental.set_memory_growth(self.args.gpu_list[self.args.device_idx], True)
         strategy = self.args.strategy
-        assert strategy is not None, "A device strategy has to be initialized before using Tensorflow."
+        assert strategy is not None, "A device strategy has to be initialized before using TensorFlow."
         _inference = self._prepare_inference_func(model_name, batch_size, sequence_length)
         return self._measure_memory(_inference)
 
@@ -166,7 +166,7 @@ def _measure_speed(self, func) -> float:
 
     def _measure_memory(self, func: Callable[[], None]) -> [Memory, MemorySummary]:
         logger.info(
-            "Note that Tensorflow allocates more memory than"
+            "Note that TensorFlow allocates more memory than"
             "it might need to speed up computation."
             "The memory reported here corresponds to the memory"
             "reported by `nvidia-smi`, which can vary depending"
@@ -210,7 +210,7 @@ def _measure_memory(self, func: Callable[[], None]) -> [Memory, MemorySummary]:
                     # cpu
                     if self.args.trace_memory_line_by_line:
                         logger.info(
-                            "When enabling line by line tracing, the max peak memory for CPU is inaccurate in Tensorflow."
+                            "When enabling line by line tracing, the max peak memory for CPU is inaccurate in TensorFlow."
                         )
                         memory = None
                     else:

diff --git a/src/transformers/benchmark/benchmark_utils.py b/src/transformers/benchmark/benchmark_utils.py
@@ -740,7 +740,7 @@ def environment_info(self):
             info["framework"] = self.framework
             if self.framework == "PyTorch":
                 info["use_torchscript"] = self.args.torchscript
-            if self.framework == "Tensorflow":
+            if self.framework == "TensorFlow":
                 info["eager_mode"] = self.args.eager_mode
                 info["use_xla"] = self.args.use_xla
             info["framework_version"] = self.framework_version