Open source release for Charformer.

PiperOrigin-RevId: 382158961
parmjeet112 · Jun 29, 2021 · c61715c · c61715c
1 parent 041c564
commit c61715c
Show file tree

Hide file tree

Showing 5 changed files with 836 additions and 0 deletions.
diff --git a/charformer/README.md b/charformer/README.md
@@ -0,0 +1,56 @@
+# Charformer
+
+This repository contains the Mesh-Tensorflow implementation of Charformer:
+Fast Character Transformers via Gradient-based Subword Tokenization.
+
+This implementation works with the [T5-codebase](https://github.com/google-research/text-to-text-transfer-transformer).
+
+# Usage
+
+Currently this codebase contains the modules/layers that can be plugged into T5 codebase. We are working on a JAX/FLAX implementation that will be later available in this repository. For now, the Mesh-TF implementation exists as a reference implementation. 
+
+
+One would need to modify `transformer.py` in https://github.com/tensorflow/mesh to use
+the provided Charformer layers. The code to inject Charformer layers can be found at
+`https://github.com/tensorflow/mesh/blob/master/mesh_tensorflow/transformer/transformer.py`.
+
+### Integration Steps
+
+Step 1: Add the following lines to the `__init__` function of Unitransformer class.
+
+```
+if self.gradient_subwords:
+  tf.logging.info("Using gradient subwords..")
+  self.grad_layer = [gradient_subword_layer()] * self.num_gsw_layers
+```
+along with new args `gradient_subwords`, `gradient_subword_layer` to the class.
+
+Step 2: Right after the positional embeddings, add
+
+```
+if self.gradient_subwords and self.grad_layer:
+  tf.logging.info("Using Charformer before computing layer stack.")
+  # tensor should be batch x char_length x dim]
+  for grad_layer in self.grad_layer:
+    x, context = grad_layer.call(context, x)
+```
+Step 3:
+Create a gin config (similar to the one provided in `configs/cf_v2_d3_dv_base.gin` which you may use in place of any other gin configs in the T5 codebase.
+
+### Reference
+
+If you use our work, or find it helpful in some form, please consider citing our paper:
+
+```
+@misc{tay2021charformer,
+      title={Charformer: Fast Character Transformers via Gradient-based Subword Tokenization}, 
+      author={Yi Tay and Vinh Q. Tran and Sebastian Ruder and Jai Gupta and Hyung Won Chung and Dara Bahri and Zhen Qin and Simon Baumgartner and Cong Yu and Donald Metzler},
+      year={2021},
+      eprint={2106.12672},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+
+
+
diff --git a/charformer/configs/cf_v2_d2_cv_base.gin b/charformer/configs/cf_v2_d2_cv_base.gin
@@ -0,0 +1,26 @@
+# -*-Python-*-
+# Example Gin Config
+import charformer.lib.charformer_layers
+include 'models/bi_v1.gin'
+
+GradientSubwordLayerV2.key_value_size = %d_kv
+GradientSubwordLayerV2.num_heads = %num_heads
+GradientSubwordLayerV2.dropout_rate = %dropout_rate
+GradientSubwordLayerV2.downsample_query = 2.0
+GradientSubwordLayerV2.radius = 8
+GradientSubwordLayerV2.low_rank_features = 32
+GradientSubwordLayerV2.project_kv = False
+GradientSubwordLayerV2.use_ffn = False
+GradientSubwordLayerV2.local_gate = False
+GradientSubwordLayerV2.num_memory_slots = 0
+GradientSubwordLayerV2.local_attention = False
+GradientSubwordLayerV2.consider_chars_as_blocks = True
+GradientSubwordLayerV2.conv_type = "conv1d"
+
+encoder/Unitransformer.gradient_subwords = True
+
+make_layer_stack.layer_stack_cls=@charformer_layers.CharformerLayerStack
+
+encoder/Unitransformer.gradient_subword_layer = @charformer_layers.GradientSubwordLayerV2
+
+mesh_train_dataset_fn.pack = False
diff --git a/charformer/configs/cf_v2_d3_cv_base.gin b/charformer/configs/cf_v2_d3_cv_base.gin
@@ -0,0 +1,26 @@
+# -*-Python-*-
+# Example Gin Config
+import charformer.lib.charformer_layers
+include 'models/bi_v1.gin'
+
+GradientSubwordLayerV2.key_value_size = %d_kv
+GradientSubwordLayerV2.num_heads = %num_heads
+GradientSubwordLayerV2.dropout_rate = %dropout_rate
+GradientSubwordLayerV2.downsample_query = 3.0
+GradientSubwordLayerV2.radius = 8
+GradientSubwordLayerV2.low_rank_features = 32
+GradientSubwordLayerV2.project_kv = False
+GradientSubwordLayerV2.use_ffn = False
+GradientSubwordLayerV2.local_gate = False
+GradientSubwordLayerV2.num_memory_slots = 0
+GradientSubwordLayerV2.local_attention = False
+GradientSubwordLayerV2.consider_chars_as_blocks = True
+GradientSubwordLayerV2.conv_type = "conv1d"
+
+encoder/Unitransformer.gradient_subwords = True
+
+make_layer_stack.layer_stack_cls=@charformer_layers.CharformerLayerStack
+
+encoder/Unitransformer.gradient_subword_layer = @charformer_layers.GradientSubwordLayerV2
+
+mesh_train_dataset_fn.pack = False