Skip to content

Commit

Permalink
Open source release for Charformer.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 382158961
  • Loading branch information
Yi Tay authored and copybara-github committed Jun 29, 2021
1 parent 041c564 commit c61715c
Show file tree
Hide file tree
Showing 5 changed files with 836 additions and 0 deletions.
56 changes: 56 additions & 0 deletions charformer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Charformer

This repository contains the Mesh-Tensorflow implementation of Charformer:
Fast Character Transformers via Gradient-based Subword Tokenization.

This implementation works with the [T5-codebase](https://github.com/google-research/text-to-text-transfer-transformer).

# Usage

Currently this codebase contains the modules/layers that can be plugged into T5 codebase. We are working on a JAX/FLAX implementation that will be later available in this repository. For now, the Mesh-TF implementation exists as a reference implementation.


One would need to modify `transformer.py` in https://github.com/tensorflow/mesh to use
the provided Charformer layers. The code to inject Charformer layers can be found at
`https://github.com/tensorflow/mesh/blob/master/mesh_tensorflow/transformer/transformer.py`.

### Integration Steps

Step 1: Add the following lines to the `__init__` function of Unitransformer class.

```
if self.gradient_subwords:
tf.logging.info("Using gradient subwords..")
self.grad_layer = [gradient_subword_layer()] * self.num_gsw_layers
```
along with new args `gradient_subwords`, `gradient_subword_layer` to the class.

Step 2: Right after the positional embeddings, add

```
if self.gradient_subwords and self.grad_layer:
tf.logging.info("Using Charformer before computing layer stack.")
# tensor should be batch x char_length x dim]
for grad_layer in self.grad_layer:
x, context = grad_layer.call(context, x)
```
Step 3:
Create a gin config (similar to the one provided in `configs/cf_v2_d3_dv_base.gin` which you may use in place of any other gin configs in the T5 codebase.

### Reference

If you use our work, or find it helpful in some form, please consider citing our paper:

```
@misc{tay2021charformer,
title={Charformer: Fast Character Transformers via Gradient-based Subword Tokenization},
author={Yi Tay and Vinh Q. Tran and Sebastian Ruder and Jai Gupta and Hyung Won Chung and Dara Bahri and Zhen Qin and Simon Baumgartner and Cong Yu and Donald Metzler},
year={2021},
eprint={2106.12672},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```



26 changes: 26 additions & 0 deletions charformer/configs/cf_v2_d2_cv_base.gin
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# -*-Python-*-
# Example Gin Config
import charformer.lib.charformer_layers
include 'models/bi_v1.gin'

GradientSubwordLayerV2.key_value_size = %d_kv
GradientSubwordLayerV2.num_heads = %num_heads
GradientSubwordLayerV2.dropout_rate = %dropout_rate
GradientSubwordLayerV2.downsample_query = 2.0
GradientSubwordLayerV2.radius = 8
GradientSubwordLayerV2.low_rank_features = 32
GradientSubwordLayerV2.project_kv = False
GradientSubwordLayerV2.use_ffn = False
GradientSubwordLayerV2.local_gate = False
GradientSubwordLayerV2.num_memory_slots = 0
GradientSubwordLayerV2.local_attention = False
GradientSubwordLayerV2.consider_chars_as_blocks = True
GradientSubwordLayerV2.conv_type = "conv1d"

encoder/Unitransformer.gradient_subwords = True

make_layer_stack.layer_stack_cls=@charformer_layers.CharformerLayerStack

encoder/Unitransformer.gradient_subword_layer = @charformer_layers.GradientSubwordLayerV2

mesh_train_dataset_fn.pack = False
26 changes: 26 additions & 0 deletions charformer/configs/cf_v2_d3_cv_base.gin
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# -*-Python-*-
# Example Gin Config
import charformer.lib.charformer_layers
include 'models/bi_v1.gin'

GradientSubwordLayerV2.key_value_size = %d_kv
GradientSubwordLayerV2.num_heads = %num_heads
GradientSubwordLayerV2.dropout_rate = %dropout_rate
GradientSubwordLayerV2.downsample_query = 3.0
GradientSubwordLayerV2.radius = 8
GradientSubwordLayerV2.low_rank_features = 32
GradientSubwordLayerV2.project_kv = False
GradientSubwordLayerV2.use_ffn = False
GradientSubwordLayerV2.local_gate = False
GradientSubwordLayerV2.num_memory_slots = 0
GradientSubwordLayerV2.local_attention = False
GradientSubwordLayerV2.consider_chars_as_blocks = True
GradientSubwordLayerV2.conv_type = "conv1d"

encoder/Unitransformer.gradient_subwords = True

make_layer_stack.layer_stack_cls=@charformer_layers.CharformerLayerStack

encoder/Unitransformer.gradient_subword_layer = @charformer_layers.GradientSubwordLayerV2

mesh_train_dataset_fn.pack = False
Loading

0 comments on commit c61715c

Please sign in to comment.