-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
5 changed files
with
836 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Charformer | ||
|
||
This repository contains the Mesh-Tensorflow implementation of Charformer: | ||
Fast Character Transformers via Gradient-based Subword Tokenization. | ||
|
||
This implementation works with the [T5-codebase](https://github.com/google-research/text-to-text-transfer-transformer). | ||
|
||
# Usage | ||
|
||
Currently this codebase contains the modules/layers that can be plugged into T5 codebase. We are working on a JAX/FLAX implementation that will be later available in this repository. For now, the Mesh-TF implementation exists as a reference implementation. | ||
|
||
|
||
One would need to modify `transformer.py` in https://github.com/tensorflow/mesh to use | ||
the provided Charformer layers. The code to inject Charformer layers can be found at | ||
`https://github.com/tensorflow/mesh/blob/master/mesh_tensorflow/transformer/transformer.py`. | ||
|
||
### Integration Steps | ||
|
||
Step 1: Add the following lines to the `__init__` function of Unitransformer class. | ||
|
||
``` | ||
if self.gradient_subwords: | ||
tf.logging.info("Using gradient subwords..") | ||
self.grad_layer = [gradient_subword_layer()] * self.num_gsw_layers | ||
``` | ||
along with new args `gradient_subwords`, `gradient_subword_layer` to the class. | ||
|
||
Step 2: Right after the positional embeddings, add | ||
|
||
``` | ||
if self.gradient_subwords and self.grad_layer: | ||
tf.logging.info("Using Charformer before computing layer stack.") | ||
# tensor should be batch x char_length x dim] | ||
for grad_layer in self.grad_layer: | ||
x, context = grad_layer.call(context, x) | ||
``` | ||
Step 3: | ||
Create a gin config (similar to the one provided in `configs/cf_v2_d3_dv_base.gin` which you may use in place of any other gin configs in the T5 codebase. | ||
|
||
### Reference | ||
|
||
If you use our work, or find it helpful in some form, please consider citing our paper: | ||
|
||
``` | ||
@misc{tay2021charformer, | ||
title={Charformer: Fast Character Transformers via Gradient-based Subword Tokenization}, | ||
author={Yi Tay and Vinh Q. Tran and Sebastian Ruder and Jai Gupta and Hyung Won Chung and Dara Bahri and Zhen Qin and Simon Baumgartner and Cong Yu and Donald Metzler}, | ||
year={2021}, | ||
eprint={2106.12672}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CL} | ||
} | ||
``` | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# -*-Python-*- | ||
# Example Gin Config | ||
import charformer.lib.charformer_layers | ||
include 'models/bi_v1.gin' | ||
|
||
GradientSubwordLayerV2.key_value_size = %d_kv | ||
GradientSubwordLayerV2.num_heads = %num_heads | ||
GradientSubwordLayerV2.dropout_rate = %dropout_rate | ||
GradientSubwordLayerV2.downsample_query = 2.0 | ||
GradientSubwordLayerV2.radius = 8 | ||
GradientSubwordLayerV2.low_rank_features = 32 | ||
GradientSubwordLayerV2.project_kv = False | ||
GradientSubwordLayerV2.use_ffn = False | ||
GradientSubwordLayerV2.local_gate = False | ||
GradientSubwordLayerV2.num_memory_slots = 0 | ||
GradientSubwordLayerV2.local_attention = False | ||
GradientSubwordLayerV2.consider_chars_as_blocks = True | ||
GradientSubwordLayerV2.conv_type = "conv1d" | ||
|
||
encoder/Unitransformer.gradient_subwords = True | ||
|
||
make_layer_stack.layer_stack_cls=@charformer_layers.CharformerLayerStack | ||
|
||
encoder/Unitransformer.gradient_subword_layer = @charformer_layers.GradientSubwordLayerV2 | ||
|
||
mesh_train_dataset_fn.pack = False |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# -*-Python-*- | ||
# Example Gin Config | ||
import charformer.lib.charformer_layers | ||
include 'models/bi_v1.gin' | ||
|
||
GradientSubwordLayerV2.key_value_size = %d_kv | ||
GradientSubwordLayerV2.num_heads = %num_heads | ||
GradientSubwordLayerV2.dropout_rate = %dropout_rate | ||
GradientSubwordLayerV2.downsample_query = 3.0 | ||
GradientSubwordLayerV2.radius = 8 | ||
GradientSubwordLayerV2.low_rank_features = 32 | ||
GradientSubwordLayerV2.project_kv = False | ||
GradientSubwordLayerV2.use_ffn = False | ||
GradientSubwordLayerV2.local_gate = False | ||
GradientSubwordLayerV2.num_memory_slots = 0 | ||
GradientSubwordLayerV2.local_attention = False | ||
GradientSubwordLayerV2.consider_chars_as_blocks = True | ||
GradientSubwordLayerV2.conv_type = "conv1d" | ||
|
||
encoder/Unitransformer.gradient_subwords = True | ||
|
||
make_layer_stack.layer_stack_cls=@charformer_layers.CharformerLayerStack | ||
|
||
encoder/Unitransformer.gradient_subword_layer = @charformer_layers.GradientSubwordLayerV2 | ||
|
||
mesh_train_dataset_fn.pack = False |
Oops, something went wrong.