The Repo contains implementation of ALBERT in julia
Simply implementation of ALBERT(A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS). This implementation is based on Transformers.jl
- SOP(sentence-order prediction) loss : In Original BERT, creating is-not-next(negative) two sentences with randomly picking, however ALBERT use negative examples the same two consecutive segments but with their order swapped.
- Cross-Layer Parameter Sharing : ALBERT use cross-layer parameter sharing in Attention and FFN(FeedForward Network) to reduce number of parameter.
- ALBERT seperated Embedding matrix(VxD) to VxE and ExD.
Pre-trained tensorflow checkpoint file by google-research to the Julia desired pre-trained model format(i.e. BSON) :
Version-1 of ALBERT models
Version-2 of ALBERT models
src/albert.jl - File contains wrapper for ALBERT transformer.It is implemented on top of Transformers.jl
src/alberttokenizer.jl - File contains Albert tokenizer implemented on top of WordTokenizer to tokenize the word before feeding into wordpiece or sentence piece
src/model.jl - It contains model structure of original ALBERT model released by google-Research src/sentencepiece.jl - Currently it contains Wordpiece model (directly taken from Transformers.jl) and planning to replace it with complete sentence piece model
tfckpt2bsonforalbert.jl - It is used to convert Tensorflow checkpoint file to Raw bson file
The code is still underdevelopment
- file to convert Tensflowcheckpoint to bson
- Space tokenizer (planning to update it as per need of sentencepiece)
- SentencePiece (code can be founded in wordtokenizer)
- wrapper for ALBERT Transformer
- Model file containing structure of ALBERT
most functions is under development and will be available soon
sophisticated implemented and doc can be founded here
Demo file contains tutorial for pretraining.