Fantastic features for UniTok 3.0!
UniDep Cache (from 2.4.3.2)
UniDep might suffer inefficiency when unioning
other depots. Depot cache will generate samples all at once.
UniDep Export (from 3.0.11)
Easy to export unioned or filtered depot.
More Easy-to-use Vocab
- support
len(vocab)
to get vocab size - support vocab iterating by
for obj in vocab
- support
list(vocab)
to get token list - support
vocab.i2o(index)
to get vocab by index, andvocab.o2i(obj)
to get index by object
Two New Tokenizers
- NumberTok
- SeqTok
Compatible Meta
- support
print(depot)
to get detailed description of depot - support meta upgrading