TPU Support

Although you _can_ train an aitextgen model on TPUs by setting `n_tpu_cores=8` in an appropriate runtime, and the training loss indeed does decrease, there are a number of miscellaneous blocking problems:

- The `model` stored in `aitextgen` does not update, even after training.
- Saving the model via `save_pretrained()` causes hang, even with `xm.rendezvous()`
- Memory leaks on the host system (especially with large batch size)
- `fp16` doesn't work at all, and there's no training loss decrease.

Will gladly take any suggestions/PRs to help resolve these!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPU Support #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development