Transformers

Preprocessing and Tokenization

One sentence per line, and one blank line between documents. The reason for the sentence splitting is that part of the training involves a next sentence objective in which the model must predict whether two sequences of text are contiguous text from the same document or not.