FemtoGPT: Minimal Generative Pretrained Transformer

FemtoGPT

Minimal Generative Pretrained Transformer

FemtoGPT is a minimal Generative Pretrained Transformer written entirely in Rust.

It does not use any external libraries for tensor operations or model training/inference. It follows the same architecture as nanoGPT, which was explained by Andrej Karpathy in his video lecture.

FemtoGPT is a useful resource for anyone who wants to learn more about how large language models work at a low level.

How It Works

FemtoGPT is a minimalistic implementation of a GPT model that relies only on a few libraries. It uses rand/rand-distr, for random generation, serde/bincodefor data serialization, and rayon for parallel computing.

However, femtoGPT is not very efficient, because it uses naive algorithms for basic operations like matrix multiplication.

The gradients are verified using gradient-checking technique, but there might be some errors in the layer implementations.

FemtoGPT Usage

To train your GPT model, you need to create a file named dataset.txt and fill it with your desired text. The text should have a low diversity of characters for optimal results.

Then run this command:

cargo run --release

The model will begin training and store the data in the train_data directory. You can pause the training and resume it anytime!

Output samples

This is the result of training a 300k parameter model on the Shakespeare database for hours:

LIS:
Tore hend shater sorerds tougeng an herdofed seng he borind,
Ound ourere sthe, a sou so tousthe ashtherd, m se a man stousshan here hat mend serthe fo witownderstesther s ars at atheno sel theas,
thisth t are sorind bour win soutinds mater horengher

This is not as good as expected, but on the positive side, it seems like it has generated words that are easy to pronounce.

The current task of the team is to train a model with 10M parameters and verify the accuracy of the code.

UPDATE:

This is the result of further training on a comparable model for several hours:

What like but wore pad wo me che nogns yous dares,
As supt it nind bupart 'the reed:
And hils not es

The model demonstrates some knowledge of vocabulary and syntax!

Project Page

FemtoGPT: Minimal Generative Pretrained Transformer

1 min

FemtoGPT

How It Works

FemtoGPT Usage

Output samples

0 Comments

Cancel reply

Posted by Belmechri

FemtoGPT

How It Works

FemtoGPT Usage

Output samples

Like it? Share with your friends!

0 Comments