FemtoGPT

FemtoGPT: Minimal Generative Pretrained Transformer

1 min


[vc_headings linewidth=”0″ borderwidth=”1″ borderclr=”#000000″ title=”FemtoGPT” google_fonts=”font_family:Comfortaa%3A300%2Cregular%2C700|font_style:700%20bold%20regular%3A700%3Anormal” titlesize=”60″ titleclr=”#000000″]Minimal Generative Pretrained Transformer[/vc_headings]
Github logo

FemtoGPT is a minimal Generative Pretrained Transformer written entirely in Rust.

It does not use any external libraries for tensor operations or model training/inference. It follows the same architecture as nanoGPT, which was explained by Andrej Karpathy in his video lecture.

FemtoGPT is a useful resource for anyone who wants to learn more about how large language models work at a low level.

[vc_headings style=”theme4″ borderclr=”#000000″ style2=”image” title=”How It Works” google_fonts=”font_family:Comfortaa%3A300%2Cregular%2C700|font_style:700%20bold%20regular%3A700%3Anormal” lineheight=”3″ titlesize=”40″ titleclr=”#000000″ image_id=”2854″][/vc_headings]

FemtoGPT is a minimalistic implementation of a GPT model that relies only on a few libraries. It uses rand/rand-distr, for random generation, serde/bincodefor data serialization, and rayon for parallel computing.

However, femtoGPT is not very efficient, because it uses naive algorithms for basic operations like matrix multiplication.

The gradients are verified using gradient-checking technique, but there might be some errors in the layer implementations.

[vc_headings style=”theme4″ borderclr=”#000000″ style2=”image” title=”FemtoGPT Usage” google_fonts=”font_family:Comfortaa%3A300%2Cregular%2C700|font_style:700%20bold%20regular%3A700%3Anormal” lineheight=”3″ titlesize=”40″ titleclr=”#000000″ image_id=”2854″][/vc_headings]

To train your GPT model, you need to create a file named dataset.txt and fill it with your desired text. The text should have a low diversity of characters for optimal results.

Then run this command:

cargo run --release

The model will begin training and store the data in the train_data directory. You can pause the training and resume it anytime!

Output samples

This is the result of training a 300k parameter model on the Shakespeare database for hours:

LIS:
Tore hend shater sorerds tougeng an herdofed seng he borind,
Ound ourere sthe, a sou so tousthe ashtherd, m se a man stousshan here hat mend serthe fo witownderstesther s ars at atheno sel theas,
thisth t are sorind bour win soutinds mater horengher

This is not as good as expected, but on the positive side, it seems like it has generated words that are easy to pronounce.

The current task of the team is to train a model with 10M parameters and verify the accuracy of the code.

UPDATE:

This is the result of further training on a comparable model for several hours:

What like but wore pad wo me che nogns yous dares,
As supt it nind bupart 'the reed:
And hils not es

The model demonstrates some knowledge of vocabulary and syntax!


Like it? Share with your friends!

1

0 Comments

Your email address will not be published. Required fields are marked *

Belmechri

I am an IT engineer, content creator, and proud father with a passion for innovation and excellence. In both my personal and professional life, I strive for excellence and am committed to finding innovative solutions to complex problems.
Choose A Format
Personality quiz
Series of questions that intends to reveal something about the personality
Trivia quiz
Series of questions with right and wrong answers that intends to check knowledge
Poll
Voting to make decisions or determine opinions
Story
Formatted Text with Embeds and Visuals
List
The Classic Internet Listicles
Countdown
The Classic Internet Countdowns
Open List
Submit your own item and vote up for the best submission
Ranked List
Upvote or downvote to decide the best list item
Meme
Upload your own images to make custom memes
Video
Youtube and Vimeo Embeds
Audio
Soundcloud or Mixcloud Embeds
Image
Photo or GIF
Gif
GIF format