Falcon-40B

Falcon-40B: The Most Powerful Open-source Model

1 min


Falcon-40B

The Most Powerful Open-source Model

huggingface_logo

Falcon-40B is a large language model (LLM) and one of Falcon LLM models with 40 billion parameters trained on 1,000B tokens of web data and curated corpora. It was developed by Technology Innovation Institute (TII) in Abu Dhabi and open-sourced under the Apache 2.0 license. Falcon-40B features an architecture optimized for inference, with FlashAttention and multiquery. It outperforms other open-source LLMs such as GPT-3, LLaMA, StableLM, RedPajama, and MPT.

Falcon-40B Features

Best open-source model

Falcon-40B outperforms other open-source models such as LLaMA, StableLM, RedPajama, and MPT. It is one of the top ranked projects in Huggingface OpenLLM Leaderboard.

Optimized architecture

Falcon-40B uses FlashAttention and multiquery techniques to improve inference speed and efficiency .

Permissive license

Falcon-40B is released under the Apache 2.0 license, which allows unrestricted commercial use without any royalties or restrictions.

Multilingual capabilities

Falcon-40B supports English, German, Spanish, French, and has limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish.

Data quality at scale

A data pipeline that extracts high-quality content from the web using extensive filtering and deduplication.

Get Started with Falcon-40B

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Training Details

Falcon-40B used RefinedWeb, a web dataset with high-quality filtering and deduplication, to train Falcon-40B on 1,000B tokens. they also added curated corpora to enhance the dataset, some of which were based on The Pile (Gao et al., 2020).

Data source Fraction Tokens Sources
RefinedWeb-English 75% 750B massive web crawl
RefinedWeb-Europe 7% 70B European massive zeb crawl
Books 6% 60B
Conversations 5% 50B Reddit, StackOverflow, HackerNews
Code 5% 50B
Technical 2% 20B arXiv, PubMed, UPSTO, etc.

RefinedWeb-Europe is made of the following languages:

Language Fraction of multilingual data Tokens
German 26% 18B
Spanish 24% 17B
French 23% 16B
Italian 7% 5B
Portuguese 4% 3B
Polish 4% 3B
Dutch 4% 3B
Romanian 3% 2B
Czech 3% 2B
Swedish 2% 1B

The data was tokenized with the Falcon-7B/40B tokenizer.

Training Procedure

Using 3D parallelism (TP=8, PP=4, DP=12) and ZeRO, Falcon-40B was  trained on 384 A100 40GB GPUs.

Training Hyperparameters

Hyperparameter Value Comment
Precision bfloat16
Optimizer AdamW
Learning rate 1.85e-4 4B tokens warm-up, cosine decay to 1.85e-5
Weight decay 1e-1
Z-loss 1e-4
Batch size 1152 100B tokens ramp-up

How much it takes to train Falcon-40B?

Training started in December 2022 and took two months.

Technical Specifications

Falcon-40B is a decoder-only model that learns to generate the next token in a sequence. It is based on the GPT-3 architecture (Brown et al., 2020), with some modifications:

  • Rotary positional embeddings (Su et al., 2021) to encode the relative positions of tokens;
  • Multiquery attention (Shazeer et al., 2019) and FlashAttention (Dao et al., 2022) to efficiently compute attention scores;
  • Parallel attention/MLP decoder blocks with two layer normalization steps to stabilize the training.

Each tensor parallel degree has its own key and value in Falcon-40B’s multiquery, which uses a special version internally.

Hyperparameter Value Comment
Layers 60
d_model 8192
head_dim 64 Reduced to optimise for FlashAttention
Vocabulary 65024
Sequence length 2048

Lamitations

Falcon-40B is a multilingual model that can handle English, German, Spanish, French, and some other languages to a lesser extent. However, it is not suitable for languages outside its training data. Moreover, it may reflect the online prejudices and biases that are present in its large-scale web-based corpus.

FAQ

What is Falcon-40B and what can it do?

Falcon-40B is a 40 billion parameters causal decoder-only model built by TII and trained on 1,000 billion tokens of RefinedWeb enhanced with curated corpora. It can generate text for various tasks such as summarization, text generation, chatbot, etc.

How can I use Falcon-40B?

You can use Falcon-40B with the Hugging Face Transformers library. You need to install PyTorch 2.0 and import the AutoTokenizer and AutoModelForCausalLM classes from the transformers module. Then you can load the model and the tokenizer with the name “tiiuae/falcon-40b” and use the pipeline function to generate text with a given prompt.

What are the advantages of Falcon-40B?

Falcon-40B is the best open-source model currently available. It outperforms other models such as LLaMA, StableLM, RedPajama, MPT, etc. on the OpenLLM Leaderboard. It also features an architecture optimized for inference, with FlashAttention and multiquery . It is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions.

What is the difference between Falcon-40B and Falcon-40B-Instruct?

Falcon-40B is a raw, pre-trained model that can be further finetuned for specific use cases. Falcon-40B-Instruct is a version of Falcon-40B that has been finetuned on a chat dataset and can take generic instructions in a chat format.


Like it? Share with your friends!

0

0 Comments

Your email address will not be published. Required fields are marked *

Belmechri

I am an IT engineer, content creator, and proud father with a passion for innovation and excellence. In both my personal and professional life, I strive for excellence and am committed to finding innovative solutions to complex problems.
Choose A Format
Personality quiz
Series of questions that intends to reveal something about the personality
Trivia quiz
Series of questions with right and wrong answers that intends to check knowledge
Poll
Voting to make decisions or determine opinions
Story
Formatted Text with Embeds and Visuals
List
The Classic Internet Listicles
Countdown
The Classic Internet Countdowns
Open List
Submit your own item and vote up for the best submission
Ranked List
Upvote or downvote to decide the best list item
Meme
Upload your own images to make custom memes
Video
Youtube and Vimeo Embeds
Audio
Soundcloud or Mixcloud Embeds
Image
Photo or GIF
Gif
GIF format