Falcon-40B
The Most Powerful Open-source Model
Falcon-40B is a large language model (LLM) and one of Falcon LLM models with 40 billion parameters trained on 1,000B tokens of web data and curated corpora. It was developed by Technology Innovation Institute (TII) in Abu Dhabi and open-sourced under the Apache 2.0 license. Falcon-40B features an architecture optimized for inference, with FlashAttention and multiquery. It outperforms other open-source LLMs such as GPT-3, LLaMA, StableLM, RedPajama, and MPT.
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "tiiuae/falcon-40b"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
Falcon-40B used RefinedWeb, a web dataset with high-quality filtering and deduplication, to train Falcon-40B on 1,000B tokens. they also added curated corpora to enhance the dataset, some of which were based on The Pile (Gao et al., 2020).
Data source | Fraction | Tokens | Sources |
---|---|---|---|
RefinedWeb-English | 75% | 750B | massive web crawl |
RefinedWeb-Europe | 7% | 70B | European massive zeb crawl |
Books | 6% | 60B | |
Conversations | 5% | 50B | Reddit, StackOverflow, HackerNews |
Code | 5% | 50B | |
Technical | 2% | 20B | arXiv, PubMed, UPSTO, etc. |
RefinedWeb-Europe is made of the following languages:
Language | Fraction of multilingual data | Tokens |
---|---|---|
German | 26% | 18B |
Spanish | 24% | 17B |
French | 23% | 16B |
Italian | 7% | 5B |
Portuguese | 4% | 3B |
Polish | 4% | 3B |
Dutch | 4% | 3B |
Romanian | 3% | 2B |
Czech | 3% | 2B |
Swedish | 2% | 1B |
Hyperparameter | Value | Comment |
---|---|---|
Precision | bfloat16 |
|
Optimizer | AdamW | |
Learning rate | 1.85e-4 | 4B tokens warm-up, cosine decay to 1.85e-5 |
Weight decay | 1e-1 | |
Z-loss | 1e-4 | |
Batch size | 1152 | 100B tokens ramp-up |
How much it takes to train Falcon-40B?
Training started in December 2022 and took two months.
Falcon-40B is a decoder-only model that learns to generate the next token in a sequence. It is based on the GPT-3 architecture (Brown et al., 2020), with some modifications:
- Rotary positional embeddings (Su et al., 2021) to encode the relative positions of tokens;
- Multiquery attention (Shazeer et al., 2019) and FlashAttention (Dao et al., 2022) to efficiently compute attention scores;
- Parallel attention/MLP decoder blocks with two layer normalization steps to stabilize the training.
Each tensor parallel degree has its own key and value in Falcon-40B’s multiquery, which uses a special version internally.
Hyperparameter | Value | Comment |
---|---|---|
Layers | 60 | |
d_model |
8192 | |
head_dim |
64 | Reduced to optimise for FlashAttention |
Vocabulary | 65024 | |
Sequence length | 2048 |
What is Falcon-40B and what can it do?
Falcon-40B is a 40 billion parameters causal decoder-only model built by TII and trained on 1,000 billion tokens of RefinedWeb enhanced with curated corpora. It can generate text for various tasks such as summarization, text generation, chatbot, etc.
How can I use Falcon-40B?
You can use Falcon-40B with the Hugging Face Transformers library. You need to install PyTorch 2.0 and import the AutoTokenizer and AutoModelForCausalLM classes from the transformers module. Then you can load the model and the tokenizer with the name “tiiuae/falcon-40b” and use the pipeline function to generate text with a given prompt.
What are the advantages of Falcon-40B?
Falcon-40B is the best open-source model currently available. It outperforms other models such as LLaMA, StableLM, RedPajama, MPT, etc. on the OpenLLM Leaderboard. It also features an architecture optimized for inference, with FlashAttention and multiquery . It is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions.
What is the difference between Falcon-40B and Falcon-40B-Instruct?
Falcon-40B is a raw, pre-trained model that can be further finetuned for specific use cases. Falcon-40B-Instruct is a version of Falcon-40B that has been finetuned on a chat dataset and can take generic instructions in a chat format.
0 Comments