OpenLLM
Open Source Library for LLM
OpenLLM is an open-source library for large language models. It provides a unified framework for training, deploying, and serving state-of-the-art natural language processing models. OpenLLM is built on top of BentoML, a platform-agnostic model serving solution.
OpenLLM requires Python 3.8+ and pip on your system. To avoid package conflicts, use a Virtual Environment.
You can install OpenLLM with the “pip” command :
pip install openllm
To make sure that it is correctly installed, run:
$ openllm -h
Usage: openllm [OPTIONS] COMMAND [ARGS]...
██████╗ ██████╗ ███████╗███╗ ██╗██╗ ██╗ ███╗ ███╗
██╔═══██╗██╔══██╗██╔════╝████╗ ██║██║ ██║ ████╗ ████║
██║ ██║██████╔╝█████╗ ██╔██╗ ██║██║ ██║ ██╔████╔██║
██║ ██║██╔═══╝ ██╔══╝ ██║╚██╗██║██║ ██║ ██║╚██╔╝██║
╚██████╔╝██║ ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
╚═════╝ ╚═╝ ╚══════╝╚═╝ ╚═══╝╚══════╝╚══════╝╚═╝ ╚═╝
An open platform for operating large language models in production.
Fine-tune, serve, deploy, and monitor any LLMs with ease.
Starting an LLM Server
To launch an LLM server, use the command openllm start
. To start an OPT
server, for example, you can use:
openllm start opt
You can use the Python client that comes with OpenLLM to communicate with the model. Open another terminal window or a Jupyter notebook and create a client object to start working with the model:
>>> import openllm
>>> client = openllm.client.HTTPClient('http://localhost:3000')
>>> client.query('Explain to me the difference between "further" and "farther"')
Another way to interact with the model is to run the openllm query
This allows you to enter queries and get responses from the model directly.:
export OPENLLM_ENDPOINT=http://localhost:3000
openllm query 'Explain to me the difference between "further" and "farther"'
For OpenLLM’s API specifications, visit http://localhost:3000/docs.json
.
To serve different model variants, users can provide the --model-id
argument, for example:
openllm start flan-t5 --model-id google/flan-t5-large
To view the supported models and variants in OpenLLM, run the openllm models
command.
🧩 Supported Models
OpenLLM supports these models. You need extra dependencies to run some models. Follow these steps to install them:
Model | Installation | Model Ids |
flan-t5 | pip install "openllm[flan-t5]" | google/flan-t5-small |
google/flan-t5-base | ||
google/flan-t5-large | ||
google/flan-t5-xl | ||
google/flan-t5-xxl | ||
dolly-v2 | pip install openllm | databricks/dolly-v2-3b |
databricks/dolly-v2-7b | ||
databricks/dolly-v2-12b | ||
chatglm | pip install "openllm[chatglm]" | thudm/chatglm-6b |
thudm/chatglm-6b-int8 | ||
thudm/chatglm-6b-int4 | ||
starcoder | pip install "openllm[starcoder]" | bigcode/starcoder |
bigcode/starcoderbase | ||
falcon | pip install "openllm[falcon]" | tiiuae/falcon-7b |
tiiuae/falcon-40b | ||
tiiuae/falcon-7b-instruct | ||
tiiuae/falcon-40b-instruct | ||
stablelm | pip install openllm | stabilityai/stablelm-tuned-alpha-3b |
stabilityai/stablelm-tuned-alpha-7b | ||
stabilityai/stablelm-base-alpha-3b | ||
stabilityai/stablelm-base-alpha-7b | ||
opt | pip install openllm | facebook/opt-125m |
facebook/opt-350m | ||
facebook/opt-1.3b | ||
facebook/opt-2.7b | ||
facebook/opt-6.7b | ||
facebook/opt-66b |
⚙️ Integrations
OpenLLM is more than a single solution; it’s a modular component that can seamlessly connect with other advanced tools.. It currently offer integration with BentoML and LangChain.
BentoML
You can use a Runner to incorporate OpenLLM models into your BentoML service. The Runner has a generate
function that accepts a prompt string and produces an output string accordingly. This enables you to easily integrate any OpenLLM models with your current ML pipeline.
import bentoml
import openllm
model = "opt"
llm_config = openllm.AutoConfig.for_model(model)
llm_runner = openllm.Runner(model, llm_config=llm_config)
svc = bentoml.Service(
name=f"llm-opt-service", runners=[llm_runner]
)
@svc.api(input=Text(), output=Text())
async def prompt(input_text: str) -> str:
answer = await llm_runner.generate(input_text)
return answer
Hugging Face Agents
Hugging Face Agents can easily integrate with OpenLLM.
import transformers
agent = transformers.HfAgent("http://localhost:3000/hf/agent") # URL that runs the OpenLLM server
agent.run("Is the following `text` positive or negative?", text="I don't like how this models is generate inputs")
OpenLLM client allows you to interact with the agent that is running. You can use it to pose queries to the agent:
import openllm
client = openllm.client.HTTPClient("http://localhost:3000")
client.ask_agent(
task="Is the following `text` positive or negative?",
text="What are you thinking about?",
)
LangChain (⏳Coming Soon!)
Soon, LangChain will let you easily use OpenLLM models with this syntax:
from langchain.llms import OpenLLM
llm = OpenLLM.for_model(model_name='flan-t5')
llm("What is the difference between a duck and a goose?")
To access an existing OpenLLM server from a different location, enter its URL in the following format:
from langchain.llms import OpenLLM
llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http')
llm("What is the difference between a duck and a goose?")
🚀 Deploying to Production
To deploy your LLMs to production:
-
Building a Bento: OpenLLM lets you create a Bento for any model, such as
dolly-v2
, with thebuild
command.:openllm build dolly-v2
BentoML distributes your program as a Bento. A Bento contains your source code, models, files, artifacts, and dependencies.
-
Containerize your Bento
bentoml containerize <name:version>
BentoML provides a flexible and robust framework for building and deploying ML services online. For more details, see the Deploying a Bento guide.
🍇 Telemetry
To improve the product and user experience, OpenLLM tracks usage data. We only send internal API calls from OpenLLM and protect your privacy by not including any sensitive information. We do not collect any user code, model data, or stack traces. See the code for usage tracking.
To disable usage tracking, use the --do-not-track
option in the CLI:
openllm [command] --do-not-track
Or by tweaking the environment variable OPENLLM_DO_NOT_TRACK=True
:
export OPENLLM_DO_NOT_TRACK=True