OpenLLM: Open Source Library for LLM

1 min


Open Source Library for LLM

Github logo

OpenLLM is an open-source library for large language models. It provides a unified framework for training, deploying, and serving state-of-the-art natural language processing models. OpenLLM is built on top of BentoML, a platform-agnostic model serving solution.

OpenLLM Features

Built-in support for state-of-the-art LLMs

You can use OpenLLM to fine-tune, serve, deploy, and monitor any open-source LLMs and model runtime, including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.

Flexible APIs

You can serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.

Freedom to build

You can easily create your own AI apps by composing LLMs with other models and services. OpenLLM has first-class support for LangChain, BentoML and Hugging Face.

Streamline deployment

You can automatically generate your LLM server Docker Images or deploy as serverless endpoint via BentoCloud.

Bring your own LLM

You can fine-tune any LLM to suit your needs with LLM.tuning().

Getting Started πŸš€

OpenLLM requires Python 3.8+ and pip on your system. To avoid package conflicts, use a Virtual Environment.

You can install OpenLLM with the “pip” command :

pip install openllm

To make sure that it is correctly installed, run:

$ openllm -h

Usage: openllm [OPTIONS] COMMAND [ARGS]...

   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—     β–ˆβ–ˆβ•—     β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•—
  β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘
  β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘
  β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β• β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘
  β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘
   β•šβ•β•β•β•β•β• β•šβ•β•     β•šβ•β•β•β•β•β•β•β•šβ•β•  β•šβ•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•     β•šβ•β•

  An open platform for operating large language models in production.
  Fine-tune, serve, deploy, and monitor any LLMs with ease.

Starting an LLM Server

To launch an LLM server, use the command openllm start. To start an OPT server, for example, you can use:

openllm start opt

After this step, you can access a Web UI at http://localhost:3000 to try out the endpoints and sample input prompts.

You can use the Python client that comes with OpenLLM to communicate with the model. Open another terminal window or a Jupyter notebook and create a client object to start working with the model:

>>> import openllm
>>> client = openllm.client.HTTPClient('http://localhost:3000')
>>> client.query('Explain to me the difference between "further" and "farther"')

Another way to interact with the model is to run the openllm query This allows you to enter queries and get responses from the model directly.:

export OPENLLM_ENDPOINT=http://localhost:3000
openllm query 'Explain to me the difference between "further" and "farther"'

For OpenLLM’s API specifications, visit http://localhost:3000/docs.json.

To serve different model variants, users can provide the --model-id argument, for example:

openllm start flan-t5 --model-id google/flan-t5-large

To view the supported models and variants in OpenLLM, run the openllm models command.

🧩 Supported Models

OpenLLM supports these models. You need extra dependencies to run some models. Follow these steps to install them:

Model Installation Model Ids
flan-t5 pip install "openllm[flan-t5]" google/flan-t5-small
dolly-v2 pip install openllm databricks/dolly-v2-3b
chatglm pip install "openllm[chatglm]" thudm/chatglm-6b
starcoder pip install "openllm[starcoder]" bigcode/starcoder
falcon pip install "openllm[falcon]" tiiuae/falcon-7b
stablelm pip install openllm stabilityai/stablelm-tuned-alpha-3b
opt pip install openllm facebook/opt-125m

βš™οΈΒ Integrations

OpenLLM is more than a single solution; it’s a modular component that can seamlessly connect with other advanced tools.. It currently offer integration with BentoMLΒ andΒ LangChain.


You can use a Runner to incorporate OpenLLM models into your BentoML service. The Runner has aΒ generateΒ function that accepts a prompt string and produces an output string accordingly. This enables you to easily integrate any OpenLLM models with your current ML pipeline.

import bentoml
import openllm

model = "opt"

llm_config = openllm.AutoConfig.for_model(model)
llm_runner = openllm.Runner(model, llm_config=llm_config)

svc = bentoml.Service(
    name=f"llm-opt-service", runners=[llm_runner]

@svc.api(input=Text(), output=Text())
async def prompt(input_text: str) -> str:
    answer = await llm_runner.generate(input_text)
    return answer

Hugging Face Agents

Hugging Face Agents can easily integrate with OpenLLM.

Warning: The Hugging Face Agent is under development and may change frequently. To use the most recent API for the Hugging Face Agent, please run OpenLLM with pip install -r nightly-requirements.generated.txt

import transformers

agent = transformers.HfAgent("http://localhost:3000/hf/agent")  # URL that runs the OpenLLM server"Is the following `text` positive or negative?", text="I don't like how this models is generate inputs")

Agent integration is compatible only withstarcoder. The previous example used four T4s on an EC2 g4dn.12xlargeinstance.

OpenLLM client allows you to interact with the agent that is running. You can use it to pose queries to the agent:

import openllm

client = openllm.client.HTTPClient("http://localhost:3000")

    task="Is the following `text` positive or negative?",
    text="What are you thinking about?",

LangChain (⏳Coming Soon!)

Soon, LangChain will let you easily use OpenLLM models with this syntax:

from langchain.llms import OpenLLM
llm = OpenLLM.for_model(model_name='flan-t5')
llm("What is the difference between a duck and a goose?")

To access an existing OpenLLM server from a different location, enter its URL in the following format:

from langchain.llms import OpenLLM
llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http')
llm("What is the difference between a duck and a goose?")

πŸš€Β Deploying to Production

To deploy your LLMs to production:

  1. Building a Bento: OpenLLM lets you create a Bento for any model, such as dolly-v2, with the buildΒ command.:

    openllm build dolly-v2

    BentoML distributes your program as a Bento. A Bento contains your source code, models, files, artifacts, and dependencies.

  2. Containerize your Bento

    bentoml containerize <name:version>

    BentoML provides a flexible and robust framework for building and deploying ML services online. For more details, see the Deploying a BentoΒ guide.

πŸ‡Β Telemetry

To improve the product and user experience, OpenLLM tracks usage data. We only send internal API calls from OpenLLM and protect your privacy by not including any sensitive information. We do not collect any user code, model data, or stack traces. See the code for usage tracking.

To disable usage tracking, use the --do-not-track option in the CLI:

openllm [command] --do-not-track

Or by tweaking the environment variable OPENLLM_DO_NOT_TRACK=True:


Project External Links

Join Guidady AI Mail List

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for subscribing.

Something went wrong.

Like it? Share with your friends!



Your email address will not be published. Required fields are marked *


I am an IT engineer, content creator, and proud father with a passion for innovation and excellence. In both my personal and professional life, I strive for excellence and am committed to finding innovative solutions to complex problems.
Choose A Format
Personality quiz
Series of questions that intends to reveal something about the personality
Trivia quiz
Series of questions with right and wrong answers that intends to check knowledge
Voting to make decisions or determine opinions
Formatted Text with Embeds and Visuals
The Classic Internet Listicles
The Classic Internet Countdowns
Open List
Submit your own item and vote up for the best submission
Ranked List
Upvote or downvote to decide the best list item
Upload your own images to make custom memes
Youtube and Vimeo Embeds
Soundcloud or Mixcloud Embeds
Photo or GIF
GIF format