OpenLLM: Open Source Library for LLM

OpenLLM

Open Source Library for LLM

OpenLLM is an open-source library for large language models. It provides a unified framework for training, deploying, and serving state-of-the-art natural language processing models. OpenLLM is built on top of BentoML, a platform-agnostic model serving solution.

OpenLLM Features

Built-in support for state-of-the-art LLMs

You can use OpenLLM to fine-tune, serve, deploy, and monitor any open-source LLMs and model runtime, including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.

Flexible APIs

You can serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.

Freedom to build

You can easily create your own AI apps by composing LLMs with other models and services. OpenLLM has first-class support for LangChain, BentoML and Hugging Face.

Streamline deployment

You can automatically generate your LLM server Docker Images or deploy as serverless endpoint via BentoCloud.

Bring your own LLM

You can fine-tune any LLM to suit your needs with LLM.tuning().

Getting Started 🚀

OpenLLM requires Python 3.8+ and pip on your system. To avoid package conflicts, use a Virtual Environment.

You can install OpenLLM with the “pip” command :

pip install openllm

To make sure that it is correctly installed, run:

$ openllm -h

Usage: openllm [OPTIONS] COMMAND [ARGS]...

   ██████╗ ██████╗ ███████╗███╗   ██╗██╗     ██╗     ███╗   ███╗
  ██╔═══██╗██╔══██╗██╔════╝████╗  ██║██║     ██║     ████╗ ████║
  ██║   ██║██████╔╝█████╗  ██╔██╗ ██║██║     ██║     ██╔████╔██║
  ██║   ██║██╔═══╝ ██╔══╝  ██║╚██╗██║██║     ██║     ██║╚██╔╝██║
  ╚██████╔╝██║     ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
   ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝╚══════╝╚══════╝╚═╝     ╚═╝

  An open platform for operating large language models in production.
  Fine-tune, serve, deploy, and monitor any LLMs with ease.

Starting an LLM Server

To launch an LLM server, use the command openllm start. To start an OPT server, for example, you can use:

openllm start opt

After this step, you can access a Web UI at http://localhost:3000 to try out the endpoints and sample input prompts.

You can use the Python client that comes with OpenLLM to communicate with the model. Open another terminal window or a Jupyter notebook and create a client object to start working with the model:

>>> import openllm
>>> client = openllm.client.HTTPClient('http://localhost:3000')
>>> client.query('Explain to me the difference between "further" and "farther"')

Another way to interact with the model is to run the openllm query This allows you to enter queries and get responses from the model directly.:

export OPENLLM_ENDPOINT=http://localhost:3000
openllm query 'Explain to me the difference between "further" and "farther"'

For OpenLLM’s API specifications, visit http://localhost:3000/docs.json.

To serve different model variants, users can provide the --model-id argument, for example:

openllm start flan-t5 --model-id google/flan-t5-large

To view the supported models and variants in OpenLLM, run the openllm models command.

🧩 Supported Models

OpenLLM supports these models. You need extra dependencies to run some models. Follow these steps to install them:

Model	Installation	Model Ids
flan-t5	pip install "openllm[flan-t5]"	google/flan-t5-small
		google/flan-t5-base
		google/flan-t5-large
		google/flan-t5-xl
		google/flan-t5-xxl
dolly-v2	pip install openllm	databricks/dolly-v2-3b
		databricks/dolly-v2-7b
		databricks/dolly-v2-12b
chatglm	pip install "openllm[chatglm]"	thudm/chatglm-6b
		thudm/chatglm-6b-int8
		thudm/chatglm-6b-int4
starcoder	pip install "openllm[starcoder]"	bigcode/starcoder
		bigcode/starcoderbase
falcon	pip install "openllm[falcon]"	tiiuae/falcon-7b
		tiiuae/falcon-40b
		tiiuae/falcon-7b-instruct
		tiiuae/falcon-40b-instruct
stablelm	pip install openllm	stabilityai/stablelm-tuned-alpha-3b
		stabilityai/stablelm-tuned-alpha-7b
		stabilityai/stablelm-base-alpha-3b
		stabilityai/stablelm-base-alpha-7b
opt	pip install openllm	facebook/opt-125m
		facebook/opt-350m
		facebook/opt-1.3b
		facebook/opt-2.7b
		facebook/opt-6.7b
		facebook/opt-66b

⚙️ Integrations

OpenLLM is more than a single solution; it’s a modular component that can seamlessly connect with other advanced tools.. It currently offer integration with BentoML and LangChain.

BentoML

You can use a Runner to incorporate OpenLLM models into your BentoML service. The Runner has a generate function that accepts a prompt string and produces an output string accordingly. This enables you to easily integrate any OpenLLM models with your current ML pipeline.

import bentoml
import openllm

model = "opt"

llm_config = openllm.AutoConfig.for_model(model)
llm_runner = openllm.Runner(model, llm_config=llm_config)

svc = bentoml.Service(
    name=f"llm-opt-service", runners=[llm_runner]
)

@svc.api(input=Text(), output=Text())
async def prompt(input_text: str) -> str:
    answer = await llm_runner.generate(input_text)
    return answer

Hugging Face Agents

Hugging Face Agents can easily integrate with OpenLLM.

Warning: The Hugging Face Agent is under development and may change frequently. To use the most recent API for the Hugging Face Agent, please run OpenLLM with pip install -r nightly-requirements.generated.txt

import transformers

agent = transformers.HfAgent("http://localhost:3000/hf/agent")  # URL that runs the OpenLLM server

agent.run("Is the following `text` positive or negative?", text="I don't like how this models is generate inputs")

Agent integration is compatible only withstarcoder. The previous example used four T4s on an EC2 g4dn.12xlargeinstance.

OpenLLM client allows you to interact with the agent that is running. You can use it to pose queries to the agent:

import openllm

client = openllm.client.HTTPClient("http://localhost:3000")

client.ask_agent(
    task="Is the following `text` positive or negative?",
    text="What are you thinking about?",
)

LangChain (⏳Coming Soon!)

Soon, LangChain will let you easily use OpenLLM models with this syntax:

from langchain.llms import OpenLLM
llm = OpenLLM.for_model(model_name='flan-t5')
llm("What is the difference between a duck and a goose?")

To access an existing OpenLLM server from a different location, enter its URL in the following format:

from langchain.llms import OpenLLM
llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http')
llm("What is the difference between a duck and a goose?")

🚀 Deploying to Production

To deploy your LLMs to production:

Building a Bento: OpenLLM lets you create a Bento for any model, such as dolly-v2, with the build command.:
```
openllm build dolly-v2
```
BentoML distributes your program as a Bento. A Bento contains your source code, models, files, artifacts, and dependencies.
Containerize your Bento
```
bentoml containerize <name:version>
```
BentoML provides a flexible and robust framework for building and deploying ML services online. For more details, see the Deploying a Bento guide.

🍇 Telemetry

To improve the product and user experience, OpenLLM tracks usage data. We only send internal API calls from OpenLLM and protect your privacy by not including any sensitive information. We do not collect any user code, model data, or stack traces. See the code for usage tracking.

To disable usage tracking, use the --do-not-track option in the CLI: