LocalGPT: Chat with Your Local Documents

LocalGPT

Chat with Local Documents

LocalGPT is a project that allows you to chat with your documents on your local device using GPT models. No data leaves your device and 100% private. You can use LocalGPT to ask questions to your documents without an internet connection, using the power of LLMs. LocalGPT is built with LangChain and Vicuna-7B and InstructorEmbeddings.

System Requirements

Python Version

LocalGPT requires Python 3.10 or higher to run. It will not work with older versions of Python.

C++ Compiler

A C++ compiler may be required to build a wheel during pip install process, This can cause an error message during the process.

For Windows 10/11

To set up a C++ compiler on Windows 10/11, you need to do the following::

Install Visual Studio 2022.
Make sure to select the following components:
- Universal Windows Platform development
- C++ CMake tools for Windows
Download the MinGW installer from the MinGW website.
Run the installer and select the “gcc” component.

Environment Setup

To run the code here, you need to install all requirements first:

pip install -r requirements.txt

Test dataset

This repo uses a Constitution of USA as an example.

Instructions for ingesting your own dataset

To load your documents, you need to do two things. First, move all your files with .txt, .pdf, or .csv extensions to the SOURCE_DOCUMENTS directory. Second, change the docs_path variable to the full path of your SOURCE_DOCUMENTS directory in the load_documents() function.

To ingest all the data, run the following command:

python ingest.py

This will build an index with the local vectorstore. The duration depends on your document size. You can add any number of documents and they will all be stored in the local embeddings database. To start with an empty database, remove the index.

Note: When you run this for the first time, it will download take time as it has to download the embedding model. In the subseqeunt runs, no data will leave your local enviroment and can be run without internet connection.

Note:The first time you execute this, it will take longer because it needs to download the embedding model. After that, it will run locally and you don’t need an internet connection.

Ask questions to your documents, locally!

To ask a question, run a command like this:

python run_localGPT.py

And wait for the script to require your input.

> Enter a query:

Press enter. The LLM model will process the prompt and generate the answer. It will also show the 4 sources from your documents that it used as context. You can ask more questions without restarting the script. Just wait for the prompt again.

Note: The first time you use this script, it will download the vicuna-7B model from the internet. Then you can disconnect from the internet and still run the script inference. Your data stays in your local environment.

Type exit to finish the script.

Run it on CPU

The ingest.py and run_localGPT.py scripts in localGPT can use your GPU by default. This makes them run faster. If you only have a CPU, you can still run them, but they will be slower. To do this, add --device_type cpu to both scripts.

Run the following for Ingestion:

python ingest.py --device_type cpu

In order to ask a question, run a command like:

python run_localGPT.py --device_type cpu

How It Works

With LangChain local models and power, you can process everything locally, keeping your data secure and fast.

ingest.py uses tools from LangChain to analyze the document and create local embeddings with InstructorEmbeddings. It then saves the result in a local vector database with Chroma vector store.
run_localGPT.py uses a local LLM (Vicuna-7B in this case) to comprehend questions and generate answers. The context for the answers is retrieved from the local vector store using a similarity search to find the right piece of context from the docs.
You can swap this local LLM with any other LLM from the HuggingFace. Make sure whatever LLM you choose is in the HF format.

Known Issue

NVIDIA Driver’s Issues:

Use the official Nvidia page to install NVIDIA Drivers.

Project Page

One Comment

Cancel reply

Hi I work in Pharma + IT company and in one of my Local GPT project I need to put 40 pdf Documents in one go and try to get answer from that but all the time I am getting “Sorry I can’t get this” or “Some random words from this pdfs” and I dont know how to fix this even openai GPT4 APi key also not getting this properly when I use it in my Local GPT project so plz tell me how to fix this issue in my project.