Tips & Tricks

How to create a private ChatGPT that interacts with your local documents

June 1, 2023

As much as ChatGPT is convenient, it has its tradeoffs. The fact that it requires you to send your data over the internet can be a concern when it comes to privacy, especially if you’re using confidential documents. Additionally, it requires a constant internet connection, which can be an issue in areas with poor connectivity.

Fortunately, there is an alternative. You can run your own local large language model (LLM), which puts you in control of your data and privacy. In this article, we will explore how to create a private ChatGPT that interacts with your local documents, giving you a powerful tool for answering questions and generating text without having to rely on OpenAI’s servers. We will also look at PrivateGPT, a project that simplifies the process of creating a private LLM.

The private LLM structure

Let’s start with a zoomed-out view of the components you need to create a local language model that can interact with your documents. In a previous article, I did a deep dive into customizing ChatGPT with your own data and documents. Your local LLM will have a similar structure, but everything will be stored and run on your own computer:

1. Open-source LLM: These are small open-source alternatives to ChatGPT that can be run on your local machine. Some popular examples include Dolly, Vicuna, GPT4All, and llama.cpp. These models are trained on large amounts of text and can generate high-quality responses to user prompts.

2. Embedding model: An embedding model is used to transform text data into a numerical format that can be easily compared to other text data. This is typically done using a technique called word or sentence embeddings, which represent text as dense vectors in a high-dimensional space. These embeddings can be used to find documents that are related to the user’s prompt. The SentenceTransformers library contains a rich variety of pre-trained embedding models.

3. Vector database: A vector database is designed to store and retrieve embeddings. It can store the content of your documents in a format that can be easily compared to the user’s prompt. Faiss is a library that you can use to add vector similarity comparisons on top of other data stores. But there are also a few open-source vector databases that you can install on your computer including Qdrant, Weaviate, and Milvus.

4. Knowledge documents: A collection of documents that contain the knowledge your LLM will use to answer your questions. These documents depend on your application. For example, it can be a collection of PDF or text documents that contain your personal blog posts.

5. User interface: The user interface layer will take user prompts and display the model’s output. This can be a simple command-line interface (CLI) or a more sophisticated web application such as Streamlit. The user interface will send the user’s prompt to the application and return he model’s response to the user.

private GPT architecture — private LLM architecture

Private LLM workflow

Before you can use your local LLM, you must make a few preparations:

1. Create a list of documents that you want to use as your knowledge base

2. Break large documents into smaller chunks (around 500 words)

3. Create an embedding for each document chunk

4. Create a vector database that stores all the embeddings of the documents

If you add documents to your knowledge database in the future, you will have to update your vector database.

Now that our knowledge base and vector database are ready, we can review the workflow of the private LLM:

1. The user enters a prompt in the user interface.

2. The application uses the embedding model to create an embedding from the user’s prompt and send it to the vector database.

3. The vector database returns a list of documents that are relevant to the prompt based on the similarity of their embeddings to the user’s prompt.

4. The application creates a new prompt with the user’s initial prompt and the retrieved documents as context and sends it to the local LLM.

5. The LLM produces the result along with citations from the context documents. The result is displayed in the user interface along with the sources.

private LLM ChatGPT workflow — private LLM workflow

Open-source LLMs are much smaller than state-of-the-art models like ChatGPT and Bard and might not match them in every possible task. But augmenting these language models with your own documents makes them very powerful for tasks such as search and question-answering.

PrivateGPT

By using a local language model and vector database, you can maintain control over your data and ensure privacy while still having access to powerful language processing capabilities. The process may require some technical expertise, but there are many resources available online to help you get started.

One solution is PrivateGPT, a project hosted on GitHub that brings together all the components mentioned above in an easy-to-install package. PrivateGPT includes a language model, an embedding model, a database for document embeddings, and a command-line interface. It supports several types of documents including plain text (.txt), comma-separated values (.csv), Word (.docx and .doc), PDF, Markdown (.md), HTML, Epub, and email files (.eml and .msg).

To use PrivateGPT, you’ll need Python installed on your computer. You can start by cloning the PrivateGPT repository on your computer and install the requirements:

git clone https://github.com/imartinez/privateGPT.git
cd privateGPT/
pip install -r requirements.txt

Next, you need to download a pre-trained language model on your computer. Create a “models” folder in the PrivateGPT directory and move the model file to this folder. PrivateGPT is configured by default to work with GPT4ALL-J (you can download it here) but it also supports llama.cpp. These are both open-source LLMs that have been trained for instruction-following (like ChatGPT). They have also been designed to run on computers with consumer-grade hardware. Llama.cpp works especially well on Mac computers with M1 processors.

Next, you have to create your knowledge base. PrivateGPT has a “source_documents” folder where you must copy all your documents. After that, you must populate your vector database with the embedding values of your documents. Fortunately, the project has a script that performs the entire process of breaking documents into chunks, creating embeddings, and storing them in the vector database:

python ingest.py

Behind the scenes, PrivateGPT uses LangChain and SentenceTransformers to break the documents into 500-token chunks and generate embeddings. And it uses DuckDB to create the vector database. The result is stored in the project’s “db” folder. One thing to note is that LangChain needs to be connected to the internet to download the pre-trained embedding model. After that, all processing takes place on your own computer and you don’t need internet connectivity.

Depending on the number of documents that you have, creating the vector database might take several minutes. Once the preparation is finished, you can start the model with the following command:

python privateGPT.py

And then you can start talking to your local LLM with no strings attached. It will answer your questions and provide up to four sources from your knowledge base for each reply. PrivateGPT is an experimental project. It is not fast (it can take 20-30 seconds to respond) and is not optimized for every type of hardware. Its installation might also run into bugs based on your operating system and hardware. But it is surely a preview of one of the many directions the field is taking and the powerful applications that open-source LLMs can unlock.

22 COMMENTS

XAVIER DIOKNO June 1, 2023 at 3:03 pm

Thank you for providing this helpful content. I did have a question. If you are hosting your own LLM, won’t that require a substantial amount of computing resources to support the application? I.e., the CPU needed to handle multiple user chats?

Loading...

Reply
- Oktrik June 2, 2023 at 4:59 pm
  
  It’s already been answered here “They have also been designed to run on computers with consumer-grade hardware”
  
  Loading...
  
  Reply
Richard Ametih June 3, 2023 at 6:23 am

One of the most useful articles on how chat GPT works. I’m a urologist taking courses full stack software developer. Looking for ways harness the power of A.I for simplified patients doctor interaction

Loading...

Reply
Andrew June 3, 2023 at 5:01 pm

This makes a great deal of sense and seems practical. I’m still hesitant to pull the trigger, mainly because I’m pretty confident Microsoft and other bigger tech companies will be able to do this in a much more seamless manner in the very near future.

Loading...

Reply
- Arthur June 17, 2023 at 12:43 pm
  
  Yeah, in Fact, Google announced that you would be able to query anything stored within one’s google drive. I expect it will be much more seamless, albeit, your documents will all be avail to Google and your number of queries may be limited each day or every couple of hours.
  
  Microsoft was also doing somehting like this, but I didn’t hear the explicit details. I know it is to be available for companies to better look up policy and procedure etc. But I can’t recall if they were wanting do Natural Language Queries (nlq) for the average consumer.
  
  Loading...
  
  Reply
nw June 3, 2023 at 6:43 pm

It is great to see this content however for us lesser tech monkeys I am getting errors when I copy and paste your lines in Python. Can you troubleshoot this? I am a complete noob to this but what you mention, running it on my own machine, teaching it myself and expanding capabilities is pure gold.

Loading...

Reply
- Ben Dickson June 3, 2023 at 7:26 pm
  
  You’re probably having problem with the Python version and libraries. I suggest your install Anaconda, which already has the right Python version and libraries. And then try to install PrivateGPT in Anaconda. It should be smoother.
  
  Loading...
  
  Reply
Raheel Ali Khan June 3, 2023 at 9:33 pm

Do we need to re-train the model if more documents get added in the repository after initial model? I have a growing repository of all types of documents such as; Google docs, sheets and slides etc. Will this private GPT work for all those docs too?

Loading...

Reply
- Ben Dickson June 4, 2023 at 5:49 am
  
  No. You just need to update your vector database.
  
  Loading...
  
  Reply
Andrew Andreyev June 3, 2023 at 9:58 pm

Great summary of a complex workflow. Appreciated!

Loading...

Reply
Fred June 4, 2023 at 8:47 pm

You can also simply install FractalGPT and use YOUR private Azure OpenAI subscription.

Loading...

Reply
Vermandel Michiel June 5, 2023 at 6:47 am

I followed your instructions, and everything is working… sort of… PrivateGPT seems to retrieve the document(s) that fit the query the best and then just returns that document, instead of interpreting it and using it to build an appropriate answer.

Loading...

Reply
- Ben Dickson June 5, 2023 at 7:59 am
  
  This is interesting. Can you share a bit more? Which model are you using? Also, how many documents do you have and how long are they on general?
  
  Loading...
  
  Reply
Sam P June 5, 2023 at 6:17 pm

It seems like an environment variable called “persist_directory” needs to be set. I tried setting it to db and it helped me get through to the next error message(!)

Loading...

Reply
- Bobf July 1, 2023 at 9:08 pm
  
  When you clone the PrivateGPT repo, it includes a file, example.env. You need to copy that file to .env in that same folder. This will set the required environment variables for ingest.py
  
  Loading...
  
  Reply
josh kincannon June 14, 2023 at 2:48 am

I am trying to use this to review medical records and have the program answer questions about those records for civil litigation, i’m a noob. I got it running with the groovy bin model and also vicuna 13b-q5_1. I have a powerful pc, 13th gen intel I9, 96gb ram. I have put my records, about 125 pages, into source folder, in pdf. I ask it the patient name, and I time it’s response. I’ve used both models, and tried putting the data in different format. it almost got the answer correct the first time, every time since then I don’t get the correct response, it hiccups and loops, however when it is citing to the page where it got the info it list the correct response in that data, but it did not properly extract and give it as an answer. Same with height and weight. I believe this is a problem with the underlying model, and that if I used a clinical language model like gatortron it might work better, but I can’t get gatortron to work. Should the system as I’ve described it with the models I have installed work better at extracting the proper answers to my simple questions about data contained in a pdf?

Loading...

Reply
David July 3, 2023 at 2:46 pm

Can pages that are constantly being edited be stored as a source in the database?

Loading...

Reply
- Ben Dickson July 5, 2023 at 7:36 pm
  
  Yes. But you have to update their record in the vector DB every time the file’s content is modified.
  
  Loading...
  
  Reply
Ian Skidmore July 7, 2023 at 4:22 pm

I followed the instructions in the project README and ingested the example ‘state of the union’ text file, but when I tried to run the privateGPT script, I got the following error:
Found model file at models/ggml-gpt4all-j-v1.3-groovy.bin
Invalid model file
Traceback (most recent call last):
File “C:\Users\Ian\python\privateGPT\privateGPT.py”, line 83, in
main()
File “C:\Users\Ian\python\privateGPT\privateGPT.py”, line 38, in main
llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend=’gptj’, n_batch=model_n_batch, callbacks=callbacks, verbose=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “pydantic\main.py”, line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for GPT4All
__root__
Unable to instantiate model (type=value_error)

I’m running on a Windows machine – I presume that the .bin file format is correct? Or is there a specific binary version for Windows that I need to access?

Loading...

Reply
Vitaly Karasik August 3, 2023 at 3:24 am

Many thanks!
Two questions:
1) Regarding “2. Embedding model’ – I understand that Open-source LLM is pluggable. What about Embedding model? Where is coming from? Is it different between GPT4all and PrivateGPT for example?
2) If I replace LLM model, should I ingest my documents again?

Loading...

Reply
- Ben Dickson August 4, 2023 at 5:50 am
  
  Hi Vitaly,
  Good questions!
  1) The embedding model used in this project is the SentenceTransformer from Hugging Face. It is independent of the LLM.
  2) If you replace the LLM, you do not need to ingest the documents again. But if you change your embedding model, you have to do so. The key is to use the same model to 1) embed the documents and store them in the vector DB and 2) embed user prompts to retrieve documents from the vector DB.
  Hope this helps!
  
  Loading...
  
  Reply
Michal October 11, 2023 at 2:43 pm

Hey,
do any of those models handle inputs in various languages (other then English)? I wonder if the model will be able to understand the knowledge documents e.g. in Polish, Spanish and English and compile them into an answer in English.

Loading...

Reply

Will infinite context windows kill LLM fine-tuning and RAG?

How to turn any LLM into an embedding model

AI in healthcare: Real-world applications for cost-savings and innovation

Stanford’s ReFT fine-tunes LLMs at a fraction of the cost

How generative AI is transforming the shopping experience

Fine-tune a Llama-2 language model with a single instruction

What to know about the rising threat of deepfake scams

4 reasons to use open-source LLMs (especially after the OpenAI drama)

No-code retrieval augmented generation (RAG) with LlamaIndex and ChatGPT

How to make your LLMs lighter with GPTQ quantization

What to know about open-source alternatives to GPT-4 Vision

The complete guide to LLM compression

A simple guide to gradient descent in machine learning

The complete guide to LLM fine-tuning

What is low-rank adaptation (LoRA)?

What to know about the security of open-source machine learning models

Understanding the impact of open-source language models

What we learned from the deep learning revolution

AI21 Labs’ mission to make large language models get their facts…

Democratizing the hardware side of large language models

How to create a private ChatGPT that interacts with your local documents

The private LLM structure

Private LLM workflow

PrivateGPT

Like this:

22 COMMENTS

Leave a ReplyCancel reply

The private LLM structure

Private LLM workflow

PrivateGPT

Like this:

22 COMMENTS

Leave a ReplyCancel reply

Discover more from TechTalks