Diving into your personal or company's documents requires sometimes a fully private RAG. This article provides a step-by-step guide on how to set up a free, ready-to-use RAG chatbot that runs locally on your machine.
Plus, you will become able to test several models using the same configuration, like a pro!
Let's go.
Assuming you already have downloaded and installed LMStudio and AnythingLLM
LMStudio:
The LMStudio interface provides a user-friendly platform for browsing models, evaluating both base and fine-tuned models, and accessing the extensive collection of models available in the Hugging Face repository.
With LMStudio, you can see a model download statistics, choose from numerous quantization options, and observe real-time data on memory consumption (while running) and GPU compatibility (model info).
To load and evaluate a local LLM within LM Studio, navigate to the "Chat interface", on the left panel. Among other settings, you can enable maximum GPU offload to run the model with optimised computational resources.
Note that within LMStudio, there is no RAG feature available. You need to use AnythingLLM for this.
Add more context, use LMStudio with AnythingLLM:
Now go to LMStudio, you will configure the inference server to use with AnythingLLM. In order to configure this server, go on the the left-hand panel's "Local Server" tab.
Set the port you want to use the server with, or keep it by default. Clicking the "Start Server" button initiates a local inference server. As long as you don't have another server accessible through the same port, it will work.
Inside Anything LLM, go to the instance setting menu, then go to the LLM preferences menu (both left panels) For "LLM provider" set LMStudio and configure the base URL in the settings window, alongside setting up the token input field, following your model's maximum token window.
In "Embedding Preferences" and "Vector Database", maintain default settings for embeddings and vector database configurations.
You can now embed documents directly into AnythingLLM by dragging and dropping files onto the designated area in the "My Documents" window. You will find this window by going to the left panel, where you have an overview of the current workspaces and threads. Chose a thread. Clicking the small load icon will open the document window.
Select the documents you want to work with, move them to the workspace and save the vector database.
Return to your thread and start asking questions relevant to the documents you just added.
And voilà.
Feel free to direct message me on Twitter; I'd be happy to assist and provide additional #OpenSource AI software resources.
For more advanced RAG solutions, please visit our company website and drop us a line!