List of Different Ways to Run LLMs Locally

In this article we will see different ways to run any LLMs locally, Ping this article so you can test everything or go back when needed.

Dr. Walid Soula

ILLUMINATION

· ~6 min read · March 26, 2024 (Updated: March 27, 2024) · Free: Yes

From the list that I will provide, I mainly use LMStudio, Ollama, Jan.ai, and Transformers. A special mention to Pinokio, a very good platform!

1/ LMStudio

LM Studio is a desktop application for running local LLMs on your computer. Link: https://lmstudio.ai/

2/ Ollama

Ollama is a tool that allows you to run open-source large language models (LLMs) locally on your machine. It supports a variety of models, including Llama 2, Code Llama, and others. It bundles model weights, configuration, and data into a single package, defined by a Modelfile. Link: https://ollama.com/

3/ Hugging Face and Transformers

Hugging Face is the Docker Hub equivalent for Machine Learning and AI, offering an overwhelming array of open-source models. Hugging Face also provides transformers, a Python library that streamlines running a LLM locally. Example how to run Phi 2 from Microsoft

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device("cuda")

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

More Info : https://huggingface.co/microsoft/phi-2

4/ LangChain

LangChain is a Python framework for building AI applications. It provides abstractions and middleware to develop your AI application on top of one of its supported models. For example, the following code asks one question to the Microsoft/DialoGPT-medium model:

from langchain.llms.huggingface_pipeline import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="microsoft/DialoGPT-medium", task="text-generation", pipeline_kwargs={"max_new_tokens": 200, "pad_token_id": 50256},
)

from langchain.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))

Link: https://www.langchain.com/

5/ Llama.cpp

Llama.cpp is a C and C++ based inference engine for LLMs, optimized for Apple silicon and running Meta's Llama2 models. Link: https://github.com/ggerganov/llama.cpp

6/ Lamafile

Developed by Mozilla, offers a user-friendly alternative for running LLMs. Llamafile is known for its portability and the ability to create single-file executables.Link : https://github.com/Mozilla-Ocho/llamafile

7/ Jan.ai

Jan turns your computer into an AI machine by running LLMs locally on your computer. It's a privacy-focus, local-first, open-source solution. Link: https://jan.ai/

8/ llm

LLM by Simon Willison is one of the easier ways I've seen to download and use open source LLMs locally on your own machine. While you do need Python installed to run it, you shouldn't need to touch any Python code. If you're on a Mac and use Homebrew, just install with

pip install llm

LLM defaults to using OpenAI models, but you can use plugins to run other models locally. For example, if you install the gpt4all plugin, you'll have access to additional local models from GPT4All. There are also plugins for llama, the MLC project, and MPT-30B, as well as additional remote models.

Install a plugin on the command line with LLM install model-name :

llm install llm-gpt4all

To send a query to a local LLM, use the syntax:

llm -m the-model-name "Your query"

9/ GPT4ALL

GPT4ALL is an easy-to-use desktop application with an intuitive GUI. It supports local model running and offers connectivity to OpenAI with an API key. It stands out for its ability to process local documents for context, ensuring privacy. Link: https://gpt4all.io/index.html

10/ H2OGPT

h2oGPT simplifies the process of creating a private LLM. It includes a large language model, an embedding model, a database for document embeddings, a command-line interface, and a graphical user interface.

Put anything on the username and password, you can test it here: https://gpt.h2o.ai/

Link: https://github.com/h2oai/h2ogpt

11/ LocalLLM

You can also run local LLM with it as the name suggests! Link: https://github.com/GoogleCloudPlatform/localllm

12/ Oobabooga

A Gradio web UI for Large Language Models. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation. Link: https://github.com/oobabooga/text-generation-webui

13/ Koboldcpp

You can download the latest version of it from the following link: https://github.com/LostRuins/koboldcpp/releases.

14/ LocalAI

LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic… ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, and audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. Link: https://github.com/mudler/LocalAI

15/ EXUI

This is a simple, lightweight browser-based UI for running local inference using ExLlamaV2. Link: https://github.com/turboderp/exui

16/ vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving. The installation is very easy

pip install vllm

Link: https://github.com/vllm-project/vllm

17/ MLX

MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research. You can also host your local LLM too

Link: https://github.com/ml-explore/mlx

18/ CTranslate2

CTranslate2 is a C++ and Python library for efficient inference with Transformer models. The following model types are currently supported:

Encoder-decoder models: Transformer base/big, M2M-100, NLLB, BART, mBART, Pegasus, T5, Whisper
Decoder-only models: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, MPT, Llama, Mistral, Gemma, CodeGen, GPTBigCode, Falcon
Encoder-only models: BERT, DistilBERT, XLM-RoBERTa

Link: https://github.com/OpenNMT/CTranslate2

19/ Pinokio

A platform proposing multiple solutions not only hosting LLMs. Link: https://pinokio.computer/

22/ PowerInfer

PowerInfer is a CPU/GPU LLM inference engine leveraging activation locality for your device. Link: https://github.com/SJTU-IPADS/PowerInfer

23/ MLC-LLM

MLC LLM is a universal solution that allows any language model to be deployed natively on a diverse set of hardware backends and native applications. Link: https://llm.mlc.ai/

24/ TXTAI

Run and use any LLM. Link: https://github.com/neuml/txtai

25/ RayLLM

RayLLM (formerly known as Aviary) is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs, built on Ray Serve. Link: https://github.com/ray-project/ray-llm

26/ LlamaSharp

The C#/.NET binding of llama.cpp. It provides higher-level APIs to inference the LLaMA Models and deploys it on a local device with C#/.NET. It works on Windows, Linux, and Mac without the need to compile llama.cpp yourself. Even without a GPU or not enough GPU memory, you can still use LLaMA models! Link: https://github.com/SciSharp/LLamaSharp

27/ LMQL

You can install LMQL locally or use the web-based Playground IDE. For the use of self-hosted models via Transformers or llama.cpp, you have to install LMQL locally. To install LMQL Locally :

pip install lmql

#Running LMQL Programs
lmql playground

More info:

Docs: https://lmql.ai/docs/
Playground: https://lmql.ai/playground/

28/ AvaPLS

Ava PLS is an open-source desktop application for running language models locally on your computer. It allows you to perform various language tasks, like text generation, grammar correction, rephrasing, summarization, data extraction, and more. Link: https://avapls.com/

29/ LiteLLM

Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, etc.] Link: https://github.com/BerriAI/litellm

30/ HammerAI

Nor really using LLM for AI APP, but this app provides a chat experience using an LLM in your own machine, with HammerAI you can chat with role-playing AI characters that run locally in your browser — 100% free and completely private. Link: https://www.hammerai.com/