From the list that I will provide, I mainly use LMStudio, Ollama, Jan.ai, and Transformers. A special mention to Pinokio, a very good platform!
1/ LMStudio
LM Studio is a desktop application for running local LLMs on your computer. Link: https://lmstudio.ai/
2/ Ollama
Ollama is a tool that allows you to run open-source large language models (LLMs) locally on your machine. It supports a variety of models, including Llama 2, Code Llama, and others. It bundles model weights, configuration, and data into a single package, defined by a Modelfile. Link: https://ollama.com/
3/ Hugging Face and Transformers
Hugging Face is the Docker Hub equivalent for Machine Learning and AI, offering an overwhelming array of open-source models. Hugging Face also provides transformers, a Python library that streamlines running a LLM locally. Example how to run Phi 2 from Microsoft
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)
inputs = tokenizer('''def print_prime(n):
"""
Print all primes between 1 and n
"""''', return_tensors="pt", return_attention_mask=False)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)More Info : https://huggingface.co/microsoft/phi-2
4/ LangChain
LangChain is a Python framework for building AI applications. It provides abstractions and middleware to develop your AI application on top of one of its supported models. For example, the following code asks one question to the Microsoft/DialoGPT-medium model:
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
hf = HuggingFacePipeline.from_model_id(
model_id="microsoft/DialoGPT-medium", task="text-generation", pipeline_kwargs={"max_new_tokens": 200, "pad_token_id": 50256},
)
from langchain.prompts import PromptTemplate
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
chain = prompt | hf
question = "What is electroencephalography?"
print(chain.invoke({"question": question}))Link: https://www.langchain.com/
5/ Llama.cpp
Llama.cpp is a C and C++ based inference engine for LLMs, optimized for Apple silicon and running Meta's Llama2 models. Link: https://github.com/ggerganov/llama.cpp
6/ Lamafile
Developed by Mozilla, offers a user-friendly alternative for running LLMs. Llamafile is known for its portability and the ability to create single-file executables.Link : https://github.com/Mozilla-Ocho/llamafile
7/ Jan.ai
Jan turns your computer into an AI machine by running LLMs locally on your computer. It's a privacy-focus, local-first, open-source solution. Link: https://jan.ai/
8/ llm
LLM by Simon Willison is one of the easier ways I've seen to download and use open source LLMs locally on your own machine. While you do need Python installed to run it, you shouldn't need to touch any Python code. If you're on a Mac and use Homebrew, just install with
pip install llmLLM defaults to using OpenAI models, but you can use plugins to run other models locally. For example, if you install the gpt4all plugin, you'll have access to additional local models from GPT4All. There are also plugins for llama, the MLC project, and MPT-30B, as well as additional remote models.
Install a plugin on the command line with LLM install model-name :
llm install llm-gpt4allTo send a query to a local LLM, use the syntax:
llm -m the-model-name "Your query"9/ GPT4ALL
GPT4ALL is an easy-to-use desktop application with an intuitive GUI. It supports local model running and offers connectivity to OpenAI with an API key. It stands out for its ability to process local documents for context, ensuring privacy. Link: https://gpt4all.io/index.html
10/ H2OGPT
h2oGPT simplifies the process of creating a private LLM. It includes a large language model, an embedding model, a database for document embeddings, a command-line interface, and a graphical user interface.
Put anything on the username and password, you can test it here: https://gpt.h2o.ai/
Link: https://github.com/h2oai/h2ogpt
11/ LocalLLM
You can also run local LLM with it as the name suggests! Link: https://github.com/GoogleCloudPlatform/localllm
12/ Oobabooga
A Gradio web UI for Large Language Models. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation. Link: https://github.com/oobabooga/text-generation-webui
13/ Koboldcpp
You can download the latest version of it from the following link: https://github.com/LostRuins/koboldcpp/releases.
14/ LocalAI
LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic… ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, and audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. Link: https://github.com/mudler/LocalAI
15/ EXUI
This is a simple, lightweight browser-based UI for running local inference using ExLlamaV2. Link: https://github.com/turboderp/exui
16/ vLLM
vLLM is a fast and easy-to-use library for LLM inference and serving. The installation is very easy
pip install vllmLink: https://github.com/vllm-project/vllm
17/ MLX
MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research. You can also host your local LLM too
Link: https://github.com/ml-explore/mlx
18/ CTranslate2
CTranslate2 is a C++ and Python library for efficient inference with Transformer models. The following model types are currently supported:
- Encoder-decoder models: Transformer base/big, M2M-100, NLLB, BART, mBART, Pegasus, T5, Whisper
- Decoder-only models: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, MPT, Llama, Mistral, Gemma, CodeGen, GPTBigCode, Falcon
- Encoder-only models: BERT, DistilBERT, XLM-RoBERTa
Link: https://github.com/OpenNMT/CTranslate2
19/ Pinokio
A platform proposing multiple solutions not only hosting LLMs. Link: https://pinokio.computer/
22/ PowerInfer
PowerInfer is a CPU/GPU LLM inference engine leveraging activation locality for your device. Link: https://github.com/SJTU-IPADS/PowerInfer
23/ MLC-LLM
MLC LLM is a universal solution that allows any language model to be deployed natively on a diverse set of hardware backends and native applications. Link: https://llm.mlc.ai/
24/ TXTAI
Run and use any LLM. Link: https://github.com/neuml/txtai
25/ RayLLM
RayLLM (formerly known as Aviary) is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs, built on Ray Serve. Link: https://github.com/ray-project/ray-llm
26/ LlamaSharp
The C#/.NET binding of llama.cpp. It provides higher-level APIs to inference the LLaMA Models and deploys it on a local device with C#/.NET. It works on Windows, Linux, and Mac without the need to compile llama.cpp yourself. Even without a GPU or not enough GPU memory, you can still use LLaMA models! Link: https://github.com/SciSharp/LLamaSharp
27/ LMQL
You can install LMQL locally or use the web-based Playground IDE. For the use of self-hosted models via Transformers or llama.cpp, you have to install LMQL locally. To install LMQL Locally :
pip install lmql
#Running LMQL Programs
lmql playgroundMore info:
- Docs: https://lmql.ai/docs/
- Playground: https://lmql.ai/playground/
28/ AvaPLS
Ava PLS is an open-source desktop application for running language models locally on your computer. It allows you to perform various language tasks, like text generation, grammar correction, rephrasing, summarization, data extraction, and more. Link: https://avapls.com/
29/ LiteLLM
Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, etc.] Link: https://github.com/BerriAI/litellm
30/ HammerAI
Nor really using LLM for AI APP, but this app provides a chat experience using an LLM in your own machine, with HammerAI you can chat with role-playing AI characters that run locally in your browser — 100% free and completely private. Link: https://www.hammerai.com/
31/ Bedrock/VertexAI
Google and Amazon solution to host LLMs
If you have anything else, please share it in the comments section, so we can test it and add it!
Please consider the following for more articles about business, data science, machine learning, and extended reality.
You can find my lists in the following links :
Statistics & Data Science: https://medium.com/@soulawalid/list/statistics-data-science-65305693779d
Business: https://medium.com/@soulawalid/list/business-1528f08575a7
Quantum Machine Learning: https://medium.com/@soulawalid/list/qml-be0b06f7a986
Extended Reality: https://medium.com/@soulawalid/list/extended-reality-bf03607b0b80
Neuromarketing: https://medium.com/@soulawalid/list/neuromarketing-8f94149e3c73
If you have any questions, you can ask me on LinkedIn, here is my profile: https://www.linkedin.com/in/oualid-soula/ Let's connect!