Stop paying for ChatGPT with these two tools | LMStudio x AnythingLLM

Tim Carambat
22 Feb 202411:12

Summary

TLDRIn this tutorial, Timothy Carat introduces viewers to a simple method for running a powerful, locally-hosted AI chat application using LM Studio and Anything LLM. He demonstrates how to install both tools, explore popular models on LM Studio, and set up a local server for Anything LLM. By integrating these platforms, users can leverage the latest open-source models on Hugging Face for a comprehensive, private AI chat experience without monthly fees.

Takeaways

  • 🚀 Timothy Carat introduces LM Studio and Anything LLM as tools for running a capable, locally-hosted conversational AI on your personal computer.
  • 💻 Both LM Studio and Anything LLM are single-click, installable applications that are compatible with various operating systems, including Windows.
  • 🌐 LM Studio supports multiple operating systems and is particularly beneficial when used with a GPU for enhanced performance.
  • 🔗 Anything LLM is an all-in-one chat application that is fully private, open-source, and can connect to a wide range of services.
  • 🆓 The integration of LM Studio and Anything LLM provides a comprehensive, cost-free LLM experience.
  • 📱 LM Studio comes with a built-in chat client for experimenting with models, but for more capabilities, Anything LLM is recommended.
  • 🔍 Users can download various models from the Hugging Face repository through LM Studio, with options for different sizes and compatibility with the user's system.
  • 🚦 LM Studio allows for server configuration to run completions against the chosen model, including setting up a local server for model interaction.
  • 🗂️ Anything LLM can be augmented with private documents or web scraping to provide the model with additional context for more accurate responses.
  • 🔄 The combination of LM Studio and Anything LLM creates a fully private, end-to-end system for chatting and document interaction without reliance on subscription services like OpenAI.
  • 📈 The choice of model in LM Studio and Anything LLM significantly impacts the user experience, with more capable and niche models available for specific tasks.

Q & A

  • Who is the speaker in the transcript and what is his role?

    -The speaker is Timothy Carat, the founder of Implex Labs and creator of Anything LLM.

  • What are the two tools mentioned in the transcript for running a local LLM application?

    -The two tools mentioned are LM Studio and Anything LLM Desktop.

  • What is the significance of having a GPU for running these applications?

    -Having a GPU enhances the experience by allowing for faster token processing and the ability to use more powerful models through full GPU offloading.

  • Is Anything LLM open-source? What are the implications of this?

    -Yes, Anything LLM is open-source, which means that users with programming skills can contribute to its development and add their own integrations.

  • How does LM Studio help in downloading and exploring different models?

    -LM Studio provides a user-friendly interface for downloading models from the Hugging Face repository, exploring popular models, and checking their compatibility with the user's system.

  • What is the process for setting up Anything LLM Desktop?

    -After installing Anything LLM Desktop, users need to start the application, which typically lands them on a screen where they can begin interacting with the AI.

  • How does LM Studio integrate with Anything LLM?

    -LM Studio can be set up to run a local server that interacts with the Anything LLM model, allowing users to chat with the model and leverage its capabilities privately on their own machine.

  • What is the importance of embedding information in Anything LLM?

    -Embedding information enhances the model's understanding and provides it with context, leading to more accurate and relevant responses when interacting with users.

  • How does the speaker demonstrate the capabilities of the integrated system?

    -The speaker demonstrates the system by scraping a website, embedding the information into Anything LLM, and then asking questions to show how the model can provide more accurate responses with the added context.

  • What are the cost implications of using LM Studio and Anything LLM Desktop?

    -Using LM Studio and Anything LLM Desktop does not require a monthly subscription fee to OpenAI, making it a cost-effective solution for running local LLM applications.

  • What is the speaker's final recommendation for users interested in local LLM applications?

    -The speaker recommends integrating LM Studio and Anything LLM Desktop as a core part of their local LLM stack for a fully private, end-to-end chatting and document handling system without the need for external subscription services.

Outlines

00:00

🚀 Introduction to Locally Running LLMs with Timothy Carat

Timothy Carat, founder of Implex Labs, introduces viewers to a simple method for running powerful AI language models (LLMs) locally on their computers, utilizing GPU or CPU. He mentions two single-click installable applications: LM Studio and Anything LLM Desktop. The tutorial focuses on setting up Windows, but the process is similar for other operating systems. Timothy emphasizes the privacy and open-source nature of Anything LLM, highlighting its capabilities and potential for integration through contributions.

05:02

📱 Exploring LM Studio and Downloading Models

The tutorial delves into the functionalities of LM Studio, including exploring popular models like Google's Gemma and downloading them for use. It explains the process of selecting compatible models based on system specifications and downloading them, which could be time-consuming. Timothy demonstrates how to use the built-in chat client in LM Studio to interact with models and the importance of selecting the right model for optimal performance.

10:03

🤖 Integrating Anything LLM with LM Studio for Enhanced Capabilities

Timothy shows how to integrate Anything LLM with LM Studio to enhance the capabilities of the local LLM setup. He guides through the process of setting up Anything LLM, connecting it to the LM Studio inference server, and using it to chat with the model. The tutorial highlights the ability to augment the model's knowledge with private documents or web scraping, resulting in more accurate and contextually relevant responses. The integration allows for a fully private, end-to-end system for chatting and document interaction without the need for external subscription fees.

🌐 Conclusion: Empowering Local LLM Usage with LM Studio and Anything LLM

The conclusion emphasizes the ease and potential of using LM Studio and Anything LLM together for local LLM applications. It encourages viewers to explore the integration and take advantage of the open-source models available on Hugging Face. Timothy invites feedback and assures that the right model choice can significantly enhance the user experience, suggesting popular and niche models for various applications.

Mindmap

Keywords

💡Implex Labs

Implex Labs is the company founded by Timothy Carat, the speaker in the transcript. It is the developer of Anything LLM and is focused on providing AI solutions. In the context of the video, Implex Labs is the entity behind the creation of the software that allows for local running of AI models, emphasizing privacy and integration capabilities.

💡Anything LLM

Anything LLM is an all-in-one chat application that is fully private and can connect to various platforms. It is open-source, allowing users with programming skills to add integrations or contribute to its development. The application is designed to provide a comprehensive AI experience without any costs to the user.

💡LM Studio

LM Studio is a single-click installable application that supports different operating systems and is used to manage and interact with AI models. It allows users to download models from the Hugging Face repository, check compatibility with the user's system, and set up a local server for running AI completions.

💡Hugging Face Repository

The Hugging Face Repository is a platform where various AI models are published and made available for download. It is a resource for developers and users to access a wide range of models, including those used in the transcript for LM Studio and Anything LLM.

💡GPU Offloading

GPU Offloading refers to the process of using the GPU to handle computational tasks, which in the context of AI models, can significantly speed up the processing of tokens and improve the overall performance of the AI application.

💡Quantization

Quantization is the process of reducing the precision of a model's parameters to save space and computational resources. In AI models, this is often done to create smaller, faster, and more efficient models without significantly compromising on performance. The transcript mentions Q4 and Q5 models, which are quantized versions of AI models.

💡Tokenization

Tokenization in the context of AI models refers to the process of breaking down text into individual units or tokens that the model can process. It is a crucial step in natural language processing and is directly related to the speed and efficiency of AI interactions.

💡Context Window

The context window refers to the amount of text or information that an AI model can consider at one time. A larger context window allows the model to understand and generate responses based on more extensive input, which can lead to more coherent and relevant outputs.

💡Embedding

Embedding in AI refers to the process of converting text or data into numerical representations that can be understood and processed by machine learning models. In the context of the video, embedding is used to enhance the AI's understanding of information, such as web pages, to improve its responses.

💡Local AI

Local AI refers to AI models and applications that run on a user's personal devices, such as laptops or desktops, rather than relying on cloud-based services. This approach emphasizes privacy, control, and the ability to use AI tools without continuous internet connection or additional costs.

💡Open Source

Open source refers to software or tools whose source code is made available to the public, allowing for free use, modification, and distribution. In the context of the video, open source AI models like Anything LLM enable users to customize and integrate the AI into their systems without restrictions or costs.

Highlights

Timothy Carat introduces himself as the founder of Implex Labs and creator of Anything LLM.

The presentation aims to show the easiest way to run a capable, locally-executing, fully private AI chat application like Anything LLM on a personal computer.

Two single-click installable applications are used for this process: LM Studio and Anything LLM Desktop.

LM Studio supports multiple operating systems, with the demonstration focusing on the Windows version due to GPU availability.

Anything LLM is an all-in-one chat application that is fully private and can connect to various platforms, offering a lot of features for free.

Anything LLM is fully open-source, allowing for community contributions and integrations.

The tutorial begins with installing LM Studio and Anything LLM on a Windows machine.

LM Studio's interface includes an exploring page that showcases popular models like Google's Gemma.

Downloading models from the Hugging Face repository can take a significant amount of time, depending on the model size and internet speed.

LM Studio provides options for GPU offloading, allowing for faster token processing with compatible graphics cards.

LM Studio includes a simple chat client for experimenting with models, but it has limited functionality.

Anything LLM is downloaded and set up to work with LM Studio, providing a more powerful and feature-rich chat experience.

LM Studio's local server tab is used to configure and run a server for model completions, which is essential for multi-model support.

Anything LLM's workspace is created, and the model's context window and token limit are set up for optimal interaction.

LM Studio allows for scraping websites to provide additional context to the AI model, enhancing its understanding and responses.

The integration of LM Studio and Anything LLM creates a fully private, end-to-end system for chatting and document interaction without the need for external subscription services.

The choice of model used in the setup will significantly impact the user experience, with more capable and niche models available for specific tasks.

Transcripts

00:00

hey there my name is Timothy carat

00:01

founder of implex labs and creator of

00:03

anything llm and today I actually want

00:05

to show you possibly the easiest way to

00:08

get a very extremely capable locally

00:12

running fully rag like talk to anything

00:16

with any llm application running on

00:19

honestly your laptop a desktop if you

00:22

have something with the GPU this will be

00:24

a way better experience if all you have

00:26

is a CPU this is still possible and

00:28

we're going to use two tools

00:30

both of which are a single-click

00:31

installable application and one of them

00:34

is LM studio and the other is of course

00:37

anything LM desktop right now I'm on LM

00:40

studio. a they have three different

00:42

operating systems they support we're

00:44

going to use the windows one today

00:46

because that's the machine that I have a

00:49

GPU for and I'll show you how to set it

00:50

up how the chat normally works and then

00:52

how to connect it to anything LM to

00:54

really unlock a lot of its capabilities

00:57

if you aren't familiar with anything llm

00:59

anything llm is is an all-in-one chat

01:01

with anything desktop application it's

01:03

fully private it can connect to pretty

01:06

much anything and you get a whole lot

01:08

for actually free anything in LM is also

01:11

fully open source so if you are capable

01:13

of programming or have an integration

01:14

you want to add you can actually do it

01:16

here and we're happy to accept

01:18

contributions so what we're going to do

01:20

now is we're going to switch over to my

01:22

Windows machine and I'm going to show

01:24

you how to use LM studio with anything

01:27

LM and walking through both of the

01:29

products so that you can really get

01:31

honestly like the most comprehensive llm

01:34

experience and pay nothing for it okay

01:36

so here we are on my Windows desktop and

01:39

of course the first thing we're going to

01:40

want to do is Click LM Studio for

01:43

Windows this is version

01:46

0.216 whatever version you might be on

01:48

things may change a little bit but in

01:50

general this tutorial should be accurate

01:52

you're going to want to go to use

01:54

anything.com go to download anything LM

01:56

for desktop and select your appropriate

01:58

operating system once you have these two

02:00

programs installed you are actually 50%

02:04

done with the entire process that's how

02:06

quick this was let me get LM Studio

02:08

installed and running and we'll show you

02:10

what that looks like so you've probably

02:11

installed LM Studio by now you click the

02:13

icon on your desktop and you usually get

02:15

dropped on this screen I don't work for

02:17

LM studio so I'm just going to show you

02:19

kind of some of the capabilities that

02:20

are relevant to this integration and

02:22

really unlocking any llm you use they

02:25

kind of land you on this exploring page

02:27

and this exploring page is great it

02:28

shows you basically some of the more

02:30

popular models that exist uh like

02:32

Google's Gemma just dropped and it's

02:34

already live that's really awesome if

02:36

you go down here into if you click on

02:38

the bottom you'll see I've actually

02:40

already downloaded some models cuz this

02:42

takes time downloading the models will

02:45

probably take you the longest time out

02:46

of this entire operation I went ahead

02:48

and downloaded the mistal 7B instruct

02:51

the Q4 means 4bit quantized model now

02:55

I'm using a Q4 model honestly Q4 is kind

02:59

of the lowest end you should really go

03:00

for Q5 is really really great Q8 if you

03:04

want to um if you actually go and look

03:07

up any model on LM Studio like for

03:10

example let's look up mistol as you can

03:12

see there's a whole bunch of models here

03:14

for mistol there's a whole bunch of

03:15

different types these are all coming

03:17

from the hugging face repository and

03:20

there's a whole bunch of different types

03:21

that you can find here published by

03:23

bunch of different people you can see

03:25

that you know how many times this one

03:27

has been downloaded this is a very

03:29

popular model and once you click on it

03:31

you'll likely get some options now LM

03:33

studio will tell you if the model is

03:35

compatible with your GPU or your system

03:39

this is pretty accurate I've found that

03:41

sometimes it doesn't quite work um one

03:43

thing you'll be interested in is full

03:44

GPU offloading exactly what it sounds

03:47

like using the GPU as much as you can

03:49

you'll get way faster tokens something

03:52

honestly on the speed level of a chat

03:54

GPT if you're working with a small

03:56

enough model or have a big enough

03:58

graphics card I have 12 gigs of vram

04:01

available and you can see there's all

04:02

these Q4 models again you probably want

04:05

to stick with the Q5 models at least uh

04:08

for the best experience versus size as

04:12

you can see the Q8 is quite Hefty 7.7

04:15

gigs which even if you have fast

04:17

internet won't matter because it takes

04:19

forever to download something from

04:21

hugging face if you want to get working

04:23

on this in the day you might want to

04:24

start to download now for the sake of

04:26

this video I've already downloaded a

04:28

model so now that we have a model

04:30

downloaded we're going to want to try to

04:32

chat with it LM Studio actually comes

04:34

with a chat client inside of it it's

04:37

very very simplistic though and it's

04:39

really just for experimenting with

04:41

models we're going to want to go to this

04:43

chat bubble icon and you can see that we

04:45

have a thread already started and I'm

04:47

going to want to pick the one model that

04:49

I have available and you'll see this

04:51

loading bar continue There are some

04:53

system prompts that you can preset for

04:56

the model I have GPU offloading enabled

04:59

and I've set it to Max already and as

05:02

you can see I have Nvidia Cuda already

05:04

going there are some tools there are

05:06

some other things that you can mess with

05:08

but in general that's really all you

05:10

need to do so let's test the chat and

05:12

let's just say hello how are you and you

05:15

get the pretty standard response from

05:17

any AI model and you even get some

05:19

really cool metrics down here like time

05:21

to First token was 1.21 seconds I mean

05:24

really really kind of cool showing the

05:26

GPU layers that are there however you

05:29

really can't get much out of this right

05:32

here if you wanted to add a document

05:34

you'd have to copy paste it into the

05:36

entire user prompt there's really just a

05:38

lot more that can be done here to

05:40

Leverage The Power of this local llm

05:42

that I have running even though it's a

05:45

quite small one so to really kind of

05:47

Express how powerful these models can be

05:50

for your own local use we're going to

05:52

use anything llm now I've already

05:54

downloaded anything llm let me show you

05:56

how to get that running and how to get

05:57

to LM Studio to work work with anything

06:00

llm just booted up anything llm after

06:03

installing it and you'll usually land on

06:05

a screen like this let's get started we

06:07

already know who we're looking for here

06:09

LM studio and you'll see it asks for two

06:11

pieces of information a token context

06:14

window which is a property of your model

06:16

that you'd already be familiar with and

06:18

then the LM Studio base URL if we open

06:21

up LM studio and go to this local server

06:24

tab on the side this is a really really

06:27

cool part of LM Studio this doesn't work

06:29

with multimodel support So once you have

06:32

a model selected that's the model that

06:34

you are going to be using so here we're

06:36

going to select the exact same model but

06:39

we're going to start a server to run

06:42

completions against this model so the

06:44

way that we do that is we can configure

06:46

the server Port usually it's 1 2 3 4 but

06:49

you can change it to whatever you want

06:51

you probably want to turn off cores

06:53

allow request queuing so you can keep

06:55

sending requests over and over and they

06:57

don't just fail you want to enable log

06:59

buing and prompt formatting these are

07:01

all just kind of debugging tools on the

07:03

right side you are going to still want

07:05

to make sure that you have GPU

07:06

offloading allowed if that is

07:08

appropriate but other than that you just

07:10

click Start server and you'll see that

07:12

we get some logs saved here now to

07:14

connect the LM Studio inference server

07:17

to anything llm you just want to copy

07:20

this string right here up to the V1 part

07:23

and then you're going to want to open

07:24

anything ilm paste that into here I know

07:28

that my models Max to token window is

07:31

496 I'll click next embedding preference

07:34

we don't really even need one we can

07:36

just use the anything LM built in EMB

07:38

better which is free and private same

07:40

for the vector database all of this is

07:42

going to be running on machines that I

07:45

own and then of course we can skip the

07:47

survey and let's make a our first

07:49

workspace and we'll just call it

07:51

anything llm we don't have any documents

07:54

or anything like that so if we were to

07:55

send a chat asking the model about

07:57

anything llm will'll either get get a

07:59

refusal response or it will just make

08:02

something up so let's ask what is

08:05

anything llm and if you go to LM Studio

08:07

during any part you can actually see

08:10

that we sent the requests to the model

08:12

and it is now streaming the response

08:15

first token has been generated

08:17

continuing to stream when anything llm

08:19

does receive that first token stream

08:22

this is when we will uh start to show it

08:25

on our side and you can see that we get

08:27

a response it just kind of pops up

08:28

instantly uh which was very quick but it

08:31

is totally wrong and it is wrong because

08:33

we actually don't have any context to

08:37

give the model on what anything llm

08:39

actually is now we can augment the lm's

08:43

ability to know about our private

08:45

documents by clicking and adding them

08:48

here or I can just go and scrape a

08:50

website so I'm going to go and scrape

08:51

the use.com homepage cuz that should

08:54

give us enough information and you'll

08:56

see that we've scraped the page so now

08:58

it's time to embed it and we'll just run

09:00

that embedding and now our llm should be

09:04

smarter so let's ask the same question

09:06

again but this time knowing that it has

09:09

information that could be

09:13

useful and now you can see that we've

09:15

again just been given a response that

09:17

says anything LM is an AI business

09:19

intelligence tool to form humanlike text

09:22

messages based on prompt it offers llm

09:24

support as well as a variety of

09:25

Enterprise models this is definitely

09:28

much more accur it but we also tell you

09:30

where this information came from and you

09:32

can see that it cited the use.com

09:35

website this is what the actual chunks

09:37

that were used uh to formulate this

09:40

response and so now actually we have a

09:42

very coherent machine we can embed and

09:45

modify create different threads we can

09:47

do a whole bunch of stuff from within

09:49

anything llm but the core piece of

09:51

infrastructure the llm itself we have

09:54

running on LM Studio on a machine that

09:57

we own so now we have a fully private

09:59

endtoend kind of system for chatting

10:02

with documents privately using the

10:04

latest and greatest models that are open

10:06

source and available on hugging face so

10:08

hopefully this tutorial for how to

10:09

integrate LM studio and anything llm

10:12

desktop was helpful for you and unlocks

10:15

probably a whole bunch of potential for

10:16

your local llm usage tools like LM

10:19

studio oama and local AI make running a

10:22

local llm no longer a very technical

10:25

task and you can see that with tools

10:27

that provide an interface like LM Studio

10:29

pair that with another more powerful

10:31

tool built for chatting exclusively like

10:34

anything llm on your desktop and now you

10:36

can have this entire experience and not

10:39

have to pay open AI 20 bucks a month and

10:41

again I do want to iterate that the

10:43

model that you use will determine

10:45

ultimately your experience with chatting

10:47

now there are more capable models there

10:49

are more Niche models for programming so

10:52

be careful and know about the model that

10:54

you're choosing or just choose some of

10:55

the ones that are more popular like

10:57

llama 2 or mistol and you'll honestly be

11:00

great hopefully LM Studio Plus anything

11:02

llm desktop just become a core part of

11:05

your local llm stack and we're happy to

11:07

be a part of it and hear your feedback

11:09

we'll put the links in the description

11:11

and have fun