Stop paying for ChatGPT with these two tools | LMStudio x AnythingLLM

Tim Carambat
22 Feb 202411:12

Summary

TLDR本视频由Implex Labs创始人Timothy Carat主讲,介绍了如何轻松地在本地运行功能强大的LLM(语言模型)应用程序。通过使用LM Studio和Anything LLM Desktop两个工具,用户可以在拥有GPU的笔记本电脑或台式机上获得更佳体验。视频中详细演示了如何在Windows操作系统上安装和设置这两个程序,并展示了如何通过LM Studio下载和使用不同的模型,以及如何将其与Anything LLM集成,以实现更全面的LLM体验。此外,还强调了Anything LLM的开源特性,鼓励用户贡献和自定义集成。

Takeaways

  • 😀 Timothy Carat是Implex Labs的创始人,也是Anything LLM的创建者。
  • 🚀 Anything LLM与LM Studio是两个可一键安装的应用程序,用于在本地轻松运行高能力的对话型AI。
  • 🖥️ 支持Windows操作系统,并推荐使用GPU以获得更好体验,尽管只有CPU也可运行。
  • 🔧 Anything LLM是一个全面的、完全私有的桌面聊天应用,可连接至几乎所有内容,并且完全开源。
  • 📦 LM Studio支持通过简单界面下载和管理不同的AI模型,便于用户探索和实验。
  • 🔑 使用LM Studio的内置聊天客户端可以快速测试模型,但功能相对基础。
  • ⚙️ 通过配置LM Studio启用本地服务器,可以将AI模型与Anything LLM集成,以利用更高级的聊天功能。
  • 📈 Anything LLM允许用户添加和嵌入文档或网页内容,以提高AI的上下文理解能力。
  • 🔍 利用Anything LLM的嵌入功能,可以实现更精准的信息检索和回答生成。
  • 💡 该教程展示了如何无需向OpenAI支付费用,就能在本地搭建和使用强大的聊天型AI系统。

Q & A

  • Implex Labs的创始人是谁?

    -Implex Labs的创始人是Timothy Carat。

  • Timothy Carat创建了哪个应用程序?

    -Timothy Carat创建了Anything LLM应用程序。

  • LM Studio支持哪些操作系统?

    -LM Studio支持三种不同的操作系统,但视频中只提到了Windows操作系统。

  • 使用LM Studio和Anything LLM Desktop有什么好处?

    -使用LM Studio和Anything LLM Desktop可以轻松地在本地运行功能强大的LLM应用程序,而且完全免费。

  • Anything LLM是什么?

    -Anything LLM是一个全功能的桌面聊天应用程序,它完全私密,可以连接到几乎所有东西,并且是完全开源的。

  • LM Studio中的模型是如何下载的?

    -在LM Studio中,用户可以通过点击模型并选择相应的操作系统来下载模型。下载模型可能需要一些时间,具体取决于模型的大小和用户的网络速度。

  • LM Studio中的GPU offloading功能有什么作用?

    -GPU offloading功能允许LM Studio尽可能多地使用GPU,从而加快token的处理速度,提供更快的响应时间。

  • 如何在LM Studio中开始使用模型?

    -在LM Studio中,用户可以选择模型并启动一个本地服务器来运行模型。这通常涉及到配置服务器端口,启用日志记录和提示格式化等调试工具,并确保GPU offloading被允许。

  • 如何将LM Studio的推理服务器连接到Anything LLM?

    -要将LM Studio的推理服务器连接到Anything LLM,用户需要复制LM Studio的本地服务器URL,并将其粘贴到Anything LLM的设置中。同时,用户需要输入模型的最大token窗口大小。

  • 在Anything LLM中如何增强模型的理解能力?

    -在Anything LLM中,用户可以通过添加文档或抓取网站内容来增强模型的理解能力。这些信息会被嵌入到模型中,使模型能够提供更准确和有用的回答。

  • 使用LM Studio和Anything LLM的目的是什么?

    -使用LM Studio和Anything LLM的目的是创建一个完全私密的、端到端的系统,用于在本地机器上聊天和处理文档,同时利用开源的最新模型,而无需支付额外费用。

Outlines

00:00

🚀 引言与介绍

Timothy Carat介绍了自己以及他创立的Implex Labs公司和Anything LLM产品。他提出了一个简单方法,可以让用户在本地设备上运行功能强大的、类似ChatGPT的应用程序。他强调了使用GPU设备将提供更好的体验,但也可以在只有CPU的设备上运行。他介绍了两个单点击安装的应用程序:LM Studio和Anything LLM Desktop,并强调了Anything LLM的全面性、隐私性和开源性。

05:02

📱 使用LM Studio与Anything LLM Desktop

Timothy展示了如何在Windows机器上使用LM Studio和Anything LLM。他指导用户如何下载和安装这两个程序,并解释了LM Studio的功能,包括如何下载和使用不同的模型。他还提到了GPU加速的重要性,并展示了如何在LM Studio中设置和使用模型。接着,他介绍了如何通过LM Studio与Anything LLM集成,以及如何通过添加上下文和文档来增强模型的理解能力。

10:03

🌐 集成与应用

在最后一段中,Timothy总结了如何将LM Studio和Anything LLM Desktop集成在一起,以及这种集成如何为用户提供强大的本地LLM使用工具。他强调了使用开源模型和本地AI工具的优势,以及如何避免每月向OpenAI支付费用。他还提到了选择合适的模型对于提升聊天体验的重要性,并鼓励用户尝试流行的模型,如Llama 2或Mistol。最后,他邀请用户提供反馈,并将相关链接放在视频描述中。

Mindmap

Keywords

💡implex labs

Implex Labs是视频中提到的一个公司,由Timothy Carat创立。这个公司与视频主题相关,因为它是开发Anything LLM和LM Studio这样的工具的背后的组织。

💡Anything LLM

Anything LLM是一个全功能的桌面应用程序,它允许用户与各种语言模型进行交互。它是开源的,这意味着用户可以自由地修改和扩展其功能。在视频中,Anything LLM被用来展示如何与本地运行的LLM进行交互。

💡LM Studio

LM Studio是一个支持多种操作系统的工具,它允许用户下载和使用不同的语言模型。它具有一个内置的聊天客户端,可以用来测试和实验不同的模型。

💡GPU

GPU(图形处理单元)是一种专门用于处理图形和复杂计算的硬件。在视频中,GPU被提及为提高语言模型处理速度的关键组件,特别是在使用Q4或Q5模型时。

💡模型下载

在视频中,模型下载指的是从LM Studio或其他资源库下载语言模型到本地计算机的过程。这是设置本地LLM环境的重要步骤,因为模型的大小可能会影响下载时间。

💡隐私

隐私在视频中是一个重要概念,因为Anything LLM和LM Studio提供了一种方式,让用户能够在本地计算机上处理数据,而不需要将信息发送到云端,从而保护了用户的隐私。

💡开源

开源指的是软件的源代码可以被公众访问和修改。在视频中,Anything LLM是完全开源的,这意味着用户可以根据自己的需要对其进行编程和集成。

💡本地运行

本地运行指的是在用户的个人计算机或设备上运行应用程序或服务,而不是在远程服务器上。视频展示了如何在用户的Windows桌面上本地运行LM Studio和Anything LLM。

💡聊天客户端

聊天客户端是LM Studio内置的一个功能,允许用户与下载的语言模型进行交互和测试。它提供了一个简单的界面来发送消息和接收模型的响应。

💡模型兼容性

模型兼容性指的是特定的语言模型是否能够在用户的硬件和软件环境中正常运行。在视频中,LM Studio会告知用户所选模型是否与他们的GPU或系统兼容。

💡GPU offloading

GPU offloading是一种技术,它允许将计算密集型任务从CPU转移到GPU来执行,以提高性能。在视频中,启用GPU offloading可以让用户充分利用GPU的计算能力,从而加快模型的响应速度。

Highlights

介绍Implex Labs创始人Timothy Carat和他的作品Anything LLM。

展示在本地运行功能强大的类似GPT的聊天应用的最简单方法。

推荐使用LM Studio和Anything LM Desktop两个工具,它们都可以一键安装。

LM Studio支持三种不同的操作系统,本次演示使用Windows系统。

Anything LM是一个全私有、可连接到几乎所有东西的聊天应用,并且完全开源。

演示如何在Windows机器上使用LM Studio与Anything LM。

LM Studio的安装和运行过程简单快捷。

LM Studio内置了一个简单的聊天客户端,用于测试模型。

介绍如何在LM Studio中下载和选择模型,以及如何检查模型与GPU的兼容性。

展示如何使用LM Studio的聊天功能和性能指标。

讲解如何配置LM Studio的本地服务器以运行模型。

介绍如何将LM Studio的推理服务器连接到Anything LM。

演示如何通过添加文档和网页信息来增强LM的理解和响应能力。

强调使用开源模型和LM Studio以及Anything LM桌面应用程序的私有性和无需支付额外费用的优势。

鼓励用户尝试LM Studio和Anything LM桌面,作为本地LLM的核心部分。

提供链接以便用户下载和体验LM Studio和Anything LM。

Transcripts

00:00

hey there my name is Timothy carat

00:01

founder of implex labs and creator of

00:03

anything llm and today I actually want

00:05

to show you possibly the easiest way to

00:08

get a very extremely capable locally

00:12

running fully rag like talk to anything

00:16

with any llm application running on

00:19

honestly your laptop a desktop if you

00:22

have something with the GPU this will be

00:24

a way better experience if all you have

00:26

is a CPU this is still possible and

00:28

we're going to use two tools

00:30

both of which are a single-click

00:31

installable application and one of them

00:34

is LM studio and the other is of course

00:37

anything LM desktop right now I'm on LM

00:40

studio. a they have three different

00:42

operating systems they support we're

00:44

going to use the windows one today

00:46

because that's the machine that I have a

00:49

GPU for and I'll show you how to set it

00:50

up how the chat normally works and then

00:52

how to connect it to anything LM to

00:54

really unlock a lot of its capabilities

00:57

if you aren't familiar with anything llm

00:59

anything llm is is an all-in-one chat

01:01

with anything desktop application it's

01:03

fully private it can connect to pretty

01:06

much anything and you get a whole lot

01:08

for actually free anything in LM is also

01:11

fully open source so if you are capable

01:13

of programming or have an integration

01:14

you want to add you can actually do it

01:16

here and we're happy to accept

01:18

contributions so what we're going to do

01:20

now is we're going to switch over to my

01:22

Windows machine and I'm going to show

01:24

you how to use LM studio with anything

01:27

LM and walking through both of the

01:29

products so that you can really get

01:31

honestly like the most comprehensive llm

01:34

experience and pay nothing for it okay

01:36

so here we are on my Windows desktop and

01:39

of course the first thing we're going to

01:40

want to do is Click LM Studio for

01:43

Windows this is version

01:46

0.216 whatever version you might be on

01:48

things may change a little bit but in

01:50

general this tutorial should be accurate

01:52

you're going to want to go to use

01:54

anything.com go to download anything LM

01:56

for desktop and select your appropriate

01:58

operating system once you have these two

02:00

programs installed you are actually 50%

02:04

done with the entire process that's how

02:06

quick this was let me get LM Studio

02:08

installed and running and we'll show you

02:10

what that looks like so you've probably

02:11

installed LM Studio by now you click the

02:13

icon on your desktop and you usually get

02:15

dropped on this screen I don't work for

02:17

LM studio so I'm just going to show you

02:19

kind of some of the capabilities that

02:20

are relevant to this integration and

02:22

really unlocking any llm you use they

02:25

kind of land you on this exploring page

02:27

and this exploring page is great it

02:28

shows you basically some of the more

02:30

popular models that exist uh like

02:32

Google's Gemma just dropped and it's

02:34

already live that's really awesome if

02:36

you go down here into if you click on

02:38

the bottom you'll see I've actually

02:40

already downloaded some models cuz this

02:42

takes time downloading the models will

02:45

probably take you the longest time out

02:46

of this entire operation I went ahead

02:48

and downloaded the mistal 7B instruct

02:51

the Q4 means 4bit quantized model now

02:55

I'm using a Q4 model honestly Q4 is kind

02:59

of the lowest end you should really go

03:00

for Q5 is really really great Q8 if you

03:04

want to um if you actually go and look

03:07

up any model on LM Studio like for

03:10

example let's look up mistol as you can

03:12

see there's a whole bunch of models here

03:14

for mistol there's a whole bunch of

03:15

different types these are all coming

03:17

from the hugging face repository and

03:20

there's a whole bunch of different types

03:21

that you can find here published by

03:23

bunch of different people you can see

03:25

that you know how many times this one

03:27

has been downloaded this is a very

03:29

popular model and once you click on it

03:31

you'll likely get some options now LM

03:33

studio will tell you if the model is

03:35

compatible with your GPU or your system

03:39

this is pretty accurate I've found that

03:41

sometimes it doesn't quite work um one

03:43

thing you'll be interested in is full

03:44

GPU offloading exactly what it sounds

03:47

like using the GPU as much as you can

03:49

you'll get way faster tokens something

03:52

honestly on the speed level of a chat

03:54

GPT if you're working with a small

03:56

enough model or have a big enough

03:58

graphics card I have 12 gigs of vram

04:01

available and you can see there's all

04:02

these Q4 models again you probably want

04:05

to stick with the Q5 models at least uh

04:08

for the best experience versus size as

04:12

you can see the Q8 is quite Hefty 7.7

04:15

gigs which even if you have fast

04:17

internet won't matter because it takes

04:19

forever to download something from

04:21

hugging face if you want to get working

04:23

on this in the day you might want to

04:24

start to download now for the sake of

04:26

this video I've already downloaded a

04:28

model so now that we have a model

04:30

downloaded we're going to want to try to

04:32

chat with it LM Studio actually comes

04:34

with a chat client inside of it it's

04:37

very very simplistic though and it's

04:39

really just for experimenting with

04:41

models we're going to want to go to this

04:43

chat bubble icon and you can see that we

04:45

have a thread already started and I'm

04:47

going to want to pick the one model that

04:49

I have available and you'll see this

04:51

loading bar continue There are some

04:53

system prompts that you can preset for

04:56

the model I have GPU offloading enabled

04:59

and I've set it to Max already and as

05:02

you can see I have Nvidia Cuda already

05:04

going there are some tools there are

05:06

some other things that you can mess with

05:08

but in general that's really all you

05:10

need to do so let's test the chat and

05:12

let's just say hello how are you and you

05:15

get the pretty standard response from

05:17

any AI model and you even get some

05:19

really cool metrics down here like time

05:21

to First token was 1.21 seconds I mean

05:24

really really kind of cool showing the

05:26

GPU layers that are there however you

05:29

really can't get much out of this right

05:32

here if you wanted to add a document

05:34

you'd have to copy paste it into the

05:36

entire user prompt there's really just a

05:38

lot more that can be done here to

05:40

Leverage The Power of this local llm

05:42

that I have running even though it's a

05:45

quite small one so to really kind of

05:47

Express how powerful these models can be

05:50

for your own local use we're going to

05:52

use anything llm now I've already

05:54

downloaded anything llm let me show you

05:56

how to get that running and how to get

05:57

to LM Studio to work work with anything

06:00

llm just booted up anything llm after

06:03

installing it and you'll usually land on

06:05

a screen like this let's get started we

06:07

already know who we're looking for here

06:09

LM studio and you'll see it asks for two

06:11

pieces of information a token context

06:14

window which is a property of your model

06:16

that you'd already be familiar with and

06:18

then the LM Studio base URL if we open

06:21

up LM studio and go to this local server

06:24

tab on the side this is a really really

06:27

cool part of LM Studio this doesn't work

06:29

with multimodel support So once you have

06:32

a model selected that's the model that

06:34

you are going to be using so here we're

06:36

going to select the exact same model but

06:39

we're going to start a server to run

06:42

completions against this model so the

06:44

way that we do that is we can configure

06:46

the server Port usually it's 1 2 3 4 but

06:49

you can change it to whatever you want

06:51

you probably want to turn off cores

06:53

allow request queuing so you can keep

06:55

sending requests over and over and they

06:57

don't just fail you want to enable log

06:59

buing and prompt formatting these are

07:01

all just kind of debugging tools on the

07:03

right side you are going to still want

07:05

to make sure that you have GPU

07:06

offloading allowed if that is

07:08

appropriate but other than that you just

07:10

click Start server and you'll see that

07:12

we get some logs saved here now to

07:14

connect the LM Studio inference server

07:17

to anything llm you just want to copy

07:20

this string right here up to the V1 part

07:23

and then you're going to want to open

07:24

anything ilm paste that into here I know

07:28

that my models Max to token window is

07:31

496 I'll click next embedding preference

07:34

we don't really even need one we can

07:36

just use the anything LM built in EMB

07:38

better which is free and private same

07:40

for the vector database all of this is

07:42

going to be running on machines that I

07:45

own and then of course we can skip the

07:47

survey and let's make a our first

07:49

workspace and we'll just call it

07:51

anything llm we don't have any documents

07:54

or anything like that so if we were to

07:55

send a chat asking the model about

07:57

anything llm will'll either get get a

07:59

refusal response or it will just make

08:02

something up so let's ask what is

08:05

anything llm and if you go to LM Studio

08:07

during any part you can actually see

08:10

that we sent the requests to the model

08:12

and it is now streaming the response

08:15

first token has been generated

08:17

continuing to stream when anything llm

08:19

does receive that first token stream

08:22

this is when we will uh start to show it

08:25

on our side and you can see that we get

08:27

a response it just kind of pops up

08:28

instantly uh which was very quick but it

08:31

is totally wrong and it is wrong because

08:33

we actually don't have any context to

08:37

give the model on what anything llm

08:39

actually is now we can augment the lm's

08:43

ability to know about our private

08:45

documents by clicking and adding them

08:48

here or I can just go and scrape a

08:50

website so I'm going to go and scrape

08:51

the use.com homepage cuz that should

08:54

give us enough information and you'll

08:56

see that we've scraped the page so now

08:58

it's time to embed it and we'll just run

09:00

that embedding and now our llm should be

09:04

smarter so let's ask the same question

09:06

again but this time knowing that it has

09:09

information that could be

09:13

useful and now you can see that we've

09:15

again just been given a response that

09:17

says anything LM is an AI business

09:19

intelligence tool to form humanlike text

09:22

messages based on prompt it offers llm

09:24

support as well as a variety of

09:25

Enterprise models this is definitely

09:28

much more accur it but we also tell you

09:30

where this information came from and you

09:32

can see that it cited the use.com

09:35

website this is what the actual chunks

09:37

that were used uh to formulate this

09:40

response and so now actually we have a

09:42

very coherent machine we can embed and

09:45

modify create different threads we can

09:47

do a whole bunch of stuff from within

09:49

anything llm but the core piece of

09:51

infrastructure the llm itself we have

09:54

running on LM Studio on a machine that

09:57

we own so now we have a fully private

09:59

endtoend kind of system for chatting

10:02

with documents privately using the

10:04

latest and greatest models that are open

10:06

source and available on hugging face so

10:08

hopefully this tutorial for how to

10:09

integrate LM studio and anything llm

10:12

desktop was helpful for you and unlocks

10:15

probably a whole bunch of potential for

10:16

your local llm usage tools like LM

10:19

studio oama and local AI make running a

10:22

local llm no longer a very technical

10:25

task and you can see that with tools

10:27

that provide an interface like LM Studio

10:29

pair that with another more powerful

10:31

tool built for chatting exclusively like

10:34

anything llm on your desktop and now you

10:36

can have this entire experience and not

10:39

have to pay open AI 20 bucks a month and

10:41

again I do want to iterate that the

10:43

model that you use will determine

10:45

ultimately your experience with chatting

10:47

now there are more capable models there

10:49

are more Niche models for programming so

10:52

be careful and know about the model that

10:54

you're choosing or just choose some of

10:55

the ones that are more popular like

10:57

llama 2 or mistol and you'll honestly be

11:00

great hopefully LM Studio Plus anything

11:02

llm desktop just become a core part of

11:05

your local llm stack and we're happy to

11:07

be a part of it and hear your feedback

11:09

we'll put the links in the description

11:11

and have fun