Stop paying for ChatGPT with these two tools | LMStudio x AnythingLLM
Summary
TLDRIn this tutorial, Timothy Carat introduces viewers to a simple method for running a powerful, locally-hosted AI chat application using LM Studio and Anything LLM. He demonstrates how to install both tools, explore popular models on LM Studio, and set up a local server for Anything LLM. By integrating these platforms, users can leverage the latest open-source models on Hugging Face for a comprehensive, private AI chat experience without monthly fees.
Takeaways
- 🚀 Timothy Carat introduces LM Studio and Anything LLM as tools for running a capable, locally-hosted conversational AI on your personal computer.
- 💻 Both LM Studio and Anything LLM are single-click, installable applications that are compatible with various operating systems, including Windows.
- 🌐 LM Studio supports multiple operating systems and is particularly beneficial when used with a GPU for enhanced performance.
- 🔗 Anything LLM is an all-in-one chat application that is fully private, open-source, and can connect to a wide range of services.
- 🆓 The integration of LM Studio and Anything LLM provides a comprehensive, cost-free LLM experience.
- 📱 LM Studio comes with a built-in chat client for experimenting with models, but for more capabilities, Anything LLM is recommended.
- 🔍 Users can download various models from the Hugging Face repository through LM Studio, with options for different sizes and compatibility with the user's system.
- 🚦 LM Studio allows for server configuration to run completions against the chosen model, including setting up a local server for model interaction.
- 🗂️ Anything LLM can be augmented with private documents or web scraping to provide the model with additional context for more accurate responses.
- 🔄 The combination of LM Studio and Anything LLM creates a fully private, end-to-end system for chatting and document interaction without reliance on subscription services like OpenAI.
- 📈 The choice of model in LM Studio and Anything LLM significantly impacts the user experience, with more capable and niche models available for specific tasks.
Q & A
Who is the speaker in the transcript and what is his role?
-The speaker is Timothy Carat, the founder of Implex Labs and creator of Anything LLM.
What are the two tools mentioned in the transcript for running a local LLM application?
-The two tools mentioned are LM Studio and Anything LLM Desktop.
What is the significance of having a GPU for running these applications?
-Having a GPU enhances the experience by allowing for faster token processing and the ability to use more powerful models through full GPU offloading.
Is Anything LLM open-source? What are the implications of this?
-Yes, Anything LLM is open-source, which means that users with programming skills can contribute to its development and add their own integrations.
How does LM Studio help in downloading and exploring different models?
-LM Studio provides a user-friendly interface for downloading models from the Hugging Face repository, exploring popular models, and checking their compatibility with the user's system.
What is the process for setting up Anything LLM Desktop?
-After installing Anything LLM Desktop, users need to start the application, which typically lands them on a screen where they can begin interacting with the AI.
How does LM Studio integrate with Anything LLM?
-LM Studio can be set up to run a local server that interacts with the Anything LLM model, allowing users to chat with the model and leverage its capabilities privately on their own machine.
What is the importance of embedding information in Anything LLM?
-Embedding information enhances the model's understanding and provides it with context, leading to more accurate and relevant responses when interacting with users.
How does the speaker demonstrate the capabilities of the integrated system?
-The speaker demonstrates the system by scraping a website, embedding the information into Anything LLM, and then asking questions to show how the model can provide more accurate responses with the added context.
What are the cost implications of using LM Studio and Anything LLM Desktop?
-Using LM Studio and Anything LLM Desktop does not require a monthly subscription fee to OpenAI, making it a cost-effective solution for running local LLM applications.
What is the speaker's final recommendation for users interested in local LLM applications?
-The speaker recommends integrating LM Studio and Anything LLM Desktop as a core part of their local LLM stack for a fully private, end-to-end chatting and document handling system without the need for external subscription services.
Outlines
🚀 Introduction to Locally Running LLMs with Timothy Carat
Timothy Carat, founder of Implex Labs, introduces viewers to a simple method for running powerful AI language models (LLMs) locally on their computers, utilizing GPU or CPU. He mentions two single-click installable applications: LM Studio and Anything LLM Desktop. The tutorial focuses on setting up Windows, but the process is similar for other operating systems. Timothy emphasizes the privacy and open-source nature of Anything LLM, highlighting its capabilities and potential for integration through contributions.
📱 Exploring LM Studio and Downloading Models
The tutorial delves into the functionalities of LM Studio, including exploring popular models like Google's Gemma and downloading them for use. It explains the process of selecting compatible models based on system specifications and downloading them, which could be time-consuming. Timothy demonstrates how to use the built-in chat client in LM Studio to interact with models and the importance of selecting the right model for optimal performance.
🤖 Integrating Anything LLM with LM Studio for Enhanced Capabilities
Timothy shows how to integrate Anything LLM with LM Studio to enhance the capabilities of the local LLM setup. He guides through the process of setting up Anything LLM, connecting it to the LM Studio inference server, and using it to chat with the model. The tutorial highlights the ability to augment the model's knowledge with private documents or web scraping, resulting in more accurate and contextually relevant responses. The integration allows for a fully private, end-to-end system for chatting and document interaction without the need for external subscription fees.
🌐 Conclusion: Empowering Local LLM Usage with LM Studio and Anything LLM
The conclusion emphasizes the ease and potential of using LM Studio and Anything LLM together for local LLM applications. It encourages viewers to explore the integration and take advantage of the open-source models available on Hugging Face. Timothy invites feedback and assures that the right model choice can significantly enhance the user experience, suggesting popular and niche models for various applications.
Mindmap
Keywords
💡Implex Labs
💡Anything LLM
💡LM Studio
💡Hugging Face Repository
💡GPU Offloading
💡Quantization
💡Tokenization
💡Context Window
💡Embedding
💡Local AI
💡Open Source
Highlights
Timothy Carat introduces himself as the founder of Implex Labs and creator of Anything LLM.
The presentation aims to show the easiest way to run a capable, locally-executing, fully private AI chat application like Anything LLM on a personal computer.
Two single-click installable applications are used for this process: LM Studio and Anything LLM Desktop.
LM Studio supports multiple operating systems, with the demonstration focusing on the Windows version due to GPU availability.
Anything LLM is an all-in-one chat application that is fully private and can connect to various platforms, offering a lot of features for free.
Anything LLM is fully open-source, allowing for community contributions and integrations.
The tutorial begins with installing LM Studio and Anything LLM on a Windows machine.
LM Studio's interface includes an exploring page that showcases popular models like Google's Gemma.
Downloading models from the Hugging Face repository can take a significant amount of time, depending on the model size and internet speed.
LM Studio provides options for GPU offloading, allowing for faster token processing with compatible graphics cards.
LM Studio includes a simple chat client for experimenting with models, but it has limited functionality.
Anything LLM is downloaded and set up to work with LM Studio, providing a more powerful and feature-rich chat experience.
LM Studio's local server tab is used to configure and run a server for model completions, which is essential for multi-model support.
Anything LLM's workspace is created, and the model's context window and token limit are set up for optimal interaction.
LM Studio allows for scraping websites to provide additional context to the AI model, enhancing its understanding and responses.
The integration of LM Studio and Anything LLM creates a fully private, end-to-end system for chatting and document interaction without the need for external subscription services.
The choice of model used in the setup will significantly impact the user experience, with more capable and niche models available for specific tasks.
Transcripts
hey there my name is Timothy carat
founder of implex labs and creator of
anything llm and today I actually want
to show you possibly the easiest way to
get a very extremely capable locally
running fully rag like talk to anything
with any llm application running on
honestly your laptop a desktop if you
have something with the GPU this will be
a way better experience if all you have
is a CPU this is still possible and
we're going to use two tools
both of which are a single-click
installable application and one of them
is LM studio and the other is of course
anything LM desktop right now I'm on LM
studio. a they have three different
operating systems they support we're
going to use the windows one today
because that's the machine that I have a
GPU for and I'll show you how to set it
up how the chat normally works and then
how to connect it to anything LM to
really unlock a lot of its capabilities
if you aren't familiar with anything llm
anything llm is is an all-in-one chat
with anything desktop application it's
fully private it can connect to pretty
much anything and you get a whole lot
for actually free anything in LM is also
fully open source so if you are capable
of programming or have an integration
you want to add you can actually do it
here and we're happy to accept
contributions so what we're going to do
now is we're going to switch over to my
Windows machine and I'm going to show
you how to use LM studio with anything
LM and walking through both of the
products so that you can really get
honestly like the most comprehensive llm
experience and pay nothing for it okay
so here we are on my Windows desktop and
of course the first thing we're going to
want to do is Click LM Studio for
Windows this is version
0.216 whatever version you might be on
things may change a little bit but in
general this tutorial should be accurate
you're going to want to go to use
anything.com go to download anything LM
for desktop and select your appropriate
operating system once you have these two
programs installed you are actually 50%
done with the entire process that's how
quick this was let me get LM Studio
installed and running and we'll show you
what that looks like so you've probably
installed LM Studio by now you click the
icon on your desktop and you usually get
dropped on this screen I don't work for
LM studio so I'm just going to show you
kind of some of the capabilities that
are relevant to this integration and
really unlocking any llm you use they
kind of land you on this exploring page
and this exploring page is great it
shows you basically some of the more
popular models that exist uh like
Google's Gemma just dropped and it's
already live that's really awesome if
you go down here into if you click on
the bottom you'll see I've actually
already downloaded some models cuz this
takes time downloading the models will
probably take you the longest time out
of this entire operation I went ahead
and downloaded the mistal 7B instruct
the Q4 means 4bit quantized model now
I'm using a Q4 model honestly Q4 is kind
of the lowest end you should really go
for Q5 is really really great Q8 if you
want to um if you actually go and look
up any model on LM Studio like for
example let's look up mistol as you can
see there's a whole bunch of models here
for mistol there's a whole bunch of
different types these are all coming
from the hugging face repository and
there's a whole bunch of different types
that you can find here published by
bunch of different people you can see
that you know how many times this one
has been downloaded this is a very
popular model and once you click on it
you'll likely get some options now LM
studio will tell you if the model is
compatible with your GPU or your system
this is pretty accurate I've found that
sometimes it doesn't quite work um one
thing you'll be interested in is full
GPU offloading exactly what it sounds
like using the GPU as much as you can
you'll get way faster tokens something
honestly on the speed level of a chat
GPT if you're working with a small
enough model or have a big enough
graphics card I have 12 gigs of vram
available and you can see there's all
these Q4 models again you probably want
to stick with the Q5 models at least uh
for the best experience versus size as
you can see the Q8 is quite Hefty 7.7
gigs which even if you have fast
internet won't matter because it takes
forever to download something from
hugging face if you want to get working
on this in the day you might want to
start to download now for the sake of
this video I've already downloaded a
model so now that we have a model
downloaded we're going to want to try to
chat with it LM Studio actually comes
with a chat client inside of it it's
very very simplistic though and it's
really just for experimenting with
models we're going to want to go to this
chat bubble icon and you can see that we
have a thread already started and I'm
going to want to pick the one model that
I have available and you'll see this
loading bar continue There are some
system prompts that you can preset for
the model I have GPU offloading enabled
and I've set it to Max already and as
you can see I have Nvidia Cuda already
going there are some tools there are
some other things that you can mess with
but in general that's really all you
need to do so let's test the chat and
let's just say hello how are you and you
get the pretty standard response from
any AI model and you even get some
really cool metrics down here like time
to First token was 1.21 seconds I mean
really really kind of cool showing the
GPU layers that are there however you
really can't get much out of this right
here if you wanted to add a document
you'd have to copy paste it into the
entire user prompt there's really just a
lot more that can be done here to
Leverage The Power of this local llm
that I have running even though it's a
quite small one so to really kind of
Express how powerful these models can be
for your own local use we're going to
use anything llm now I've already
downloaded anything llm let me show you
how to get that running and how to get
to LM Studio to work work with anything
llm just booted up anything llm after
installing it and you'll usually land on
a screen like this let's get started we
already know who we're looking for here
LM studio and you'll see it asks for two
pieces of information a token context
window which is a property of your model
that you'd already be familiar with and
then the LM Studio base URL if we open
up LM studio and go to this local server
tab on the side this is a really really
cool part of LM Studio this doesn't work
with multimodel support So once you have
a model selected that's the model that
you are going to be using so here we're
going to select the exact same model but
we're going to start a server to run
completions against this model so the
way that we do that is we can configure
the server Port usually it's 1 2 3 4 but
you can change it to whatever you want
you probably want to turn off cores
allow request queuing so you can keep
sending requests over and over and they
don't just fail you want to enable log
buing and prompt formatting these are
all just kind of debugging tools on the
right side you are going to still want
to make sure that you have GPU
offloading allowed if that is
appropriate but other than that you just
click Start server and you'll see that
we get some logs saved here now to
connect the LM Studio inference server
to anything llm you just want to copy
this string right here up to the V1 part
and then you're going to want to open
anything ilm paste that into here I know
that my models Max to token window is
496 I'll click next embedding preference
we don't really even need one we can
just use the anything LM built in EMB
better which is free and private same
for the vector database all of this is
going to be running on machines that I
own and then of course we can skip the
survey and let's make a our first
workspace and we'll just call it
anything llm we don't have any documents
or anything like that so if we were to
send a chat asking the model about
anything llm will'll either get get a
refusal response or it will just make
something up so let's ask what is
anything llm and if you go to LM Studio
during any part you can actually see
that we sent the requests to the model
and it is now streaming the response
first token has been generated
continuing to stream when anything llm
does receive that first token stream
this is when we will uh start to show it
on our side and you can see that we get
a response it just kind of pops up
instantly uh which was very quick but it
is totally wrong and it is wrong because
we actually don't have any context to
give the model on what anything llm
actually is now we can augment the lm's
ability to know about our private
documents by clicking and adding them
here or I can just go and scrape a
website so I'm going to go and scrape
the use.com homepage cuz that should
give us enough information and you'll
see that we've scraped the page so now
it's time to embed it and we'll just run
that embedding and now our llm should be
smarter so let's ask the same question
again but this time knowing that it has
information that could be
useful and now you can see that we've
again just been given a response that
says anything LM is an AI business
intelligence tool to form humanlike text
messages based on prompt it offers llm
support as well as a variety of
Enterprise models this is definitely
much more accur it but we also tell you
where this information came from and you
can see that it cited the use.com
website this is what the actual chunks
that were used uh to formulate this
response and so now actually we have a
very coherent machine we can embed and
modify create different threads we can
do a whole bunch of stuff from within
anything llm but the core piece of
infrastructure the llm itself we have
running on LM Studio on a machine that
we own so now we have a fully private
endtoend kind of system for chatting
with documents privately using the
latest and greatest models that are open
source and available on hugging face so
hopefully this tutorial for how to
integrate LM studio and anything llm
desktop was helpful for you and unlocks
probably a whole bunch of potential for
your local llm usage tools like LM
studio oama and local AI make running a
local llm no longer a very technical
task and you can see that with tools
that provide an interface like LM Studio
pair that with another more powerful
tool built for chatting exclusively like
anything llm on your desktop and now you
can have this entire experience and not
have to pay open AI 20 bucks a month and
again I do want to iterate that the
model that you use will determine
ultimately your experience with chatting
now there are more capable models there
are more Niche models for programming so
be careful and know about the model that
you're choosing or just choose some of
the ones that are more popular like
llama 2 or mistol and you'll honestly be
great hopefully LM Studio Plus anything
llm desktop just become a core part of
your local llm stack and we're happy to
be a part of it and hear your feedback
we'll put the links in the description
and have fun
5.0 / 5 (0 votes)
Run your own AI (but private)
Ollama Embedding: How to Feed Data to AI for Better Response?
ChatGPT Can Now Talk Like a Human [Latest Updates]
Merge Models Locally While Fine-Tuning on Custom Data Locally - LM Cocktail
AI Portfolio Project | I built a MACHINE LEARNING MODEL using AI in 10 MINUTES
AI Agents: The next generation of AI-powered bots | Zendesk