Hugging Face GGUF Models locally with Ollama
Summary
TLDRفي هذا الفيديو، سيتعلم المستخدمين كيفية تشغيل نماذج Hugging Face بتنسيق GGF على أجهزتهم باستخدام الأداة llama. يبدأ المقدم ببحث على موقع Hugging Face لإيجاد نموذج يحتوي على ٧ مليار معامل، ثم يختار نموذجًا يدعى mistal light. بعد ذلك، يشرح كيفية تحميل النموذج باستخدام CLI لـ Hugging Face، ويحدد الدليل المحلي لتنزيل النموذج. يستخدم النموذج الأخير من llama لإنشاء ملف نماذج يشير إلى الموقع المطلوب للنموذج. يُظهر المثال كيفية تشغيل النموذج باستخدام الأمر AL run. يُشير إلى وجود العديد من النماذج المدمجة في llama، مثل mistal AI، ويُشير المقدم إلى محتوى الفيديو السابق للاطلاع على المزيد من المعلومات.
Takeaways
- 📚 ggf是一种用于存储大型语言模型的文件格式。
- 🔍 在hugging face网站上搜索ggf格式的模型,可以找到超过1000个模型。
- 📈 选择7亿参数的模型通常更适合在机器上运行。
- 🔄 根据需要的质量和运行速度,可以选择不同量化级别的模型。
- 📝 通过hugging face Hub CLI可以下载所需的模型。
- 📂 下载时指定本地目录以避免占用过多空间。
- 🚀 使用Llama工具可以在本地机器上运行大型语言模型。
- 📝 创建一个模型文件,类似于Dockerfile但用于大型语言模型。
- 📁 模型文件中指定ggf文件的位置,以便Llama可以找到并使用它。
- ⏱️ 使用Llama创建命令来创建模型,过程只需几秒钟。
- 💻 使用Al run命令运行模型,可以查看模型的输出。
- 📊 使用如ay top这样的工具可以监控模型运行时的系统资源使用情况。
- 🌐 Llama还内置了多个模型,如mystal AI,可以进一步探索。
Q & A
GGF هي تنسيق ملفات يستخدم لتخزين ماذا؟
-GGF هي تنسيق ملفات يستخدم لتخزين نماذج لغوية كبيرة.
من أين يمكننا العثور على نماذج GGF للتحميل؟
-يمكننا العثور على نماذج GGF للتحميل من موقع Hugging Face.
ما هي القيمة الأساسية التي ينصح بالبحث عنها عند اختيار نموذج GGF؟
-ينصح بالبحث عن نموذج يمتلك التوازن المتوسط بين الجودة والحجم والسرعة.
ماذا يتيح لنا استخدام Hugging Face Hub CLI؟
-يتيح لنا استخدام Hugging Face Hub CLI للتحميل المباشر لنماذج الـ GGF.
كيف يمكننا تحديد المكان الذي سنقوم بتنزيل النموذج إلىه؟
-يمكننا تحديد المكان الذي سنقوم بتنزيل النموذج إليه من خلال الأمر `local directory downloads`.
ماذا تعني الرسالة التي تظهر عند التنزيل تشير إلى استعمال HF transfer لتنزيلات أسرع؟
-هذه الرسالة تشير إلى وجود خيار آخر يدعى HF transfer لتحسين سرعة التنزيل، ولكن الأسلوب الحالي يستخدم من خلال Hugging Face CLI يعتبر ملائمًا.
ما هو الأداة التي يمكن استخدامها لتشغيل نماذج LLM على الماكينة الخاصة بنا؟
-الأداة التي يمكن استخدامها لتشغيل نماذج LLM على الماكينة الخاصة بنا تسمى Alama.
ماذا يجب أن نقوم بإنشاء قبل تشغيل النموذج على الماكينة؟
-يجب أن نقوم بإنشاء ملف نموذج يشبه Docker file ولكنه مخصص للنماذج الآلية.
كيف يمكننا تشغيل النموذج بعد إنشائه؟
-يمكننا تشغيل النموذج باستخدام الأمر `Alam a run model_name`.
ما هي الأداة التي يستخدمها الشخص لمراقبة الوظائف الأساسية لجهازه أثناء تشغيل النماذج؟
-الأداة التي يستخدمها الشخص لمراقبة الوظائف الأساسية لجهازه هي ay top.
ماذا تعني النسبة الأعلى في GPU وزيادة استخدام الرام بحوالي ثلاثة гиغا بايت؟
-هذا يعني أن النموذج يستخدم GPU بشدة خلال التشغيل، مما يؤدي إلى زيادة استخدام الرام.
ما هي الميزة الرئيسية لاستخدام نماذج GGF مع الـ LLMs؟
-الميزة الرئيسية هي القدرة على تشغيل أيًا من آلاف نماذج GGF محليًا.
Outlines
📚 Introduction to GGUF and Hugging Face Models
The video begins with an introduction to the GGUF file format, which is used for storing large language models. The host guides viewers to the Hugging Face website to search for GGUF models, suggesting to filter the search by '7B' for 7 billion parameter models. They recommend sorting by recently updated and selecting a model with a balance of quality and performance, such as 'mystal light'. The process of downloading the model using the Hugging Face Hub CLI is demonstrated, including specifying the repository and file name, and choosing a local directory for the download.
Mindmap
Keywords
💡GGF file format
💡Hugging Face
💡7 billion parameter models
💡Quantization
💡Hugging Face Hub CLI
💡Local directory
💡Llama
💡Model file
💡Grafana
💡Apple silicon
💡Mystal AI
Highlights
ggf is a file format used for storing large language models.
The video demonstrates how to run hugging face models in GGF format on a machine using ol Lama.
Over 1,000 GGF models are available on the hugging face website, with options to filter by parameter size like 7B for 7 billion parameters.
Models are sorted by recently updated for the latest versions.
The 'mistal light' model by the bloke is selected for its balance between quality and efficiency.
Different versions of the model are available, with trade-offs between size, speed, and quality.
The Q4 korm version of 'mistal light' is recommended for medium quality.
Downloading models can be done using the hugging face Hub CLI.
The hugging face Hub dependency is included in the poetry Pi Project file.
Downloading a specific model file avoids using excessive space for all files.
The HF transfer service is available for faster downloads, but the CLI method is considered sufficient.
The 'mistal light' ggf file is 4.1 GB and can be found in the downloads folder.
ol Lama is a tool for running large language models on Mac and Linux, with upcoming Windows support.
A model file, similar to a Docker file, is created to specify the location of the ggf file.
The 'alamak create' command is used to create a model file named 'mral light'.
Running the model is done using the 'alamak run' command followed by the model name.
The tool 'ay top' is used to monitor the machine's GPU and RAM usage during model execution.
ol Lama also comes with built-in models like 'mistal AI'.
The video concludes by showcasing the ability to run any of the thousand GGF models locally.
Transcripts
ggf is a file format used for storing
large language models and in this video
we're going to learn how to run hugging
face models in GG UF format on our
machine using ol Lama so let's head over
to the hugging face website and we're
going to just click in the search bar at
the top and we're going to type in
ggf and you can see it comes back with
models which mentioned this and there
are more than 1,000 of those but I find
generally for running models on machine
it's better to get those 7 billion
parameter ones so let's add in a 7B at
the end and you can see now we're down
to just under 400 models which are still
quite quite a few to choose from uh on
the search page let's sort it say by
recently updated so we'll get the latest
ones uh and we'll scroll down and let's
find one that we can use so the bloke
has been creating loads of these so
let's let's pick one that that he's been
working on so we'll pick mistal light
and if we click on that we get this this
page and it has lots of information
about the model and if we sort of go all
the way down we can see that there's
actually lots of different versions of
this model that have been created by the
bloke the ones at the top uh tend to
have been quantized much more so they
they will be smaller and run quicker but
generally with less U quality and as you
go down the quality goes up but the size
and the time that it takes to run go up
as well so it seems to be like there's
one in the middle Q4 korm that seems to
be a good one so it says it's got medium
Balan quality and is recommended so
let's let's pick that one so if we click
on that we get taken through to a page
describing that particular model now we
can download these models using the
hugging face Hub CLI so if we come over
to the terminal we're going to have a
look at my uh poetry Pi Project file and
you can see there in the middle I've got
the hugging face Hub dependency so let's
start writing this we're going to say
poetry run hugging face CLI download and
then you need to put in the name of the
repository so we'll just come back and
we're going to copy the repository from
that page and then we'll go back and
paste it in and then the next thing we
need is the name of the file itself so
let's go back to the page and we'll get
that as well and we'll paste that in now
if you don't pass in the file it will
try to download all the files which will
take up a lot of space so you probably
want to make sure you just have one file
once we've done that we're going to tell
it where to download it so we'll say
local directory downloads and then we'll
say don't use any Sim links and then
we're going to we're going to run it now
it comes up with this message telling
you that you should use HF transfer for
faster downloads but I have actually had
a look at that and I I find this method
that we're using is actually perfectly
fine and I so I I I wouldn't necessarily
recommend against using this hugging
face CLI let's speed this up a bit
because we've all got things to do and
you can see it takes about a minute to
download it I have got this connected on
ethernet and then afterwards we can have
a look at our downloads folder and you
can see in there we've got our mystal
light ggf file and it's 4.1 GB now we're
going to learn how to run this model on
our machine using a tool called the Lor
now Alama is a tool that lets you run
llms on your own machine at the moment
it works on Mac and Linux but Windows
support is coming soon and we're going
to have to create something called a
model file so I I kind of think of this
as like a Docker file but for llms and
we're in there we can say from and then
we need to put the location of the ggf
file so it's do/ downloads and then Mr
light at ggf uh we'll close that now and
what we can do is we can call the alarm
create command we'll say what what name
do you want to give your models we'll
say mral light and then point it at the
location of the file and it will say
it's going to created it it will only
take a few seconds and and then we've
got our file which we can see if we type
in a ll list as you can s see in the
middle there two seconds ago we've got
our mistal light latest model we can
then run this model using the AL run
command so we'll say Alama run mistal
light I know what is grafana and you can
see it sort of gives us like a big
explanation of the the grafana
visualization tool now I want to
conclude this by showing you a tool that
let you see what's going on when you're
running these models so the one that I'm
using for Apple silicon is called ay top
but if you look on that that page and
I'll include the link below there are
other ones for for different operating
systems and so we're going to just split
the terminal into into two and I'm going
to do Pudo asy top and then what we're
going to do is Rerun the previous
command where it was asking what is
grafana as you can see at the top
there's my machine's spec and while this
is running the GPU is at 99% and my RAM
usage is increasing by about three
gigabytes or so and then it goes back
down once it's finished and so being
able to use ggf models is quite a neat
feature of Al LL and it means you can
take any of those other thousand ggf
models and run them locally but Al also
comes with many built-in models one of
those is mystal AI and if you want to
learn more about that check out this
video up here and I'll see you in the
next one
5.0 / 5 (0 votes)
Merge Models Locally While Fine-Tuning on Custom Data Locally - LM Cocktail
How to Select an AI Model for Specific Domain or Task
37% Better Output with 15 Lines of Code - Llama 3 8B (Ollama) & 70B (Groq)
Install Yi-1.5 Model Locally - Beats Llama 3 in Various Benchmarks
Stop paying for ChatGPT with these two tools | LMStudio x AnythingLLM
Install LibreChat Locally