Hugging Face GGUF Models locally with Ollama

Learn Data with Mark
20 Oct 202304:55

Summary

TLDRفي هذا الفيديو، سيتعلم المستخدمين كيفية تشغيل نماذج Hugging Face بتنسيق GGF على أجهزتهم باستخدام الأداة llama. يبدأ المقدم ببحث على موقع Hugging Face لإيجاد نموذج يحتوي على ٧ مليار معامل، ثم يختار نموذجًا يدعى mistal light. بعد ذلك، يشرح كيفية تحميل النموذج باستخدام CLI لـ Hugging Face، ويحدد الدليل المحلي لتنزيل النموذج. يستخدم النموذج الأخير من llama لإنشاء ملف نماذج يشير إلى الموقع المطلوب للنموذج. يُظهر المثال كيفية تشغيل النموذج باستخدام الأمر AL run. يُشير إلى وجود العديد من النماذج المدمجة في llama، مثل mistal AI، ويُشير المقدم إلى محتوى الفيديو السابق للاطلاع على المزيد من المعلومات.

Takeaways

  • 📚 ggf是一种用于存储大型语言模型的文件格式。
  • 🔍 在hugging face网站上搜索ggf格式的模型,可以找到超过1000个模型。
  • 📈 选择7亿参数的模型通常更适合在机器上运行。
  • 🔄 根据需要的质量和运行速度,可以选择不同量化级别的模型。
  • 📝 通过hugging face Hub CLI可以下载所需的模型。
  • 📂 下载时指定本地目录以避免占用过多空间。
  • 🚀 使用Llama工具可以在本地机器上运行大型语言模型。
  • 📝 创建一个模型文件,类似于Dockerfile但用于大型语言模型。
  • 📁 模型文件中指定ggf文件的位置,以便Llama可以找到并使用它。
  • ⏱️ 使用Llama创建命令来创建模型,过程只需几秒钟。
  • 💻 使用Al run命令运行模型,可以查看模型的输出。
  • 📊 使用如ay top这样的工具可以监控模型运行时的系统资源使用情况。
  • 🌐 Llama还内置了多个模型,如mystal AI,可以进一步探索。

Q & A

  • GGF هي تنسيق ملفات يستخدم لتخزين ماذا؟

    -GGF هي تنسيق ملفات يستخدم لتخزين نماذج لغوية كبيرة.

  • من أين يمكننا العثور على نماذج GGF للتحميل؟

    -يمكننا العثور على نماذج GGF للتحميل من موقع Hugging Face.

  • ما هي القيمة الأساسية التي ينصح بالبحث عنها عند اختيار نموذج GGF؟

    -ينصح بالبحث عن نموذج يمتلك التوازن المتوسط بين الجودة والحجم والسرعة.

  • ماذا يتيح لنا استخدام Hugging Face Hub CLI؟

    -يتيح لنا استخدام Hugging Face Hub CLI للتحميل المباشر لنماذج الـ GGF.

  • كيف يمكننا تحديد المكان الذي سنقوم بتنزيل النموذج إلىه؟

    -يمكننا تحديد المكان الذي سنقوم بتنزيل النموذج إليه من خلال الأمر `local directory downloads`.

  • ماذا تعني الرسالة التي تظهر عند التنزيل تشير إلى استعمال HF transfer لتنزيلات أسرع؟

    -هذه الرسالة تشير إلى وجود خيار آخر يدعى HF transfer لتحسين سرعة التنزيل، ولكن الأسلوب الحالي يستخدم من خلال Hugging Face CLI يعتبر ملائمًا.

  • ما هو الأداة التي يمكن استخدامها لتشغيل نماذج LLM على الماكينة الخاصة بنا؟

    -الأداة التي يمكن استخدامها لتشغيل نماذج LLM على الماكينة الخاصة بنا تسمى Alama.

  • ماذا يجب أن نقوم بإنشاء قبل تشغيل النموذج على الماكينة؟

    -يجب أن نقوم بإنشاء ملف نموذج يشبه Docker file ولكنه مخصص للنماذج الآلية.

  • كيف يمكننا تشغيل النموذج بعد إنشائه؟

    -يمكننا تشغيل النموذج باستخدام الأمر `Alam a run model_name`.

  • ما هي الأداة التي يستخدمها الشخص لمراقبة الوظائف الأساسية لجهازه أثناء تشغيل النماذج؟

    -الأداة التي يستخدمها الشخص لمراقبة الوظائف الأساسية لجهازه هي ay top.

  • ماذا تعني النسبة الأعلى في GPU وزيادة استخدام الرام بحوالي ثلاثة гиغا بايت؟

    -هذا يعني أن النموذج يستخدم GPU بشدة خلال التشغيل، مما يؤدي إلى زيادة استخدام الرام.

  • ما هي الميزة الرئيسية لاستخدام نماذج GGF مع الـ LLMs؟

    -الميزة الرئيسية هي القدرة على تشغيل أيًا من آلاف نماذج GGF محليًا.

Outlines

00:00

📚 Introduction to GGUF and Hugging Face Models

The video begins with an introduction to the GGUF file format, which is used for storing large language models. The host guides viewers to the Hugging Face website to search for GGUF models, suggesting to filter the search by '7B' for 7 billion parameter models. They recommend sorting by recently updated and selecting a model with a balance of quality and performance, such as 'mystal light'. The process of downloading the model using the Hugging Face Hub CLI is demonstrated, including specifying the repository and file name, and choosing a local directory for the download.

Mindmap

Keywords

💡GGF file format

GGF file format is a specific type of file used for storing large language models. In the video, it is the format in which the language models are available for download from the Hugging Face website. The script mentions the use of GGF files to store models with varying levels of quantization, which affects their size and performance.

💡Hugging Face

Hugging Face is a company that provides a platform for developers to share, discover, and use machine learning models, particularly for natural language processing. The video script discusses navigating the Hugging Face website to find and download GGF format models, highlighting its role in accessing language models.

💡7 billion parameter models

These models refer to language models with 7 billion parameters, which are variables that the model learns from the training data. The video suggests that for running models on a machine, it's often better to use these larger models for their enhanced capabilities, as opposed to smaller ones.

💡Quantization

Quantization in the context of machine learning models refers to the process of reducing the precision of the model's parameters to use fewer bits, which makes the models smaller and faster but can slightly reduce their accuracy. The video script discusses different versions of a model with varying levels of quantization.

💡Hugging Face Hub CLI

Hugging Face Hub CLI is a command-line interface tool that allows users to interact with the Hugging Face Hub, which is a platform for sharing and discovering machine learning models. In the video, it is used to download the selected GGF model from the repository.

💡Local directory

A local directory refers to a folder on the user's own computer where files can be stored and accessed. In the script, the local directory 'downloads' is specified as the location where the downloaded GGF model file will be saved.

💡Llama

Llama is a tool mentioned in the video that enables users to run large language models (LLMs) on their own machines. It is currently compatible with Mac and Linux, with Windows support in development. The script details the process of using Llama to run the downloaded GGF model.

💡Model file

In the context of the video, a model file is a configuration file used by Llama to define how a particular language model should be run on the user's machine. It is compared to a Dockerfile but for LLMs, specifying the location of the GGF file and other settings.

💡Grafana

Grafana is a visualization tool mentioned in the video that can be used to monitor and observe the performance of systems, including when running language models. The script uses Grafana as an example of a query that can be processed by the model once it's running.

💡Apple silicon

Apple silicon refers to the custom-designed processors developed by Apple for their devices, such as the M1 chip. The video script mentions using a tool called 'ay top' for Apple silicon to monitor the machine's resource usage when running the language model.

💡Mystal AI

Mystal AI is one of the built-in models available in Llama, as mentioned in the video. It is an example of a model that can be run using Llama, showcasing the tool's capability to execute different language models locally.

Highlights

ggf is a file format used for storing large language models.

The video demonstrates how to run hugging face models in GGF format on a machine using ol Lama.

Over 1,000 GGF models are available on the hugging face website, with options to filter by parameter size like 7B for 7 billion parameters.

Models are sorted by recently updated for the latest versions.

The 'mistal light' model by the bloke is selected for its balance between quality and efficiency.

Different versions of the model are available, with trade-offs between size, speed, and quality.

The Q4 korm version of 'mistal light' is recommended for medium quality.

Downloading models can be done using the hugging face Hub CLI.

The hugging face Hub dependency is included in the poetry Pi Project file.

Downloading a specific model file avoids using excessive space for all files.

The HF transfer service is available for faster downloads, but the CLI method is considered sufficient.

The 'mistal light' ggf file is 4.1 GB and can be found in the downloads folder.

ol Lama is a tool for running large language models on Mac and Linux, with upcoming Windows support.

A model file, similar to a Docker file, is created to specify the location of the ggf file.

The 'alamak create' command is used to create a model file named 'mral light'.

Running the model is done using the 'alamak run' command followed by the model name.

The tool 'ay top' is used to monitor the machine's GPU and RAM usage during model execution.

ol Lama also comes with built-in models like 'mistal AI'.

The video concludes by showcasing the ability to run any of the thousand GGF models locally.

Transcripts

00:00

ggf is a file format used for storing

00:03

large language models and in this video

00:06

we're going to learn how to run hugging

00:07

face models in GG UF format on our

00:10

machine using ol Lama so let's head over

00:14

to the hugging face website and we're

00:16

going to just click in the search bar at

00:18

the top and we're going to type in

00:20

ggf and you can see it comes back with

00:22

models which mentioned this and there

00:24

are more than 1,000 of those but I find

00:28

generally for running models on machine

00:30

it's better to get those 7 billion

00:32

parameter ones so let's add in a 7B at

00:35

the end and you can see now we're down

00:36

to just under 400 models which are still

00:38

quite quite a few to choose from uh on

00:41

the search page let's sort it say by

00:43

recently updated so we'll get the latest

00:44

ones uh and we'll scroll down and let's

00:47

find one that we can use so the bloke

00:49

has been creating loads of these so

00:51

let's let's pick one that that he's been

00:52

working on so we'll pick mistal light

00:54

and if we click on that we get this this

00:57

page and it has lots of information

00:59

about the model and if we sort of go all

01:01

the way down we can see that there's

01:02

actually lots of different versions of

01:05

this model that have been created by the

01:07

bloke the ones at the top uh tend to

01:09

have been quantized much more so they

01:11

they will be smaller and run quicker but

01:13

generally with less U quality and as you

01:16

go down the quality goes up but the size

01:19

and the time that it takes to run go up

01:21

as well so it seems to be like there's

01:23

one in the middle Q4 korm that seems to

01:26

be a good one so it says it's got medium

01:27

Balan quality and is recommended so

01:29

let's let's pick that one so if we click

01:31

on that we get taken through to a page

01:33

describing that particular model now we

01:36

can download these models using the

01:38

hugging face Hub CLI so if we come over

01:41

to the terminal we're going to have a

01:42

look at my uh poetry Pi Project file and

01:46

you can see there in the middle I've got

01:47

the hugging face Hub dependency so let's

01:51

start writing this we're going to say

01:52

poetry run hugging face CLI download and

01:54

then you need to put in the name of the

01:56

repository so we'll just come back and

01:57

we're going to copy the repository from

01:59

that page and then we'll go back and

02:01

paste it in and then the next thing we

02:03

need is the name of the file itself so

02:06

let's go back to the page and we'll get

02:07

that as well and we'll paste that in now

02:09

if you don't pass in the file it will

02:11

try to download all the files which will

02:12

take up a lot of space so you probably

02:14

want to make sure you just have one file

02:16

once we've done that we're going to tell

02:17

it where to download it so we'll say

02:18

local directory downloads and then we'll

02:20

say don't use any Sim links and then

02:22

we're going to we're going to run it now

02:23

it comes up with this message telling

02:25

you that you should use HF transfer for

02:27

faster downloads but I have actually had

02:28

a look at that and I I find this method

02:30

that we're using is actually perfectly

02:31

fine and I so I I I wouldn't necessarily

02:35

recommend against using this hugging

02:37

face CLI let's speed this up a bit

02:40

because we've all got things to do and

02:42

you can see it takes about a minute to

02:44

download it I have got this connected on

02:46

ethernet and then afterwards we can have

02:48

a look at our downloads folder and you

02:49

can see in there we've got our mystal

02:51

light ggf file and it's 4.1 GB now we're

02:55

going to learn how to run this model on

02:57

our machine using a tool called the Lor

03:00

now Alama is a tool that lets you run

03:02

llms on your own machine at the moment

03:05

it works on Mac and Linux but Windows

03:07

support is coming soon and we're going

03:09

to have to create something called a

03:11

model file so I I kind of think of this

03:12

as like a Docker file but for llms and

03:15

we're in there we can say from and then

03:17

we need to put the location of the ggf

03:18

file so it's do/ downloads and then Mr

03:21

light at ggf uh we'll close that now and

03:24

what we can do is we can call the alarm

03:26

create command we'll say what what name

03:28

do you want to give your models we'll

03:29

say mral light and then point it at the

03:31

location of the file and it will say

03:32

it's going to created it it will only

03:34

take a few seconds and and then we've

03:35

got our file which we can see if we type

03:38

in a ll list as you can s see in the

03:40

middle there two seconds ago we've got

03:42

our mistal light latest model we can

03:45

then run this model using the AL run

03:48

command so we'll say Alama run mistal

03:49

light I know what is grafana and you can

03:52

see it sort of gives us like a big

03:54

explanation of the the grafana

03:55

visualization tool now I want to

03:57

conclude this by showing you a tool that

03:59

let you see what's going on when you're

04:01

running these models so the one that I'm

04:03

using for Apple silicon is called ay top

04:06

but if you look on that that page and

04:08

I'll include the link below there are

04:09

other ones for for different operating

04:11

systems and so we're going to just split

04:13

the terminal into into two and I'm going

04:15

to do Pudo asy top and then what we're

04:18

going to do is Rerun the previous

04:20

command where it was asking what is

04:22

grafana as you can see at the top

04:24

there's my machine's spec and while this

04:27

is running the GPU is at 99% and my RAM

04:31

usage is increasing by about three

04:33

gigabytes or so and then it goes back

04:35

down once it's finished and so being

04:37

able to use ggf models is quite a neat

04:39

feature of Al LL and it means you can

04:40

take any of those other thousand ggf

04:43

models and run them locally but Al also

04:45

comes with many built-in models one of

04:48

those is mystal AI and if you want to

04:49

learn more about that check out this

04:51

video up here and I'll see you in the

04:54

next one

Rate This

5.0 / 5 (0 votes)

Related Tags
Hugging FaceGGFLorlamaاي توبملفات MLماكلينكسالذكاء الاصطناعيالتعلم العميقالتحليل الビジュアルالتحسين الذاتيمستويات الأداءالتخزين السحابي
Do you need a summary in English?