Hugging Face GGUF Models locally with Ollama

Learn Data with Mark

20 Oct 202304:55

Summary

TLDRفي هذا الفيديو، سيتعلم المستخدمين كيفية تشغيل نماذج Hugging Face بتنسيق GGF على أجهزتهم باستخدام الأداة llama. يبدأ المقدم ببحث على موقع Hugging Face لإيجاد نموذج يحتوي على ٧ مليار معامل، ثم يختار نموذجًا يدعى mistal light. بعد ذلك، يشرح كيفية تحميل النموذج باستخدام CLI لـ Hugging Face، ويحدد الدليل المحلي لتنزيل النموذج. يستخدم النموذج الأخير من llama لإنشاء ملف نماذج يشير إلى الموقع المطلوب للنموذج. يُظهر المثال كيفية تشغيل النموذج باستخدام الأمر AL run. يُشير إلى وجود العديد من النماذج المدمجة في llama، مثل mistal AI، ويُشير المقدم إلى محتوى الفيديو السابق للاطلاع على المزيد من المعلومات.

Takeaways

📚 ggf是一种用于存储大型语言模型的文件格式。
🔍 在hugging face网站上搜索ggf格式的模型，可以找到超过1000个模型。
📈 选择7亿参数的模型通常更适合在机器上运行。
🔄 根据需要的质量和运行速度，可以选择不同量化级别的模型。
📝 通过hugging face Hub CLI可以下载所需的模型。
📂 下载时指定本地目录以避免占用过多空间。
🚀 使用Llama工具可以在本地机器上运行大型语言模型。
📝 创建一个模型文件，类似于Dockerfile但用于大型语言模型。
📁 模型文件中指定ggf文件的位置，以便Llama可以找到并使用它。
⏱️ 使用Llama创建命令来创建模型，过程只需几秒钟。
💻 使用Al run命令运行模型，可以查看模型的输出。
📊 使用如ay top这样的工具可以监控模型运行时的系统资源使用情况。
🌐 Llama还内置了多个模型，如mystal AI，可以进一步探索。

Q & A

GGF هي تنسيق ملفات يستخدم لتخزين ماذا؟
-GGF هي تنسيق ملفات يستخدم لتخزين نماذج لغوية كبيرة.
من أين يمكننا العثور على نماذج GGF للتحميل؟
-يمكننا العثور على نماذج GGF للتحميل من موقع Hugging Face.
ما هي القيمة الأساسية التي ينصح بالبحث عنها عند اختيار نموذج GGF؟
-ينصح بالبحث عن نموذج يمتلك التوازن المتوسط بين الجودة والحجم والسرعة.
ماذا يتيح لنا استخدام Hugging Face Hub CLI؟
-يتيح لنا استخدام Hugging Face Hub CLI للتحميل المباشر لنماذج الـ GGF.
كيف يمكننا تحديد المكان الذي سنقوم بتنزيل النموذج إلىه؟
-يمكننا تحديد المكان الذي سنقوم بتنزيل النموذج إليه من خلال الأمر `local directory downloads`.
ماذا تعني الرسالة التي تظهر عند التنزيل تشير إلى استعمال HF transfer لتنزيلات أسرع؟
-هذه الرسالة تشير إلى وجود خيار آخر يدعى HF transfer لتحسين سرعة التنزيل، ولكن الأسلوب الحالي يستخدم من خلال Hugging Face CLI يعتبر ملائمًا.
ما هو الأداة التي يمكن استخدامها لتشغيل نماذج LLM على الماكينة الخاصة بنا؟
-الأداة التي يمكن استخدامها لتشغيل نماذج LLM على الماكينة الخاصة بنا تسمى Alama.
ماذا يجب أن نقوم بإنشاء قبل تشغيل النموذج على الماكينة؟
-يجب أن نقوم بإنشاء ملف نموذج يشبه Docker file ولكنه مخصص للنماذج الآلية.
كيف يمكننا تشغيل النموذج بعد إنشائه؟
-يمكننا تشغيل النموذج باستخدام الأمر `Alam a run model_name`.
ما هي الأداة التي يستخدمها الشخص لمراقبة الوظائف الأساسية لجهازه أثناء تشغيل النماذج؟
-الأداة التي يستخدمها الشخص لمراقبة الوظائف الأساسية لجهازه هي ay top.
ماذا تعني النسبة الأعلى في GPU وزيادة استخدام الرام بحوالي ثلاثة гиغا بايت؟
-هذا يعني أن النموذج يستخدم GPU بشدة خلال التشغيل، مما يؤدي إلى زيادة استخدام الرام.
ما هي الميزة الرئيسية لاستخدام نماذج GGF مع الـ LLMs؟
-الميزة الرئيسية هي القدرة على تشغيل أيًا من آلاف نماذج GGF محليًا.

Outlines

00:00

📚 Introduction to GGUF and Hugging Face Models

The video begins with an introduction to the GGUF file format, which is used for storing large language models. The host guides viewers to the Hugging Face website to search for GGUF models, suggesting to filter the search by '7B' for 7 billion parameter models. They recommend sorting by recently updated and selecting a model with a balance of quality and performance, such as 'mystal light'. The process of downloading the model using the Hugging Face Hub CLI is demonstrated, including specifying the repository and file name, and choosing a local directory for the download.

Mindmap

Keywords

💡GGF file format

GGF file format is a specific type of file used for storing large language models. In the video, it is the format in which the language models are available for download from the Hugging Face website. The script mentions the use of GGF files to store models with varying levels of quantization, which affects their size and performance.

💡Hugging Face

Hugging Face is a company that provides a platform for developers to share, discover, and use machine learning models, particularly for natural language processing. The video script discusses navigating the Hugging Face website to find and download GGF format models, highlighting its role in accessing language models.

💡7 billion parameter models

These models refer to language models with 7 billion parameters, which are variables that the model learns from the training data. The video suggests that for running models on a machine, it's often better to use these larger models for their enhanced capabilities, as opposed to smaller ones.

💡Quantization

Quantization in the context of machine learning models refers to the process of reducing the precision of the model's parameters to use fewer bits, which makes the models smaller and faster but can slightly reduce their accuracy. The video script discusses different versions of a model with varying levels of quantization.

💡Hugging Face Hub CLI

Hugging Face Hub CLI is a command-line interface tool that allows users to interact with the Hugging Face Hub, which is a platform for sharing and discovering machine learning models. In the video, it is used to download the selected GGF model from the repository.

💡Local directory

A local directory refers to a folder on the user's own computer where files can be stored and accessed. In the script, the local directory 'downloads' is specified as the location where the downloaded GGF model file will be saved.

💡Llama

Llama is a tool mentioned in the video that enables users to run large language models (LLMs) on their own machines. It is currently compatible with Mac and Linux, with Windows support in development. The script details the process of using Llama to run the downloaded GGF model.

💡Model file

In the context of the video, a model file is a configuration file used by Llama to define how a particular language model should be run on the user's machine. It is compared to a Dockerfile but for LLMs, specifying the location of the GGF file and other settings.

💡Grafana

Grafana is a visualization tool mentioned in the video that can be used to monitor and observe the performance of systems, including when running language models. The script uses Grafana as an example of a query that can be processed by the model once it's running.

💡Apple silicon

Apple silicon refers to the custom-designed processors developed by Apple for their devices, such as the M1 chip. The video script mentions using a tool called 'ay top' for Apple silicon to monitor the machine's resource usage when running the language model.

💡Mystal AI

Mystal AI is one of the built-in models available in Llama, as mentioned in the video. It is an example of a model that can be run using Llama, showcasing the tool's capability to execute different language models locally.

Highlights

ggf is a file format used for storing large language models.

The video demonstrates how to run hugging face models in GGF format on a machine using ol Lama.

Over 1,000 GGF models are available on the hugging face website, with options to filter by parameter size like 7B for 7 billion parameters.

Models are sorted by recently updated for the latest versions.

The 'mistal light' model by the bloke is selected for its balance between quality and efficiency.

Different versions of the model are available, with trade-offs between size, speed, and quality.

The Q4 korm version of 'mistal light' is recommended for medium quality.

Downloading models can be done using the hugging face Hub CLI.

The hugging face Hub dependency is included in the poetry Pi Project file.

Downloading a specific model file avoids using excessive space for all files.

The HF transfer service is available for faster downloads, but the CLI method is considered sufficient.

The 'mistal light' ggf file is 4.1 GB and can be found in the downloads folder.

ol Lama is a tool for running large language models on Mac and Linux, with upcoming Windows support.

A model file, similar to a Docker file, is created to specify the location of the ggf file.

The 'alamak create' command is used to create a model file named 'mral light'.

Running the model is done using the 'alamak run' command followed by the model name.

The tool 'ay top' is used to monitor the machine's GPU and RAM usage during model execution.

ol Lama also comes with built-in models like 'mistal AI'.

The video concludes by showcasing the ability to run any of the thousand GGF models locally.