Run your own AI (but private)

NetworkChuck

12 Mar 202422:13

Summary

TLDRThe video discusses the concept of private AI, showcasing how to set up a personal AI model on your computer, ensuring data privacy and security. The host explores the potential of private AI in enhancing job efficiency, especially in sectors with strict privacy requirements. Highlighting VMware's role in enabling private AI through their comprehensive solutions, the video emphasizes the ease of fine-tuning AI models with the right tools and infrastructure. The host also demonstrates the integration of personal knowledge bases with AI for tailored assistance, illustrating the transformative impact of private AI on personal and professional lives.

Takeaways

🌟 The concept of private AI is highlighted, emphasizing running AI models locally on one's own computer, ensuring data privacy and security.
🚀 The ease and speed of setting up a private AI model on personal devices are discussed, with the process being free and taking only about five minutes.
📚 The video introduces the idea of connecting personal knowledge bases, such as notes and documents, to a private AI for personalized assistance.
💡 Private AI can be particularly beneficial for jobs where using public AI models is restricted due to privacy and security concerns.
🌐 The role of companies like VMware in enabling on-premises AI solutions is explored, showcasing their contribution to the advancement of private AI.
🔍 The script mentions Hugging Face's platform, which hosts a community and a vast collection of AI models, many of which are free and open for use.
💻 The process of downloading and using a powerful AI model like Llama2 is described, illustrating the accessibility of advanced AI technology to individuals.
📈 The training of AI models is discussed, including the extensive data and resources required, highlighting the capabilities of entities like Meta AI in pre-training models.
🛠️ The concept of fine-tuning AI models with proprietary data is introduced, explaining how it can be done with相对较少的 resources compared to initial training.
🔗 The combination of VMware's infrastructure with NVIDIA's AI tools is presented as a comprehensive solution for companies to implement private AI.
🎁 The video concludes with a quiz for viewers, offering an incentive of free coffee for those who achieve a high score, reinforcing the engagement and educational aspect of the content.

Q & A

What is the primary advantage of running a private AI model on your own computer?
-The primary advantage is that it ensures data privacy and security as the AI model runs locally without sharing data with external companies or servers.
How long does it typically take to set up and run your own AI model on your laptop?
-It is mentioned that setting up and running your own AI model on a laptop computer can be achieved in about five minutes, making it a relatively quick and easy process.
What is Hugging Face and how does it relate to AI models?
-Hugging Face is a website and a community dedicated to providing and sharing a vast array of AI models. It hosts over 505,000 AI models, many of which are open and free to use and pre-trained.
What is an LLM and what is its significance?
-LLM stands for Large Language Model, which is an artificial intelligence model pre-trained on a large dataset of text. These models are significant because they can understand and generate human-like text, making them useful for various applications such as chatbots, text summarization, and more.
How was the Llama two model trained and what does its training entail?
-The Llama two model was trained by Meta (Facebook) using over 2 trillion tokens of data from publicly available sources and over a million human annotated examples. The training process involved a supercluster of over 6,000 GPUs and took 1.7 million GPU hours, with an estimated cost of around $20 million.
What is the role of O Lama in running private AI models?
-O Lama is a tool that allows users to run various LLMs, including Llama two and its uncensored versions, on their local machines. It simplifies the process of installing and utilizing these models without the need for extensive technical setup.
Why is fine-tuning an AI model important for businesses and individuals?
-Fine-tuning an AI model is important because it allows businesses and individuals to tailor the AI to their specific needs and data. This means the AI can understand and provide insights relevant to the user's unique context, such as company-specific knowledge bases or personal documents.
What is the difference between pre-training an AI model and fine-tuning it?
-Pre-training an AI model involves training the model on a large dataset to make it capable of understanding and generating human-like text. Fine-tuning, on the other hand, involves further training the pre-trained model on a smaller, more specific dataset to adapt it to a particular use case or to improve its performance in a specific domain.
How does VMware's private AI solution differ from other AI solutions?
-VMware's private AI solution provides an all-in-one package that includes the necessary hardware, software, and tools for fine-tuning and deploying AI models. It offers a robust infrastructure along with partnerships with companies like Nvidia and Intel, giving users a choice of tools and technologies to suit their needs.
What is RAG and how does it enhance the capabilities of an LLM?
-RAG stands for Retrieval-Augmented Generation. It is a technique that allows an LLM to consult a database or knowledge base for accurate information before generating a response. This enhances the LLM's ability to provide correct and relevant answers based on specific data that it may not have been fine-tuned on.
What is the significance of the quiz mentioned at the end of the video?
-The quiz is a test of the viewer's understanding of the video content. It provides an interactive way for viewers to engage with the material and assess their knowledge. The first five people to score 100% on the quiz receive free coffee from Network Chuck Coffee, adding an incentive for viewers to pay close attention to the video.

Outlines

00:00

🌐 Introducing Private AI and its Benefits

The speaker introduces the concept of private AI, emphasizing its advantages over cloud-based AI models. They highlight the importance of data privacy and the ability to run AI models locally on one's computer, without the need for internet connection. The speaker also mentions the ease and speed of setting up private AI, and teases a more advanced demonstration involving connecting personal documents and knowledge bases to the AI for personalized assistance. The discussion extends to the professional applications of private AI, especially in jobs where the use of public AI models is restricted due to privacy and security concerns. The role of companies like VMware in enabling on-premise AI deployment is also acknowledged.

05:01

💻 Setting Up Private AI and WSL Installation

The speaker provides a step-by-step guide on setting up private AI on a personal computer. They discuss the process of installing Windows Subsystem for Linux (WSL) on Windows, which simplifies the setup of a Linux environment. The speaker then demonstrates the installation of a tool called 'O Lama' and how it enables the running of various large language models (LLMs), including Llama two. The segment also covers the importance of having an Nvidia GPU for enhanced performance and provides a brief comparison of running LLMs on different platforms, such as Linux and Mac with M1 processors.

10:02

📚 Fine-Tuning AI Models and VMware's Role

The speaker delves into the concept of fine-tuning AI models, explaining how it allows customization of models with proprietary data for specific use cases. They discuss the resource-intensive process of pre-training models, as exemplified by Meta's training of their LLM, and contrast it with the more accessible fine-tuning process. The speaker then introduces VMware's contribution to private AI, highlighting their private AI offering in partnership with Nvidia. This includes tools and infrastructure that simplify the fine-tuning process, making it more accessible for companies to develop customized AI solutions.

15:02

🔍 Implementing RAG for Personalized AI Experience

The speaker explores the use of RAG (Retrieval-Augmented Generation) for enhancing AI models with personalized data. They demonstrate how RAG can connect an AI model to a database of personal notes and journal entries, enabling the model to provide accurate and personalized responses. The speaker also discusses the potential of fine-tuning and RAG in enhancing the utility of AI in various scenarios, such as troubleshooting company code or creating customer-facing chatbots. The segment concludes with an overview of how VMware and its partners, including Nvidia and Intel, provide the necessary tools and infrastructure for deploying customized AI models.

20:04

🎥 Live Demonstration of Private GPT with Personal Knowledge Base

The speaker presents a live demonstration of a private GPT model connected to a personal knowledge base. They walk through the process of uploading a VMware article and asking the AI model questions based on the content. The speaker then shares their experience of connecting their personal journals to the AI and asking questions about their past experiences. The segment highlights the potential and ease of use of such personalized AI models, while also acknowledging the complexity and effort required to set up such a system independently, as opposed to using a comprehensive solution like VMware's private AI.

Mindmap

Keywords

💡Private AI

Private AI refers to artificial intelligence systems that are run locally on a user's personal computer, ensuring that the data being processed remains private and is not shared with external entities. In the context of the video, the host is emphasizing the benefits of running a private AI model, such as Chat GPT, on one's own machine, which allows for greater control over personal data and its usage. This concept is central to the video's theme of self-empowerment through technology.

💡Chat GPT

Chat GPT is an AI model developed by OpenAI, known for its ability to generate human-like text based on the input it receives. In the video, the host discusses an alternative to Chat GPT that can be run privately on one's own computer, highlighting the benefits of having control over where and how the AI model operates. The mention of Chat GPT serves to draw a comparison with the private AI solution being proposed.

💡Data Privacy

Data privacy refers to the protection of personal information from unauthorized access, use, or disclosure. It is a significant concern for individuals and organizations alike. In the video, the host underscores the importance of data privacy by advocating for the use of private AI, which keeps data local and secure. This concept is crucial to the video's message about the responsible use of AI technology.

💡VMware

VMware is a software company that provides virtualization and cloud computing services. In the context of the video, VMware is highlighted as a sponsor and as a company enabling the possibility of running private AI models within one's own data center. The host discusses how VMware's solutions can facilitate the integration of AI technologies in a secure and controlled environment, which is essential for companies that prioritize data privacy and compliance.

💡On-Prem

On-Prem (short for on-premises) refers to the practice of hosting services, applications, or data on one's own physical servers, rather than using cloud-based services. In the video, the host talks about the benefits of running AI models on-premises, emphasizing the control and security it offers over data and operations. This concept aligns with the video's theme of self-reliance and data sovereignty.

💡Fine Tuning

Fine tuning in the context of AI refers to the process of further training a pre-trained AI model with new data to make it better suited for specific tasks or to improve its performance. In the video, the host discusses the concept of fine tuning AI models, such as the Llama model, with proprietary data to make them more relevant and useful for individual or company-specific needs. This process is key to adapting AI models to unique applications and enhancing their utility.

💡LLM (Large Language Model)

A Large Language Model (LLM) is a type of AI model that processes and generates text using deep learning techniques. These models are trained on vast amounts of text data and can perform various language-related tasks. In the video, the host mentions LLMs like Llama and Chat GPT, emphasizing their capabilities and the potential for customization through fine tuning. The concept of LLMs is central to the discussion of private AI and its applications.

💡Hugging Face

Hugging Face is an open-source community and platform that provides a wide range of AI models, including large language models. In the video, the host visits the Hugging Face website to demonstrate the variety of AI models available for use, highlighting the vast number of options and the potential for individuals to find models that suit their needs. This concept showcases the collaborative nature of AI development and the availability of resources for private AI implementation.

💡Data Freshness

Data freshness refers to the currency or recency of data, indicating that the information is up-to-date and relevant. In the context of AI training, data freshness is crucial for ensuring that models learn from the most recent examples and trends. The video discusses the importance of data freshness in training AI models like Llama, which was trained on data up to July 2023.

💡Super Cluster

A super cluster refers to a large-scale computing infrastructure consisting of many interconnected processors or servers working together to perform complex tasks. In the context of AI, a super cluster is often used for training large language models due to the immense computational power required. The video mentions a super cluster of over 6,000 GPUs used to train the Llama model, illustrating the scale of resources needed for such endeavors.

💡WSL (Windows Subsystem for Linux)

WSL, or Windows Subsystem for Linux, is a compatibility layer developed by Microsoft that allows Linux binary executables to run natively on Windows. In the video, the host mentions WSL as a solution for running Linux-based AI models on a Windows computer, demonstrating how technology integration can facilitate the use of advanced tools across different platforms.

💡RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is a machine learning technique that combines the ability to retrieve relevant information from a database with the capability to generate new text based on that retrieved information. In the video, RAG is presented as a tool that allows a private AI model to consult a database before generating a response, ensuring accuracy and personalization in the answers. This concept is integral to the video's exploration of how AI can be tailored to individual needs and data.

Highlights

The transcript discusses the concept of private AI, which is an AI model running locally on a personal computer, ensuring data privacy and security.

The presenter shares a personal experience of setting up a private AI and emphasizes its ease of use, taking only about five minutes to run on a laptop.

An introduction to Hugging Face's platform, which hosts a community dedicated to sharing and providing AI models, with over 505,000 models available.

The mention of Llama, an LLM (Large Language Model) developed by Meta (Facebook), which was trained on over 2 trillion tokens of data and a million human-annotated examples.

The presenter demonstrates the process of downloading and using a pre-trained AI model, emphasizing the power and capabilities of such models.

The discussion of how private AI can benefit professionals who are restricted from using public AI models due to privacy and security concerns in their jobs.

VMware's role in enabling private AI is highlighted, showing how companies can run their own AI models on-premises within their data centers.

The presenter guides the audience on how to install a tool called O Lama, which facilitates the running of various LLMs on different operating systems, including Windows Subsystem for Linux (WSL).

The importance of GPUs in running AI models efficiently is discussed, with the presenter sharing their experience of running an LLM on a Linux virtual machine with an M1 processor.

The concept of fine-tuning AI models with proprietary data is introduced, allowing companies to train their AI models on their own data for specific use cases.

VMware's Private AI solution, in partnership with Nvidia, is presented as a comprehensive package that simplifies the process of fine-tuning AI models for companies.

The presenter explains the process of fine-tuning an LLM with a small dataset, changing only a small percentage of the model's parameters, making it a feasible task without massive resources.

The use of RAG (Retrieval-Augmented Generation) is described, which allows an LLM to consult a database of information before generating a response, ensuring accuracy.

Nvidia AI Enterprise and its tools for deploying custom LLMs are mentioned, emphasizing the ease of setup and customization provided by the platform.

Intel's partnership with VMware is highlighted, offering data scientists tools for analytics, generative AI, deep learning, and classic ML for private AI implementation.

The transcript concludes with a quiz for viewers, offering free coffee to the first five people who score 100%, encouraging engagement and knowledge retention.

Transcripts

00:00

I'm running something called private ai. It's kind of like chat GPT,

00:03

except it's not. Everything about it is running right here on my computer.

00:07

Am I even connected to the internet?

00:08

This is private contained and my data isn't being shared with some random

00:12

company. So in this video I want to do two things. First,

00:15

I want to show you how to set this up.

00:16

It is ridiculously easy and fast to run your own AI on your laptop computer or

00:21

whatever. It's this is free, it's amazing.

00:23

It'll take you about five minutes and if you stick around until the end,

00:26

I want to show you something even crazier, a bit more advanced.

00:28

I'll show you how you can connect your knowledge base, your notes,

00:31

your documents,

00:32

your journal entries to your own private GPT and then ask it questions

00:37

about your stuff. And then second,

00:38

I want to talk about how private AI is helping us in the area we need help Most.

00:42

Our jobs, you may not know this,

00:44

but not everyone can use chat GBT or something like it at their job.

00:47

Their companies won't let them mainly because of privacy and security reasons,

00:51

but if they could run their own private ai, that's a different story.

00:54

That's a whole different ballgame and VMware is a big reason. This is possible.

00:58

They're the sponsor of this video and they're enabling some amazing things that

01:01

companies can do on-Prem in their own data center to run their own ai.

01:05

And it's not just the cloud man, it's like in your data center.

01:07

The stuff they're doing is crazy. We're going to talk about it here in a bit,

01:10

but tell you what, go ahead and do this. There's a link in the description.

01:13

Just go ahead and open it and take a little glimpse at what they're doing.

01:16

We're going to dive deeper,

01:16

so just go ahead and have it open right in your second monitor or something or

01:20

on the side or minimize. I don't know what you're doing.

01:22

I dunno how many monitors you have. You have three Actually, Bob,

01:25

I can see before we get started, I have to show you this.

01:27

You can run your own private ai. That's kind of uncensored. I watch this,

01:34

So yeah, please don't do this to destroy me. Also,

01:37

make sure you're paying attention at the end of this video,

01:39

I'm doing a quiz and if you're one of the first five people to get a hundred

01:42

percent on this quiz, you're getting some free coffee network. Chuck Coffee.

01:46

So take some notes, study up. Let's do this

01:51

now real quick, before we install a private local AI model on your computer,

01:55

what does it even mean? What's an AI model? At its core,

01:58

an AI model is simply an artificial intelligence pre-trained on data we

02:02

provided. One you may have heard of is open AI's Chat GBT,

02:05

but it's not the only one out there. Let's take a field trip.

02:08

We're going to go to a website called hugging face.co.

02:11

Just an incredible brand name. I love it so much.

02:14

This is an entire community dedicated to providing and sharing AI models and

02:18

there are a ton. You're about to have your mind blown. Ready?

02:21

I'm going to click on models up here. Do you see that number? 505,000 AI models.

02:26

Many of these are open and free for you to use and pre-trained,

02:30

which is kind of a crazy thing. Let me show you this.

02:32

We're going to search for a model named Llama two,

02:35

one of the most popular models out there. We'll do LAMA two seven B. Again,

02:39

I love the branding.

02:40

LAMA two is an AI model known as an LLM or large language model,

02:45

open AI's Chat. GPT is also an LLM. Now this LLM,

02:48

this pre-trained AI model was made by meda,

02:51

AKA Facebook and what they did to pre-train.

02:54

This model is kind of insane and the fact that we're about to download this and

02:58

use it even crazier, check this out if you scroll down just a little bit,

03:01

here we go. Training data.

03:03

It was trained by over 2 trillion tokens of data from publicly available

03:07

sources. Instruction data sets over a million human annotated examples,

03:11

data freshness. We're talking in July, 2023. I love that term.

03:15

Data freshness and getting the data was just step one.

03:18

Step two is insane because this is where the training happens.

03:21

Mata to train this model put together what's called a super cluster.

03:25

It already sounds cool, right? This sucker is over 6,000 GPUs.

03:29

It took 1.7 million GPU hours to train this model and it's estimated it

03:34

costs around $20 million to train it and now made is just like,

03:39

here you go kid. Download this incredibly powerful thing.

03:43

I don't want to call it a being yet. I'm not ready for that,

03:46

but this intelligent source of information that you can just download on your

03:50

laptop and ask it questions,

03:51

no internet required and this is just one of the many models we could download.

03:55

They have special models like text to speech, image to image.

03:58

They even have uncensored ones. They have an uncensored version of a llama too.

04:02

This guy George Sung,

04:04

took this model and fine tuned it with a pretty hefty GPU,

04:08

took him 19 hours and made it to where you could pretty much ask this thing.

04:11

Anything you wanted, whatever question comes to mind,

04:14

it's not going to hold back. Okay,

04:16

so how did we get this fine tuned model onto your computer? Well,

04:19

actually I should warn you, this involves quite a bit of llamas,

04:22

more than you would expect. Our journey starts at a tool called O Lama.

04:26

Let's go ahead and take a field trip out there real quick.

04:28

We'll go to O lama.ai. All we'll have to do is install this little guy, Mr.

04:32

Alama,

04:32

and then we can run a ton of different LLMs Llama two Code Llama told you lots

04:37

of llamas and there's others that are pretty fun like Llama two Uncensored or

04:41

Llamas. Tdrl. I'll show you in a second. But first, what do we install alama on?

04:46

We can see right down here that we have it available on macOS and Linux,

04:49

but oh bummer, windows coming soon.

04:52

It's okay because we've got WSL, the Windows subsystem for Linux,

04:56

which is now really easy to set up.

04:58

So we'll go ahead and click on download right here from os.

05:01

You'll just simply download this and install like one of your regular

05:04

applications for Linux. We'll click on this.

05:07

We got to fun curl command that will copy and paste now because we're going to

05:09

install WSL on Windows. This will be the same step. So Mac OS folks,

05:15

go ahead and just run that installer. Linux and Windows folks, let's keep going.

05:19

Now, if you're on Windows,

05:20

all you have to do now to get WSL installed is launch your Windows terminal.

05:23

Just go to your search bar and search for terminal and with one command it'll

05:27

just happen. It used to be so much harder, which is WSL dash dash install.

05:32

It'll go through a few steps. It'll install Ubuntu as default.

05:35

I'll go ahead and let that do that. And boom, just like that.

05:39

I've got Ubuntu 22 0 4 3 lts installed and I'm actually inside of it right

05:44

now. So now at this point, Linux and Windows folks, we converged.

05:47

We're on the same path. Let's install alama.

05:49

I'm going to copy that curl command that alama gave us,

05:52

jump back into my terminal, paste that in there and press enter.

05:55

Fingers crossed, everything should be great. Like the way it is right now,

05:59

it'll ask for my pseudo password and that was it. Oh, LAMA is now installed.

06:04

Now this will directly apply to Linux people and Windows people.

06:07

See right here where it says Nvidia GPU installed. If you have that,

06:10

you're going to have a better time than other people who don't have that.

06:13

I'll show you here in a second. If you don't have it, that's fine.

06:15

We'll keep going. Now let's run an LLM. We'll start with llama two.

06:18

So we'll simply type in, oh Lama run,

06:22

and then we'll pick one llama two and that's it. Ready,

06:26

set go. It's going to pull the manifest.

06:28

It'll then start pulling down and downloading Llama two.

06:31

And I want you to just realize this, that powerful LAMA two pre-training,

06:34

we talked about all the money and hours spent. That's how big it is.

06:38

This is the 7 billion parameter model or the seven B.

06:42

It's pretty powerful and we're about to literally have this in the palm of our

06:45

hands in like 3, 2, 1. Oh, I thought I had it. Anyways,

06:49

it's almost done. And boom, it's done.

06:52

We've got a nice success message right here and it's ready for us.

06:56

We can ask you anything. Let's try what is a pug?

06:59

Now the reason this is going so fast, just like a side note,

07:01

is that I'm running A GPU and AI models love GPUs.

07:05

So lemme just show you real quick.

07:06

I did install alama on a Linux virtual machine and I'll just demo the

07:10

performance for you real quick. By the way, if you're running a Mac with an M1,

07:13

M two or M three processor, it actually works great. I forgot to install it.

07:17

I got to install it real quick and I'll ask you that same question.

07:19

What is a pug? It's going to take a minute, it'll still work,

07:22

but it's going to be slower on CPUs and there it goes. It didn't take too long,

07:25

but notice it is a bit slower.

07:27

Now if you're running WSL and you know have an Nvidia GPU and it didn't show up,

07:31

I'll show you in a minute how you can get those drivers installed. But anyways,

07:34

just sit back for a minute,

07:35

sip your coffee and think about how powerful this is.

07:38

The tinfoil hat version of me stinking loves this because let's say

07:43

the zombie apocalypse happens, right? The grid goes down, things are crazy,

07:47

but as long as I have my laptop and a solar panel,

07:51

I still have AI and it can help me survive the zombie apocalypse.

07:55

Let's actually see how that would work. It gives me next steps.

07:58

I could have it help me with the water filtration system. This is just cool,

08:01

right? It's amazing. But can I show you something funny?

08:04

You may have caught this earlier. Who is network? Chuck?

08:09

What? Dude, I've always wanted to be Rick Grimes.

08:14

That is so fun, but seriously, it kind of hallucinated there.

08:17

It didn't have the correct information.

08:19

It's so funny how it mixed the zombie apocalypse prompt with me.

08:23

I love that so much. Let's try a different model. I'll say bye.

08:27

I'll try a really fun one called mytral. And by the way,

08:30

if you want to know which ones you can run with Llama, which LLMs,

08:33

they get a page for their models right here and all the ones you can run,

08:36

including llama two, uncensored Wizard Math.

08:39

I might give that to my kids actually. Let's see what it says.

08:41

Now who is Network Chuck?

08:45

Now my name is not Chuck Davis and my YouTube channel is not called Network

08:50

Chuck on Tech.

08:50

So clearly the data this thing was trained on is either not up to date or just

08:54

plain wrong. So now the question is cool,

08:57

we've got this local private ai, this LLM, that's super powerful,

09:02

but how do we teach it the correct information for us?

09:05

How can I teach it to know that I'm network Chuck, Chuck Keith, not Chuck Davis,

09:08

and my channel is called Network Chuck.

09:09

Or maybe I'm a business and I want it to know more than just what's publicly

09:13

available because sure, right now if you downloaded this lm,

09:16

you could probably use it in your job,

09:17

but you can only go so far without it knowing more about your job. For example,

09:22

maybe you're on a help desk.

09:23

Imagine if you could take your help desk's knowledge base, your IT procedures,

09:27

your documentation. Not only that,

09:29

but maybe you have a database of closed tickets, open tickets.

09:31

If you could take all that data and feed it to this LLM and then ask it

09:35

questions about all of that, that would be crazy.

09:38

Or maybe you wanted to help troubleshoot code that your company's written.

09:41

You could even make this LM public facing for your customers.

09:44

You feed information about your product and the customer could interact with

09:47

that chat bot you make.

09:49

Maybe this is all possible with a process called fine tuning where we can train

09:53

this AI on our own proprietary secret private stuff about our

09:58

company or maybe our lives or whatever you want to use it for,

10:00

whatever use case is,

10:01

and this is fantastic because maybe before you couldn't use a public LLM because

10:05

you weren't allowed to share your company's data with that LLM,

10:08

whether it's compliance reasons or you just simply didn't want to share that

10:10

data because it's secret. Whatever the case,

10:12

it's possible now because this AI is private,

10:15

it's local and whatever data you feed to it,

10:18

it's going to stay right there in a company. It's not leaving the door.

10:20

That idea just makes me so excited because I think it is the future of AI and

10:24

how companies and individuals will approach it. It's going to be more private.

10:28

Back to our question though, fine tuning, that sounds cool.

10:31

Training and AI on your own data, but how does that work?

10:34

Because as we saw before with pre-training a model with mata,

10:38

it took them 6,000 GPUs over 1.7 million GPU hours.

10:42

Do we have to have this massive data center to make this happen? No.

10:46

Check this out, and this is such a fun example, VMware, they asked chat GPT,

10:50

what's the latest version of VMware vSphere?

10:52

Now the latest chat GPT knew about was vSphere 7.0,

10:55

but that wasn't helpful to VMware because their latest version they were working

10:58

on chat hadn't been released yet.

10:59

So it wasn't public knowledge was vSphere eight update too.

11:02

And they wanted information like this internal information not yet released to

11:06

the public.

11:07

They wanted this to be available to their internal team so they could ask

11:10

something like chat GBT, Hey, what's the latest version of vSphere?

11:14

And they could answer correctly.

11:15

So to do what VMware is trying to do to fine tune a model or train it on new

11:19

data, it does require a lot. First of all,

11:22

you would need some hardware servers with GPUs.

11:24

Then you would also need a bunch of tools and libraries and SDKs like PyTorch

11:29

and TensorFlow, pandas, MPI side kit, learn transformers and fast ai.

11:33

The list goes on.

11:34

You need lots of tools and resources in order to fine tune an LLM.

11:37

That's why I'm a massive fan of what VMware is doing right here.

11:40

They have something called the VMware private AI with Nvidia,

11:44

the gajillion things I just listed off. They include in one package,

11:49

one combo meal, a recipe of ai, fine tuning goodness.

11:53

So as a company it becomes a bit easier to do this stuff yourself locally.

11:57

For the system engineer you have on staff who knows VMware and loves it,

12:00

they could do this stuff,

12:01

they could implement this and the data scientists they have on staff that will

12:04

actually do some of the fine tuning, all the tools are right there.

12:07

So here's what it looks like to fine tune and we're going to kind of peek behind

12:10

the curtain at what a data scientist actually does.

12:12

So first we have the infrastructure and we start here in vSphere, VMware.

12:17

Now if you don't know what vSphere is or VMware, think virtual machines,

12:20

you got one big physical server. The hardware, the stuff you can feel,

12:23

touch and smell. You haven't smelled the server, I dunno what you're doing.

12:26

And instead of installing one operating system on them like Windows or Linux,

12:29

you install VMware's, EA XI,

12:31

which will then allow you to virtualize or create a bunch of additional virtual

12:35

computers. So instead of one computer,

12:37

you've got a bunch of computers all using the same hardware resources.

12:40

And that's what we have right here. One of those virtual computers,

12:43

a virtual machine.

12:44

This by the way is one of their special deep learning VMs that has all the tools

12:49

I mentioned and many, many more pre-installed, ready to go.

12:53

Everything a data scientist could love.

12:55

It's kind of like a surgeon walking in to do some surgery and like their doctor

12:59

assistants or whatever have prepared all their tools.

13:01

It's all in the tray laid out nice and neat to the surgeon.

13:04

All he has to do is walk in and just go scalpel.

13:08

That's what we're doing here for the data scientist.

13:10

Now talking more about hardware,

13:11

this guy has a couple Nvidia GPUs assigned to it or pass through to it through

13:16

a technology called PCIE Passthrough. These are some beefy GPUs.

13:20

I notice they are V GPU for virtual GPU similar to what you do with the CPU,

13:25

cutting up the PU and assigning some of that to a virtual CPU on a virtual

13:29

machine. So here we are in data scientists world. This is a Jupiter notebook,

13:33

a common tool used by a data scientist,

13:35

and what you're going to see here is a lot of code that they're using to prepare

13:37

the data,

13:38

specifically the data that they're going to train or fine tune the existing

13:42

model on. Now we're not going to dive deep on that,

13:44

but I do want you to see this, check this out.

13:45

A lot of this code is all about getting the data ready. So in VMware's case,

13:48

it might be a bunch of the knowledge base product documentation and they're

13:51

getting it ready to be fed to the LLM. And here's what I wanted you to see.

13:55

Here's the dataset that we're training this model on. We're fine tuning.

13:59

We only have 9,800 examples that we're giving it or 9,800 new prompts or

14:04

pieces of data. And that data might look like this,

14:06

like a simple question or a prompt and then we feed it the correct answer and

14:11

that's how we essentially train ai. But again,

14:14

we're only giving it 9,800 examples,

14:16

which is not a lot at all and is extremely small compared to how the

14:20

model was originally trained.

14:22

And I point that out to say that we're not going to need a ton of hardware or a

14:25

ton of resources to fine tune this model.

14:28

We won't need the 6,000 GPUs we needed for MATA to originally create this model.

14:32

We're just adding to it,

14:33

changing some things or fine tuning it to what our use case is and looking at

14:37

what actually will be changed when we run this and we train it,

14:41

we're only changing 65 million parameters, which sounds like a lot, right?

14:46

But not in the grand scheme of things of like a 7 billion parameter model.

14:49

We're only changing 0.93% of the model.

14:52

And then we can actually run our fine tuning,

14:54

which this is a specific technique in fine tuning called prompt tuning where we

14:58

simply feed up additional prompts with answers to change how it'll react to

15:02

people asking you questions.

15:03

This process will take three to four minutes to fine tune it because again,

15:06

we're not changing a lot and that is just so super powerful and I think VMware

15:10

is leading the charge with private ai.

15:12

VMware and Nvidia take all the guesswork out of getting things set up to fine

15:17

tune an LLM. They've got deep learning VMs,

15:19

which are insane VMs that come pre-installed with everything you could want

15:23

everything a data scientist would need to find tune an LLM.

15:26

Then Nvidia has an entire suite of tools sensor around their GPUs,

15:29

taking advantage of some really exciting things to help you fine tune your lms.

15:33

Now there's one thing I didn't talk about because I wanted to save it for last.

15:36

For right now it's this right here, this vector database,

15:39

post gray SQL box here.

15:42

This is something called rag and it's what we're about to do with our own

15:46

personal GPT here in a bit. Retrieval, augment the generation. So scenario,

15:51

let's say you have a database of product information, internal docs,

15:54

whatever it is, and you haven't fine tuned your LLM on this just yet.

15:58

So it doesn't know about it. You don't have to do that with rag.

16:01

You can connect your LLM to this database of information,

16:05

this knowledge base and give it these instructions.

16:08

Say whenever I ask you a question about any of the things in this database,

16:11

before you answer, consult the database,

16:13

go look at it and make sure what you're saying is accurate.

16:16

We're not retraining the LLM, we're just saying, Hey, before you answer,

16:20

go check real quick in this database to make sure it's accurate to make sure you

16:23

got your stuff right. Isn't that cool? So yes,

16:25

fine tuning is cool and training an LLM on your own data is awesome,

16:29

but in between those moments of fine tuning,

16:31

you can have rag set up where it can consult your database,

16:34

your internal documentation and give correct answers based on what you have in

16:38

that database. That is so stinking cool.

16:40

So with VMware private AI foundation with nvidia,

16:43

they have those tools baked right in to where it just kind of works for what

16:47

would otherwise be a very complex setup. And by the way, this whole rag thing,

16:51

like I said earlier, we're about to do this,

16:53

I actually connected a lot of my notes and journal entries to a private GPT

16:58

using RAG and I was able to talk with it about me asking it about my

17:03

journal entries and answering questions about my past. That's so powerful. Now,

17:07

before we move on,

17:08

I just want to highlight the fact that Nvidia with their Nvidia AI enterprise

17:12

gives you some amazing and fantastic tools to pull the LLM of your choice and

17:17

then fine tune and customize and deploy that LLM. It's all built in right here.

17:21

So VMware Cloud Foundation,

17:22

they provide the robust infrastructure and NVIDIA provides all the amazing AI

17:26

tools you need to develop and deploy these custom LLMs.

17:29

Now it's not just Nvidia, they're partnering with Intel as well.

17:31

So VMware is covering all the tools that admins care about.

17:34

And then for the data scientists, this is for you.

17:36

Intel's got your back data analytics,

17:38

generative AI and deep learning tools and some classic ML or machine learning.

17:42

And they're also working with IBM, all you IBM fans. You can do this too. Again,

17:46

VMware has the admin's back. But for the data scientist, Watson,

17:49

one of the first AI things I ever heard about Red Hat and OpenShift,

17:52

and I love this because what VMware is doing is all about choice.

17:55

If you want to run your own local private ai, you can.

17:58

You're not just stuck with one of the big guys out there and you can choose to

18:00

run it with Nvidia and VMware, Intel and VMware, IBM and VMware.

18:04

You got options. So there's nothing stopping you.

18:06

It's not for some of the bonus section of this video and that's how to run your

18:09

own private GPT with your own knowledge base. Now, fair warning,

18:14

it is a bit more advanced, but if you stick with me,

18:16

you should be able to get this up and running. So take one more sip of coffee.

18:20

Let's get this going. Now, first of all, this will not be using a lama.

18:23

This will be a separate project called Private GPT. Now disclaimer,

18:26

this is kind of hard to do. Unlike VMware private ai,

18:29

which they do it all for you,

18:30

it's a complete solution for companies to run their own private local ai.

18:34

What I'm about to show you is not that at all. No affiliation with VMware.

18:37

It's a free side project.

18:39

You can try just to get a little taste of what running your own private GPT with

18:44

rag tastes like. Did I do that right? I don't know.

18:47

Now L Martinez has a great doc on how to install this. It's a lot,

18:51

but you can do it. And if you just want a quick start,

18:53

he does have a few lines of code for Linux and Mac users. Fair warning,

18:57

this is CPU only. You can't really take advantage of RAG without A GPU,

19:00

which is what I wanted to do. So here's my very specific scenario.

19:03

I've got a Windows PC with an NVIDIA 40 90. How do I run this?

19:06

Linux-based project. WSL, and I'm so thankful to this guy Emelia Lance a lot.

19:11

He put an entire guide together of how to set this up.

19:14

I'm not going to walk you through every step because he already did that link

19:17

below, but I seriously need to buy this guy a coffee. How do I do that?

19:20

I don't know, Emil, if you're watching this, reach out to me.

19:22

I'll send you some coffee. So anyways,

19:24

I went through every step from installing all the prereqs to installing NVIDIA

19:27

drivers and using poetry to handle dependencies, which poetry is pretty cool.

19:31

I landed here.

19:32

I've got a private local working private GPT that I can access through my web

19:36

browser and it's using my GPU, which is pretty cool. Now,

19:38

first I try a simple document upload,

19:40

got this VMware article that details a lot of what we talked about in this

19:43

video. I upload it and I start asking you questions about this article.

19:46

I tried something specific like show me something about VMware AI market growth.

19:50

Bam, it figured it out, it told me. Then I'm like,

19:52

what's the coolest thing about VMware private ai?

19:55

It told me I'm sitting here chatting with a document, but then I'm like,

19:58

let's try something bigger. I want to chat with my journals.

20:00

I've got a ton of journals on markdown format and I want to ask you questions

20:03

about me. Now this specific step is not covered in the article.

20:06

So here's how you do it. First,

20:07

you'll want to grab your folder of whatever documents you want to ask questions

20:10

about and throw it onto your machine.

20:12

So I copied over to my WSL machine and then I ingested it with this command once

20:16

complete and I ran private GPT. Again,

20:18

here's all my documents and I'm ready to ask it questions.

20:21

So let's test this out. I'm going to ask it what did I do in takayama?

20:26

So I went to Japan in November of 2023. Let's see if you can search my notes,

20:31

figure out when that was and what I did.

20:36

That's awesome. Oh my goodness.

20:41

Let's see, what did I eat in Tokyo?

20:45

How cool is that? Oh my gosh, that's so fun. No, it's not perfect,

20:49

but I can see the potential here. That's insane. I love this so much.

20:53

Private AI is the future and that's why we're seeing VMware bring products like

20:57

this to companies to run their own private local AI and then make it pretty

21:01

easy. If you actually did that private GPT thing, that little side project,

21:04

there's a lot to it. Lots of tools you have to install, it's kind of a pain.

21:07

But with VMware,

21:08

they kind of cover everything like that deep learning VM they offer as part of

21:11

their solution. It's got all the tools ready to go. Pre-baked again,

21:15

you're like a surgeon just walking in saying scalpel.

21:17

You got all this stuff right there. So if you want to bring AI to your company,

21:20

check out VMware private AI link below and thank you to VMware by Broadcom for

21:24

sponsoring this video. You made it to the end of the video time for a quiz.

21:28

This quiz will test the knowledge you've gained in this video and the first five

21:32

people to get a hundred percent on this quiz will get free coffee from Network

21:36

Chuck Coffee. So here's how you take the quiz right now.

21:38

Check the description in your video and click on this link.

21:41

If you're not currently signed into the academy, go ahead and get signed in.

21:43

If you're not a member, go ahead and click on sign off. It's free.

21:47

Once you're signed in,

21:48

it will take you to your dashboard showing you all the stuff you have access to

21:51

with your free academy account. But to get right back to that quiz,

21:54

go back to the YouTube video,

21:55

click on that link once more and it should take you right to it.

21:58

Go ahead and click on start now and start your quiz. Here's a little preview.

22:03

That's it. The first five to get a hundred percent free coffee.

22:06

If you're one of the five,

22:06

you'll know because you'll receive an email with free coffee.

22:09

You got to be quick, you got to be smart. I'll see you guys in the next video.

Rate This

★

★

★

★

★

5.0 / 5 (0 votes)

Tags associés

Private AIAI ModelsHugging FaceFine TuningData SecurityVMwareNvidiaRAGChat GPTData CenterOn-PremAI DeploymentTech TutorialAI EnterpriseIntel PartnershipIBM CollaborationAI CustomizationAI InfrastructureData PrivacyLocal AIOpen SourcePrivate GPTWebinar RecapTech QuizCoffee Network

Avez-vous besoin d'un résumé en français?

Parcourir plus de vidéos associées

AI Portfolio Project | I built a MACHINE LEARNING MODEL using AI in 10 MINUTES

Google I/O 2024: Everything Revealed in 12 Minutes

Microsoft Windows Wants To Record Your Screen...

Mark Zuckerberg - Llama 3, $10B Models, Caesar Augustus, & 1 GW Datacenters

Ollama Embedding: How to Feed Data to AI for Better Response?

Did AI Just End Music? (Now it’s Personal) ft. Rick Beato