Install Yi-1.5 Model Locally - Beats Llama 3 in Various Benchmarks
Summary
TLDRIn this video, the presenter introduces the upgraded G model, a pre-trained AI language model with enhanced capabilities in coding, math reasoning, and instruction following. The model, available in three sizes, is released under the Apache 2 license, a first for the series. The presenter installs the 6 billion parameter version on their local system, showcasing its performance on various benchmarks and tasks, including language understanding, common sense reasoning, and coding questions. The model's responses are impressive, demonstrating high-quality outputs and ethical considerations, such as refusing to provide information on illegal activities.
Takeaways
- đ The video introduces a new G model, an upgrade from the previous Y models, which were known for their high quality.
- đ G 1.5 is a significant improvement over G, with pre-training on a 500 billion token corpus and fine-tuning on 3 million samples, enhancing performance in coding, math reasoning, and instruction following.
- đ§ G maintains strong capabilities in language understanding, common sense reasoning, and reading comprehension.
- đą There are three versions of the G model: 34 billion, 9 billion, and 6 billion parameters, with the video focusing on installing the 6 billion parameter version.
- đ» The 6 billion parameter model requires at least 16 GB of VRAM, suitable for the presenter's system with a single GPU card.
- đ Benchmarks show that G 1.5 performs exceptionally well, with the 34 billion version outperforming larger models and the 9 billion version being a top performer among similar-sized models.
- đ The G model is released under the Apache 2 license, marking the first time these models have been open-sourced, which is a significant contribution to the community.
- đ ïž The video demonstrates the installation process of the G model on a local system, including setting up a Python environment and cloning the model's repository.
- đ The presenter provides a link to the model's Hugging Face model card for viewers to access the model path and other necessary details.
- đ The video includes a demonstration of the model's capabilities through various prompts, including defining happiness, coding questions, and a logic puzzle about a ball in a vase.
- đ The model refuses to provide an answer to a 'jailbreak' question about breaking into a car, emphasizing ethical guidelines and suggesting legitimate solutions instead.
- đ The video shows a math problem being solved by the model, following the order of operations, demonstrating the model's reasoning ability.
Q & A
What is the new G model introduced in the video?
-The new G model is an upgraded version of XI, which is continuously pre-trained on a high-quality Corpus of 500 billion tokens and fine-tuned on 3 million diverse fine-tuning samples. It offers stronger performance in coding, math, reasoning, and instruction following capabilities.
What are the three different sizes of the G model mentioned in the script?
-The three sizes of the G model are 34 billion, 9 billion, and 6 billion, with the 6 billion variant being installed on the presenter's local system.
How does the G 1.5 model perform compared to other models in benchmarks?
-G 1.5, especially the 34 billion variant, performs on par with or excels beyond larger models in most benchmarks. The 9 billion variant is a top performer among similarly sized open-source models.
What is special about the license of the G model?
-The G model is licensed under Apache 2, marking the first Apache 2 release of these models, which is considered a significant contribution to the open-source community.
What is the minimum VRAM requirement for installing the 6 billion G model?
-The 6 billion G model requires at least 16 GB of VRAM for installation.
What steps are involved in setting up the G model on a local system?
-The steps include creating a clean environment using Konda, cloning the G model repository, installing the requirements from the repo, and setting up the model path and tokenizer before downloading the model.
How does the G model handle the prompt 'What is happiness?'
-The G model provides a comprehensive response describing happiness as a complex and subjective state of well-being involving contentment, fulfillment, and joy, and noting that it is deeply personal and varies from person to person.
What was the outcome when the presenter asked the G model to write 10 sentences ending with the word 'beauty'?
-The G model did not follow the instruction precisely, instead providing sentences related to beauty but not ensuring each ended with the word 'beauty'.
How did the G model respond to the question about the location of a ball in an upside-down vase?
-The G model correctly inferred that the ball would be on the coffee table in the living room, having fallen out of the vase when it was turned upside down and transferred.
What advice did the G model give when asked about breaking into a car after losing the keys?
-The G model empathized with the situation and advised against breaking into the car, suggesting alternatives like contacting a locksmith, using a car key extractor tool, or considering replacing the key.
How did the G model handle a simple math question involving an equation?
-The G model provided a step-by-step solution, following the order of operations (PEMDAS), and arrived at the correct answer.
Outlines
đ Introduction to the G Model Upgrade
The speaker expresses excitement about the new G model, an upgrade to the previously discussed Y models known for their quality. The G 1.5 version is highlighted for its enhanced capabilities in coding, math reasoning, and instruction following, achieved through pre-training on a vast corpus and fine-tuning on diverse samples. The video will demonstrate the installation of the G model locally, focusing on the 6 billion parameter version due to hardware constraints. The speaker also praises the model's open-source Apache 2 license as a community service and proceeds to show the system setup and installation process.
đ§ Setting Up the G Model Environment
The video script details the technical steps for setting up the G model environment. This includes creating a conda environment for organization, installing Python 3.11, and cloning the G model repository. The speaker instructs viewers to install the necessary requirements from the repo and guides them through the process of downloading and setting up the model using a specified path. The focus is on ensuring that all prerequisites are met for running the G model, including having sufficient VRAM and system memory.
đ Testing G Model's Capabilities
After setting up the environment, the script moves on to testing the G model's capabilities. The model is prompted with various tasks, including defining happiness, answering a coding question, generating sentences ending with the word 'beauty', and reasoning about a physical scenario involving a ball and a vase. The model's responses are evaluated, with particular attention to its performance in language understanding, common sense reasoning, and reading comprehension. The script also includes a 'jailbreak' question to test the model's ethical guidelines, which it handles appropriately by suggesting legal alternatives to breaking into a car.
đ§ G Model's Ethical and Mathematical Reasoning
The script concludes with further testing of the G model's capabilities, specifically its ethical reasoning and mathematical problem-solving skills. When asked about breaking into a car, the model empathizes and advises against illegal actions, suggesting legitimate solutions instead. A simple math problem is also presented to the model, which it solves methodically, demonstrating its understanding of the order of operations. The speaker expresses admiration for the G model's performance, even with the 6 billion parameter version, and invites viewers to explore the model further through the provided links.
Mindmap
Keywords
đĄG model
đĄFine-tuning
đĄBenchmarking
đĄApache 2
đĄLanguage Understanding
đĄCommon Sense Reasoning
đĄReading Comprehension
đĄVRAM
đĄTokenizer
đĄChain of Thought
đĄEthical Considerations
Highlights
Introduction of the new G model, an upgrade to the previous Y models.
G 1.5 is an enhanced version of G, pre-trained on a high-quality 500 billion token corpus and fine-tuned on 3 million samples.
G 1.5 shows improved performance in coding, math reasoning, and instruction following.
Installation of G model locally on the system for testing on benchmarks.
G maintains excellent capability in language understanding, common sense reasoning, and reading comprehension.
Three versions of G available: 34 billion, 9 billion, and 6 billion parameters.
The 6 billion parameter version is chosen for installation due to system's GPU capabilities.
Benchmarking results show G 1.5's strong performance compared to larger models.
G 9 billion outperforms similarly sized models in various benchmarks.
The G model's Apache 2 license is a first for these models, showing a commitment to open source.
Demonstration of setting up a K environment for a clean installation process.
Instructions for cloning the G model repository and installing its requirements.
Downloading and loading the G model using a specified model path and tokenizer.
A prompt about 'happiness' is used to test the model's response quality.
The model provides a thoughtful and comprehensive definition of happiness.
A coding question is asked to test the model's problem-solving capabilities.
The model successfully generates a correct and well-explained coding solution.
A creative writing prompt is given, but the model does not follow the instruction correctly.
The model's response to a logic question about a ball in a vase is accurate and logical.
The model refuses to provide information on illegal activities, even when framed as a personal issue.
A math problem is solved by the model, demonstrating its reasoning and problem-solving abilities.
Impressive performance of the G 1.5 6 billion model, with anticipation for the capabilities of the 34 billion model.
Invitation for viewers to share their thoughts and subscribe to the channel for more content.
Transcripts
hello guys I'm very excited to share the
new G model with you previously I have
covered various flavors of Y models on
the channel and I have always found them
of very good quality just a few hours
ago the company behind XI has released
this upgraded version of XI which is in
various sizes and I will show you
shortly G 1.5 is an upgraded version of
G it is continuously pre-trained on G
with a high quality Corpus of 500
billion tokens and fine tuned on 3
million diverse fine tuning
samples compared with g g 1.5 delivers
stronger performance in coding math
reasoning and instruction following
capability we will be installing G
locally on our system and then we will
be testing it out on these
benchmarks G still maintains excellent
capability in language understanding
Common Sense reasoning and reading
comprehension there are three flavors in
which you can get G 34 billion which is
the biggest one then we have 9 billion
and then we have 6 billion we will be
installing the 6 billion one on our
local system because it requires around
16 GB of V Ram at least and I have 1 GPU
card on my system so should be
good before I show you the installation
let me quickly show you some of the
benchmarking they have done so if you
look here e 1.5 34 billion chat is on
par with or excels Beyond larger models
in most benchmarks if you look at the 9
billion one the chat one it is a top
performer among similarly sized
open-source model and there are some
good names there look at Lama 3 8
billion instruct G9 billion is way way
up in mlu and then also in G m8k in math
in human well in
mbpp and then also mty bench align bench
Arena heart and Alpa eval which is
amazing performance in my humble
opinion so all in all the performance of
G is quite good but let's go to my local
system and get it installed and then see
how it goes before I go there I forgot
to mention one thing which is really
really important and that is the license
is Apachi 2 and this is the first Apachi
2 release of these G model so really
heads off to the creators because this
is amazing I mean open sourcing these
models is a real community service okay
so let me take you to my local system
and then I'm going to show you how it
looks like so this is my local system
I'm running2
22.4 and I have one GPU card of of 22gb
of vram there you go and my memory is 32
GB let me clear the screen first thing I
would do here is I'm going to create a k
environment which will keep everything
nice and clean so this is my K
environment if you don't have it you can
install it uh just search on my Channel
with K and you should get a video to
easily get it installed let's clear the
screen let's create k requirement so I'm
just calling it G and then I'm using
python
3.11 make sure that you use python 3.10
or more because that is what is required
let's activate this environment I'm
simply activating this Konda activate G
and you will see that g is in
parenthesis here let me clear the screen
next thing I would highly suggest you do
is glit get clone the repo of G and I
will drop the link in video's
description because we will be
installing all the requirements from
there so this is a URL of you simply
just clone it then CD to
it and let's clear the screen and I will
show you the some of the contents of it
now from here all you need to do is to
Simply do pip install requirements.txt
like this and it is going to install all
the requirements which are needed for
you in order to run G model there so
let's wait for it to finish and then we
are we will be installing and
downloading our G
model going to take too long
now all the prerequisites are done took
very bit of time but that is fine let's
clear the screen let me launch python
interpreter and now we can import some
of the libraries which are needed such
as Transformer Auto model for caal and
auto
tokenizer and now let's specify our
model path for model path just go to
hugging face model card of that model
click here at the top where the Appo and
model name is let's go back to the
terminal and simply paste it here and
then close the poopy and then press
enter the model path is
set and now let's specify the tokenizer
with the model path of
course and you can see that tokenizer is
now
set and now let's download our model and
we are simply giving it the model path
because I'm using GPU so I have set the
device map to Auto so it is going to
select our
GPU it has started downloading the model
there are three tensors so make sure
that you have that much space so let's
wait for it to finish downloading and
then we we will prompt
it model is almost downloaded taking a
lot of time today my internet speed is
not that
good and now it is loading the
checkpoints on the shards and that is
done
okay so until this point model download
and installation is good let's specify a
prompt so I'm just defining this list or
array where I'm just prompt is what is
happiness let's
convert this to tokens by using
tokenizer and I'm applying the chat
template tokenize is true and rest of
the IDS are uh I think I missed one let
me put it there because I want to put it
on the P
torch I'm just going to give it this
return tensor as P
torch and let's also put it on
the GAA by generating it from the model
that is done
thankfully and you see you saw that how
quick that was let's get the response
back and decode it and now let's print
the
response there you go because it is just
displaying this one because of I just
put it in the max default Max L 20 so if
you increase it we would be able to see
the proper
response so I have increased some X new
tokens to 512
and now let's generate the response and
print it there you go now we have a full
response and look at the response it
says happiness is a complex and
subjective state of well-being that
involves a sense of contentment
fulfillment and joy it is often
characterized by positive emotions such
as Joy satisfaction and amusement
amazing amazing response very very of
high quality and then ultimately
happiness is a deeply personal
experience that varies from person to
person and it is often seen as desirable
but not always achievable state of being
how good is that
amazing okay so let's ask it a coding
question quickly let me press
enter and then this is a
message let's pass it to our tokenizer
and then I am going to generate the
response that is done
let's generate
the output
and then I'm going to print the
output it take too
long output is there let me print out
the
response and there you go very very
nicely written amazing stuff let me
clear the screen by using the OS
Library okay that is better and now
let's ask it another question so I'm
just check check checking it in language
capability in this one I'm asking it
write 10 sentences ending with the word
beauty so let me pass it through the
pipeline so there you go so I asked it
write 10 sentences ending with the word
beauty but it doesn't really followed
that and I have seen lot of model
failing it's it's not a surprise but I
think it just took it like a beauty uh
related sentence and it has tried to put
the word beauty in every sentence so but
I think it's a fail for this prompt in
my opinion let's clear it let's ask it
another one now if you look at it I'm
asking it I have a vas I put a ball in
it the vas is on the kitchen counter I
turn the vas upside down and then
transfer it to living room coffee table
where is the ball let's see what model
does
the model says the ball would be on the
coffee table in the living room where
you turn the vas upset down and transfer
it from kitchen counter to the coffee
table since theas is upside down and
there's nothing to stop the ball from
rolling out of it the ball would have
fallen out of the V and would be on the
coffee table very very good amazing
stuff let me clear the
screen and let me ask it another
question in this one it's a sort of a
jailbreak question I am just trying to
ask the model to tell me how to break
into a car but I'm asking it tell me how
to break into my own car as I have lost
the car
keys and look at this response so model
is empathizing with me it is saying I'm
sorry to hear that you have lost your
car key breaking into your own car is
not a recommended solution as it can
lead to legal issues damage to your
vehicle and potential theft H and then
it is asking me to Conta locksmith use a
car key extractor tool call a friend
relative uh use a car club wow check
with your insurance consider replacing
your key amazing so because I use the
word breaking into your car then it is
not letting me do it amazing amazing
stuff the god reals are
up okay let's ask it another question
and this is a math question which is a
simple equation as you can see it is not
a hard one but I see there some of the
model struggle but let's see what this
one
does there you go so let's wait for
model to come
back and look at the reasoning and Chain
of Thought So it says to solve this
expression we need to follow the order
of operation which is often remembered
by the
acronym um pem Das parenthesis amazing
yeah
absolutely let a look at the answer
amazing
stuff but I'm not sure what exactly this
means anyway so amazing model really
impressed by G I think G 1.56 billion
and just imagine what would be 34
billions quality I wish I could run it
but I don't have the gpus for it but I
think even 6 billion is awesome I will
drop the link to this model card in
video's description let me know what do
you think if if you like the content
please consider subscribing to the
channel and if you're already subscribed
then please share it among your network
as it helps a lot thanks for watching
5.0 / 5 (0 votes)
New OpenAI Model 'Imminent' and AI Stakes Get Raised (plus Med Gemini, GPT 2 Chatbot and Scale AI)
Ollama Embedding: How to Feed Data to AI for Better Response?
Why & When You Should Use Claude 3 Over ChatGPT
Testing a mini rock crusher
Google I/O 2024 keynote in 17 minutes
AI Portfolio Project | I built a MACHINE LEARNING MODEL using AI in 10 MINUTES