37% Better Output with 15 Lines of Code - Llama 3 8B (Ollama) & 70B (Groq)
Summary
TLDRThe video script details an innovative approach to improving the effectiveness of AI language models when faced with vague user queries. The creator demonstrates a system that uses a rewritten query function to provide more detailed and informative responses. By incorporating relevant context from previous messages, the system generates a more specific query, which leads to more accurate and useful information being retrieved from documents. The creator also discusses the use of the 8B and 70B Llama 3 models and shares their excitement about the potential of these models. The script concludes with a comparison of the rewritten query function's effectiveness, showing an improvement of around 30-50% in response quality.
Takeaways
- 📈 The speaker developed a solution to improve the handling of vague queries by rewriting them to include more context from previous messages.
- 🔍 The AI model, Llama 3, was trained on 15 trillion tokens, which is a significant amount of data equivalent to around two billion books.
- 💡 The rewritten query function was designed to preserve the core intent of the original query while making it more specific and informative.
- ✅ The speaker demonstrated the effectiveness of the rewritten query by comparing the responses from the AI model with and without the rewritten query.
- 📚 The use of JSON was emphasized for structured output, ensuring a deterministic format for the rewritten query.
- 🤖 The AMA chat function was updated to include a rewritten query step for all user inputs after the first, enhancing the context retrieval.
- 🚀 The speaker tested the solution using the Llama 70b model, noting that it provided better rewritten queries and more detailed responses.
- 📝 The speaker mentioned that the rewritten query function improved the response quality by about 30-50% as determined by comparing responses from the AI model.
- 🎓 The video includes a sponsorship for Brilliant.org, a learning platform for math, programming, AI, and data analysis.
- 🔧 The speaker provided a detailed step-by-step explanation of the code and logic behind the rewritten query function.
- 🌟 The speaker expressed satisfaction with the improvements made to the AMA chat function and encouraged viewers to explore and learn from the code.
Q & A
What was the problem the speaker initially wanted to solve?
-The speaker wanted to solve the issue of vague questions not pulling relevant context from documents, which led to less informative responses.
How many tokens was Meta's AI, Llama 3, trained on?
-Llama 3 was trained on 15 trillion tokens.
What does the speaker mean by 'Rewritten query'?
-A 'Rewritten query' is a modified version of the user's original query that incorporates relevant context from the conversation history to make it more specific and informative for retrieving relevant context.
What improvements were made in Llama 3 compared to its predecessor?
-The improvements in Llama 3 include increasing training data code size, support for non-languages, and enhancements in tokenizer capabilities.
How does the speaker's solution handle vague questions?
-The speaker's solution rewrites vague questions by adding more context, which helps in retrieving relevant information from documents even when the original query is not specific.
What is the role of JSON in the speaker's solution?
-JSON is used to structure the output from the solution, ensuring a deterministic and well-organized format for the rewritten queries and responses.
How does the speaker's solution improve responses from the AI model?
-The solution improves responses by rephrasing and expanding vague queries into more specific ones that can pull relevant context from documents, leading to more informative answers.
What is the significance of the 70B model in the speaker's project?
-The 70B model is a larger and more powerful version of the AI model that the speaker plans to use to test the effectiveness of the rewritten query solution on a more complex scale.
What is the 'get relevant context' function in the speaker's project?
-The 'get relevant context' function retrieves relevant information from the knowledge vault based on the rewritten query, which is more specific and informative due to the solution's processing.
How does the speaker evaluate the effectiveness of the rewritten query?
-The speaker evaluates the effectiveness by comparing responses with and without the rewritten query, using GPT-4 to assess which response is better, and conducting multiple tests to get an average improvement percentage.
What is the estimated time it would take for a human to read the equivalent amount of books that Llama 3 was trained on?
-Assuming a human reads one book per week, it would take around 16,500 to 30,000 years to read the equivalent amount of books that Llama 3 was trained on, which is based on 15 trillion tokens.
Outlines
🚀 Introduction to the AI Query Optimization Project
The speaker introduces a problem they aimed to solve regarding AI query handling. They explain their process of feeding information into an AI system, asking questions, and receiving answers. The issue arises when a vague question is asked, and the AI fails to pull relevant context from the documents. The speaker then demonstrates their solution, which involves rewriting queries to provide more context and improve the AI's responses. They also mention testing the solution on different AI models and express satisfaction with the results.
📝 Step-by-Step Explanation of Query Rewriting Process
The speaker provides a detailed walkthrough of how they approached rewriting queries. They discuss the structure of the prompt used for the AI model, emphasizing the importance of using conversation history to improve the query. The process involves receiving user input, parsing it into a dictionary, extracting the original query, constructing a prompt for the AI model, and generating a rewritten query. The rewritten query is then used to retrieve relevant context from a knowledge vault, which is a significant improvement over the original user query.
🔍 Testing and Updates to the AI System
The speaker shares their experience with testing the query rewriting solution and mentions updates made to their GitHub repository. They discuss the use of a different model, the Llama 70b, and the benefits of using JSON for structured output. The speaker also talks about the improvements in the system, such as using a more advanced embeddings model and allowing users to select models from the terminal. They express excitement about the potential of the Llama 70b model and its ability to provide better answers.
🎓 Conclusion and Future Plans
The speaker concludes by summarizing the benefits of using rewritten queries and the effectiveness of the Llama 70b model. They mention conducting tests to compare the quality of responses with and without the rewritten query feature, which showed an improvement of about 30-50%. The speaker thanks the audience for their support, encourages them to star their GitHub repository, and hints at future videos involving more work with the Gro and Llama 70b models, pending resolution of rate limit issues.
Mindmap
Keywords
💡RAG system
💡Tokens
💡Vague question
💡Rewritten query
💡AMA (Ask Me Anything)
💡Llama 3 Model
💡Contextual understanding
💡JSON
💡Brilliant.org
💡Grok and Llama 7B
💡Rate limit
Highlights
The speaker introduces a problem related to handling vague questions in an AI system and presents a solution to improve the system's responses.
The AI system is demonstrated with a question about Meta's AI, Llama 3, and its training on 15 trillion tokens.
A solution is implemented to rewrite vague queries to provide more context and specificity, leading to better responses from the AI.
The speaker shows the AI's improved ability to answer vague questions by demonstrating a rewritten query that fetches relevant context from documents.
The process of rewriting queries is detailed, explaining how it preserves the core intent while expanding on the original query for more specificity.
The use of JSON is highlighted for its role in structuring the output and ensuring a deterministic format for the rewritten queries.
The speaker discusses the AMA chat function and how it's updated to include the new query rewriting feature.
The speaker provides a step-by-step explanation of how the query rewriting process works within the AMA chat function.
A sponsor, Brilliant.org, is introduced for those interested in learning Python and computer science, offering interactive lessons in math, programming, AI, and data analysis.
The speaker shares the GitHub repository link for those interested in the project and its updates.
An update to the system using the Gro and Llama 70b model is mentioned, with a demonstration of its capabilities.
The speaker discusses the improved performance of the rewritten query function, with an estimated 30-50% better response compared to the original query.
A humorous comparison is made between the amount of data Llama 3 was trained on and the equivalent amount of human reading required to achieve similar understanding.
The speaker expresses excitement about the potential of Llama 3 and encourages viewers to explore new ideas for using embeddings and the get relevant context function.
The speaker thanks the audience for their support and invites them to give a star on GitHub if they enjoyed the content.
An upcoming video on Sunday is teased, which will likely feature more on Gro and Llama 70b, subject to rate limit conditions.
The speaker concludes by emphasizing the importance of learning from the project and looking forward to future interactions.
Transcripts
today I'm going to start by showing you
the problem I wanted to solve I want to
show you how I tried to solve it and if
it was a success and then I'm going to
explain it to you so you can understand
it and start using this too so yeah
let's just get started okay so what you
see here is me I have fired up my rag
system so we can start asking questions
about my documents so I fed in some
information about meta's AI llama 3
right I asked the question how many
tokens was llama 3 trained on and we
have the context that is pulled from the
document
and we use that context to kind of
answer llama 3 was preened on 15
trillion tokens okay so far so good
right and here comes kind of my problem
uh it's not a big problem right if you
know what you're doing but what happens
when I say what does that mean a very
vague question right okay so we say that
no we don't pull anything from our
documents right so that means that we
don't have any relevant context to this
problem so this is kind of the problem I
wanted to take a look at today how can
we kind of improve this so yeah I'm just
going to show you how I implemented a
solution for this and how it works okay
so let's fire up the second version here
so this is the version that contains my
solution so we're going to ask the same
question how many tokens was llama 3
trained on uh this is running on the 8B
llama 3 Model on AMA so it's totally
locally right and you can see see the
Llama Tre was trained over 50 million
tokens so pretty much exactly the same
answer as before what if we say what
does that mean so a very vague question
right so what I implemented was this
kind of Rewritten query so we take our
original query and we try to rewrite it
so can you provide more details about
the improvements made in llama 3
compared to its predecessor increasing
training data code size support for non-
languages also how does a tokenizer yeah
blah blah blah so you can see we added
much more context to our query uh just
by putting this through some kind of
solution I'm going to show you and now
you can see we get context pull from the
documents even though our query was kind
of the same and yeah you can see we get
a pretty good answer here I'm not going
to read it but you can pause and read if
you want to so yeah I'm pretty happy how
this works out this is of course not
like a perfect solution but uh for me
this has improved the responses a bit at
least in this very small model I haven't
tried it too much we're going to try it
on the 70b model later uh this video but
for now yeah I'm pretty happy with this
so uh I think we're just going to head
over and try to explain how this works
because a lot of you enjoy that in the
previous video going a bit deeper into
the code and explaining the logic and
kind of how this works so yeah let's do
that but first let's say you are one of
those that wants to learn more about
about Python and computer science then
you should really pay attention to
today's sponsor brilliant have you ever
wondered how to make sense of fast
amounts of data or maybe you're eager to
learn coding but you don't know where to
start well brilliant.org the sponsor of
today's video is the perfect place to
learn these skills Brilliance is a
learning platform that is designed to be
uniquely effective their interactive
lessons in math programming Ai and data
analysis are created by a team of
award-winning teachers professionals and
researchers if you looking to build a
foundation in probability to better
understand the likelihood of events the
course introduction to probability is a
great place to start you work with real
data set from sources like Starbucks
Twitter Spotify learning to parse and
visualize massive data sets to make them
easier to interpret and for those ready
to level up their programming skills the
creative coding course is a must you'll
get familiar with python and start
building programs on day one learning
essential coding elements like Loops
variables nesting and conditionals what
set brilliant apart is that it helps you
build critical thinking skills to
problem solving not just memorizing so
while you are getting knowledge on
specific topics you're also becoming a
better Tinker overall to try everything
brilliant has to offer for free for 30
days visit brilliant.org allout AI or
just click the link in the description
below you will also get a 20% of an
annual premium subscription a big thanks
to brilliant for sponsoring this video
now let's go back to the project okay so
you can see from the code here these
lines here and a few lines down here in
our AMA shat function was kind of all I
added to try to solve this problem if
you can call it a problem uh but yeah
that was something I wanted to try to do
so uh I'm just going to explain quickly
how this works or not quickly I'm going
to go into a bit of a detail but you can
see we have a pretty long prompt here so
I'm going to blow this up so you can see
it better and then we're going to move
on and kind of go through step by step
now how this actually works so yeah
hopefully you can learn learn something
about this I want to start by explaining
how I thought about this prompt we used
for this so uh basically I'm just going
to go through it and explain it so you
can see rewrite the following query by
incorporating relevant context from the
conversation history so we are actually
using bits from our conversation history
the two previous messages to try to
improve our query so the Rewritten query
should preserve the core intent and
meaning of the original query expand and
clarify the query to make it more
specific and informative for retrieving
relevant context avoid introducing new
topics and queries that deviate from the
original query don't ever answer the
original query but instead focus on
rephrasing and expanding it into a new
query return only the Rewritten query
text without any additional formatting
or explanations then we're going to pass
in our context so we're going to pass in
our true previous messages then we're
going to pass in our original query from
the user input and then we are want our
Rewritten query as the output right and
that is kind of how I set this prompt up
so of course this is important uh but we
are using some help from Json here to
get the structure output we want so that
is kind of what I wanted to explain to
you in this stepbystep process here okay
so let's start on step one here so
receive user input Json the function
receives a Json string containing user's
original query for example query and
what does that mean right how this is
being put into a rewrite query function
is in the AMA shat function here so if
we take a look here so I kind of set
this up so the first query we make to
our rag system is not going to have a
Rewritten query because I found out that
was pretty stupid we don't need that but
from the second query we put in
everything is going to be Rewritten so
you can see here is our function and
we're going to pass in this Json here
that comes from this one right so we can
see we have the user input here okay so
in step two here we're going to par the
Json to a dictionary so the Json string
is converted to a python dictionary
using Json loads so this could for
example be user input equals and then we
have a query and uh yeah the parameter
for the query could be what does this
mean okay and then we kind of move on to
step three that is going to be
extracting the original query from this
python dictionary so let's say we have
this as a python dictionary now we kind
of want to grab the query right so the
user input now is equal to what does
that mean because we grabbed it from
this python dictionary up here right The
Next Step then is going to be step four
and this is preparing the prompt for the
AI model a prompt is constructed that
includes a conversation history and
instruction for rewriting the query we
already took a look at that right up
here so we kind of know how this prompt
Works uh and you can see in Step file we
want to call our AI model so in this
case this is AMA running on llama 3 uh
is called with a prepared prompt the
model generates a Rewritten version of
the query and if we move on to step six
that is going to be extracting the
Rewritten query so the Rewritten query
extracted from the models response if
you take a look at the code here that is
going to happen yeah here right so
Rewritten query and we have the response
from the model right we feed in our
prompt and we kind of get this Json dump
Rewritten query out and here we pass in
our Rewritten query from the response
from the model right and that is of
course going to be step seven so return
Rewritten query in Json a new Json
string is constructed containing the
Rewritten query and return to the
calling function so this could be for
example Rewritten query right like we
saw down here Rewritten query and we can
maybe the parameters is going to be what
does it mean that llama 3 has been
trained on 15 trillion tokens and that
means that we're ready for kind of our
final step and that is going to be to
feed this Rewritten query back to uh or
to the get re elant context function
down in our AMA chat function right so
you can see Rewritten query here and
this is going to be fed back into the
get relevant context right so if we go
down here you can see we are feeding a
relevant query here into the get
relevant context function and we are
skipping all together uh the user the
original user input or the original user
query quy is not going to be fed into
the relevant context so we only going to
pass in a Rewritten query right so you
can see the Rewritten query is passed to
get to the get relevant context function
which retrieve relevant context from the
knowledge Vault based on the Rewritten
query and that is kind of how I set this
up so uh like I said the original user
query here is not going to be even
consideration even though we print it uh
that is just to compare it
if we take a look here you can see we
print it here but we are not going to
pass it into any functions so we only
going to print it so we can kind of
compare them here side by side just for
fun I guess so yeah that is kind of how
I set this up and so far I'm been pretty
happy with it I hope it was uh okay to
kind of understand how this works so it
really helps using Json because that
gives us a more deterministic output
so we will always kind of get this very
structured form and if I tried to use I
tried to not use Json but that was not a
great success but you can try that if
you want to uh but for me this has been
working pretty okay so this code is
going to be H kind of an extension of my
the GitHub repo you can see on the
screen here which is the super easy 100%
local uh AMA rag uh so we made some
updates uh what we updated was we are
using the dolphin tree llama model now
uh we have an update that we change our
embeddings models so we are actually
using a AMA embeddings model now and
that has been working out pretty good we
have a few other updates so we can kind
of pick our model from the terminal line
if we want to do that yeah just some
issues we had that I got on the GitHub
that I have implemented uh of course
this is just a a layout so you can do
whatever you want with this you can find
the link in the description I'm probably
going to put put this video up and
explaining all the updates to the code
here so this should be code should be up
now so you can start playing around with
it and yeah I hope you kind of enjoyed
it so I kind of wanted to finish this
video by I created a local rag version
here using Gro and the Llama 70b model
uh I was supposed to do a video today
using grock and llama 7B but I had so
many issues with the rate limit so I had
to skip it that might be for Sunday we
will see but let's just finish up this
video by testing this using the gro and
the lava 70b so yeah this is basically
exactly the same uh I found that the
Rewritten queries were a bit better so
let's just try the same questions so how
many
tokens was this you can see it's pretty
fast though we're running right uh okay
so let's do what does that mean okay so
let's take a look at the Rewritten query
here what does that mean what does it
mean that llama 3 was trained on
enormous data set uh equivalent of two
to billion
books 13 tokens what is the impact
moldability so you can see yeah this is
a much better Rewritten query right this
is good so let's see the answer
here uh okay here is a breakdown of what
it means 15 trillion tokens refers to
the massive amount of data using okay T
stands for
trillions uh okay so this is great
tokens are indidual unit of text such as
words and characters the mother train a
huge data set wow this is good right so
we got all of this just by asking what
does this mean so you can see how
actually how good this Rewritten query
is and of course the better model we are
going to use the better answer we are
going to get right in summary llama 3 is
a highly adapt highly Advanced language
model trained on enormous data set with
focus on simplistic scalability and high
quality
data uh let's do wow
that's crazy how many books must a human
read to be this smart that's a bad
question um what's the equivalent amount
of human reading in terms of number of
books that would be required to achieve
the same L understanding knowledge L Tre
train 15 trillion tokens of data again a
very good Rewritten Qui if you ask me uh
what a question to put it at scale and
it goes
into uh okay so let's
say to read
330,000 to 600,000 books it would take
around
16,500 to 30,000 years assuming one book
per week uh around 15,000 years assuming
two bucks per week of course this is a
rough estimate them meant to be humorous
this model is so good so yeah you can
see we have to read uh around how many
books was it 600,000 books to be this
smart so yeah uh so I I think this kind
of shows how good this uh Rewritten
query kind of is and yeah how good the
70b model is so really excited about
llama Tre uh hope you found this
enjoyable hope you learned something
from it it that's the most important
thing the result doesn't matter too much
but maybe this give you some new ideas
how you can use embeddings to kind of
improve stuff how you can use uh the get
relevant context function to do other
stuff so yeah so I guess a lot of you
wondering where I got the 30% better
response from so what I did is I took
one response uh without the rewrite
query and I took like a second response
with the rewriting query function and I
asked the first GPT 4 to compare them
and I asked a lot of times and most of
the times it came in between 30 and 50%
better response than response one so
response two is the one with the rewrite
query and I did the same on Opus and
here it always landed in 30 40% better
than response one so response two that
was with the uh yeah reite query
function so yeah that is where I got the
37% from just want to say big thank you
for the support lately it's been awesome
and give the GitHub a star if you
enjoyed it other than that come back for
Sunday probably going to do more Gro and
llama 70b if the rate limit is okay have
a great week and I see you again yeah
Sunday
5.0 / 5 (0 votes)
Why & When You Should Use Claude 3 Over ChatGPT
How we built a 1000x Growth Product in 1 Year | Perplexity AI, Aravind Srinivas
Mark Zuckerberg - Llama 3, $10B Models, Caesar Augustus, & 1 GW Datacenters
Google I/O 2024 keynote in 17 minutes
Kobo Libra Colour REVIEW: The King of E-Readers?!
Apple’s iPad event in 12 minutes