AI Agent Automatically Codes WITH TOOLS - SWE-Agent Tutorial ("Devin Clone")
Summary
TLDRThe video introduces a new coding assistant called SWE Agent, developed by a team at Princeton. It specializes in fixing real-world bugs on GitHub by analyzing issue URLs, replicating the issue, and submitting a fix as a pull request. The SWE Agent has impressively achieved near performance to Devon, another AI model, using GPT-4. The project stands out for its language model-centric commands and feedback format, which simplifies code browsing, editing, and execution. It also includes features like a linter, a custom file viewer, and a directory string search command. The video provides a detailed demonstration of the installation process and showcases the agent's ability to resolve an issue from its own repository.
Takeaways
- ð Introduction of a new coding assistant, the SWE Agent, developed by a team at Princeton, specializing in software engineering language models.
- ð The SWE Agent has quickly gained popularity, with 3,500 stars on GitHub just days after its release.
- ð SWE Agent's unique capability to fix real-world bugs and issues on GitHub by analyzing the issue URL, replicating the issue, and submitting a fix as a PR.
- ð Comparative performance of SWE Agent using GPT-4, showing a 12.29% success rate in the SWE Bench test, nearly matching the performance of Devon.
- ð ïž The project's innovation lies in its simple language model-centric commands and feedback format, facilitating easier code browsing, editing, and execution.
- ð§ SWE Agent includes a linter that ensures syntactical correctness before code edits are applied, and a custom file viewer for the model.
- ð The agent features a file editor with scrolling and search capabilities, essentially providing an LLM with its own custom IDE.
- ð A full directory string searching command is supplied for efficient codebase navigation and match listing.
- ð Easy installation process with Docker and Miniconda setup, streamlined for user convenience.
- ð» The SWE Agent comes with a pre-configured conda environment, reducing the complexity of environment and dependency management.
- ð A live demonstration by one of the authors showcases the end-to-end process of resolving a GitHub issue using the SWE Agent.
Q & A
What is the SWE Agent and how does it differ from other coding assistants?
-The SWE Agent is a special coding assistant developed by a team at Princeton. It stands out from other coding assistants because it specializes in fixing real-world bugs and issues on GitHub. Given a GitHub issue URL, it can understand the problem, replicate the issue, fix it, and submit the fix as a Pull Request (PR). It has been designed with language model-centric commands and a feedback format to efficiently browse, view, edit, and execute code files within a repository.
How does the SWE Agent perform in comparison to other models like Devon?
-The SWE Agent has shown impressive performance, nearly matching that of Devon. In the SWE Bench test, Devon achieved a 13.84% success rate, while the SWE Agent, using GPT-4 and despite being newly released, scored 12.29%. This demonstrates the Agent's capability to effectively address issues in software engineering tasks.
What features does the SWE Agent have that facilitate its operation?
-The SWE Agent includes several features such as a built-in linter that checks code syntax before edits are made, a custom file viewer that displays up to 100 lines at a time for better comprehension, a file editor with scrolling and search commands, and a full directory string searching command. These features collectively provide the language model with an environment similar to a custom IDE, enhancing its ability to understand and work with large codebases.
How can one install and set up the SWE Agent for use?
-To install the SWE Agent, one needs to install Docker and Miniconda first. Then, users should clone the SWE Agent repository, create a conda environment using the provided environment.yml file, activate the environment, and run the setup script to build the Docker image. After these steps, users can run the SWE Agent using the provided command in the project's directory.
What was the issue that the SWE Agent attempted to solve in its own repository?
-The issue that the SWE Agent attempted to solve in its own repository was related to a key error on the base commit. The error was identified in the 'reset' method of the SWEn.py file, where the 'base commit' was not included in 'self.record' before the line of execution. The SWE Agent managed to locate the issue and suggested adding a check to include 'base commit' before the problematic line.
How does the SWE Agent handle large codebases?
-The SWE Agent is designed to handle large codebases by using a combination of a custom file viewer, which displays a limited number of lines at a time, and a file editor with search capabilities. This approach allows the language model to focus on specific parts of the codebase without being overwhelmed by its size. Additionally, the SWE Agent uses language model-centric commands and feedback to navigate and understand the codebase more effectively.
What is the significance of the SWE Agent's ability to understand and fix GitHub issues?
-The SWE Agent's ability to understand and fix GitHub issues is significant because it demonstrates the growing capability of AI in software engineering tasks. By automatically identifying, reproducing, and fixing bugs, the SWE Agent can save developers time and effort, potentially reducing the occurrence of human error and improving the overall quality and efficiency of software development.
How does the SWE Agent ensure syntactically correct code edits?
-The SWE Agent ensures syntactically correct code edits by incorporating a linter that runs every time an edit command is issued. The edit command will not proceed if the code is not syntactically correct, thus maintaining code quality and preventing the introduction of new errors.
What is the role of the environment variables in using the SWE Agent?
-Environment variables play a crucial role in configuring the SWE Agent. Users need to provide their GitHub personal access token, and optionally, their OpenAI, Anthropic, and Togther API keys. These tokens are required for the SWE Agent to access the necessary resources and services to perform its tasks, such as browsing and editing code on GitHub.
What is the cost limit feature in the SWE Agent and how does it work?
-The cost limit feature in the SWE Agent is a setting that allows users to define a maximum cost for using the GPT-4 model. This is important for managing expenses, as the use of AI models can incur costs. In the example provided, the default cost limit was set at $2, but users can adjust this limit according to their needs, up to a certain extent, to ensure that the Agent's operations do not exceed their budget.
What are some potential future improvements for the SWE Agent?
-Potential future improvements for the SWE Agent could include allowing the use of local models to eliminate costs, enhancing the model's ability to understand and navigate even larger and more complex codebases, and possibly integrating with more development tools and platforms to streamline the software development process further.
How does the SWE Agent handle the process of solving an issue from start to finish?
-The SWE Agent handles the process of solving an issue by first reproducing the bug, then searching the repository for the relevant code, analyzing and identifying the problem, generating and applying an edit to fix the issue, retesting the code to confirm the fix, and finally submitting the solution as a PR. This end-to-end process is designed to mimic the workflow of a human developer, providing an automated and efficient approach to resolving software issues.
Outlines
ð Introduction to the SWE Agent
The video introduces a new coding assistant called SWE Agent, developed by a team at Princeton. It is described as a special tool due to its ability to perform nearly as well as Devon, another renowned coding assistant. The SWE Agent has quickly gained popularity, amassing 3,500 stars on GitHub within days of its release. The project specializes in fixing real-world bugs and issues on GitHub by analyzing the issue URL, replicating the issue, and submitting a fix as a pull request (PR). The SWE Agent's performance is highlighted by its results on the SWE Bench test, where it scored 12.29%, nearly matching Devon's 13.84%. The project's success is attributed to its design of simple language model-centric commands and feedback formats, which facilitate easier code browsing, editing, and execution for the language model.
ð ïž Features and Installation of SWE Agent
The SWE Agent comes with several notable features, including a built-in linter that ensures syntactical correctness before code editing, a custom file viewer that displays 100 lines at a time for optimal performance, and a file editor with scrolling and search capabilities. Additionally, the tool includes a full directory string searching command and provides feedback when commands have no output. The installation process for SWE Agent is straightforward, requiring the installation of Docker and miniconda. The project provides a Docker image and a conda environment file (environment.yml), simplifying the setup process. However, the video creator encounters an error related to miniconda on Apple silicon but successfully resolves it by switching to a different environment (Lightning).
ð Live Demonstration of SWE Agent
The video concludes with a live demonstration of the SWE Agent in action. The creator, Carlos, shows how the agent resolves an issue from a GitHub repository. The SWE Agent reproduces the bug, identifies the problem in the code, and applies a fix. The edited code is tested, and the issue is resolved without breaking any existing tests. The demonstration highlights the agent's ability to understand and manipulate code effectively. The video creator expresses excitement about the advancements in AI coding helpers and encourages viewers to like and subscribe for more content.
Mindmap
Keywords
ð¡coding assistant
ð¡GitHub
ð¡GPT-4
ð¡bug fixing
ð¡language model
ð¡performance benchmark
ð¡open source
ð¡Docker
ð¡miniconda
ð¡IDE (Integrated Development Environment)
ð¡pull request (PR)
Highlights
Introduction of a new coding assistant called SWE-Agent, developed by a team at Princeton, specializing in fixing real-world bugs on GitHub.
SWE-Agent automates the process of bug fixing by taking a GitHub issue URL, replicating the issue, fixing it, and submitting a PR.
Comparison of SWE-Agent's performance with Devon, highlighting its competitive edge with a 12.29% benchmark score using GPT-4.
SWE-Agent utilizes language model-centric commands and feedback formats to navigate and manipulate code repositories effectively.
The implementation of a custom file viewer and editor within SWE-Agent to handle code files efficiently, showing just 100 lines at a time.
Integration of a linter in SWE-Agent to ensure syntactical correctness before any code edits are applied.
Use of Universal C tags in another project, AER, as a benchmark for effective large codebase management.
Step-by-step installation guide for SWE-Agent using Docker and Miniconda, emphasizing the ease of setup.
Troubleshooting installation issues on MacOS with Apple Silicon, indicating compatibility challenges.
Adoption of alternative platforms like Lightning.a to overcome installation hurdles, showcasing flexibility in setup environments.
Demonstration of SWE-Agent fixing an issue within its own repository, illustrating self-referential debugging.
Introduction of a cost control feature in SWE-Agent, allowing users to set a spending limit for operations.
Prospective integration of local model support in future versions of SWE-Agent to eliminate operation costs.
A full end-to-end demo by one of SWE-Agent's authors, resolving a GitHub issue and preparing a fix, highlighting practical application.
Successful resolution of a coding issue by SWE-Agent, confirmed by S bench tests that ensure the fix does not break existing functionality.
Transcripts
we have a brand new coding assistant it
feels like every day we're getting a new
one but this one is special this is
called s we- agent and it is out of a
team at Princeton and it describes
itself as agent computer interfaces
enable software engineering language
models what does that actually mean so
what makes this special and it is
absolutely blowing up it's already at 3
and a half thousand stars and it was
just released a few days ago but what
makes this special is the fact that it
performs nearly as good as Devon so what
this project specializes in is fixing
realworld bugs and issues on GitHub so
you basically just give it a GitHub
issue URL it finds out what's going on
replicates the issue fixes it and
submits a fix as a PR it is really
impressive and check this out look at
this performance so this is the swe
bench test performance and everybody saw
that Devon had a 13 14.84% which again
it was being compared against kind of
just core models and not actually like
multi-agent Frameworks and but that
aside it performed really well
13.84% now with swe agent using GPT 4
open source
12.29 and again it just came out so very
very impressive and nearly as good as
Devon already and here's an example of
what it looks like back and forth so you
basically give give it a GitHub issue
and it says our reproduction script
confirms the issue reported men and Max
are not being converted to R so then it
searches the files for anything related
to R finds the issues then the
responsible file is likely R code. piy
we should open and inspect it it does
that then it makes the necessary changes
it is so cool and here's really why this
project is so special we accomplished
these results by designing simple
language model Centric commands and
feedback format to make it easier for
the LM to browse the repository view
edit and execute code files now that is
something that many projects don't do
very well if you have an existing
codebase a large codebase it is very
difficult for a language model to
understand the entire codebase and even
understand parts of it because each part
is interconnected with other parts of a
large codebase the only project I've
really seen do this very well is AER a i
d r and that's because it uses Universal
C tags which is essentially a way to
give computers a really easy way to
search through large code bases so here
are the features that it has they added
a linter that runs when an edit command
is issued we do not let the edit command
go through if the code isn't
syntactically correct so that's
fantastic we Supply the agent with a
special built file viewer instead of
just having a cat file so they actually
have a custom file viewer for the model
we found that this viewer works best
when displaying just 100 lines in each
turn very interesting so I'm not always
convinced that providing a language
model with just a snippet of code is
enough for it to understand the broader
context of the code but apparently it's
doing it pretty well the file editor
that we built has commands for scrolling
up and down and for performing a search
within the file so you're basically
giving an llm its own custom IDE and
that's kind of cool as a concept we
Supply the agent with a special built
full directory string searching command
we found that it's important for this
tool to succinctly list the matches and
when commands have an empty output we
return the message saying your command
ran successfully and did not produce any
output and yeah you can install it right
away so I'm going to show you how to
install it and then I'm going to show
you a demo so the first thing you're
going to need to do is install Docker
and we've been doing that a bunch lately
so go ahead click on that link you're
going to open up this page docs.
deer.com engine SL install you're going
to find the relevant Docker desktop app
for your operation system download it
and install it and when it's up and
running you're going to see this little
Docker icon in your taskbar then you
need to install miniconda so something
we use quite often on this channel and
if you don't already have it click this
link and you can download the installer
right here so Windows Mac OS and Linux
so download the relevant one for your
operating system once again install it
restart your terminal if you have to
next open up visual studio code we're
going to click this button in the top
right to toggle the panel which opens up
our terminal then we're going to CD to
the desktop or wherever you like to
store your new projects then switch back
to the GitHub repository we're going to
look for this green code button we're
going to click it and then we're going
to click this copy URL to clipboard
button right there copying the GitHub
URL switch back to VSS code and we're
going to type git clone and then thatwe
agent URL hit enter okay now we're going
to CD into it so CD sw- agents now
here's something cool which this project
did that I really like it actually comes
with a cond environment so it actually
will just set up the cond environment
for us so let's do that so if we switch
back we need to just type cond EnV
create DF environment. yml and this
environment. yml has the definition of
our environment that's necessary so it
just reduces the amount of guess work so
go ahead and click enter all right now
that that's done we're going to
highlight this right here to activate
the environment copy paste cond to
activate sw- agent hit enter okay so now
it's activated we can see so right there
next we need to run the setup script so/
setup.sh hit enter and this is going to
build the docker image so again between
Docker and cond kind of coming out of
the box with this project I really
appreciate how easy they're making this
it really reduces the headache of python
environment management package
management dependencies Etc so to the
authors of this project thank you thank
you thank you I hope more projects do
this all right well here's something
funny I know I said that it was going to
be a lot easier because it comes with
cond and Docker already ready to go but
I was wrong I can't get past this error
right here something with miniconda
something having to do with my Mac OS
being on Apple silicon I've tried a few
things and I don't know how to fix it so
if you do know how to fix this drop a
comment below but what I did is I
switched over to lightning. a now now
this video is not sponsored by
lightning. a but it just made it so much
easier it comes with Docker
pre-installed it comes with cond and I
simply followed the same steps and now
the docker image is created okay so now
it's done so all those previous steps
still work just follow those and now I'm
starting from here within lightning okay
then it says to create a keys file so
we're going to do that right click over
here new file keys. CFG hit enter then
we're going to paste in these four
environment VAR variable so we have the
GitHub token which is required we have
the open AI key the anthropic key and
the together key all of which are
optional we're going to be using open AI
today so for the GitHub token so I got
my GitHub personal access token and
there's instructions for how to do it
just Google it it's a few steps but it's
pretty straightforward paste it in here
just like so then I grabbed my open AI
API key and pasted it in just like so
all right so we have both of those and
we should be ready to to go now okay so
I think I actually put this Keys file in
the wrong place so I'm going to go ahead
and drag it into swe agents so now it is
in the main part of the swe agent folder
just like that now apparently to get it
running we just have to run this command
so python run.py we provided a model
name we have the data path which is the
issue that we're looking to fix and the
config file which is already set up for
us so I'm going to copy this command I
paste it in the command right here just
so I can edit it easily and we're going
to try something very meta now there's
actually an issue on swe agent here it
is so run replay. piy misses record base
commit etc etc I'm going to give it an
issue from its own repository so I'm
going to come up here I'm going to copy
the issue switch back to lightning I'm
going to highlight this issue right here
switch it out for this swe agent issue
highlight the whole thing cut come down
here paste and then hit enter and let's
see if this works okay and environment
initialized loaded data set beginning
task trying to clone from nonm Mirror
now I really want to plug grock into
this so it's super fast but for now
we're going to have to deal with GPT for
speeds okay we got a warning that seems
okay initializing agent settings for
container I mean everything seems to be
working you're going to solve this issue
on your own your terminal session has
started and you're in the repository etc
etc thoughts the issue indicates that
there is a key error for for base commit
okay that is correct so it got the issue
well here's the code and remember it has
about a 12% chance of solving the issue
but the fact that it's getting this far
is just super impressive so now that
we've located the S SW en. file let's
open it to inspect the code and
understand how it works now the file is
quite large 68 668 lines and total the
error message from the issue points to a
key error okay let's use the goto
command so that's the command that is
built into swe agents and we're moving
to line 300 so there it is okay we
scrolled to line 300 but have not yet
reached the reset method so now let's
try 500 so it's basically trying to
search for that code we've scrolled
further but we still have not reached it
let's keep scrolling we have reached the
end but not but have not found the reset
method so now it Scrolls back up to 400
it's just looking for it but if for some
reason it can't find it interesting so
it's basically scrolling up and down the
file just like any engineer would and
it's trying to find this Method All
right we have found the reset method
starting on line 108 yes good to fix
this we need to ensure that base commit
is included in self. record before this
line is executed one way to do this
would be to add a check Etc so I think
this is the updated code okay the edit
has been successfully applied but
there's a duplicate setting and it's
fixing it I mean this is so cool all
right o cost limit exceeded interesting
so there must be a setting somewhere
where you can actually limit the cost of
gp4 and so it already exceeded $2 so
fine thank you for stopping it there but
it seems like it was well on its way to
solving this issue and yep right there
we can actually set the cost limit so
that's really cool so it is set by
default to $2 we could set it at $10 if
we want and so on so very very cool I
want to power this with a local model
because then it won't cost anything so
there doesn't seem to be a very
straightforward way to use a local model
but I suspect that with a little bit of
effort we could get it to work and I bet
that they're going to allow it in future
versions since this has only been out
for a couple days so overall very cool
project now let me show you a full demo
so I'm about to show you a full demo end
to endend by one of the authors solving
a GitHub issue and preparing a fix for
it hey my name is Carlos and today I'm
going to show you an example of sweet
agent resolving an issue from GitHub so
we'll be looking at this issue from
simpai which is an instance in sbench
and we see that the user is reporting at
this problem where this Matrix operation
call insert is producing some unexpected
output so it looks like a
straightforward issue we'll copy this
GitHub URL and send this over to theu
agent um run script and once that's
going we can uh we we wait for about a
minute or two
but we can look at an example that ran a
bit earlier so here we have sweet agent
trying to resolve this issue and it
starts off by reproducing the the bug
that's reported which is always a good
first step so it copies the code from
that issue into a new file called
reproduce bug and after running that we
see that we have the same results that
are uh reported in the issue with this
problem being present here at the bottom
so now that we've confirmed that the
issue is a problem is still a problem
we can search the uh search the the
repository for this call insert function
to see where it might be defined and the
model thinks that it is defined in this
common. piy file so we open this common.
piy file in the file
editor and we can look at the different
functions that are present and we
identify the eval call insert as being a
particular function of Interest so we
scroll that into view down on line
81 and after analyzing the code a little
bit the model realizes that there's a
problem with the indexing for uh those
the values in this Matrix operation so
we generate an edit which is then
applied again to this function which can
be seen after here between lines 87
through 89 and we go back to our
reproduction code to run that again and
see how the output has changed and we
see here that the output is actually uh
represents the expected result so it
looks like uh the issue is resolved and
and we clean up our workspace by
removing that file and finally submit
what we think is the right solution so
that produces this diff that we can
evaluate with sbench and after testing
on S bench we find that this submission
passes the initial test and it doesn't
break any of the existing tests so we
can Market result all right so that's it
this is incredible I'm so excited to see
all of the progress on AI coding helpers
if you liked this video please consider
giving a like And subscribe and I'll see
you in the next one
5.0 / 5 (0 votes)
Google Releases AI AGENT BUILDER! ð€ Worth The Wait?
ðã2024ææ°ãClaude 3æä¹æ³šåïŒæ³šåClaude 3äžæ¬¡æåãClaude AIææ°æ³šåæçšïŒå šé¢è¶ è¶GPT-4ãGemini UltraçClaude 3 AIæä¹çš | æ°åç§æ°LC
AI Portfolio Project | I built a MACHINE LEARNING MODEL using AI in 10 MINUTES
Ollama Embedding: How to Feed Data to AI for Better Response?
Why & When You Should Use Claude 3 Over ChatGPT
ChatGPT Can Now Talk Like a Human [Latest Updates]