AI Agent Automatically Codes WITH TOOLS - SWE-Agent Tutorial ("Devin Clone")
Summary
TLDR视频介绍了一款名为Swe-Agent的新型编程助手,由普林斯顿大学团队开发,专注于修复GitHub上的现实世界bug。Swe-Agent通过GPT-4技术,能够理解GitHub问题URL,复制问题,修复并提交PR。它具备代码编辑器、文件查看器等功能,并通过Docker和conda简化安装过程。尽管遇到了一些技术障碍,但Swe-Agent在解决编程问题上展现出了巨大潜力。
Takeaways
- 🌟 介绍了一种全新的编程助手SWE Agent,由普林斯顿大学团队开发,专注于修复GitHub上的现实世界中的bug。
- 🚀 SWE Agent在短时间内获得了大量的关注和高星级评价,表明其性能和潜力受到认可。
- 🔍 SWE Agent通过分析GitHub问题URL,复制问题,修复并提交PR,展示了其强大的问题解决能力。
- 📈 与现有的编程助手相比,SWE Agent在性能测试中接近Devon,显示出其高效的问题解决能力。
- 🛠️ SWE Agent通过设计简单的语言模型中心指令和反馈格式,简化了语言模型浏览代码库、编辑和执行代码文件的过程。
- 📋 项目特点包括运行编辑命令时的linter、特殊的文件查看器、文件编辑器以及全目录字符串搜索命令。
- 🔗 安装过程简化,通过Docker和conda环境,减少了Python环境管理和依赖性问题。
- 🎥 视频中展示了一个完整的demo,作者之一Carlos通过SWE Agent解决了一个GitHub问题并准备了修复方案。
- 🔎 SWE Agent在解决GitHub问题时,能够复制问题代码,定位问题所在,并进行有效的修复。
- 📊 通过测试,SWE Agent提交的修复方案通过了SWeBench测试,证明了其解决方案的有效性。
- 💡 尽管SWE Agent目前还不支持本地模型,但未来版本可能会增加这一功能,进一步提升用户体验。
Q & A
SWE Agent是什么?
-SWE Agent是一个由普林斯顿大学团队开发的新型编程助手,专注于修复GitHub上的现实世界中的bug和问题。
SWE Agent如何工作?
-你只需要给SWE Agent一个GitHub问题链接,它就能找出问题所在,复制问题,修复它,并作为PR提交修复。
SWE Agent的性能如何?
-SWE Agent的性能非常出色,使用GPT-4模型,在SWE Bench测试中达到了12.29%的修复成功率,与Devon的性能接近。
SWE Agent如何理解大型代码库?
-SWE Agent通过设计简单的语言模型中心命令和反馈格式,使LM更容易浏览代码库、查看、编辑和执行代码文件。
SWE Agent有哪些特点?
-SWE Agent添加了语法检查、特殊的文件查看器、文件编辑器和全目录字符串搜索命令。它还提供了一个内置的conda环境,简化了Python环境管理和依赖项管理。
如何安装SWE Agent?
-首先需要安装Docker和miniconda,然后通过VS Code克隆SWE Agent的GitHub仓库,并按照仓库中的指示设置conda环境和运行安装脚本。
SWE Agent在处理自己的GitHub问题时表现如何?
-SWE Agent能够成功识别并修复自己的GitHub问题,展示了其在查找和修复代码问题方面的能力。
SWE Agent在解决GitHub问题时的流程是怎样的?
-SWE Agent首先复制问题中的代码以重现bug,然后在代码库中搜索相关函数,分析代码并生成修复,再次运行代码以验证修复,最后提交修复作为PR。
SWE Agent在运行时遇到了哪些问题?
-在运行过程中,SWE Agent遇到了Miniconda在Apple Silicon上的兼容性问题,以及在尝试解决GitHub问题时的成本限制问题。
是否有可能使用本地模型运行SWE Agent?
-目前SWE Agent主要依赖在线模型,但未来版本可能会支持本地模型,这需要一些努力和优化。
SWE Agent的成本限制是如何设定的?
-SWE Agent的成本限制默认设置为2美元,但如果需要,可以设置更高的限额,例如10美元。
Outlines
🤖 介绍全新代码助手SWE-Agent
本段落介绍了一款名为SWE-Agent的全新代码助手,由普林斯顿大学团队开发。该助手专注于修复GitHub上的现实世界中的bug和问题,用户只需提供GitHub问题链接,SWE-Agent便能自动发现问题、复制、修复并提交修复作为PR。该工具在短时间内获得了大量的关注和好评,其性能与Devon相当,使用GPT-4开源版本,在SWE Bench测试中表现接近Devon。此外,SWE-Agent通过设计简单的语言模型中心命令和反馈格式,使语言模型更容易浏览代码库、查看、编辑和执行代码文件。
🛠 SWE-Agent的安装与使用
本段落详细描述了SWE-Agent的安装和使用过程。首先需要安装Docker和Miniconda,然后通过GitHub下载SWE-Agent项目。创建并激活conda环境,运行设置脚本构建Docker镜像。遇到Miniconda安装问题时,作者转而使用Lightning环境,该环境预装了Docker和conda。接下来,创建keys文件并输入GitHub token和OpenAI API key。最后,通过运行python run.py命令启动SWE-Agent,并尝试解决自身仓库中的一个问题,展示了SWE-Agent的工作流程和能力。
🌟 SWE-Agent的现场演示与问题解决
本段落通过作者Carlos的现场演示,展示了SWE-Agent如何解决GitHub上的一个实际问题。Carlos从一个名为simpai的项目中选取了一个关于矩阵操作的问题,并通过SWE-Agent进行修复。SWE-Agent首先复制问题代码进行复现,确认问题存在,然后在代码库中搜索相关的函数定义,分析并找到问题所在,生成并应用修复代码。通过再次运行复生产生代码验证修复效果,并最终提交修复方案。整个演示过程展示了SWE-Agent的强大功能和对AI编程助手领域进步的激动之情。
Mindmap
Keywords
💡编程助手
💡GitHub
💡语言模型
💡代码修复
💡Docker
💡Miniconda
💡代码编辑器
💡文件查看器
💡命令行
💡环境变量
💡问题跟踪
Highlights
介绍了一种新型的编程助手SWE Agent,由普林斯顿大学团队开发。
SWE Agent专注于修复GitHub上的现实世界中的bug和问题。
SWE Agent在发布几天内就获得了3500个星标。
SWE Agent的性能接近Devon,一个已经建立声誉的系统。
SWE Bench测试显示SWE Agent使用GPT-4的修复率达到12.29%。
SWE Agent通过设计简单的语言模型中心命令和反馈格式,使语言模型更容易浏览代码库、查看、编辑和执行代码文件。
SWE Agent添加了在编辑命令发出时运行的linter,确保代码语法正确。
SWE Agent提供了一个特殊的构建文件查看器,每次显示100行代码。
SWE Agent具有文件编辑器,可以滚动和在文件中搜索。
SWE Agent具备全目录字符串搜索命令。
SWE Agent可以立即安装,提供了Docker和conda环境,简化了环境配置和管理。
作者感谢项目团队简化了Docker和conda的安装过程,减少了Python环境管理的麻烦。
SWE Agent在尝试修复自己的仓库中的一个问题时,成功定位并修复了代码。
SWE Agent在处理问题时,能够快速滚动和搜索大型文件。
SWE Agent在修复问题时,能够自我检查并限制成本。
视频作者对AI编程助手的进步表示兴奋,并鼓励观众喜欢和订阅。
Transcripts
we have a brand new coding assistant it
feels like every day we're getting a new
one but this one is special this is
called s we- agent and it is out of a
team at Princeton and it describes
itself as agent computer interfaces
enable software engineering language
models what does that actually mean so
what makes this special and it is
absolutely blowing up it's already at 3
and a half thousand stars and it was
just released a few days ago but what
makes this special is the fact that it
performs nearly as good as Devon so what
this project specializes in is fixing
realworld bugs and issues on GitHub so
you basically just give it a GitHub
issue URL it finds out what's going on
replicates the issue fixes it and
submits a fix as a PR it is really
impressive and check this out look at
this performance so this is the swe
bench test performance and everybody saw
that Devon had a 13 14.84% which again
it was being compared against kind of
just core models and not actually like
multi-agent Frameworks and but that
aside it performed really well
13.84% now with swe agent using GPT 4
open source
12.29 and again it just came out so very
very impressive and nearly as good as
Devon already and here's an example of
what it looks like back and forth so you
basically give give it a GitHub issue
and it says our reproduction script
confirms the issue reported men and Max
are not being converted to R so then it
searches the files for anything related
to R finds the issues then the
responsible file is likely R code. piy
we should open and inspect it it does
that then it makes the necessary changes
it is so cool and here's really why this
project is so special we accomplished
these results by designing simple
language model Centric commands and
feedback format to make it easier for
the LM to browse the repository view
edit and execute code files now that is
something that many projects don't do
very well if you have an existing
codebase a large codebase it is very
difficult for a language model to
understand the entire codebase and even
understand parts of it because each part
is interconnected with other parts of a
large codebase the only project I've
really seen do this very well is AER a i
d r and that's because it uses Universal
C tags which is essentially a way to
give computers a really easy way to
search through large code bases so here
are the features that it has they added
a linter that runs when an edit command
is issued we do not let the edit command
go through if the code isn't
syntactically correct so that's
fantastic we Supply the agent with a
special built file viewer instead of
just having a cat file so they actually
have a custom file viewer for the model
we found that this viewer works best
when displaying just 100 lines in each
turn very interesting so I'm not always
convinced that providing a language
model with just a snippet of code is
enough for it to understand the broader
context of the code but apparently it's
doing it pretty well the file editor
that we built has commands for scrolling
up and down and for performing a search
within the file so you're basically
giving an llm its own custom IDE and
that's kind of cool as a concept we
Supply the agent with a special built
full directory string searching command
we found that it's important for this
tool to succinctly list the matches and
when commands have an empty output we
return the message saying your command
ran successfully and did not produce any
output and yeah you can install it right
away so I'm going to show you how to
install it and then I'm going to show
you a demo so the first thing you're
going to need to do is install Docker
and we've been doing that a bunch lately
so go ahead click on that link you're
going to open up this page docs.
deer.com engine SL install you're going
to find the relevant Docker desktop app
for your operation system download it
and install it and when it's up and
running you're going to see this little
Docker icon in your taskbar then you
need to install miniconda so something
we use quite often on this channel and
if you don't already have it click this
link and you can download the installer
right here so Windows Mac OS and Linux
so download the relevant one for your
operating system once again install it
restart your terminal if you have to
next open up visual studio code we're
going to click this button in the top
right to toggle the panel which opens up
our terminal then we're going to CD to
the desktop or wherever you like to
store your new projects then switch back
to the GitHub repository we're going to
look for this green code button we're
going to click it and then we're going
to click this copy URL to clipboard
button right there copying the GitHub
URL switch back to VSS code and we're
going to type git clone and then thatwe
agent URL hit enter okay now we're going
to CD into it so CD sw- agents now
here's something cool which this project
did that I really like it actually comes
with a cond environment so it actually
will just set up the cond environment
for us so let's do that so if we switch
back we need to just type cond EnV
create DF environment. yml and this
environment. yml has the definition of
our environment that's necessary so it
just reduces the amount of guess work so
go ahead and click enter all right now
that that's done we're going to
highlight this right here to activate
the environment copy paste cond to
activate sw- agent hit enter okay so now
it's activated we can see so right there
next we need to run the setup script so/
setup.sh hit enter and this is going to
build the docker image so again between
Docker and cond kind of coming out of
the box with this project I really
appreciate how easy they're making this
it really reduces the headache of python
environment management package
management dependencies Etc so to the
authors of this project thank you thank
you thank you I hope more projects do
this all right well here's something
funny I know I said that it was going to
be a lot easier because it comes with
cond and Docker already ready to go but
I was wrong I can't get past this error
right here something with miniconda
something having to do with my Mac OS
being on Apple silicon I've tried a few
things and I don't know how to fix it so
if you do know how to fix this drop a
comment below but what I did is I
switched over to lightning. a now now
this video is not sponsored by
lightning. a but it just made it so much
easier it comes with Docker
pre-installed it comes with cond and I
simply followed the same steps and now
the docker image is created okay so now
it's done so all those previous steps
still work just follow those and now I'm
starting from here within lightning okay
then it says to create a keys file so
we're going to do that right click over
here new file keys. CFG hit enter then
we're going to paste in these four
environment VAR variable so we have the
GitHub token which is required we have
the open AI key the anthropic key and
the together key all of which are
optional we're going to be using open AI
today so for the GitHub token so I got
my GitHub personal access token and
there's instructions for how to do it
just Google it it's a few steps but it's
pretty straightforward paste it in here
just like so then I grabbed my open AI
API key and pasted it in just like so
all right so we have both of those and
we should be ready to to go now okay so
I think I actually put this Keys file in
the wrong place so I'm going to go ahead
and drag it into swe agents so now it is
in the main part of the swe agent folder
just like that now apparently to get it
running we just have to run this command
so python run.py we provided a model
name we have the data path which is the
issue that we're looking to fix and the
config file which is already set up for
us so I'm going to copy this command I
paste it in the command right here just
so I can edit it easily and we're going
to try something very meta now there's
actually an issue on swe agent here it
is so run replay. piy misses record base
commit etc etc I'm going to give it an
issue from its own repository so I'm
going to come up here I'm going to copy
the issue switch back to lightning I'm
going to highlight this issue right here
switch it out for this swe agent issue
highlight the whole thing cut come down
here paste and then hit enter and let's
see if this works okay and environment
initialized loaded data set beginning
task trying to clone from nonm Mirror
now I really want to plug grock into
this so it's super fast but for now
we're going to have to deal with GPT for
speeds okay we got a warning that seems
okay initializing agent settings for
container I mean everything seems to be
working you're going to solve this issue
on your own your terminal session has
started and you're in the repository etc
etc thoughts the issue indicates that
there is a key error for for base commit
okay that is correct so it got the issue
well here's the code and remember it has
about a 12% chance of solving the issue
but the fact that it's getting this far
is just super impressive so now that
we've located the S SW en. file let's
open it to inspect the code and
understand how it works now the file is
quite large 68 668 lines and total the
error message from the issue points to a
key error okay let's use the goto
command so that's the command that is
built into swe agents and we're moving
to line 300 so there it is okay we
scrolled to line 300 but have not yet
reached the reset method so now let's
try 500 so it's basically trying to
search for that code we've scrolled
further but we still have not reached it
let's keep scrolling we have reached the
end but not but have not found the reset
method so now it Scrolls back up to 400
it's just looking for it but if for some
reason it can't find it interesting so
it's basically scrolling up and down the
file just like any engineer would and
it's trying to find this Method All
right we have found the reset method
starting on line 108 yes good to fix
this we need to ensure that base commit
is included in self. record before this
line is executed one way to do this
would be to add a check Etc so I think
this is the updated code okay the edit
has been successfully applied but
there's a duplicate setting and it's
fixing it I mean this is so cool all
right o cost limit exceeded interesting
so there must be a setting somewhere
where you can actually limit the cost of
gp4 and so it already exceeded $2 so
fine thank you for stopping it there but
it seems like it was well on its way to
solving this issue and yep right there
we can actually set the cost limit so
that's really cool so it is set by
default to $2 we could set it at $10 if
we want and so on so very very cool I
want to power this with a local model
because then it won't cost anything so
there doesn't seem to be a very
straightforward way to use a local model
but I suspect that with a little bit of
effort we could get it to work and I bet
that they're going to allow it in future
versions since this has only been out
for a couple days so overall very cool
project now let me show you a full demo
so I'm about to show you a full demo end
to endend by one of the authors solving
a GitHub issue and preparing a fix for
it hey my name is Carlos and today I'm
going to show you an example of sweet
agent resolving an issue from GitHub so
we'll be looking at this issue from
simpai which is an instance in sbench
and we see that the user is reporting at
this problem where this Matrix operation
call insert is producing some unexpected
output so it looks like a
straightforward issue we'll copy this
GitHub URL and send this over to theu
agent um run script and once that's
going we can uh we we wait for about a
minute or two
but we can look at an example that ran a
bit earlier so here we have sweet agent
trying to resolve this issue and it
starts off by reproducing the the bug
that's reported which is always a good
first step so it copies the code from
that issue into a new file called
reproduce bug and after running that we
see that we have the same results that
are uh reported in the issue with this
problem being present here at the bottom
so now that we've confirmed that the
issue is a problem is still a problem
we can search the uh search the the
repository for this call insert function
to see where it might be defined and the
model thinks that it is defined in this
common. piy file so we open this common.
piy file in the file
editor and we can look at the different
functions that are present and we
identify the eval call insert as being a
particular function of Interest so we
scroll that into view down on line
81 and after analyzing the code a little
bit the model realizes that there's a
problem with the indexing for uh those
the values in this Matrix operation so
we generate an edit which is then
applied again to this function which can
be seen after here between lines 87
through 89 and we go back to our
reproduction code to run that again and
see how the output has changed and we
see here that the output is actually uh
represents the expected result so it
looks like uh the issue is resolved and
and we clean up our workspace by
removing that file and finally submit
what we think is the right solution so
that produces this diff that we can
evaluate with sbench and after testing
on S bench we find that this submission
passes the initial test and it doesn't
break any of the existing tests so we
can Market result all right so that's it
this is incredible I'm so excited to see
all of the progress on AI coding helpers
if you liked this video please consider
giving a like And subscribe and I'll see
you in the next one
5.0 / 5 (0 votes)