AI Agent Automatically Codes WITH TOOLS - SWE-Agent Tutorial ("Devin Clone")

Matthew Berman
5 Apr 202413:59

Summary

TLDR视频介绍了一款名为Swe-Agent的新型编程助手,由普林斯顿大学团队开发,专注于修复GitHub上的现实世界bug。Swe-Agent通过GPT-4技术,能够理解GitHub问题URL,复制问题,修复并提交PR。它具备代码编辑器、文件查看器等功能,并通过Docker和conda简化安装过程。尽管遇到了一些技术障碍,但Swe-Agent在解决编程问题上展现出了巨大潜力。

Takeaways

  • 🌟 介绍了一种全新的编程助手SWE Agent,由普林斯顿大学团队开发,专注于修复GitHub上的现实世界中的bug。
  • 🚀 SWE Agent在短时间内获得了大量的关注和高星级评价,表明其性能和潜力受到认可。
  • 🔍 SWE Agent通过分析GitHub问题URL,复制问题,修复并提交PR,展示了其强大的问题解决能力。
  • 📈 与现有的编程助手相比,SWE Agent在性能测试中接近Devon,显示出其高效的问题解决能力。
  • 🛠️ SWE Agent通过设计简单的语言模型中心指令和反馈格式,简化了语言模型浏览代码库、编辑和执行代码文件的过程。
  • 📋 项目特点包括运行编辑命令时的linter、特殊的文件查看器、文件编辑器以及全目录字符串搜索命令。
  • 🔗 安装过程简化,通过Docker和conda环境,减少了Python环境管理和依赖性问题。
  • 🎥 视频中展示了一个完整的demo,作者之一Carlos通过SWE Agent解决了一个GitHub问题并准备了修复方案。
  • 🔎 SWE Agent在解决GitHub问题时,能够复制问题代码,定位问题所在,并进行有效的修复。
  • 📊 通过测试,SWE Agent提交的修复方案通过了SWeBench测试,证明了其解决方案的有效性。
  • 💡 尽管SWE Agent目前还不支持本地模型,但未来版本可能会增加这一功能,进一步提升用户体验。

Q & A

  • SWE Agent是什么?

    -SWE Agent是一个由普林斯顿大学团队开发的新型编程助手,专注于修复GitHub上的现实世界中的bug和问题。

  • SWE Agent如何工作?

    -你只需要给SWE Agent一个GitHub问题链接,它就能找出问题所在,复制问题,修复它,并作为PR提交修复。

  • SWE Agent的性能如何?

    -SWE Agent的性能非常出色,使用GPT-4模型,在SWE Bench测试中达到了12.29%的修复成功率,与Devon的性能接近。

  • SWE Agent如何理解大型代码库?

    -SWE Agent通过设计简单的语言模型中心命令和反馈格式,使LM更容易浏览代码库、查看、编辑和执行代码文件。

  • SWE Agent有哪些特点?

    -SWE Agent添加了语法检查、特殊的文件查看器、文件编辑器和全目录字符串搜索命令。它还提供了一个内置的conda环境,简化了Python环境管理和依赖项管理。

  • 如何安装SWE Agent?

    -首先需要安装Docker和miniconda,然后通过VS Code克隆SWE Agent的GitHub仓库,并按照仓库中的指示设置conda环境和运行安装脚本。

  • SWE Agent在处理自己的GitHub问题时表现如何?

    -SWE Agent能够成功识别并修复自己的GitHub问题,展示了其在查找和修复代码问题方面的能力。

  • SWE Agent在解决GitHub问题时的流程是怎样的?

    -SWE Agent首先复制问题中的代码以重现bug,然后在代码库中搜索相关函数,分析代码并生成修复,再次运行代码以验证修复,最后提交修复作为PR。

  • SWE Agent在运行时遇到了哪些问题?

    -在运行过程中,SWE Agent遇到了Miniconda在Apple Silicon上的兼容性问题,以及在尝试解决GitHub问题时的成本限制问题。

  • 是否有可能使用本地模型运行SWE Agent?

    -目前SWE Agent主要依赖在线模型,但未来版本可能会支持本地模型,这需要一些努力和优化。

  • SWE Agent的成本限制是如何设定的?

    -SWE Agent的成本限制默认设置为2美元,但如果需要,可以设置更高的限额,例如10美元。

Outlines

00:00

🤖 介绍全新代码助手SWE-Agent

本段落介绍了一款名为SWE-Agent的全新代码助手,由普林斯顿大学团队开发。该助手专注于修复GitHub上的现实世界中的bug和问题,用户只需提供GitHub问题链接,SWE-Agent便能自动发现问题、复制、修复并提交修复作为PR。该工具在短时间内获得了大量的关注和好评,其性能与Devon相当,使用GPT-4开源版本,在SWE Bench测试中表现接近Devon。此外,SWE-Agent通过设计简单的语言模型中心命令和反馈格式,使语言模型更容易浏览代码库、查看、编辑和执行代码文件。

05:00

🛠 SWE-Agent的安装与使用

本段落详细描述了SWE-Agent的安装和使用过程。首先需要安装Docker和Miniconda,然后通过GitHub下载SWE-Agent项目。创建并激活conda环境,运行设置脚本构建Docker镜像。遇到Miniconda安装问题时,作者转而使用Lightning环境,该环境预装了Docker和conda。接下来,创建keys文件并输入GitHub token和OpenAI API key。最后,通过运行python run.py命令启动SWE-Agent,并尝试解决自身仓库中的一个问题,展示了SWE-Agent的工作流程和能力。

10:01

🌟 SWE-Agent的现场演示与问题解决

本段落通过作者Carlos的现场演示,展示了SWE-Agent如何解决GitHub上的一个实际问题。Carlos从一个名为simpai的项目中选取了一个关于矩阵操作的问题,并通过SWE-Agent进行修复。SWE-Agent首先复制问题代码进行复现,确认问题存在,然后在代码库中搜索相关的函数定义,分析并找到问题所在,生成并应用修复代码。通过再次运行复生产生代码验证修复效果,并最终提交修复方案。整个演示过程展示了SWE-Agent的强大功能和对AI编程助手领域进步的激动之情。

Mindmap

Keywords

💡编程助手

编程助手是指利用人工智能技术帮助开发者编写、调试和优化代码的软件工具。在视频中,特别介绍了一个名为Swe-Agent的新型编程助手,它能够通过分析GitHub上的问题来自动修复代码中的错误。

💡GitHub

GitHub是一个面向开源及私有软件项目的托管平台,它利用Git作为版本控制工具。在视频中,GitHub被用作问题和代码分享的平台,Swe-Agent通过分析GitHub上的问题来定位并修复代码错误。

💡语言模型

语言模型是人工智能领域的一种模型,它能够理解和生成人类语言。在视频中,Swe-Agent使用了GPT-4这样的语言模型来理解和处理编程语言,从而实现自动修复代码的功能。

💡代码修复

代码修复是指对现有代码中的错误或问题进行修改和改进的过程。在视频中,Swe-Agent的核心功能之一就是自动修复GitHub上报告的代码问题,通过分析问题、定位原因并提出解决方案。

💡Docker

Docker是一个开源的应用容器引擎,它允许开发者打包应用及其依赖环境到一个可移植的容器中,以实现应用的快速部署和运行。在视频中,为了使用Swe-Agent,需要先安装Docker,因为它提供了一个便捷的环境来运行Swe-Agent。

💡Miniconda

Miniconda是一个小型版的Anaconda,Anaconda是一个用于科学计算的Python发行版,它包含了大量用于数据科学、机器学习等领域的库和工具。在视频中,Miniconda被用来管理Python环境和包,以便为Swe-Agent的运行创建一个干净的环境。

💡代码编辑器

代码编辑器是一种用于编写和修改源代码的软件工具。在视频中,Swe-Agent内置了一个特殊的代码编辑器,它具有滚动和搜索等命令,使得语言模型能够像在IDE中一样浏览和编辑代码。

💡文件查看器

文件查看器是一种用于查看和浏览文件内容的工具。在视频中,Swe-Agent提供了一个特殊的文件查看器,它每次只显示100行内容,以便模型更有效地处理和理解代码上下文。

💡命令行

命令行是一个文本界面,用户可以在其中输入命令来与计算机系统交互。在视频中,命令行被用来执行安装Swe-Agent所需的各种命令,如安装Docker、Miniconda和激活Python环境等。

💡环境变量

环境变量是操作系统中用来指定运行环境的系统属性,它可以影响程序的运行方式。在视频中,用户需要设置环境变量,如GitHub令牌和OpenAI密钥,以便Swe-Agent能够访问GitHub和使用OpenAI的API。

💡问题跟踪

问题跟踪是指在软件开发过程中,记录、分类和解决软件缺陷或问题的一系列活动。在视频中,Swe-Agent通过分析GitHub问题跟踪代码中的错误,并尝试自动解决这些问题。

Highlights

介绍了一种新型的编程助手SWE Agent,由普林斯顿大学团队开发。

SWE Agent专注于修复GitHub上的现实世界中的bug和问题。

SWE Agent在发布几天内就获得了3500个星标。

SWE Agent的性能接近Devon,一个已经建立声誉的系统。

SWE Bench测试显示SWE Agent使用GPT-4的修复率达到12.29%。

SWE Agent通过设计简单的语言模型中心命令和反馈格式,使语言模型更容易浏览代码库、查看、编辑和执行代码文件。

SWE Agent添加了在编辑命令发出时运行的linter,确保代码语法正确。

SWE Agent提供了一个特殊的构建文件查看器,每次显示100行代码。

SWE Agent具有文件编辑器,可以滚动和在文件中搜索。

SWE Agent具备全目录字符串搜索命令。

SWE Agent可以立即安装,提供了Docker和conda环境,简化了环境配置和管理。

作者感谢项目团队简化了Docker和conda的安装过程,减少了Python环境管理的麻烦。

SWE Agent在尝试修复自己的仓库中的一个问题时,成功定位并修复了代码。

SWE Agent在处理问题时,能够快速滚动和搜索大型文件。

SWE Agent在修复问题时,能够自我检查并限制成本。

视频作者对AI编程助手的进步表示兴奋,并鼓励观众喜欢和订阅。

Transcripts

00:00

we have a brand new coding assistant it

00:02

feels like every day we're getting a new

00:04

one but this one is special this is

00:07

called s we- agent and it is out of a

00:10

team at Princeton and it describes

00:13

itself as agent computer interfaces

00:15

enable software engineering language

00:17

models what does that actually mean so

00:20

what makes this special and it is

00:22

absolutely blowing up it's already at 3

00:24

and a half thousand stars and it was

00:25

just released a few days ago but what

00:28

makes this special is the fact that it

00:30

performs nearly as good as Devon so what

00:33

this project specializes in is fixing

00:36

realworld bugs and issues on GitHub so

00:39

you basically just give it a GitHub

00:41

issue URL it finds out what's going on

00:44

replicates the issue fixes it and

00:47

submits a fix as a PR it is really

00:50

impressive and check this out look at

00:53

this performance so this is the swe

00:55

bench test performance and everybody saw

00:58

that Devon had a 13 14.84% which again

01:03

it was being compared against kind of

01:05

just core models and not actually like

01:07

multi-agent Frameworks and but that

01:10

aside it performed really well

01:12

13.84% now with swe agent using GPT 4

01:17

open source

01:18

12.29 and again it just came out so very

01:22

very impressive and nearly as good as

01:24

Devon already and here's an example of

01:27

what it looks like back and forth so you

01:29

basically give give it a GitHub issue

01:31

and it says our reproduction script

01:33

confirms the issue reported men and Max

01:35

are not being converted to R so then it

01:37

searches the files for anything related

01:39

to R finds the issues then the

01:42

responsible file is likely R code. piy

01:44

we should open and inspect it it does

01:46

that then it makes the necessary changes

01:49

it is so cool and here's really why this

01:52

project is so special we accomplished

01:55

these results by designing simple

01:57

language model Centric commands and

01:59

feedback format to make it easier for

02:01

the LM to browse the repository view

02:03

edit and execute code files now that is

02:06

something that many projects don't do

02:08

very well if you have an existing

02:10

codebase a large codebase it is very

02:13

difficult for a language model to

02:16

understand the entire codebase and even

02:18

understand parts of it because each part

02:20

is interconnected with other parts of a

02:23

large codebase the only project I've

02:25

really seen do this very well is AER a i

02:27

d r and that's because it uses Universal

02:30

C tags which is essentially a way to

02:32

give computers a really easy way to

02:34

search through large code bases so here

02:37

are the features that it has they added

02:38

a linter that runs when an edit command

02:41

is issued we do not let the edit command

02:43

go through if the code isn't

02:44

syntactically correct so that's

02:46

fantastic we Supply the agent with a

02:48

special built file viewer instead of

02:51

just having a cat file so they actually

02:53

have a custom file viewer for the model

02:57

we found that this viewer works best

02:58

when displaying just 100 lines in each

03:01

turn very interesting so I'm not always

03:03

convinced that providing a language

03:05

model with just a snippet of code is

03:07

enough for it to understand the broader

03:09

context of the code but apparently it's

03:11

doing it pretty well the file editor

03:13

that we built has commands for scrolling

03:14

up and down and for performing a search

03:16

within the file so you're basically

03:18

giving an llm its own custom IDE and

03:21

that's kind of cool as a concept we

03:24

Supply the agent with a special built

03:26

full directory string searching command

03:28

we found that it's important for this

03:30

tool to succinctly list the matches and

03:33

when commands have an empty output we

03:35

return the message saying your command

03:36

ran successfully and did not produce any

03:38

output and yeah you can install it right

03:41

away so I'm going to show you how to

03:43

install it and then I'm going to show

03:44

you a demo so the first thing you're

03:45

going to need to do is install Docker

03:47

and we've been doing that a bunch lately

03:48

so go ahead click on that link you're

03:50

going to open up this page docs.

03:52

deer.com engine SL install you're going

03:55

to find the relevant Docker desktop app

03:58

for your operation system download it

04:00

and install it and when it's up and

04:02

running you're going to see this little

04:04

Docker icon in your taskbar then you

04:07

need to install miniconda so something

04:09

we use quite often on this channel and

04:11

if you don't already have it click this

04:13

link and you can download the installer

04:15

right here so Windows Mac OS and Linux

04:18

so download the relevant one for your

04:20

operating system once again install it

04:23

restart your terminal if you have to

04:25

next open up visual studio code we're

04:27

going to click this button in the top

04:28

right to toggle the panel which opens up

04:30

our terminal then we're going to CD to

04:32

the desktop or wherever you like to

04:35

store your new projects then switch back

04:37

to the GitHub repository we're going to

04:40

look for this green code button we're

04:41

going to click it and then we're going

04:42

to click this copy URL to clipboard

04:45

button right there copying the GitHub

04:46

URL switch back to VSS code and we're

04:48

going to type git clone and then thatwe

04:51

agent URL hit enter okay now we're going

04:54

to CD into it so CD sw- agents now

04:58

here's something cool which this project

05:00

did that I really like it actually comes

05:01

with a cond environment so it actually

05:03

will just set up the cond environment

05:05

for us so let's do that so if we switch

05:07

back we need to just type cond EnV

05:10

create DF environment. yml and this

05:13

environment. yml has the definition of

05:15

our environment that's necessary so it

05:18

just reduces the amount of guess work so

05:19

go ahead and click enter all right now

05:21

that that's done we're going to

05:23

highlight this right here to activate

05:25

the environment copy paste cond to

05:27

activate sw- agent hit enter okay so now

05:31

it's activated we can see so right there

05:33

next we need to run the setup script so/

05:37

setup.sh hit enter and this is going to

05:39

build the docker image so again between

05:42

Docker and cond kind of coming out of

05:44

the box with this project I really

05:47

appreciate how easy they're making this

05:49

it really reduces the headache of python

05:51

environment management package

05:52

management dependencies Etc so to the

05:56

authors of this project thank you thank

05:58

you thank you I hope more projects do

06:00

this all right well here's something

06:03

funny I know I said that it was going to

06:05

be a lot easier because it comes with

06:07

cond and Docker already ready to go but

06:10

I was wrong I can't get past this error

06:13

right here something with miniconda

06:15

something having to do with my Mac OS

06:18

being on Apple silicon I've tried a few

06:20

things and I don't know how to fix it so

06:22

if you do know how to fix this drop a

06:24

comment below but what I did is I

06:27

switched over to lightning. a now now

06:29

this video is not sponsored by

06:31

lightning. a but it just made it so much

06:33

easier it comes with Docker

06:35

pre-installed it comes with cond and I

06:37

simply followed the same steps and now

06:40

the docker image is created okay so now

06:42

it's done so all those previous steps

06:44

still work just follow those and now I'm

06:46

starting from here within lightning okay

06:49

then it says to create a keys file so

06:50

we're going to do that right click over

06:52

here new file keys. CFG hit enter then

06:57

we're going to paste in these four

06:58

environment VAR variable so we have the

07:00

GitHub token which is required we have

07:02

the open AI key the anthropic key and

07:04

the together key all of which are

07:05

optional we're going to be using open AI

07:07

today so for the GitHub token so I got

07:12

my GitHub personal access token and

07:15

there's instructions for how to do it

07:16

just Google it it's a few steps but it's

07:19

pretty straightforward paste it in here

07:21

just like so then I grabbed my open AI

07:24

API key and pasted it in just like so

07:26

all right so we have both of those and

07:28

we should be ready to to go now okay so

07:30

I think I actually put this Keys file in

07:33

the wrong place so I'm going to go ahead

07:34

and drag it into swe agents so now it is

07:37

in the main part of the swe agent folder

07:40

just like that now apparently to get it

07:42

running we just have to run this command

07:44

so python run.py we provided a model

07:46

name we have the data path which is the

07:49

issue that we're looking to fix and the

07:52

config file which is already set up for

07:54

us so I'm going to copy this command I

07:57

paste it in the command right here just

07:58

so I can edit it easily and we're going

08:00

to try something very meta now there's

08:02

actually an issue on swe agent here it

08:05

is so run replay. piy misses record base

08:09

commit etc etc I'm going to give it an

08:12

issue from its own repository so I'm

08:14

going to come up here I'm going to copy

08:15

the issue switch back to lightning I'm

08:18

going to highlight this issue right here

08:20

switch it out for this swe agent issue

08:23

highlight the whole thing cut come down

08:25

here paste and then hit enter and let's

08:27

see if this works okay and environment

08:30

initialized loaded data set beginning

08:32

task trying to clone from nonm Mirror

08:35

now I really want to plug grock into

08:37

this so it's super fast but for now

08:39

we're going to have to deal with GPT for

08:41

speeds okay we got a warning that seems

08:44

okay initializing agent settings for

08:46

container I mean everything seems to be

08:49

working you're going to solve this issue

08:51

on your own your terminal session has

08:52

started and you're in the repository etc

08:55

etc thoughts the issue indicates that

08:57

there is a key error for for base commit

09:00

okay that is correct so it got the issue

09:02

well here's the code and remember it has

09:05

about a 12% chance of solving the issue

09:08

but the fact that it's getting this far

09:10

is just super impressive so now that

09:13

we've located the S SW en. file let's

09:16

open it to inspect the code and

09:18

understand how it works now the file is

09:21

quite large 68 668 lines and total the

09:24

error message from the issue points to a

09:26

key error okay let's use the goto

09:28

command so that's the command that is

09:31

built into swe agents and we're moving

09:33

to line 300 so there it is okay we

09:36

scrolled to line 300 but have not yet

09:38

reached the reset method so now let's

09:41

try 500 so it's basically trying to

09:43

search for that code we've scrolled

09:44

further but we still have not reached it

09:46

let's keep scrolling we have reached the

09:48

end but not but have not found the reset

09:51

method so now it Scrolls back up to 400

09:53

it's just looking for it but if for some

09:55

reason it can't find it interesting so

09:57

it's basically scrolling up and down the

09:58

file just like any engineer would and

10:00

it's trying to find this Method All

10:03

right we have found the reset method

10:04

starting on line 108 yes good to fix

10:08

this we need to ensure that base commit

10:10

is included in self. record before this

10:12

line is executed one way to do this

10:13

would be to add a check Etc so I think

10:17

this is the updated code okay the edit

10:19

has been successfully applied but

10:21

there's a duplicate setting and it's

10:23

fixing it I mean this is so cool all

10:25

right o cost limit exceeded interesting

10:30

so there must be a setting somewhere

10:32

where you can actually limit the cost of

10:35

gp4 and so it already exceeded $2 so

10:38

fine thank you for stopping it there but

10:41

it seems like it was well on its way to

10:43

solving this issue and yep right there

10:45

we can actually set the cost limit so

10:48

that's really cool so it is set by

10:50

default to $2 we could set it at $10 if

10:53

we want and so on so very very cool I

10:56

want to power this with a local model

10:58

because then it won't cost anything so

11:01

there doesn't seem to be a very

11:03

straightforward way to use a local model

11:05

but I suspect that with a little bit of

11:08

effort we could get it to work and I bet

11:10

that they're going to allow it in future

11:12

versions since this has only been out

11:14

for a couple days so overall very cool

11:17

project now let me show you a full demo

11:20

so I'm about to show you a full demo end

11:22

to endend by one of the authors solving

11:25

a GitHub issue and preparing a fix for

11:28

it hey my name is Carlos and today I'm

11:30

going to show you an example of sweet

11:32

agent resolving an issue from GitHub so

11:34

we'll be looking at this issue from

11:35

simpai which is an instance in sbench

11:37

and we see that the user is reporting at

11:40

this problem where this Matrix operation

11:43

call insert is producing some unexpected

11:45

output so it looks like a

11:46

straightforward issue we'll copy this

11:48

GitHub URL and send this over to theu

11:51

agent um run script and once that's

11:54

going we can uh we we wait for about a

11:58

minute or two

11:59

but we can look at an example that ran a

12:01

bit earlier so here we have sweet agent

12:05

trying to resolve this issue and it

12:06

starts off by reproducing the the bug

12:08

that's reported which is always a good

12:10

first step so it copies the code from

12:12

that issue into a new file called

12:15

reproduce bug and after running that we

12:17

see that we have the same results that

12:19

are uh reported in the issue with this

12:22

problem being present here at the bottom

12:25

so now that we've confirmed that the

12:27

issue is a problem is still a problem

12:30

we can search the uh search the the

12:33

repository for this call insert function

12:36

to see where it might be defined and the

12:37

model thinks that it is defined in this

12:39

common. piy file so we open this common.

12:43

piy file in the file

12:44

editor and we can look at the different

12:47

functions that are present and we

12:49

identify the eval call insert as being a

12:51

particular function of Interest so we

12:53

scroll that into view down on line

12:56

81 and after analyzing the code a little

13:00

bit the model realizes that there's a

13:02

problem with the indexing for uh those

13:05

the values in this Matrix operation so

13:08

we generate an edit which is then

13:09

applied again to this function which can

13:11

be seen after here between lines 87

13:14

through 89 and we go back to our

13:17

reproduction code to run that again and

13:19

see how the output has changed and we

13:22

see here that the output is actually uh

13:24

represents the expected result so it

13:26

looks like uh the issue is resolved and

13:29

and we clean up our workspace by

13:30

removing that file and finally submit

13:33

what we think is the right solution so

13:34

that produces this diff that we can

13:37

evaluate with sbench and after testing

13:39

on S bench we find that this submission

13:42

passes the initial test and it doesn't

13:44

break any of the existing tests so we

13:46

can Market result all right so that's it

13:49

this is incredible I'm so excited to see

13:52

all of the progress on AI coding helpers

13:54

if you liked this video please consider

13:56

giving a like And subscribe and I'll see

13:58

you in the next one