AI Agent Automatically Codes WITH TOOLS - SWE-Agent Tutorial ("Devin Clone")

Matthew Berman
5 Apr 202413:59

Summary

TLDRThe video introduces a new coding assistant called SWE Agent, developed by a team at Princeton. It specializes in fixing real-world bugs on GitHub by analyzing issue URLs, replicating the issue, and submitting a fix as a pull request. The SWE Agent has impressively achieved near performance to Devon, another AI model, using GPT-4. The project stands out for its language model-centric commands and feedback format, which simplifies code browsing, editing, and execution. It also includes features like a linter, a custom file viewer, and a directory string search command. The video provides a detailed demonstration of the installation process and showcases the agent's ability to resolve an issue from its own repository.

Takeaways

  • 🚀 Introduction of a new coding assistant, the SWE Agent, developed by a team at Princeton, specializing in software engineering language models.
  • 🌟 The SWE Agent has quickly gained popularity, with 3,500 stars on GitHub just days after its release.
  • 🔍 SWE Agent's unique capability to fix real-world bugs and issues on GitHub by analyzing the issue URL, replicating the issue, and submitting a fix as a PR.
  • 📈 Comparative performance of SWE Agent using GPT-4, showing a 12.29% success rate in the SWE Bench test, nearly matching the performance of Devon.
  • 🛠️ The project's innovation lies in its simple language model-centric commands and feedback format, facilitating easier code browsing, editing, and execution.
  • 🔧 SWE Agent includes a linter that ensures syntactical correctness before code edits are applied, and a custom file viewer for the model.
  • 📚 The agent features a file editor with scrolling and search capabilities, essentially providing an LLM with its own custom IDE.
  • 🔎 A full directory string searching command is supplied for efficient codebase navigation and match listing.
  • 📋 Easy installation process with Docker and Miniconda setup, streamlined for user convenience.
  • 💻 The SWE Agent comes with a pre-configured conda environment, reducing the complexity of environment and dependency management.
  • 🔄 A live demonstration by one of the authors showcases the end-to-end process of resolving a GitHub issue using the SWE Agent.

Q & A

  • What is the SWE Agent and how does it differ from other coding assistants?

    -The SWE Agent is a special coding assistant developed by a team at Princeton. It stands out from other coding assistants because it specializes in fixing real-world bugs and issues on GitHub. Given a GitHub issue URL, it can understand the problem, replicate the issue, fix it, and submit the fix as a Pull Request (PR). It has been designed with language model-centric commands and a feedback format to efficiently browse, view, edit, and execute code files within a repository.

  • How does the SWE Agent perform in comparison to other models like Devon?

    -The SWE Agent has shown impressive performance, nearly matching that of Devon. In the SWE Bench test, Devon achieved a 13.84% success rate, while the SWE Agent, using GPT-4 and despite being newly released, scored 12.29%. This demonstrates the Agent's capability to effectively address issues in software engineering tasks.

  • What features does the SWE Agent have that facilitate its operation?

    -The SWE Agent includes several features such as a built-in linter that checks code syntax before edits are made, a custom file viewer that displays up to 100 lines at a time for better comprehension, a file editor with scrolling and search commands, and a full directory string searching command. These features collectively provide the language model with an environment similar to a custom IDE, enhancing its ability to understand and work with large codebases.

  • How can one install and set up the SWE Agent for use?

    -To install the SWE Agent, one needs to install Docker and Miniconda first. Then, users should clone the SWE Agent repository, create a conda environment using the provided environment.yml file, activate the environment, and run the setup script to build the Docker image. After these steps, users can run the SWE Agent using the provided command in the project's directory.

  • What was the issue that the SWE Agent attempted to solve in its own repository?

    -The issue that the SWE Agent attempted to solve in its own repository was related to a key error on the base commit. The error was identified in the 'reset' method of the SWEn.py file, where the 'base commit' was not included in 'self.record' before the line of execution. The SWE Agent managed to locate the issue and suggested adding a check to include 'base commit' before the problematic line.

  • How does the SWE Agent handle large codebases?

    -The SWE Agent is designed to handle large codebases by using a combination of a custom file viewer, which displays a limited number of lines at a time, and a file editor with search capabilities. This approach allows the language model to focus on specific parts of the codebase without being overwhelmed by its size. Additionally, the SWE Agent uses language model-centric commands and feedback to navigate and understand the codebase more effectively.

  • What is the significance of the SWE Agent's ability to understand and fix GitHub issues?

    -The SWE Agent's ability to understand and fix GitHub issues is significant because it demonstrates the growing capability of AI in software engineering tasks. By automatically identifying, reproducing, and fixing bugs, the SWE Agent can save developers time and effort, potentially reducing the occurrence of human error and improving the overall quality and efficiency of software development.

  • How does the SWE Agent ensure syntactically correct code edits?

    -The SWE Agent ensures syntactically correct code edits by incorporating a linter that runs every time an edit command is issued. The edit command will not proceed if the code is not syntactically correct, thus maintaining code quality and preventing the introduction of new errors.

  • What is the role of the environment variables in using the SWE Agent?

    -Environment variables play a crucial role in configuring the SWE Agent. Users need to provide their GitHub personal access token, and optionally, their OpenAI, Anthropic, and Togther API keys. These tokens are required for the SWE Agent to access the necessary resources and services to perform its tasks, such as browsing and editing code on GitHub.

  • What is the cost limit feature in the SWE Agent and how does it work?

    -The cost limit feature in the SWE Agent is a setting that allows users to define a maximum cost for using the GPT-4 model. This is important for managing expenses, as the use of AI models can incur costs. In the example provided, the default cost limit was set at $2, but users can adjust this limit according to their needs, up to a certain extent, to ensure that the Agent's operations do not exceed their budget.

  • What are some potential future improvements for the SWE Agent?

    -Potential future improvements for the SWE Agent could include allowing the use of local models to eliminate costs, enhancing the model's ability to understand and navigate even larger and more complex codebases, and possibly integrating with more development tools and platforms to streamline the software development process further.

  • How does the SWE Agent handle the process of solving an issue from start to finish?

    -The SWE Agent handles the process of solving an issue by first reproducing the bug, then searching the repository for the relevant code, analyzing and identifying the problem, generating and applying an edit to fix the issue, retesting the code to confirm the fix, and finally submitting the solution as a PR. This end-to-end process is designed to mimic the workflow of a human developer, providing an automated and efficient approach to resolving software issues.

Outlines

00:00

🚀 Introduction to the SWE Agent

The video introduces a new coding assistant called SWE Agent, developed by a team at Princeton. It is described as a special tool due to its ability to perform nearly as well as Devon, another renowned coding assistant. The SWE Agent has quickly gained popularity, amassing 3,500 stars on GitHub within days of its release. The project specializes in fixing real-world bugs and issues on GitHub by analyzing the issue URL, replicating the issue, and submitting a fix as a pull request (PR). The SWE Agent's performance is highlighted by its results on the SWE Bench test, where it scored 12.29%, nearly matching Devon's 13.84%. The project's success is attributed to its design of simple language model-centric commands and feedback formats, which facilitate easier code browsing, editing, and execution for the language model.

05:00

🛠️ Features and Installation of SWE Agent

The SWE Agent comes with several notable features, including a built-in linter that ensures syntactical correctness before code editing, a custom file viewer that displays 100 lines at a time for optimal performance, and a file editor with scrolling and search capabilities. Additionally, the tool includes a full directory string searching command and provides feedback when commands have no output. The installation process for SWE Agent is straightforward, requiring the installation of Docker and miniconda. The project provides a Docker image and a conda environment file (environment.yml), simplifying the setup process. However, the video creator encounters an error related to miniconda on Apple silicon but successfully resolves it by switching to a different environment (Lightning).

10:01

🔍 Live Demonstration of SWE Agent

The video concludes with a live demonstration of the SWE Agent in action. The creator, Carlos, shows how the agent resolves an issue from a GitHub repository. The SWE Agent reproduces the bug, identifies the problem in the code, and applies a fix. The edited code is tested, and the issue is resolved without breaking any existing tests. The demonstration highlights the agent's ability to understand and manipulate code effectively. The video creator expresses excitement about the advancements in AI coding helpers and encourages viewers to like and subscribe for more content.

Mindmap

Keywords

💡coding assistant

A coding assistant is an AI-powered tool designed to aid in software development by automating tasks such as code writing, debugging, and issue resolution. In the context of the video, the 'swe-agent' is introduced as a new and special coding assistant developed by a team at Princeton, showcasing its ability to understand and fix real-world bugs on GitHub.

💡GitHub

GitHub is a web-based platform that provides version control and collaboration features for developers using Git. It allows users to host and review code, manage projects, and build software. In the video, GitHub is used as a platform where the 'swe-agent' identifies issues and submits pull requests with bug fixes.

💡GPT-4

GPT-4 is the fourth iteration of the Generative Pre-trained Transformer, a language prediction model developed by OpenAI. It is known for its advanced capabilities in understanding and generating human-like text, making it a powerful tool for natural language processing tasks. In the video, the 'swe-agent' uses GPT-4 to achieve a high success rate in fixing software bugs.

💡bug fixing

Bug fixing is the process of identifying, diagnosing, and correcting errors or faults in software code that cause it to behave in unexpected or undesired ways. The video highlights the 'swe-agent's' ability to automate this process by understanding the context of a GitHub issue and applying the necessary code changes to resolve it.

💡language model

A language model is a type of machine learning model that is trained to understand and generate human language. It is used in various applications, including natural language processing and understanding, text generation, and more. In the video, the 'swe-agent' utilizes a language model to interpret and manipulate code within GitHub repositories.

💡performance benchmark

A performance benchmark is a standard or criterion against which the performance of a system or component can be compared and evaluated. It is often used to assess the effectiveness and efficiency of software or hardware. In the video, the 'swe-agent' is compared to other models in a benchmark test to measure its ability to fix software bugs.

💡open source

Open source refers to a type of software licensing where the source code is made publicly available for anyone to view, use, modify, and distribute. This encourages collaboration and transparency within the software development community. The 'swe-agent' is noted for utilizing GPT-4, an open-source technology, which allows for wider accessibility and potential contributions from the community.

💡Docker

Docker is a platform that enables developers to develop, deploy, and run applications inside containers. Containers are lightweight, portable, and self-sufficient, including everything needed to run an application. In the video, Docker is used to streamline the setup and deployment of the 'swe-agent', making it easier for users to get started with the tool.

💡miniconda

Miniconda is a free minimal installer for Conda, a popular package and environment management system for Python and other programming languages. It is used to manage dependencies and create isolated environments for different projects. In the video, miniconda is required as part of the setup process for the 'swe-agent', showcasing its role in managing Python environments.

💡IDE (Integrated Development Environment)

An Integrated Development Environment (IDE) is a software application that provides a comprehensive set of tools for software development. This typically includes a source code editor, build automation tools, and a debugger. In the context of the video, the 'swe-agent' is likened to having its own custom IDE, with features that allow it to edit and execute code files.

💡pull request (PR)

A pull request (PR) is a feature in version control systems, like Git, that allows developers to propose changes to a project's codebase. It initiates a review process where other contributors can examine, discuss, and approve or request changes to the proposed code before it is merged into the main project. In the video, the 'swe-agent' is capable of submitting pull requests to fix issues on GitHub, automating part of the collaborative development process.

Highlights

Introduction of a new coding assistant called SWE-Agent, developed by a team at Princeton, specializing in fixing real-world bugs on GitHub.

SWE-Agent automates the process of bug fixing by taking a GitHub issue URL, replicating the issue, fixing it, and submitting a PR.

Comparison of SWE-Agent's performance with Devon, highlighting its competitive edge with a 12.29% benchmark score using GPT-4.

SWE-Agent utilizes language model-centric commands and feedback formats to navigate and manipulate code repositories effectively.

The implementation of a custom file viewer and editor within SWE-Agent to handle code files efficiently, showing just 100 lines at a time.

Integration of a linter in SWE-Agent to ensure syntactical correctness before any code edits are applied.

Use of Universal C tags in another project, AER, as a benchmark for effective large codebase management.

Step-by-step installation guide for SWE-Agent using Docker and Miniconda, emphasizing the ease of setup.

Troubleshooting installation issues on MacOS with Apple Silicon, indicating compatibility challenges.

Adoption of alternative platforms like Lightning.a to overcome installation hurdles, showcasing flexibility in setup environments.

Demonstration of SWE-Agent fixing an issue within its own repository, illustrating self-referential debugging.

Introduction of a cost control feature in SWE-Agent, allowing users to set a spending limit for operations.

Prospective integration of local model support in future versions of SWE-Agent to eliminate operation costs.

A full end-to-end demo by one of SWE-Agent's authors, resolving a GitHub issue and preparing a fix, highlighting practical application.

Successful resolution of a coding issue by SWE-Agent, confirmed by S bench tests that ensure the fix does not break existing functionality.

Transcripts

00:00

we have a brand new coding assistant it

00:02

feels like every day we're getting a new

00:04

one but this one is special this is

00:07

called s we- agent and it is out of a

00:10

team at Princeton and it describes

00:13

itself as agent computer interfaces

00:15

enable software engineering language

00:17

models what does that actually mean so

00:20

what makes this special and it is

00:22

absolutely blowing up it's already at 3

00:24

and a half thousand stars and it was

00:25

just released a few days ago but what

00:28

makes this special is the fact that it

00:30

performs nearly as good as Devon so what

00:33

this project specializes in is fixing

00:36

realworld bugs and issues on GitHub so

00:39

you basically just give it a GitHub

00:41

issue URL it finds out what's going on

00:44

replicates the issue fixes it and

00:47

submits a fix as a PR it is really

00:50

impressive and check this out look at

00:53

this performance so this is the swe

00:55

bench test performance and everybody saw

00:58

that Devon had a 13 14.84% which again

01:03

it was being compared against kind of

01:05

just core models and not actually like

01:07

multi-agent Frameworks and but that

01:10

aside it performed really well

01:12

13.84% now with swe agent using GPT 4

01:17

open source

01:18

12.29 and again it just came out so very

01:22

very impressive and nearly as good as

01:24

Devon already and here's an example of

01:27

what it looks like back and forth so you

01:29

basically give give it a GitHub issue

01:31

and it says our reproduction script

01:33

confirms the issue reported men and Max

01:35

are not being converted to R so then it

01:37

searches the files for anything related

01:39

to R finds the issues then the

01:42

responsible file is likely R code. piy

01:44

we should open and inspect it it does

01:46

that then it makes the necessary changes

01:49

it is so cool and here's really why this

01:52

project is so special we accomplished

01:55

these results by designing simple

01:57

language model Centric commands and

01:59

feedback format to make it easier for

02:01

the LM to browse the repository view

02:03

edit and execute code files now that is

02:06

something that many projects don't do

02:08

very well if you have an existing

02:10

codebase a large codebase it is very

02:13

difficult for a language model to

02:16

understand the entire codebase and even

02:18

understand parts of it because each part

02:20

is interconnected with other parts of a

02:23

large codebase the only project I've

02:25

really seen do this very well is AER a i

02:27

d r and that's because it uses Universal

02:30

C tags which is essentially a way to

02:32

give computers a really easy way to

02:34

search through large code bases so here

02:37

are the features that it has they added

02:38

a linter that runs when an edit command

02:41

is issued we do not let the edit command

02:43

go through if the code isn't

02:44

syntactically correct so that's

02:46

fantastic we Supply the agent with a

02:48

special built file viewer instead of

02:51

just having a cat file so they actually

02:53

have a custom file viewer for the model

02:57

we found that this viewer works best

02:58

when displaying just 100 lines in each

03:01

turn very interesting so I'm not always

03:03

convinced that providing a language

03:05

model with just a snippet of code is

03:07

enough for it to understand the broader

03:09

context of the code but apparently it's

03:11

doing it pretty well the file editor

03:13

that we built has commands for scrolling

03:14

up and down and for performing a search

03:16

within the file so you're basically

03:18

giving an llm its own custom IDE and

03:21

that's kind of cool as a concept we

03:24

Supply the agent with a special built

03:26

full directory string searching command

03:28

we found that it's important for this

03:30

tool to succinctly list the matches and

03:33

when commands have an empty output we

03:35

return the message saying your command

03:36

ran successfully and did not produce any

03:38

output and yeah you can install it right

03:41

away so I'm going to show you how to

03:43

install it and then I'm going to show

03:44

you a demo so the first thing you're

03:45

going to need to do is install Docker

03:47

and we've been doing that a bunch lately

03:48

so go ahead click on that link you're

03:50

going to open up this page docs.

03:52

deer.com engine SL install you're going

03:55

to find the relevant Docker desktop app

03:58

for your operation system download it

04:00

and install it and when it's up and

04:02

running you're going to see this little

04:04

Docker icon in your taskbar then you

04:07

need to install miniconda so something

04:09

we use quite often on this channel and

04:11

if you don't already have it click this

04:13

link and you can download the installer

04:15

right here so Windows Mac OS and Linux

04:18

so download the relevant one for your

04:20

operating system once again install it

04:23

restart your terminal if you have to

04:25

next open up visual studio code we're

04:27

going to click this button in the top

04:28

right to toggle the panel which opens up

04:30

our terminal then we're going to CD to

04:32

the desktop or wherever you like to

04:35

store your new projects then switch back

04:37

to the GitHub repository we're going to

04:40

look for this green code button we're

04:41

going to click it and then we're going

04:42

to click this copy URL to clipboard

04:45

button right there copying the GitHub

04:46

URL switch back to VSS code and we're

04:48

going to type git clone and then thatwe

04:51

agent URL hit enter okay now we're going

04:54

to CD into it so CD sw- agents now

04:58

here's something cool which this project

05:00

did that I really like it actually comes

05:01

with a cond environment so it actually

05:03

will just set up the cond environment

05:05

for us so let's do that so if we switch

05:07

back we need to just type cond EnV

05:10

create DF environment. yml and this

05:13

environment. yml has the definition of

05:15

our environment that's necessary so it

05:18

just reduces the amount of guess work so

05:19

go ahead and click enter all right now

05:21

that that's done we're going to

05:23

highlight this right here to activate

05:25

the environment copy paste cond to

05:27

activate sw- agent hit enter okay so now

05:31

it's activated we can see so right there

05:33

next we need to run the setup script so/

05:37

setup.sh hit enter and this is going to

05:39

build the docker image so again between

05:42

Docker and cond kind of coming out of

05:44

the box with this project I really

05:47

appreciate how easy they're making this

05:49

it really reduces the headache of python

05:51

environment management package

05:52

management dependencies Etc so to the

05:56

authors of this project thank you thank

05:58

you thank you I hope more projects do

06:00

this all right well here's something

06:03

funny I know I said that it was going to

06:05

be a lot easier because it comes with

06:07

cond and Docker already ready to go but

06:10

I was wrong I can't get past this error

06:13

right here something with miniconda

06:15

something having to do with my Mac OS

06:18

being on Apple silicon I've tried a few

06:20

things and I don't know how to fix it so

06:22

if you do know how to fix this drop a

06:24

comment below but what I did is I

06:27

switched over to lightning. a now now

06:29

this video is not sponsored by

06:31

lightning. a but it just made it so much

06:33

easier it comes with Docker

06:35

pre-installed it comes with cond and I

06:37

simply followed the same steps and now

06:40

the docker image is created okay so now

06:42

it's done so all those previous steps

06:44

still work just follow those and now I'm

06:46

starting from here within lightning okay

06:49

then it says to create a keys file so

06:50

we're going to do that right click over

06:52

here new file keys. CFG hit enter then

06:57

we're going to paste in these four

06:58

environment VAR variable so we have the

07:00

GitHub token which is required we have

07:02

the open AI key the anthropic key and

07:04

the together key all of which are

07:05

optional we're going to be using open AI

07:07

today so for the GitHub token so I got

07:12

my GitHub personal access token and

07:15

there's instructions for how to do it

07:16

just Google it it's a few steps but it's

07:19

pretty straightforward paste it in here

07:21

just like so then I grabbed my open AI

07:24

API key and pasted it in just like so

07:26

all right so we have both of those and

07:28

we should be ready to to go now okay so

07:30

I think I actually put this Keys file in

07:33

the wrong place so I'm going to go ahead

07:34

and drag it into swe agents so now it is

07:37

in the main part of the swe agent folder

07:40

just like that now apparently to get it

07:42

running we just have to run this command

07:44

so python run.py we provided a model

07:46

name we have the data path which is the

07:49

issue that we're looking to fix and the

07:52

config file which is already set up for

07:54

us so I'm going to copy this command I

07:57

paste it in the command right here just

07:58

so I can edit it easily and we're going

08:00

to try something very meta now there's

08:02

actually an issue on swe agent here it

08:05

is so run replay. piy misses record base

08:09

commit etc etc I'm going to give it an

08:12

issue from its own repository so I'm

08:14

going to come up here I'm going to copy

08:15

the issue switch back to lightning I'm

08:18

going to highlight this issue right here

08:20

switch it out for this swe agent issue

08:23

highlight the whole thing cut come down

08:25

here paste and then hit enter and let's

08:27

see if this works okay and environment

08:30

initialized loaded data set beginning

08:32

task trying to clone from nonm Mirror

08:35

now I really want to plug grock into

08:37

this so it's super fast but for now

08:39

we're going to have to deal with GPT for

08:41

speeds okay we got a warning that seems

08:44

okay initializing agent settings for

08:46

container I mean everything seems to be

08:49

working you're going to solve this issue

08:51

on your own your terminal session has

08:52

started and you're in the repository etc

08:55

etc thoughts the issue indicates that

08:57

there is a key error for for base commit

09:00

okay that is correct so it got the issue

09:02

well here's the code and remember it has

09:05

about a 12% chance of solving the issue

09:08

but the fact that it's getting this far

09:10

is just super impressive so now that

09:13

we've located the S SW en. file let's

09:16

open it to inspect the code and

09:18

understand how it works now the file is

09:21

quite large 68 668 lines and total the

09:24

error message from the issue points to a

09:26

key error okay let's use the goto

09:28

command so that's the command that is

09:31

built into swe agents and we're moving

09:33

to line 300 so there it is okay we

09:36

scrolled to line 300 but have not yet

09:38

reached the reset method so now let's

09:41

try 500 so it's basically trying to

09:43

search for that code we've scrolled

09:44

further but we still have not reached it

09:46

let's keep scrolling we have reached the

09:48

end but not but have not found the reset

09:51

method so now it Scrolls back up to 400

09:53

it's just looking for it but if for some

09:55

reason it can't find it interesting so

09:57

it's basically scrolling up and down the

09:58

file just like any engineer would and

10:00

it's trying to find this Method All

10:03

right we have found the reset method

10:04

starting on line 108 yes good to fix

10:08

this we need to ensure that base commit

10:10

is included in self. record before this

10:12

line is executed one way to do this

10:13

would be to add a check Etc so I think

10:17

this is the updated code okay the edit

10:19

has been successfully applied but

10:21

there's a duplicate setting and it's

10:23

fixing it I mean this is so cool all

10:25

right o cost limit exceeded interesting

10:30

so there must be a setting somewhere

10:32

where you can actually limit the cost of

10:35

gp4 and so it already exceeded $2 so

10:38

fine thank you for stopping it there but

10:41

it seems like it was well on its way to

10:43

solving this issue and yep right there

10:45

we can actually set the cost limit so

10:48

that's really cool so it is set by

10:50

default to $2 we could set it at $10 if

10:53

we want and so on so very very cool I

10:56

want to power this with a local model

10:58

because then it won't cost anything so

11:01

there doesn't seem to be a very

11:03

straightforward way to use a local model

11:05

but I suspect that with a little bit of

11:08

effort we could get it to work and I bet

11:10

that they're going to allow it in future

11:12

versions since this has only been out

11:14

for a couple days so overall very cool

11:17

project now let me show you a full demo

11:20

so I'm about to show you a full demo end

11:22

to endend by one of the authors solving

11:25

a GitHub issue and preparing a fix for

11:28

it hey my name is Carlos and today I'm

11:30

going to show you an example of sweet

11:32

agent resolving an issue from GitHub so

11:34

we'll be looking at this issue from

11:35

simpai which is an instance in sbench

11:37

and we see that the user is reporting at

11:40

this problem where this Matrix operation

11:43

call insert is producing some unexpected

11:45

output so it looks like a

11:46

straightforward issue we'll copy this

11:48

GitHub URL and send this over to theu

11:51

agent um run script and once that's

11:54

going we can uh we we wait for about a

11:58

minute or two

11:59

but we can look at an example that ran a

12:01

bit earlier so here we have sweet agent

12:05

trying to resolve this issue and it

12:06

starts off by reproducing the the bug

12:08

that's reported which is always a good

12:10

first step so it copies the code from

12:12

that issue into a new file called

12:15

reproduce bug and after running that we

12:17

see that we have the same results that

12:19

are uh reported in the issue with this

12:22

problem being present here at the bottom

12:25

so now that we've confirmed that the

12:27

issue is a problem is still a problem

12:30

we can search the uh search the the

12:33

repository for this call insert function

12:36

to see where it might be defined and the

12:37

model thinks that it is defined in this

12:39

common. piy file so we open this common.

12:43

piy file in the file

12:44

editor and we can look at the different

12:47

functions that are present and we

12:49

identify the eval call insert as being a

12:51

particular function of Interest so we

12:53

scroll that into view down on line

12:56

81 and after analyzing the code a little

13:00

bit the model realizes that there's a

13:02

problem with the indexing for uh those

13:05

the values in this Matrix operation so

13:08

we generate an edit which is then

13:09

applied again to this function which can

13:11

be seen after here between lines 87

13:14

through 89 and we go back to our

13:17

reproduction code to run that again and

13:19

see how the output has changed and we

13:22

see here that the output is actually uh

13:24

represents the expected result so it

13:26

looks like uh the issue is resolved and

13:29

and we clean up our workspace by

13:30

removing that file and finally submit

13:33

what we think is the right solution so

13:34

that produces this diff that we can

13:37

evaluate with sbench and after testing

13:39

on S bench we find that this submission

13:42

passes the initial test and it doesn't

13:44

break any of the existing tests so we

13:46

can Market result all right so that's it

13:49

this is incredible I'm so excited to see

13:52

all of the progress on AI coding helpers

13:54

if you liked this video please consider

13:56

giving a like And subscribe and I'll see

13:58

you in the next one

Rate This

5.0 / 5 (0 votes)

العلامات ذات الصلة
AI CodingGitHub FixesSWE-AgentGPT-4DevOpsSoftware EngineeringBug ResolutionPrinceton InnovationOpen SourceTech Tutorial
هل تحتاج إلى ملخص باللغة العربية؟