openai just announced Codeex their agentic coding product The actual coding agent feels very unique And not only that they released a model that was post-trained to be even better than 03 at coding So I’m going to go over it all Let’s get into it So let me show you the interface first because that’s the most unique part in my opinion It feels very chaty PT and it isn’t a VS Code plugin or a VS Code fork This is native in the cloud It feels much more akin to a Devon So you connect your GitHub repo right here You have the different branches right there You can either ask questions of your codebase or assign it a task to code Now the thing that I really like which I think a lot of other coding agents don’t have yet is multiple agents going out and coding in parallel Now of course that can be difficult if all of these agents are working on the same branch There’s going to be potential conflicts but that’s what Git is for So each time you launch a task it’s going to appear right here You can see it says starting container Every single task that you launch will be its own container will have its own environment will have its own keys And so you can really think of it as a truly isolated environment for every single task And the cool thing is every single task is essentially a fresh start with that environment It downloads the code for the first time It runs the setup commands for the first time And so once a task is complete and you can see here worked for 2 minutes and 58 seconds Everything is in the traditional chat GPT chat interface I really appreciate how they took a new view on this and it’s not just coding This is more like vibe coding than traditional coding So the task is I want to keep this codebase maintainable and bug-free Read through the code and propose tasks that would help me with this goal So here’s one avoid mutable default arguments and it’s giving a little code segment that it’s fixing Down at the bottom you can see that you can request changes or ask follow-up questions Basically just vibe coding back and forth with your coding agent And here is the suggested task right there And you can see all you have to do is hover over it It tells you what it’s going to change You click code and then it will actually make those changes Here’s another one And for this last one if you click into actually edit the task you can see here’s the task of what it’s going to do You can add something to the end You can edit it as much as you want and then go ahead and just click code All right Now you’re going to see it done through the codeex CLI So similarly you just describe a task give it a branch and then get going When you click into it here is a console Should feel very familiar but of course you have the chat interface on the left side By the way let’s just pause for a second and appreciate how far OpenAI has come on design I know everything is very simple with their design but look at this little animated icon that runs as it’s running things for you So you can see a little guy looking around and it changes back to the kind of traditional code console icon And so yeah little touches like this I really appreciate So at the end of a long task like this you could see it thought for 3 minutes 13 seconds You get a summary You get the code diffs on the right side You get the tests that ran automatically And if you hover over you can see the test status along with any debug codes Here’s all the files that were changed And right up here in the top right you can see a push button So you can push the code to GitHub All right So some initial thoughts One they rolled out their own custom model for this built on top of 03 This model called codeex one used reinforcement learning end to end to make it especially good at coding tasks And they call this out in the live stream They didn’t focus on the benchmarks They focused on realworld coding tasks And one other thing that they said in the live stream is that codeex the actual coding agent product doesn’t just wrap one of their pre-existing models in a wrapper and some scaffolding They actually developed this model specifically for that coding environment And that seems like a shot against things like Cursor and Windsurf But let’s talk about Windinsurf for a second It has been rumored strongly rumored nearly confirmed that they bought Windsurf for $3 billion And so just the same week they’re releasing their own coding agent product So it’s going to be interesting to see how this strategy unfolds They kind of spoke negatively about that kind of coding framework where you just have an agent wrapped around one of the existing models that they provide And then just yesterday Windsurf released their own model So it’s really interesting Windsurf is heading towards OpenAI Open AAI is heading towards Windinsurf And then they got acquired by OpenAI So we’ll see how this all plays out Let me pause for a second to show you another amazing AI product the sponsor of today’s video Recraft ReCraft is an incredible image generation and editing tool built for creators and teams ReCraft gives you control over the entire design process and is used by three million users and teams at companies like Netflix and ASA And I’ve talked about ReCraft before but they are rolling out two brand new features Infinite style library and style mixing both available to the public right now The infinite style library allows you to browse through different visual styles that you can apply to your images easily From photo realism to illustration you can search by theme or object and apply them instantly The second feature style mixing lets users blend multiple styles together just by adjusting their relative weight A really cool and creative way to make your images unique This allows for fully custom visuals while keeping the brand consistency intact Try the new Recraft features today They’re offering my viewers $11 off any plan Use code Matthew 11 I’ll drop the link in the description below Thanks again to Recraft Now back to the video All right I want to play this segment where Greg Brockman describes his vision and OpenAI’s vision for the future of coding and agents in particular And remember Greg Brockman is a co-founder of OpenAI So listen to this because it’s fascinating One of the things that I find really exciting about how Codeex works is it has very nonhuman strengths and weaknesses And so it really means that you get much more out of it if you start thinking of of it as not just a static tool that you just use like you know just without having to build expertise But if you um if you really optimize your codebase around what it can do you start like honestly most of what codeex benefits from is just what is good software engineering practices um in terms of of modular code bases with good tests and things like that um you’re able to just move so fast and we’ve seen that we’ve seen that happen uh internally uh with many people at OpenAI All right So what he’s saying is essentially when an engineer at production scale is working with AI and learns the AI’s weaknesses and strengths and kind of works around that designs the codebase around that That’s when you’re going to get the most out of these coding agents And that’s really interesting and something I’ve been talking about for a while I think codebase best practices are going to be changing I think code bases are going to be changing coding languages are going to be changing because more and more code is going to be written by AI and we need to optimize the language to be written for AI to actually do so And next he’s going to talk about a mini version of the model which can be used in codec cli that’s command line interface using it locally and apparently signin with chat GBT is coming So let’s listen We also are continuing to develop codec CLI which again is a local agent that runs on your laptop We’re releasing a mini model today and we’re also going to be releasing signin with with chatb to make it easier to get up and running Yeah he kind of just glossed over that announcement Signin with chatbt That’s a huge announcement Now if you think about it there’s two different form factors that we’ve talked about There’s the local synchronous on your computer version but there’s also what codeex is of an asynchronous in the cloud runs on its own computer And we think that the future is going to be these two systems coming together And that is the Windinssurf acquisition right there So Windsurf is completely local Codeex is in the cloud He’s kind of given you the playbook that yeah that’s why we acquired them This is not a competing product And I kind of agree There’s a use case to use coding agents locally on your computer and have all the files stored locally And there’s also a use case for doing it in the cloud And whichever company brings together the best of both worlds is going to win All right So codeex is rolling out of course to the top tier users first Chat GPT pro enterprise and team users today with support for plus and edu coming soon And this is really interesting Task completion typically takes between 1 and 30 minutes depending on complexity And that is fascinating because even with vibe coding the longest task usually only takes a few minutes Now let’s look at some performance benchmarks Here is SWEBench verified This is accuracy over number of attempts and codeex beats 03 high across the board until they just about converge at eight attempts Here is OpenAI internal suite tasks 01 high down at 11% and look at that codeex one at 75% compared to 03 high at 70 and 04 minih high at 67% So they really did train the best coding model amongst their family of models Now of course I want to put it up against Gemini 2.5 Pro Maybe we’ll do that So Codex Mini the mini version of the Codeex model is available through the API priced at $1.50 per million input tokens $6 per million output tokens 75% prompt caching discount So that’s it Let me know what you think How does it compare to Windsurf how does it compare to Cursor or Replet or all of the other awesome vibe coding tools out there