TRANSCRIPT. so open ai’s biggest partner Microsoft AI have just revealed what they’re going to be releasing in 2025 and it’s truly fascinating so take a look before I dive into all the key details we have prototypes that we’ve been working on that have near infinite memory and so it just doesn’t forget which is truly transformative I mean you talk about inflection points memory is clearly an inflection point because it means that it’s worth you investing the time um so it’s that capability alone which I expect to come online in 2025 is is going to be truly transformative so if you’re aren’t familiar with who that was that was Mustafa suan he’s currently the head of Microsoft Ai and it’s well noted that Microsoft AI work closely with openingi in order to produce products and services so it’s quite likely that we’re going to be getting some form of open AI model that basically has infinite memory one of the things about context windows that actually managed to surpass my knowledge even as someone that pays attention to the AI Community was the fact that if we take a look at this paper here Google research actually did publish a paper that proposes a way to help language models handle much longer pieces of text without needing massive amounts of memory or slowing down too much so they published this paper called leave no context behind efficient infinite context Transformers with infinite attention so basically this is a paper that essentially talks about infinite context windows and essentially how they’ve done it is by changing what usual models do so current models focus on all the parts of the text that they’re processing but it becomes very difficult when the text is very long imagine reading a book but trying to remember every single word you’ve read eventually it becomes overwhelming so they invented this new method which is likely to be deployed worldwide next year or variation of this method which is called infinite attention and this acts like a smart notepad that summarizes what the bottle has read so far and keeps only the essential points this allows the model to hold on to older information without storing every single detail just like making a summary of a long story in order to remember the main point now infinite attention Blends immediate memory which is what the model is working with right now with this long-term summarized memory and it’s as if the model can use both the short-term memory for quick details and the long-term memory for past important events it’s efficient and really cool the memory doesn’t grow endlessly it’s kept within the limits by updating with new information in a way that the old important stuff doesn’t get forgotten it’s essentially stored more Compact and overall I think everyone understands why this is going to be so impactful now if we actually managed to get true infinite context Windows this would be absolutely incredible with AI systems in 2025 that have an infinite context window and memory we could maintain context of every previous interaction with a user forever you could develop genuine long-term relationships with AI by it remembering every single conversation and remembering all shared context it could also track your personal growth and evolution of ideas over years it could ingest and reason about entire libraries of human knowledge simultaneously and it could maintain context of entire code bases documentation bug reports new user feedback for massive system this would be particularly crazy and if this does happen in 2025 I think it’s going to be a real game changer for those in the AI space and this is another area where Eric Schmidt also talks about long context windows and the implications of infinite context windows and what they’re going to be able to do in the future the context window is the prompt that you ask so you know study John F Kennedy or something but in fact that context window can have a million words and this year people are inventing a context window that is infinitely long and this is very important because it means that you can take the answer from the system and feed it in and ask it another question so I want a recipe let’s say I want a recipe to make a drug or something they say what’s the first step and it says buy these materials so then you say okay I’ve bought these materials now what’s my next step and then it says buy a mixing pan and then the next step is how long do I mix it for you see it’s a recipe that’s called Chain of Thought reasoning and it generalizes really well we should be able in 5 years for example to to be able to produce a thousand step recipes to solve really important problems in science in medicine in Material Science climate change now here is another clip where we get to see Mustafa suan actually talking about why memory is going to be so transformative for next year is memory we’re going to nail memory I mean I’m I’m really confident 2025 memory is done permanent memory I mean if you think about it we already have memory on the web retrieve from the web you know all the time quite accurately now co-pilot has really good citations it’s up to date 15 minutes ago knows what’s happened in the news on the web and so on so we’re all we’re just kind of compressing that to do it for your personal Knowledge Graph and then you can sort of add in your your own documents and your email and calendar stuff like that so memory is going to completely transform these experiences because you will be it’s sort of frustrating to like have a meaningful conversation or go on a interesting exploration around some idea and then come back three or four or five sessions later and it’s like let’s start again we completely forgotten what we talked about you know so I think that’s going to be a big shift as well cuz you’ll know not only does it lower the barrier to entry to you expressing a creative idea but those things don’t get forgotten too so you can do this ambiguous cross reference back to something that you what what was that thing I said like three weeks ago and that is it’s sort of like having a second brain in that it’s like an extension of your mind now I was doing a bit more research with regards to what is going to happen in the Years 2025 and Beyond and there are two main things that I think are going to happen that are pretty crazy number one is of course recursive self-improvement now this is something that I don’t know if I genuinely believe this but I don’t want to doubt someone who’s literally the head of Microsoft AI but recursive self-improvement is basically where we get AIS that are completely self-improving which means that a smart AI could make a smarter AI that makes an even smarter Ai and so forth and apparently this is going to happen before 2030 cursive self-improvement it could edit its own code in order to get better it could self-improve or like it would have autonomy it could act independently of your direct command essentially or you give it a very general command um you know and it goes off and does all sort of subactions that are super complicated like you know maybe even invent a new product and create a website for it and then set up a drop ship for it and then you know go and Market it and take all the income and then do the accounts and so on I mean I think that’s kind of plausible in say 3 to 5 years before 2030 I think we’ll definitely have that and might well be much much sooner and I found that to be fascinating because at the end there he says it could be much much sooner now I’m a little bit more skeptical than that because that would just basically mean that we have ai systems that are increasingly rapid but then again the pace of AI is incredible and I do remember that recently we did have a major breakthrough with open AI one series that basically means that things are heating up again now of course with opening eyes 01 series there is something that I do want to tell you all about for 2025 this is going to be something that is going to be the main theme which is of course agents but with agents I think they’re going to be released in a very specific way because agents are really tricky and I want to show you guys this clip right here because when I showed it to you guys I think around 4 months ago a lot of people were quite confused but in this video I’m going to show you guys and explain to you with a small snippet from a research paper why is the agents a really really tricky and it looks like it might just be a little bit longer before real agents when I say real agents I just mean agents that can perform actions on a long time frame are going to be there in terms of the reliability it’s still pretty hard to get these models to um follow instructions with subtlety and Nuance over extended periods of time I think that they can do it you know and there’s a lot of cherry-picked examples that are impressive you know on Twitter and stuff like that but to really get it to consistently do it in novel environments is is pretty hard and I think that it’s going to be not one but two orders of magnitude more computation of training the models um so not gbt 5 but more like gbt 6 scale models so I think we’re talking about 2 years before we have systems that can really now if we look at benchmarks for the frontier agent use interaction basically this is the tower Benchmark and this is The Benchmark where they look at how agents perform in real world domains so where agents are going to be used for real world actions now what was interesting to me was that this is I wouldn’t say it’s a reverse scaling law but it’s the kind of graph that you don’t see when it comes to discussing AI capability so what you’re looking at is a kind of high error rate so basically you can see which models are currently being used on the left you can see Frontier models like claw 3.5 Sonet gbg 40 and other mistal models now what you’re seeing for the pass one pass two pass three pass 4 is basically how many times a model get something right in a row so for the first try the model gets it right 46% of the time then if the model tries twice in a row it gets it right 32% of the time then if it tries three times in a row it gets it right 20 6% of the time and over time the performance just consistently degrades now what this means in theory is that if we want to actually use these models in production for various tasks we’re going to have to ensure that the error rate for these kind of Agents is close to 90% which means we’re going to have to at least double or even triple this performance to get anywhere close to something that works because if we don’t every single time that individuals use these programs they’re going to have an increasingly number of frustrating experiences because these kind of a systems right now even with clae 3.5 Sonet just don’t seem to be reliable at all maybe there’s going to be a new Foundation model for agents maybe there’s going to be a new way to train them but if the performance consistently degrades with how many tries you have them it’s just something that of course we cannot use in production and reliability is something that we do need and of course Dar amade actually spoke about this a few months ago where he spoke about how agents it’s going to take probably till around 2026 before we get real world reliable agents that are autonomous and doing a lot of things if you want an agent to act in the world um usually that acting requires you to you know engage in a series of actions right you talk to a chatbot it only answers and maybe there’s a little followup but with agents you might need to take a bunch of actions see what happens in the world or with a human and then take more actions and so you need to do a long sequence of things and for that long SE of things to actually work the error rate on each of the individual things has to be pretty low right if I’m a robot and I’m like you know okay I’m going to pick up this thing and walk over there and I’m going to pick up that you know I’m building the house or something there’s probably thousands of actions that go into that and so all of this is to say the models need to get more reliable because the individual steps need to have very low error rates and I think part of that will come from scale um like we need another generation or two of scale before the agents will really work so let me know what you think about 2025 and the infinite memory because I think it’s going to be a real game changer