An important contribution to the beginning of potential solutions to X-risk and a prosperous Future of Life for humanity and all living things on our planet, and beyond.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

691 views. 25 Oct 2024 Future of Life Institute Podcast

Andrea Miotti joins the podcast to discuss “A Narrow Path” — a roadmap to safe, transformative AI. We talk about our current inability to precisely predict future AI capabilities, the dangers of self-improving and unbounded AI systems, how humanity might coordinate globally to ensure safe AI development, and what a mature science of intelligence would look like. Here’s the document we discuss in the episode: https://www.narrowpath.co Timestamps: 00:00 A Narrow Path 06:10 Can we predict future AI capabilities? 11:10 Risks from current AI development 17:56 The benefits of narrow AI 22:30 Against self-improving AI 28:00 Cybersecurity at AI companies 33:55 Unbounded AI 39:31 Global coordination on AI safety 49:43 Monitoring training runs 01:00:20 Benefits of cooperation 01:04:58 A science of intelligence 01:25:36 How you can help

A Narrow Path welcome to the future of Life Institute podcast my name is Gus Docker and I’m here with Andrea mioi who is the director and founder of control AI which is a nonprofit working to reduce catastrophic risks from artificial intelligence Andrea welcome to the podcast thank you very much thank you for having me fantastic you have a new report called a narrow path which is about exactly this topic how to prevent catastrophic risk what is the Narrow Path so what we lay out in this report as a plan for Humanity to survive super intelligence and Thrive and flourish beyond that you know there’s one key threat that we deal with it’s the development of artificial superintelligence AI that is smar than humans across a general range of tasks in many cases could be smarter than all humans combined and this is something that you know for some people could be far away but increasingly there’s a concern that this might be coming quite soon there are definitely a handful of companies investing tens of billions of dollars soon hundreds of billions to make this happen and the thing is if we develop super intelligence right now as many experts warned us you know now even new limited Nobel prizes but also CEOs of the very companies developing this technology world leaders like the previous UK prime minister Rak we Face an Extinction risk and you know we need to chart a different path this is what we tried to do here to chart a path where Humanity keeps the benefits of AI as a tool as a tool to empower humans as a tool to you know provide economic growth in applications that help us but not as a successor species not as an entity more powerful than us that overpowers us yeah I guess the recommendations in this report depends on when we think we will have something like AGI or uh even more advanced systems like super intelligence and you you give an estimate of within 3 to 15 years what what is the ground for that estimate yeah so the ground for for that estimate is a combination of know General estimates from people in the field we you know we’ve had recently s Alman CE of open AI estimating something that is roughly three or three three plus years in this view for super intelligence in a recent blog post and 15 years is kind of a a time scale where in in case the current scaling Paradigm doesn’t work too much or slows down there will still be a lot of algorithmic improvements and other advances that will still give us gains during that period but ultimately I don’t think that the exact time frame matters that much like we kind of expect that this technology might be developed relatively soon especially soon for a government’s time scales we know governments are not that quick at many things even you know five to 10 year time frame is a challenging deadline when you’re facing extinction so the reality is that this is coming we don’t know exactly when it’s coming predicting the future is hard but we need to plan and we need to plan now if it comes in three years we’re going to be in real trouble and if even if it comes in 15 years we’re still going to be in real trouble unless we plan the timing matters but it doesn’t matter in a deep sense what matters most is that we have an unsolved problem of how to align and control systems how to understand these systems and if that problem isn’t solved before we develop the the advanced systems then we might be in trouble so maybe you could you could tell us about how much do we understand about AI systems currently and how is the understanding that you would like to see before we develop superintelligence different from the understanding we have yeah so as you said a big issue with AI especially these Advanced powerful AI systems is that we understand them in many ways very little and like we are approaching this without some scientific foundations that we did have for other large scale transformative Technologies or intelligence we have neither science of intelligence that let us make let us make predictions about you know if you put this amount of computing power and this amount of data what will you get and we don’t even have a a Metrology measurement science of intelligence we don’t even have like a yard stick to be like okay well clearly gp4 is as smart as two human being beings and as smart as 10 mice we do have a deep intuition as as humans of like you know we look at a rock and we think like that’s probably like around zero intelligence or you know but very little a flower is clearly more than a rock it can take some limited amount of interaction with its environment a mouse is more than a flower we are seeing AI systems that can do very complex things you know but are they as kind of quote Yun are they cat level are they dog level at a human level without a measurement science we’re we’re just making guesses and unanchored guesses and the reality is intelligence is not magical intelligence is just the ability to accomplish goals it’s a physical phenomenon like any other phenomenon that we have in our physical universe and we were in this situation with other things in science before like temperature wasn’t like a clear measure of temperature wasn’t developed until just a few centuries ago and it just took people to do empirical experimentation and drawing up skilles to actually measure this you know clearly tangible phenomenon that we didn’t have real measures for and once we had those measures then we could make clear predictions and clear you know Drawing the Line understanding different levels and so on we we will need the same for intelligence to be able to actually predict in advance not just after deployment or you know even sometimes never but to predict that before we build it how powerful is it going going to be what’s going to be capable of doing what is it not going to be capable of doing that’s these are the foundations needed to be able to control such a power technology to know advance to predict what it’s going to do so regular listeners to this show will know about scaling laws and these laws seem Can we predict future AI capabilities? to be something that we can use to predict AI capabilities in advance in fact you just mentioned them as something that you know when we scale up compute we expect to see certain capabilities arise why are these scaling laws not enough for us to have a a a good understanding of of how AI capabilities emerge yeah so scaling laws are definitely helpful but they’re you know they’re ultimately empirical observations that they’re not kind of fundamental laws grounded in theory like we would have in physics or other more mature disciplines and the the proof is in the fact that people aware of scaling laws still cannot predict in advance what their models will be able to do like they can roughly see that given you know one or magnitude more of computing power you will get a more powerful system but they’re not able to predict exactly at this level the AI is going to be able to use coding at this level of human performance at this level the AI will start to persuade humans at this level there are no such precise predictions being made and this is actually the kind of things that we could get if we had a deeper theory of intelligence and an actual measurement of intelligence so you can see how many people put it it that’s correct compute is a very good proxy it’s kind of the best we have given that we don’t have the actual measure the actual yard stick for intelligence we use compute as one of the main proxies to approximate it but it’s still proxy and we we can’t always rely on it and there are many other areas that can improve even while keeping compute constant things like algorithmic improvements things like the just switching to the chinchilla scaling glov compared to the previous scaling glovs led to massive improvements in performance even given the same amount of computing power you know things like open’s recent model o1 that uses a lot more inference compute rather than training compute to kind of essentially postprocess its its outputs and make them better that not really well captured by just training compute proxy so we need need to go beyond it’s a very useful tool but we need to go beyond that we need to have a defense in depth approach to AI policy that doesn’t just rely on compute thresholds and that is actually what you lay out in an arrow path I think before we get too deep into this interview I would love for you to to lay out the three phases of an arrow path what you would like the world to do in each phase yeah absolutely so kind of as we covered so far we focus on one specific trat model super intelligence you know AI smarter than any human or smarter than most human combined this is you know the explicit goal of some companies it’s the explicit goal of people developing this technology and if we don’t put safeguards in place the default outcome of that development is human extinction so what do we what do we do about that well we lay we chart a path of three phases phase zero phas you know pH with this threat the first thing we need to do is to stop dying stop losing you know if we develop this right now it’s game over until we have the safe CS in place until we know how to control it we just cannot afford to develop super intelligence everybody loses if we do it’s not going to be the victory of one insur over another one company over another Humanity loses the only winner is super intelligent so in that phase the goal is safety preventing the development of super intelligence for at least 20 years then beyond that if we do succeed at you know building our defenses and building beneficial AI tools while while preventing super intelligence then we will face another issue which is international competition interc compan competition Rogue actors trying to develop this technology we need to make an international system that is stable and where if we get these measures implemented they don’t just collapse under competition and pressure and this is why the goal of phase one is stability building an international AI oversight system that is stable and beneficial for all participants and finally once we are safe you know we’re we’re not dead and we’re not going to die anytime soon we have a stable International System so we have a way to prevent Rogue actors we also have a way to keep major AI players together rather than escalating into a arms race on AI then we can look into building a flourishing future so the goal of phase two is flourishing develop transformative AI not super intelligence but transformative Ai and as a tool to benefit humanity and to achieve all of the economic growth and scientific potential that we do want from Ai and we can get from AI if we get all these Risks from current AI development things right this is an alternative plan compared to the plan of the leaders of the AI corporations why isn’t it that what they’re doing will work so for example why why want reinforcement learning from Human feedback which is the currently dominant technique why isn’t that enough to to keep us safe and to to keep AI models aligned to our our values for example when I use chat gbt it seems relatively helpful and align to me why won’t that just continue to be the case in two parts specifically about refers and learning from Human feedback that’s a plan that I think even even the companies themselves do not expect will scale super intelligence they’re just quite upfront about it I also when I use chbt I’m pretty happy with the answers I get but you know what but what these companies are building is not more chatbots you know if we only had chatbots we would have some things to worry about but not at this scale the reality is that what the development is going towards and is already being built for is to have you know generalist agents AI is agents that take actions in the real world autonomously on their own connected they already are you know connected to tools connected to the internet and so on and what rhf ultimately is is it’s essentially kind of training the model with a with a stick and with some reward it’s like you know you have a it’s like having a an animal that you don’t really understand and whenever it does something nice you give it a candy and whenever does something that you don’t like you just hit it with a stick not very nice but you know that’s that’s how RF Works doesn’t tell you much about internally how the system is representing this information is learning it’s not telling you whether the system will just go and do something else when it’s in a different you know environment in the real world you couldn’t predict and Ely it it doesn’t even address a lot of the fundamental issues that come with scaling to Super intelligence things like using AIS to improve other AIS or like having these AIS act autonomously in the real world this will lead to the AI itself changing to like next generation of AI changing that this is a very very limited way to deal with that but I would say at a at a kind of a a bigger picture level I am not sure I have SE seen even a plan from the companies like I’ve seen you know maybe concepts of a plan things like rhf but I don’t think the companies are quite public at least with their plans if they have them at all the you know the plan seems to be let’s continue building super intelligence and let’s see what happens and you know we will dedicate some of our budget to doing some Safety Research but deployment comes first and you know release comes first anyways you know even then even like even if one of them figures out the way to keep the systems controlled how are they going to make sure that the other companies don’t just deploy the dangerous system right like that’s not not a single company issue to solve it’s not even a single country issue to solve you know maybe you can get all the companies in one country to follow the solution to this problem that has been found no not found yet but found in the future what about another country where they’re just developing the same thing how do you make them do that how do you make sure and this is why we need a plan that just that goes also just beyond the technical solution technical solution is part of the problem and we need it urgently especially on this tight deadline but there are a lot of governance and policy questions at the national and Global level that are just you know left undone and we need to do them fast we do seem to be in this tragic situation where there is say an AGI Corporation developing more and more advanced systems and in response and another Corporation arises where their claim is we will do this more responsible we want we want raise ahead and then you you suddenly get this proliferation of companies competing with each other where each is is created in response to the to the other ones and not acting responsibly yeah it’s quite ironic it’s a big irony it seems yeah you you even have the ones that start having you know safe in the name to and the come to tell you well if it safe in the name it’s it’s going to be fine right yeah yeah you’re referring to Il ver safe super intelligence Incorporated yeah exactly all right so phase zero is about safety and there the plan is to prevent the development of super intelligence for 20 years where does that number come from why why 20 years why not five years why not 50 years this number is essentially a forcing function it’s a way like it’s very very often in policy you know one ends up going or like you know vague ideas or ideas that are difficult to operationalize like put first of all putting a number forces you to check and this is why we put a number to check okay if I actually assess all of these measures and they all get you know are all in place do I predict do I expect this actually stops it for 20 years or not it lets you do the exercise more like a you know solving a mathematical proof exercise rather than having a general feeling of whether things are going to help on the margin or not then you know why 20 specifically it takes a long time for government to do things so some people might think that 20 is too little I definitely know a few some people might think that 20 is too much I think generally people that know how how long it takes to build institutions and build New Foundations for science I think generally are on the on the side of it might even be too little but if you might you can do can kind of speedrun the whole process and set up new international institutions very robust safety policies at a national level in all major players achieve Global coordination on this and build the foundations for you know scientific foundations for building safe transformative AI then we can you know we can do this in less than 20 years but I think we should account number one for this things are hard and take time and we cannot afford to miss the deadline and second of all you know planning fallacy it it takes a long time and it’s it’s quite complicated to get agreement on a lot of things it’s possible we have done it before but it’s quite complicated I think 20 years gives us a good you know a good window to still take it also this doesn’t mean stopping other AI development in this 20 years we will have massive transformation like a society with no super intelligence is still a society where you know the vast majority of Labor will be automated at current rates it’s a society where we will have a completely different education environment a completely different work environment in the next five to 10 years even without super intelligence it’s a very important point I think it’s worth stressing that what The benefits of narrow AI control AI your organization is against is not all AI development it is this development specifically of super intelligence and and so yeah this is not about shutting down all AI development it is about the specific issue exactly and it it seems to be at least in principle that you know we should we should not develop super intelligence for as long as it takes to solve the problems we need to solve so solving the problem of understanding these systems and aligning these systems but I can see the point in having this 20 years where you’re not developing such a system as something for people to coess around and and I take it that that this is the this is the reason for 20 years specifically yes exactly the the first objection that that comes up when you propose something like this is that setting up International institutions that will control what everyone on Earth can do specifically that they can’t develop superintelligence that is a level of kind of authoritarian control that we simply can’t accept and so weighing that risk the risk of of authoritarian control over technological development against the risk of superintelligence some people might come down on the side of we need to prevent this uh this control from from being implemented well I think sometimes people get a bit ideological on these issues also because they they forget that no we live in a world where you cannot just build a nuke in your backyard right like we already have the things and and for good reason and they they have nothing to do with you know International totalitarianism or similar like we you cannot build biological weapons uh you cannot build a nuke in your backyard you know and we’re very happy that you cannot do that because otherwise we will be a very unstable word the word is made of trade-offs like obviously like I’m not going to just lie and say there are no trade-offs like obviously there is a trade-off between speed and and safety sometimes there is a trade-off between security and proliferation and we need to be you know to make mature decisions as you know as countries as individuals and as civilizations to take these trade-offs in a way that maximizes our Collective you know our Collective gain our Collective benefit the reality is that if we just rushia ahead right now and anybody on the planet can just develop super intelligence we will die or you know we’re quite likely to die I believe we will die some people are skeptical but the real the fundamental reality is that if we if we have this system set up like that it’s utterly unstable it’s not going going to be possible to prevent all risk that come from it and so we need to find a different path we we have we have found well with the nuclear weapons again we it’s thanks to the efforts that countries like the US and you know scientists on both sides of the Iron Curtain did that we have a stable nuclear International System it’s it has a lot of flaws right but we’re still alive today you know if we there could be a different word where there was a massive nuclear war after World War II and you know me and you and everybody else listening wouldn’t wouldn’t be around here today to to talk about this stuff so we did we did manage to do it with other extinction level Technologies we also as a civilization if you know if everything goes well and we succeed we will keep building bigger and bigger Technologies that’s great but also that comes with more and more responsibility the bigger the technology you build the more the downside risks the more the you know blast radius of it is going to increase And if every time we build a technology we just distribute it everywhere with no safeguards to everybody all the time this is a you know a recipe for a civilization that doesn’t doesn’t make it at some point you just need one or two or very few actors to screw it up for everybody else and you know if we are to become a great civilization even greater than where we are right now and explore the stars and you know live healthy and long lives you know much at much larger scale than we do right now we need to Grapple with this fundamental questions we need to be able to deal with technologies that affect everybody on the planet right now it’s it’s difficult for me to imagine a technology that could be as transformative as super intelligence but I can imagine a period say after we develop super intelligence where we suddenly are faced with a bunch of new technologies and some of these Technologies could be destructive so it’s it seems like a very good idea to have set up a system for for handling such Technologies at an international level you sketch out what the conditions for achieving safety would look like and Against self-improving AI I I would like us to to go through each of them because these are these are actions you could take while developing AI that will be especially dangerous and the first one of these is that we do not want AIS to be improving AIS do you think this is to some extent already happening and how do we prevent it from advancing further yeah so this one is a crucial one kind of to give the the full picture the reason for this one is is that whatever safeguards you put in place whatever you know limits you put in place on AI development if AIS can improve themselves or can be used to improve AIS it’s very easy for them to now break out of this limits of of this bounds that you put in place and that’s going to nullify any defenses that you’ve you’ve put in place so like it’s a we put it as a necessary condition to have a safety regime that works because this just trivializes kind of any defense you have in place if you let it happen some amount of this is happening now in in our policy we focus we worked a lot to try to find a measure that is you know really Surgical and affect the most dangerous forms of AIS improving AIS while also leaving a lot of you know beneficial normal software untouched so some sometimes you need to make trade-offs but when you can find a precise way it’s even better so for doing this we we coined the idea of found systems so there’s a sometimes in in in common parland we use AI to talk about a lot of different things but there’s a kind of two kinds of AI that are intuitively quite different there’s normal software that humans like me or you write down and then there’s this in some ways quite strange form of AIS which are a lot of things like chpt or you know mod large language models that are not really designed or are written by humans but they’re kind of grown they’re they’re they’re found via mathematical optimization this is why we call them found systems and this you know is one of the reasons why we are so inscrutable for us in many reasons it’s because we don’t just design them we don’t don’t just draw them out and a plan for them or we just find them via mathematical optimization and also these are the ones that are most concerning and most most dangerous because we don’t have good ways to understand them and and bound them to put like understand in advance what’s the what’s the limits or what they will be able to do and you know what are the what’s going on inside them and how can we know limit in which directions they they grow and so for no a improving eyes we focus on no found systems so no AI grown AIS improving other grown AIS why if we have ai improving AI at machine speed there is you know very little way for humans to actually keep up with what’s going on they’re much faster us we don’t even understand them right now imagine you know in an in an intelligence explosion where they keep improving themselves or improving other AIS and that’s you know a recipe for a disaster and for having an uncontrolled intelligence explosion that we just cannot follow while we’re happy with you know we want to leave a lot of normal applications like humans improving AI are like humans using handwritten code to optimize other handwritten code with the way that we we spell out the policy you know we find a way to to leave that untouched so what would be excluded here would be for example a researcher at open using a language model to help him generate code and then using that code to improve their systems but would you also exclude say Nvidia using AI for chip design uh in this case yes we don’t cover using AI for chip design for the case of the opening employe that’s quite interesting so no obviously laws are lws are an exercise in trying to you know bring Precision but you can go infinitely deep with legal definitions that there’s always going to be edge cases so the kind of the the clear case that we want to prevent is having you know gp4 considerably help with the development of GPT 5 that’s not what we want to see happen what we are clearly not covering is is things like using auter AI to record this meeting and transcribe it and then review it or you know to record an AI develop a meeting and trans describe it and review it of course the reality is that things are always on a spectrum right now we already see companies You know despite like I think s Alman said at some point that recursive self-improvement is something is so dangerous open eye would never do it yet in practice we do see that in companies right now like anthropic and like opening eye they they do say that they use their own current systems to speed up their machine learning Engineers for the development of the future systems and again this is a spectrum it’s always difficult to draw to draw a line obviously the more this is done and the more this is delegated to the machine that we don’t understand the more this is dangerous and so no we need a policy that kind of says no the full recurses of improvement with machines is not allowed you should have protocols in place to prevent this you should teach your employees that this is something very dangerous you shouldn’t do and then you know for the edge cases we have we have courts and we have discussions to find exactly how they work yeah makes sense another development that could be dangerous is if you allow Cybersecurity at AI companies systems to break out of their environments maybe you could describe how could this happen and isn’t this a case where this isn’t even in the interest of the ADI corporations themselves so perhaps there’s some room for agreement here yeah so this is a case this is another one of these conditions that is really foundational because similarly to AI Improv AIS you know whatever limits you put in place to limit the power and the danger of AI systems if the AI is capable of just breaking out of their virtual or physical environment they’re going to nullify all of your counter measures so again if if kind of if the AI can just open the his his room and you know leave the room and do something else then your room is useless if your prisoner can just open the the jail cell that the jail cell is useless in this case this is something ironically like well I was developing this policy I thought of it you know I thought it could happen quite soon but I could not expect that in September is essentially was reported in the 0an model card during the Open Eye testing of of 0an 0an did something similar to this you know there’s contention whether it’s there’s always contention whether it’s exactly breaking out or not this is why we need clear rules put by the third parties and having third party inspectors decide and adjudicate not just the companies themselves but in this case the model essentially was given a computer security challenge capture the flag challenge to solve the challenge was broken in C a certain way and the model realized well I I still want to solve it so it it found a way to to gain kind of route access to the the virtual machine where it was going on start a new environment and then solve the challenge in that new environment it essentially broke out of one level of boundary not all of them it didn’t you know it didn’t leave the server but it broke out of who one level of boundary that was set up for it to solve a challenge that was impossible otherwise so this is very concerning you know if you have a system that is this capable and you put safe cards in place you’re gonna you know the system can just find a way to just break out of them and and leave and nullify all of your safeguards so and indeed it this shouldn’t be in the interest of the companies to have this happen but again there are tradeoffs you know the more power the kind of the more raw power and the more more raw access you give to your AI systems the more in the short term you might feel like you have an economic incentive yeah it’s it’s know it’s just doing it’s just just doing whatever it thinks it’s best it just has access to everything it can it can solve all of your problems the you know the the other side of that is yes with complete free access and complete no restrictions then the risks are really severe and you nullify all of the other safety measures so we need you Common Sense approach of no no AI that can break out of their environment make sure that they cannot build your environment in a way that doesn’t make this possible and the burden of proof should be on the company to show that they’re not definitely they’re not developing these types of models intentionally if they find this they find a way to remove this ability or constrain it and if they cannot well this this shouldn’t be around the case you mentioned from the o1 model card is like a miniature version of what we might expect with more powerful models you can imagine something like a a hedge fund deploying a model that’s that’s trading and giving it a bunch of access to their systems and the the model beginning to do something that that is that is not within the bounds of what they expected so trading more money or breaking laws or something like that that’s a much more large scale that doesn’t seem out of out of the question for me and I also think we will so both professionals but also kind of the the the broad public will begin to give these systems more and more access you will probably be tempting to give these systems access to your email to your calendar and just in those very kind of limited circumstances with that access what what could a a a model be capable of doing could a model buy more compute power for itself on Amazon or something like that but do you worry less about this because again it might be in the interest of the companies themselves to to to prevent this I mean this is this isn’t in the interest of of me as a consumer for example or even the people who run the the hedge fund would would you know face legal trouble because their model escaped from from from the boundaries that it was believed to be operating within well I think this is precisely why we need to have these as rules not just as kind of General incentives I mean in theory everybody has an incentive not to go extinct from AI yet and and this risk is even recognized by the companies developing some of the most powerful models yet this is still happening right so if we just rely on incentives and we we could also make incentives stronger like there is a you know there’s a strong case for liability because liability aligns you know the incentives of the of the developer with having to prevent these kind of things from happening in the first place otherwise they will face penalties this is this is exactly the the approach here the approach here is to say here are some conditions that we actually need to enforce we can’t just rely on the Goodwill of companies know even if all companies are well intentioned competitive pressures Rogue actors accidents will happen unless there are protocols in place rules in place ways in place to mitigate this and we know we just need to make it actually compulsory and like it has happened with many other Industries this will know quickly improve the situation and make sure that these things actually don’t don’t happen what is an unbounded AI and Unbounded AI and why why is that dangerous other one of our conditions is no unbounded Ai and I know the name of this one is a little bit Arcane but I will try to make it uh clear so the idea here is that in almost any you know high-risk engineering sector we actually know in advance before even developing a system or like you know building a system think of like a nuclear power plant you can sketch out a or building a plane that needs to fly and not kill a bunch of people the developers and the builders are going to sketch out a blueprint is going to have a lot of assumptions and a lot of calculations on exactly how this thing will fail what are the safety margins you know what amount of pressure this can withstand you know in the case of a bridge you need to know in advance and you can know in advance if you make your calculations how many cars can it withstand before it’s G to collapse and we we kind of in many ways we expect that this is happening with with everything that that we deal with but AI is is very strange it’s an industry that’s kind of developed without all of these basic and normal Common Sense engineering practices where developers don’t even know before finishing the training run what their AIS can do and don’t and don’t even design them in many ways right they we we’ve talked about how they’re kind of grown rather than than than designed and just discover capabili is way long you know way after the fact even after the model has been tested and evaluated for a week maybe you know one year later you still discover new cap ities and new ways that you can do you can do stuff and that’s just an essentially an untenable way to deal with safety like this if if this was the approach with safety for Bridges or for nuclear power plants we would we would brid Bridges wouldn’t stand and power plants would explode all the time what we need to have is instead you know AI that before deployment a developer should just be able to say I can you know given this conditions that I have in mind I can expect that this system will be able to do this but not that it’s quite easy for Sam systems for example if you have a CNN a convolutional convolutional neural network to scan for cancer for example it’s quite easy to say as a developer look this this model is only trained on cancer pictures only trained on images and it can only output text I pretty confident it’s you know it’s not going to be able to hack other systems it’s not going to be able to generate or execute code I’m I’m not giving it access to like anything that can execute code and it’s not going to be able to self-improve or like let’s go for bigger system like Alpha fold that’s a case of a it’s a very powerful system you know it’s a Marvel of Science and Engineering but it it can just encode proteins you know you can be quite maybe not you know not 100% certain that’s always hard but you can give a very confident estimate like this system is bounded just to the protein folding domain you know it’s G of it’s going to Output protein structures it’s not going to Output images it’s not going to Output other stuff especially if we don’t give it access to like tools that could let do that yeah so another way of of making this of stating this difference might be saying that unbounded systems are General systems and those systems are are opposed to narrow systems like Alpha fold where you get some form of safety built in because yes Alpha fold is super human in the domain of protein folding but it’s not superhuman in other domains I so I actually do think that it is possible to bound General systems it’s just just hasn’t been done you know in principle it’s definitely possible you know maybe not in the full limit again we go back to like you know full super intelligence it’s very hard to like give it like actual know Godlike AI you know what how you going to predicted bounds but you know with better understanding there is no reason why you know with better interpretability or with better understanding or new new ways to to design and and train like fully General systems you could still give you know certain bounds and in practice what we ask for is like for for any capability of concerns or any capability that is considered dangerous or is illegal in a certain jurisdiction a classic example would be the most you know increasingly most jurisdictions that will be concerned about AI enabling the development of biological weapons or you know assisting with classified information about nuclear weapons now if you if you do have as you know if you as a developer you have a way to demonstrate that your fully General system will not generate this information that’s great you know but we don’t need to be prescriptive we just like companies and developers should find ways to innovate to to match this we can leave it to them you know if you have a full interpretability technique that lets you do this or you know you a simple way could be look my model I can guarantee you it has not been trained on a single you know on any chemical data it just there’s no way to talk talk about chemistry there’s no way can talk about chemistry it’s only know going to talk about other stuff nothing in the training data it has a r derive this from know learning about the English language and so on that’s great that’s you know that’s that’s enough proof that we need but we do need to start having these things otherwise if we don’t you know if we don’t bound our systems we also cannot bound the ways in which they can fail or act and we again it breaks all of the other assumptions that we can make about them because we don’t even know what the full extent of what they will be capable of doing is so I think Global coordination on AI safety here we arrive at phase one which is about stability and this is about how to implement A system that actually upholds the prohibitions on systems that are that are dangerous and and we would like to avoid so you have a number of conditions for for stability also and perhaps we could we could walk through those in a similar fashion yeah where the first one is nonproliferation and this I take it that this is just about we don’t want AR systems spreading across the globe in a kind of unregulated way how do how would you explain nonproliferation yeah absolutely so with so in nonproliferation and you know here it’s it’s a common idea in arms controls and with other powerful Technologies like nuclear where you know ultimately if you have a technology that can be dangerous you you can have a TR Vector the more people and the more countries that they have access to it it the more something can go wrong you know you’re kind of multiplying the downside Risk by the number of actors that can use it so you need to find a way for the most for the you know the fundamentally dangerous systems to not proliferate across the more there are the hard it is to govern them the harder it is to monitor them and also if they proliferate it makes it really difficult for agreements to stay in place because countries will start to feel like well you know I’ve I’ve finally decided to implement safety measures and you know limit the development of AI in a certain way but all of these other countries are just rushing ahead and they’re getting access to it and they are you know developing Beyond these safeguards why am I limiting myself competive pressure you know builds builds up over time and leads to kind of skirting the safety measures rather than following them would would nonproliferation set bounds on on open sourcing AI so well I think the open sourcing AI question is quite an interesting one first of all because it’s very often a misused term there are very few especially of the most powerful models and very few that actually have a an actual open source license or that actually follow the principle of Open Source they generally just release the weights so I usually like to call it open weights which is not the same as having the source code just being able to replicate it at home you know good luck finding all the compute that the meta has and so on but yeah like ultimately the reality is that for sufficiently powerful systems we cannot have them spread across the planet and ungoverned you know if if you accept the premise that Sam systems will lead to catastrophic risks if you have eight billion people on the planet and those eight billion people on the planet include terrorists they include you know soci paaths they include people that just want to hurt other people they include Rogue States the moment they have access to those then you need to find a way to remove access to those systems from from them otherwise you face a threat so at a certain level of capability and at a certain you know certain level of danger it is untable to just have proliferation one sad thing is that I actually do really understand the approach of many people that are in good faith like trying to concern about you know overreach of you know International measures and like Global surveillance but the the setad thing is the more we proliferate powerful AI the more the only way that we have a stable future system is with more and more surveillance like if you have new level technology in every house the war is either has much more surveillance than we have right now or we we die in a nuclear war so we should rather seek to you know not proliferate now and have roughly you know similar similarly powerful to like what we have on nukes measures on AI that don’t invade our our privacy and our life right and Russia had to proliferate and then the only way we have we have to get out is much more invasive measures they might you know they might still be worth it because you know not dying is a pretty nice deal but you know I would just rather have a future where we keep a lot of our privacy and we don’t have dangerous technology proliferated and we can easily Gard it in a few in a few areas which type of international structure would we need in order to prevent proliferation yeah so the structure that we propose is quite similar to what we have in place for nuclear because there are a lot of similarities actually between AI nuclear some differences but a lot of similarities a few of them are there well you know extremely powerful technologies that are inherently dual use you know you can use AI in a lot of narrow civilian beneficial applications you can also you know build AI to create technology that is as powerful as n if not more say with nuclear weapons and nuclear civilian power they both AI despite being software it’s still quite reliant on a physical resource a bit like uranium for nukes and for I is compute so there is this sign there are various inputs to AI but the the most important one definitely the most governable one is compute computing power these are this is not just digital these are like physical big you know GPU is big machines big supercomputers that sit in large scale data centers they’re fairly easy to monitor they’re physical they’re large they’re expensive and there is a very thin supply chain that produces them globally so they’re quite easy to to track and this you know M Maps quite similarly to uranium and plutonium for nukes a few differences are that that nukes are just a weapon you know they are just ultimately even if they’re extremely powerful they’re just sitting there and the the way they can go wrong is by human me use or accident with AI especially with AI that many companies try to build we are dealing with agents we’re dealing with essentially entities that we should model that are much easier to model as essentially adversaries or like a fleet of human operators but on a computer or you know a fleet of analysts but on a computer rather than just a kind of a static weapon you know they take actions on their own they buil to take action on their own and they can model you like an adversary would you know so they’re much closer to an adversary of a foreign Force rather than just a weapon this is a key difference but yeah in terms of international structure uh what we propose is very similar to what exists for nuclear so we have we propose an international agency that is akin to the IAA so the international Atomic agency that would at the same time have monitoring and verification mechanisms to kind of monitor the the stock pule compute monitor that the safety conditions are being enforced across countries make sure the countries have licensing regime to themselves monitor these safety conditions and let countries also monitor each other you know always trust but but veryify it’s very important that each other are not violating these commitments and I I think this is absolutely possible as we’ve done it with nukes wouldn’t this run into kind of the classical problem of international law which is that we don’t have strong enforcement mechanisms and it’s difficult for for countries to agree so how for example would would the US and China agree on on these kind of international structures yeah so it’s worth noting that the Ia does exist and it works pretty well so like we have succeeded in some areas in some pretty dicey areas it’s not easy you know this is why this is a narrow path think the reality is that there are a lot of Puffs in front of us this is also has been what Hinton this week said in his Nobel Prize response that we are at a beercation point in history where we are facing this extinction level threat coming up very soon in the next few years we will need to figure out whether we have plans to deal with it or not so most of the paths are going to lead to you know paths where we don’t try to cooperate or we try to cooperate and fail or something goes wrong they’re going to lead to a bad outcome but we need to tr try to go for the good outcome and we have achieved it before again you know the Ia exists for nuclear weapons the biological weapons convention has prohibited biological weapons worldwide and is you know sometimes there are defections but you know all in all we have a very limited number of biological weapons being actually used human cloning was a a case where it wasn’t know very strong and immediate ban on a technology that would have been very very economically and militarily and strategically VI valuable for countries and this has been you know adopted in the US in China in Russia and so on so we we do need to try and the important thing to understand and this is why I think some other approaches proposed in this field are either naive or disingenuous is that the alternative to cooperation is war and war is nuclear war you know we are in a world with powerful you know nuclear States and we know we either find a way to cooperate and to prevent the development of super intelligence together or the alternative is that you know one country we need to force the others to prevent the development of super intelligence and forcing a nuclear power is quite tricky so we should test the cooperation route and again always trust but verify or you know sometimes not even trust just verify but we should test the cooperation route before going for the war route I do not believe there is a route where somehow you know one player just takes over all of the other ones but there is no conflict you know trying to take over other countries leads to conflict that’s the definition of of take over that’s how you know International geopolitics Works how do Monitoring training runs you trust but verify so how specifically how do you verify do we have interesting technical measures or solutions for how to how to look at which training runs are are on the way in in countries and so on that’s a very good question the good news is that we increasingly have more of those Technical Solutions before going to the Technical Solutions I do want to stress that the first step is we need the processes and institutions in place like we shouldn’t feel bottlenecked by the technology the technology helps but the kind of the foundation also for using the technology and deploying these better and better monitary Technologies are actual political agreements and deciding that the policy you know is worth pursuing generally You know despite despite all things if major governments especially governments like the US decide to do something it gets done the main thing is actually decided to do something and then you know we will find a way to to solve it via you know going from the the old timey inperson monitoring by inspectors coming with a with a suit and with with a bag to very sophisticated new technology the good news is that we also have more and more sophisticated new technologies so there are a few ways so some ways are again more a legacy but can still be used so data centers are large they consume a lot of energy they produce a variety of signatures that are no verifiable we also propose in our plan a specific approach of limiting the total amount of flops kind of total amount of computing power per data center a pretty high limit but still this is to ensure that if somebody wants to illegally break out of the limits we have the equivalent of a nuclear breakout time that we can calculate in advance of they will need to like smuggle more gpus into that and it will still take them you know a certain number of years before being being able to train such a larger model so we can and this we can just do this right now with all Technologies just by having an enforced limit on the total size of a data center not just the total size of a single training run in the near future there are proposals you know that are being worked on right now there are prototypes being worked on right now one of the one of the various proposals is guarante chips that you know people like yoso Benjo and others are are working on is to have on ship monitoring mechanisms this you know this would be kind of the the gold standard for Mutual verification where you can have on the chip a way to verify whether the training run is on authorized whether the you know the entity doing the training run has still a valid license you can verify the number of amount of computing power being used for a training ground a variety of other specifications that you might want to verify in this kind of mechanism and that that many ways would be much much much much more visibility than we even have on nuclear so it’s a it’s an easier job once you have this kind of stuff they would do with nuclear yeah and it might be possible to get these Hardware mechanisms because as you mentioned the supply chain is extremely concentrated into asml and tsmc and Nvidia as as the main players and therefore it seems like if if those players could be convinced that this would be a good thing it might be possible do do you worry about distributed training runs so say instead of having a one training run running in a data center with a million gpus you spread those out over a 100 different data centers and and thereby conceal that you’re you’re actually training an enormous model yeah so for this I mean that’s obviously going to be one threat model from actors are trying to defect um this does lead to a lot of inefficiencies at the moment you know it’s kind of naturally a a big factor in in training rounds at the moment is interconnect speed so kind of like how quickly does information flow between a nearby gpus and you know here you’re doing it over the Internet you’re doing it kind of spread out potentially globally across the jurisdictions you will have challenges there but this is why we do need to have you eventually we need to have an international oversight system where you know some countries get together and decide to implement this and then make sure to Monitor and sanction countries that do not follow the system we need to have you know a carrot we need to find ways to get to make it incentivized joining the system but also obviously if there are defections like in any other area of international law they should be prevented and it should be you know sanction limited and you know a company that is trying to like circumvent uh these measures should be sanction there should be incentives against doing these kind of things but this is the the kind of stuff that for example with you know a robust monitor regime even without on chip mechanisms is you know strongly solvable and like this it’s solved by these approaches what if it turns out that using inference time compute is a really efficient way to get capability gains or performance gains so you mentioned this earlier in the episode but when you use chat DBT and it’s be it’s thinking about something that’s that’s inference time compute it’s uh it’s not entirely public how that works but it’s somehow reflecting on its own output and getting to a better result uh what if it turns out that techniques like those are perhaps a better use of of compute than than using compute for for training bigger models does that kind of invalidate the whole regime of compute governance with these Hardware mechanisms I think it challenges it but it doesn’t invalidate it and also this is precisely the reason why we draw out those safety conditions in Phase zero that go beyond just compute thresholds so like as we discussed earlier compute is a very good proxy but it’s just one proxy you know ideally we would have these and we should build them these clear metrics of intelligence and ability to predict over time you whether you put more INF time compute less INF time compute also this this the safety conditions were drawn out and all the all of the various phas were dra out in a paradigm agnostic way so while obviously now we all have in our mind the kind of like current scaling llms plus RL plus other things as the main Domino Paradigm we should have systems that are no institutions that are robust enough to deal also with Paradigm ships you know scaling stock but you know there’s going to be other breakthroughs you know we this is a problem we need to solve at some point you know if it’s not by scaling there there might be like breakthroughs in ourl that push us forward and other things like that and so this is why the defensive dep approach of having some measures that limit the general intelligence of the I systems via compute this is why we have a licensing regime some measur that are just fundamentally necessary like no AI improving AIS no you know unbounded AIS no AIS that can break out of their environment and we need to complement the compute approach with starting to build this understanding of intelligence at a theoretical and empirical level and we should for example match them one proposal that we have in our licensing regime is to have a measure of we should test AIS and see how they perform against remote workers and if they exceed the performance of remote workers should be equivalent to crossing a Compu threshold so this is to capture you know a breakthrough in algorithm efficiency that just makes them much more generally intelligent while still being below the compute threshold we should have these other ways we should have these fail saves to capture that that that sounds to be almost like a natural Benchmark or something that that will emerge in the in the wild if you have uh you have these these services online where you can purchase where you can make contracts with remote workers and if if it turns out that those rem remote workers begin being outco competed by AI remote workers well then you have then you have a definite increase in capabilities whether or not that comes from another enormous model or better use of inference time computer whatever that comes from is that would you formalize that into a benchmark or would you would you collect the data after the fact from these platforms online or or from from whatever Source I think government should start working on formalizing these benchmarks this is this is kind of exactly how we build other Sciences starting from from zero things like temperature you know we started with people putting their left hand into pot of cold water their right hand putting in into a pot of hot water and then noting down the feeling of difference you know you need to start somewhere then bootstrap to a robust measurement science to a robust Metrology and we we should just start doing this with AI governments are know luckily building up their capacity with things like the AI safety and and evaluation institutes that’s a great place to start developing this we you know we should involve much more economists to do rigorous empirical studies we see a lot of these things like you know AI passed the you know passed an IQ test or AI passed an exam that’s great is how but we need to do more we need to do it formalized we need to do it more we need to have a big investment into this we need to understand exactly at which levels a are and that’s going to be a good fail safe to combine with a compute based measures to catch exactly this like natural metric because ultimately we know that us humans you know as general intelligence we can develop AI so roughly this is the you know the big intuition behind AGI and super intelligence if you have an AI that is as competent as a human and they can do AI research and it can do AI activities as well as a human you are facing a very quick rapid acceleration into more and more Pari systems we know we’re capable of doing this we know that you know machines are much faster than us once we cross the threshold of generally capable as a human we are in the danger territory so we should start actually measuring this yard stick yes it’s hard yes the first tries are going to be you know not perfect but this is this is how to start if we return to the issue of international cooporation from the perspective of a country deciding Benefits of cooperation whether to sign up for for some International agreement how are they making that decision well they’re thinking about kind of benefits and costs right and so you write about how there should be benefits to to cooperating with these International agreements and structures say you are neria for example and you’re deciding whether whether to sign up you might think well I mean maybe I can I can grow faster my economy can grow faster if I if I don’t sign up because you know these these models can be very powerful what can be offered on the other side what what which types of benefits can be offered for cooperating with with these International structures yes so a fundamental and important point of an international structure is to also have these benefits for cooperation to have so on one hand again there should be a clear incentive and there should be a the incentive should be made clear that if we don’t cooperate we are facing an extension level threat this is Global Security is not no Global Security is National Security you are at risk as well but it’s good to add to this essentially negative incentive the positive incentive we should have some of the most risky but also potentially most beneficial AI development done in an international agency modeled similarly to the idea of of a CERN for AI in in this plan we call it guard Global unit for AI research and development in which there will be security like National Security level protocols put in place that enable more advanced a research that than than the is allowed to be done in private companies and the benefits of this once they’re proven safe they are shared to signatory countes so you have you have on one hand the nonproliferation regime and monitoring regime via the IAA for AI the international AI safety Commission on the other hand you have an outlet to channel no rather than devolving into compar pressures as you say one country moving ahead on its own and know trying to develop its own Ai and risking to bridge the super intelligence line accidentally or willingly you have a joint effort to build powerful transformative technology that will then be given proven safe of course to sign countries to address challenges like the fundamental scientific challenges automating certain amounts of Labor and so on wouldn’t you expect many of these advances happening in the kind of AI or CERN for AI International research lab wouldn’t those advances be dual use in the sense that you know if if you gain some knowledge if you find out how to do something in a in a in a fundamentally better way that can often be both be used for both good and bad and so I would worry if I’m Nigeria thinking about whether to sign up would I actually receive these benefits would I be provided with the benefits of this C for AI well so this is why so this is why the things that is you know it’s still going to be an issue even if we do solve AI Control that like you know AI misuse will not go away so the in in a similar fashion so a controllable transformative AI is going to look in many ways like Alpha F you know you can imagine that Alpha fold or you know a more powerful Alpha fold for most of chemistry and biology five while it’s not going to be an adversary on its own it’s not going to be a generalist agent if in the wrong hands is going to enable the development of noble pathogens enable the development of synthetic biological agents that could you know kill millions of people on the planet so you you will still face the issue of nonproliferation and misuse this is just in the nature of a powerful technology but at least we will have solved the problem of losing control to the system itself but this is why we do need to have an international agency that has very strong monitoring mechanisms very strong safety the protocols National Security level security in it where only research that has been you know proven safe can and safe by Design can be released outside or can be released via some more specialized channels for example only to the governments of the participating countries but not just for open use because indeed this is going to be very powerful stuff and it can still be weaponized even if it doesn’t just if it’s not a threaten itself on its own A science of intelligence let’s talk about phase two which is about flourishing so this this sounded a little bit more cheery perhaps than talking about Safety and Security and so on the the main or at least the first point that you argue in this phase is that we need to develop a science of intelligence what you might call a mature science of intelligence what what does that Ina to build things that are reliable and controlled we need to understand them predict them and measure them this is how Humanity has mastered a lot of domains this is how we have you know safe passenger planes that bring us in the air every day this is how we you know extract awesome energy from nuclear power plants and this is how we we’re going to must my eye if we do succeed the situation at the moment is that we you know as we’ve talked before like a a simple solution to the super intelligence problem would be if we could just know when the and the line at which too much intelligence leads to Super intelligence we could just draw a line there apply a safety Factor like it’s common in safety engineering it’s and stop before that you know get all of the intelligence from a systems that we that we need and can harness safely without crossing the line into systems that can overpower us we have a really hard time drawing this line right now because we don’t have a science of intelligence and we don’t have a measurement science of intelligence we cannot directly measure this we should build it that’s going to be the key to understanding know many more things about the universe but also just fundamentally building AI systems that we can in a in advance predict how powerful they will be exactly what they will be able to do exactly what they will not be able to do how much they can fail and so we can make them fail gracefully you know think of the idea of when you build a nuclear power plant you want you you’re required by law to Showcase that is not going to just collapse and explode if there is too much rainfall in your area right and we can do this because we understand physics we understand Nuclear Physics we you know we have strong foundations in in safety engineering and we can do this we can do this for a bridge carrying cars we should get to the same level with intelligence and thus with AI and so we we the first recommendation is let’s build the science of intelligence this is going to unlock a lot of benefits but we need to do the hard work of building it starting from empirical measurements you know across AIS and then building the general theory of understanding how all of this works and and how how is that different from what people are trying to do today in in computer science in Psychology in in in cognitive science and so on aren’t you know this this seems like like something that the kind of existing academics would be very interested in perhaps this is this is like this they would see this as a real break but we we are not there yet but but how how’s how’s this different from what people are already trying to do so I think especially in machine learning there’s been a bit of an of an inversion of priorities so where like there’s been kind of a continued Chase of brittle benchmarks to just get you know capabilities to to go up without actually trying to understand the deeper principles of this there there have been some good approaches recently like a book principles of deep learning they tried to find the physical foundations of how deep learning works but we essentially just need much more of that and much more of this rigorous scientific approach and and of empirical testing that is rigorous and scientific that we can bootstrap on to build this science and aimed at understanding how intelligence works and how to measure it rather than just making the line go up the next brittle evil that gets gained very quickly this is the the modern history of machine learning in the past 10 years has been just benchmarks being made in quite a crude and simple fashion and being kind of gained and and smashed across over and over and over you know there’s a talk right now of saturation of benchmarks like mlu and sound saturation meaning AIS are like sometimes training even on the content of this Benchmark they’re being like directly optimized to to beat The Benchmark and these benchmarks don’t tell us much this is why like we talk before we should have some grounded measures by grounded I mean just you know anchored in in reality things that you know we can just test against Real phenomena that we see things like let’s actually measure the performance of remote workers and let’s just compare AIS on exactly the same tasks not a toy you know not a toy Benchmark not a a list of multiple answers and questions which is a common standard now but let’s run the AI on the same tasks let’s see how it does let’s see exactly where it fails let’s bootst on that and build from that and I think that’s totally possible and this is this would be a very exciting scientific Renaissance that could come from investing a lot more in these and in these gred approaches yeah Dan Hendrick who is a previous guest on this podcast and the creator of mlu and math and and some of these benchmarks is right now trying to create the the most difficult kind of test for AIS possible something he call something like Humanity’s last exam or something like that just just to give listeners a sense of of kind of how models have just broken through these benchmarks and we we we need new ones that are much more difficult in order to to kind of keep up with with AI development and I I think in this case I think a part of the issue is is this focus on always making them harder for AIS while you know ultimately what we’re interested in is not how hard is something for AI like this tells us very little only this can only tell us how one AI Compares relative to another AI what we really care about and this is the approach presented here is how do a compare to us or like how do AI compare to other intelligences because we want to draw these kind of lines so this is what this is why I think we need you know we obviously need more benchmarks in everything but what we especially like is these grounded benchmarks don’t just tell us whether CLA is you know two points above chpt and you don’t really know what this actually means in in the real word and more like can Claude automate A salesperson you know can Claude can CL automate salesperson 80% 90% can and those are ultimately the kind of the questions that policy makers need to have answers to that you know we need to have answers to to plan our society around and to understand when we’re crossing that line it’s a it’s a very ambitious project trying to build a a kind of fully general science of intelligence I guess such a science would be able to kind of compare the intelligence of a squirrel with with a human with a current AI model with a future Ami AI model and so on so it’s very ambitious you mentioned that that this is how we’ve made progress in other domains and I’m somewhat skeptical here I think if you look at the the the history of innovation it seems that perhaps you get you get kind of tinkering and engineering and then you get some product brought to Market and then it kind of fails or it’s somewhat unsafe and then you you iterate on on that product and you get something better so it’s it it’s you get kind of product and Engineering before you get a grand Theory and here I’m thinking perhaps of the steam engine or similar similar examples but yeah it seems like you you don’t have the grand Theory before you have the product so why would that be the case in the case of AI so in the case of AI it’s possible to build the product before having the grand Theory I mean we’re doing it right now the issue with that is that we have no way to make it safe so and like with other Technologies you can you know it’s a tragedy but it it doesn’t end human history to have one Factory explode and then you can learn from it one plane crash and then you can learn from it but the reality is that when you’re dealing with a technology where the you know blast radius is all of humanity you’re not going to have retri so you’re going to have very few and you will have an enormous amount of damage before those so we can not afford to do that but do we have a lot of ways to empirically test I don’t think this is a dichotomy I think this is a false dichotomy like as we’ve just discussed we can do a lot of this empirical testing and bootstrapping from simpler problems to more complex problems like let’s just test right now how do systems compare to humans across tasks you know we we used to have simpler you know without having a grand Theory we used to have simpler measures like the test the urine test has been the gold standard and has been like a commonly accepted metric of when is AI you know crossing the like real AI you know as comparable to humans Benchmark for a century essentially we’ve broken past that you know the kind of first step to start to understand the phenomenon is to take goalpost seriously and not just keep shifting it we should have a big moment of realization like yes we’ve had a perhaps crude but you know better than nothing measure the Turing test that we’ve broken past that we have uh things like exams that we put humans on and AIS break those let’s try to figure out what’s going on let’s get some signal from this and let’s make better tests round test and let’s booster up our Theory from that yeah it’s interesting the the conclusion that some people have taken away from AI in some sense passing the T Test and passing very very difficult exams and so on is that these were never good tests and exam to begin with right and so it’s it’s it can sometimes be difficult to kind of update to think about how how should we update our beliefs in in in the face of this new this new evidence but if if we had a science of intelligence like you described I think that we would have much more clarity like if we had a science of intelligence like we have a science of physics for example there’s there’s very little room for for disagreement or much less room for disagreement yeah and going back on the iteration so with with with nuclear like if we did not have a understanding of nuclear physics when we started building nuclear weapons imagine what would have happen right like that’s that’s a technology that has a much bigger radius and we know we still made mistakes like we still made quite dangerous calculations There Was You know the famous episode of the makers of the of the first atomic bombs try to calculate whether it would ignite the atmosphere and not being completely sure you know luckily it didn’t and they they they could do the calculations because they had a theory without a theory you cannot even do those calculations you’re just in the dark we’ve had examples of of detonations that ended up having a blast radius of like multiple orders of magnitude larger than what the the maker is expected they kind of like almost killed the pilot or LED to massive Fallout around and this is even with the theory in AI we are working like like that we’re working with a technology that will have a bigger L radius than that without a theory you know that would would have been untenable with nuclear weapons now what we could get if we had a mature science of intelligence is some specification of of what these systems can do and guarantees for what these systems cannot do maybe you could explain the advantages there and and how realistic is it to get a science so precise that you can guarantee things about AI systems yeah so this is another key key component of like making transformative AI safe and controlled and you know safe by by Design and it’s to have a way to specify what we actually want this AI to do what we want this AI not to do and so this this approach is quite similar to actually things that we already have in the in the realm of you know formal specifications and formal methods are used in other areas of computer science in nuclear engineering and so on where the you know the challenges how do you kind of how do you tell the AI that this is a noggo and this is not a nogo for all possible scenarios this is not not not impossible like we we do it it’s it’s costly it takes time but we’ve made you know massive massive massive progress in this area in the past decades you know what would have taken like Decades of work and you know formally proving systems is famously hard but it’s not impossible and we’ve made massive breakthroughs on those wait what what what examples do we have of of systems where we have formal proof that they aren’t that they aren’t buggy for example what would be examples of software like that you know this has been an approach that has been used in in some areas like especially like military applications like there is some software for you know military helicopter software that we approach like this there is some nuclear software that works like this we also don’t need to fully formally prove it right like the kind of first step is like let’s approximate formally Prov proving some core components of it some core parts of it formally proving it with you know within certain assumptions you know we go back to the bounce point of like if we don’t give the AI you know if we only give thei this amount of compute and we put it in an environment where it doesn’t access the internet you know given this we can we can show that it’s not going to be able to do crazy things you know maybe it’s just it’s GNA explode a little bit but not gonna take over the word and that’s you know ently possible and we’re making massive progress on this with also part also thanks to AI you know without having to develop AGI or super intelligence there has there has been very impressive breakthroughs by Deep Mind with Alpha proof recently where they’ve been using kind of a combination of systems to accelerate the improving and similar things that you know would also help with this there’s been work by davad and Josh Benjo at Arya in the UK The Innovation agency of the UK who build this specification language for AI systems we can already do this kind of stuff stuff without waiting for a full general theory of intelligence which just is just a big effort that requires some of the best minds of the planet put onto this you know we need mathematicians you know call to action great mathematicians great theoretical computer scientists this is a great challenge to work on very important for Humanity and perhaps some some something where we should devote more computing power and more AIS to to try to help with the automated theorem proving and automated kind of ification of software this would be this would be if you could get a system like Alpha fold but specifically for proving statements about software that that would be a great advance I think yes and that’s that’s possible and that would also stay in the realm of narrow AI systems or you know bounded AI systems where we we know it’s only doing it’s only doing that it’s not doing something else and we we can have it very powerful but help us solve these problems so if we get the science of elligence if we go through the phases that you’ve described of safety of stability and of flourishing then then we are faced with a a a situation in which we can begin to automate cognitive labor and physical labor this might be somewhat beyond the scope of an arrow path but what do you what do you foresee then I mean there it’s not as if all challenges are solved right automating physical and cognitive labor would completely change society and so do you think perhaps even if we have all of these steps secured that we would you know we would still face great challenges yeah we will face great challenges but we would be out of the extinction threat that comes from superintelligence so and we don’t you know we obviously don’t have all the answers and Humanity doesn’t have all the answers and we should think really hard about those questions like as you said if we do succeed into impl in these phases and we succeeded not having super intelligence but building controlled transformative AI you know AI that is a tool for us to automate anything that we want and also we should there will be areas where we should decide as a society that we do not want to automate it we don’t want to delegate it to machines that’s you know that’s a unsolved societal question of like what are those areas and how much you know do we want to have politics automated do we want to still have our leaders be be human or not I you know I think we should but the the real answer is that we should develop due processes where we make Collective decisions together that are kind of Justified and deliberated together this is the spirit of democracy to get these answers and there will be still be a lot of challenges that are unsolved you know one big one is is concentration of resources you know by default this is a technology that concentrates power like whoever controls the development of this trans transformative technology and can essentially you know at some point automate large amounts of Labor will have an enormous power and enormous economic power and and military power their disposal this is why we set up guard in Phase One to make sure that the benefits that there is a credible commitment to Distributing these benefits to others as well and there is not a single poll running away with it so we will still to solve the question you know how do we distribute this wealth how do we allocate it you know do we allocate it well we’ll keep facing a lot of challenges that we face right now just you know amplifi are in a radically different future you radically positive future but still fill with challenges the idea sometimes proponents Ai and of super intelligence just talk about when we achieve super intelligence and it’s aligned we achieve post scarcity you know in in some ways I think this is a bit of a misnomer in some ways we are post scarcity in a lot of areas right now especially compared to our ancestors in other ways scarcity will never go away we will always have to make tradeoffs there are only so a finite number of atoms in the universe in a in a very deep deep sense we will never Escape scarcity because we live in a physical universe but we as you say we we have already we’re we’re kind of approaching scarcity in some domains and we will perhaps approach scarcity or post scarcity sorry in in some domains and we will approach post scarcity in in other domains exactly and we will still face trade-offs between things like explore versus exploit you know we will have a technology that can just no we would have need to make decisions do we invest a lot of this Surplus into exploring the stars and settling on new planets and set and setting up new new things or should we invested to make everybody immediately extremely wealthy and have access to any material Goods they want you know it’s those things are not going to go away some things are never going away and this is another reason why super intelligence as a dream is is sometimes quite quite utopian in an ungrounded sense because there are things that are just logical impossibilities or like moral impossibilities some people want other people to suffer you know is is alignment satisfying their desire for other others to suffer this will contrast other people’s desire some people want positional Goods you know they want to be the best at something only one person can be the best at something at any given point how do you satisfy that human values are complex we don’t even know if there are like Universal you know full human values and we you know if we if there are we should discover them but we’re not there yet and a lot of these questions are not going to be answered by more technology they’re they’re going to have to be answered by more you know human tackling these question and trying to find tradeoffs compromises we will never get rid of compromises we will probably not to satisfy all preferences fully and if effect if I’m not mistaken I think we have some impossibility theorems that that state that we cannot satisfy all preferences at the same time as a final question I would like us to explore how How you can help people can can get involved so if you’re in politics or if you’re a technical person in computer science or a mathematician perhaps lay out for us how how you can get involved and you know what you see as the most kind of fruitful areas to get involved you know in our path we set out One path that we think is going to help Humanity make it across the other side and survive super intelligence and tribe and a lot of questions are unanswered a lot of things will need to be strengthened tested adapted to specific contexts a lot of technical challenges will need to be solved which are you know really exciting one of them is is we will need much more work and much more bright Minds on things like on chip verification mechanisms now if you are interested in that look into those those the the current projects think about setting up your own project this is going to be a a growing area and an area that we you know very very much need for robust yeah governance people that work on benchmarks do work on ground the benchmarks we want we would love to see garments economists and computer scientists working together to make these grounded benchmarks in reality comparing you know actual economic performance of humans with performance of a eyes to have clear metrics of what they can do and for people in in policy you know if you are concerned about these risks and you’re thinking about measures like this in your country please get in touch also get in touch if you think that there are ways to strengthen these measures you know uh this is a very complex problem where nobody has a solution and our our goal with an Narrow Path was to go from zero to one we didn’t find a plan to deal with Extinction risk from AI globally so we made one now you know we need all of you to make it better and to find the way that you can implement it adapt it transform it into real laws that can pass in your country in your jurisdiction and to make this into a reality so if you have any ideas on that please do get touch at hello narath that’s the email that we use for all the feedback and you know share it share criticism share feedback helps a lot fantastic thanks for talking with me thank you so much

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.