10,619 views. Streamed live on 6 Jun 2024. Artificial intelligence has advanced rapidly in the last years. If this rise will continue, it could be a matter of time until AI approaches, or surpasses, human capability level at a wide range of tasks. Many AI industry leaders think this may occur in just a few years. What will happen if they are right? Roman Yampolskiy (University of Louisville) will discuss the question of controllability of superhuman AI. The implications of his results for AI development, AI governance, and society will then be discussed in the panel with philosopher Simon Friederich (Rijksuniversiteit Groningen), Dutch parliamentarians Jesse Six Dijkstra (NSC), Queeny Rajkowski (VVD) and Marieke Koekkoek (Volt), policy officer Lisa Gotoh (Ministry of Foreign Affairs), and AI PhD Tim Bakker (UvA). The future of AI will become a determining factor of our century. If you want to understand future AI’s enormous consequences for the Netherlands and the world, this is an event not to miss!
TRANSCRIPT.
welcome everyone uh for an evening of exploring the future of AI and asking ourselves the hard question is that future too much to handle uh I have the privilege to being the host of this evening my name is Martin G I’m a director of the argumentation Factory um and uh some of you may have seen me here before because last year we had a similar event with steuart Russell uh Professor Stuart Russell and just to see a show of hands who was there during that time ah okay uh well if you thought that was depressing prepare for this evening um because compared to the speaker of today uh Dr Roman Yampolskiy. Stuart Russell is an optimist just a quote from your recent book or uh the press release I should say we are facing an almost guaranteed event with potential to cause an existential cat catastrophe the outcome could possibly could be Prosperity or Extinction and the fate of the universe is hangs in the balance well those are a dire warning I would say in a few moments we’ll hear your talk uh and thereafter we’ll have some question and answer with the audience followed by uh panel discussion with three parliamentarians two researchers and one civil servant which almost sounds like the start of a joke um uh there after also a Q&A with the with the audience followed by drinks and I’ve heard the first one is on the house so it’s yay yeah um okay but now over to the First Act of the evening Dr Roman yampolsky he’s an associate professor uh in the department of Computer Engineering and computer science at the speed School of Engineering at the University of Lewisville uh never heard of that but I heard it’s a great place uh one of the founding fathers of AI Safety Research um and the founder and current director of the cyber security Lab at the University writer of many books uh most recently W with the ominous title AI unexplainable unpredictable uncontrollable that sort of summarized is the gist I guess of what you’re about to say so without further Ado the floor is [Applause] yours right thank you very much can you hear me can you hear me now yes excellent where are my slides at where am I looking at let’s see aha excellent so thank you for this wonderful introduction I really appreciate it I have not been called a father of anything other than my kids ever before so that’s nice uh if you have any questions after this short 30 minute talk you can follow me for more information I post a lot on Twitter X I post on Facebook just don’t follow me home it’s very important if you attended Stuart Russell’s talk you probably already know a lot about AI some of the things I’ll tell you will not be as novel for you but hopefully I still have something new to offer so about a year ago something happened a new AI model was was released in the world and uh unlike previous AIS for the first time ever it uh made some people go that maybe it’s showing Sparks of generality it wasn’t just a chess playing program or a car trying to drive itself it was capable of doing many things and many domains some things it was not uh explicitly trained to do so here you can see how well it did on many standardized exams LW exams medical exams College exams and it uh did really well in most of those things it wasn’t perfect in some mythology and things like that but on science and engineering it was pretty impressive and so I think we need to kind of try to understand what it means in terms of current capability ities and in terms of uh upcoming capabilities for those systems historically every time we try to make AI do something we had to kind of start from scratch if I create a program to play tic TCT toe it does nothing for my ability to uh have a program play chess with uh systems which are more General if we can find a scaling approach scaling architecture uh it means we can just add more compute add more data and get additional capabilities out of the system the so-called scaling hypothesis and it seems that maybe we are actually in possession of such a model some people disagree but it seems that maybe we have successfully created something capable of becoming much more powerful with every uh level of scaling in terms of compute we had gpt1 which no one noticed gpt2 which was impressive to some Scholars then gpt3 hit and GPT 4 was of course the latest model and now we hear that GPT 5 is being trained so interesting question is how soon before AGI I don’t have any Insider information on that but I’ll share with you what some other people think CEOs of uh major companies like deep mind and anthropic said that maybe we’re 2 years away if you look at prediction markets and those are of course markets they fluctuate last time I did a screen grab it was about two years away I think more recently it’s been 3 years maybe today it’s one I haven’t checked but they all seem to be in agreement that something big might be happening soon uh Ray kwell a famous futurist uses his ability to map computational capacities on uh different types of systems like living Brains Brains of animals and humans and for the last decade or so I’ve been showing this slide to different audiences and uh before it was all about the future so I was saying by 2023 we’ll have systems capable of simulating one human brain now it’s 2024 and uh this is passed and this is the year we had a very powerful AI system released and his next prediction is that by 2045 we’ll have enough compute to simulate all the brains all 8 billion of us system which will be super intelligent and um maybe by 2029 he expects a true AGI uh system as capable as any human so what can we expect from those systems what uh properties capabilities can we predict a super intelligent system would have uh one is of course it’s going to be smart that’s not so Innovative what does that mean if you look at narrow systems they outperform the best humans in that domain so if it’s a chess playing program it’s better than human world champion if it’s any other limited domain games like Jeopardy same idea we expect super intelligent systems to be smarter than any human in all domains including science and engineering so they would be able to take over development of future generations of AI systems hardware for systems and uh all the other scientific research we expect those systems to be very complex that’s not surprising either uh this here is a picture of a instrument panel for for an airplane kind of like a guey interface to the autopilot and all the other parts of the airplane uh it’s confusing enough to where even experienced Pilots have hard time uh tracking some of it but uh it’s even worse for the source code behind the autopilot with years and years of experience some Pilots are still not sure what’s going on how to communicate with those systems and we started to see accidents happen as a result those systems are also very fast not in the same sense as computers are fast when our computers are quick but I’m talking about Ultra fast extreme events where the system can change your whole environment so quickly you don’t have time to react and then it might return it to the original state or maybe not so here you have a stock market crash in a matter of milliseconds billions of dollars of value is wiped out and maybe brought back actually 2 days ago I was flying to give Ava talk and US Stock Market had not a flash crash but a crash where a lot of big companies lost 99% of their value and I missed it so that’s a very sad part of my talk Wilmington so of course given all that we would never be silly enough to surrender control to AI right but we actually did many years ago before it became even smart software controls most of a stock market power plants electrical G nuclear response we have already surrendered much of a control before we even got to AGI my background is in cyber security so I’m very interested in how Ai and malevolence sof software will interact we’re starting to see systems capable of producing very believable deep fakes not just pictures but audio video with audio uh which makes it possible to automate social engineering attacks and completely unprecedented scale if before to send a single spear fishing email I had to study your social media tailor just right now I can send millions of such tailored requests including this uh multimedia content and studies show that everyone clicks on those links every everyone even people with training and cyber security end up clicking on something from their family member boss especially if it’s a believable video representation historically a lot of funding for AI work came from military organizations uh today industry is probably the number one founder of uh AI research in general but there is still military research and Ai and they of course have interest in automating killing of humans so while there is useful work on Rescue Robotics and things of that nature a lot of work goes into autonomous drones and killer soldiers yeah so I don’t want you to get scared there is a lot of positive in this technology AI is awesome it can help us with research it can help us get free labor free stuff physical cognitive trillions of dollars we can cure diseases we can live forever it can also do unknown unknowns things I cannot even imagine it is capable of doing unfortunately it’s the last happy slide I have because for every single one of those very positive things there is a negative counterpart Free Labor means technological unemployment and if it can do every job all jobs go and just like with the previous slide there are still unknown unknowns things which we cannot predict it can do we cannot prepare for them CU we’re not smart enough to anticipate those possibilities so the good news is there is a lot of people who are now concerned about this problem and taking it seriously a lot of famous people rich people uh leaders of companies thousands of Scholars in fact many computer scientists touring Award winners who stated that they equate risk from AI with that of nuclear weapons meaning existential risk what are they all concerned about there is many problems but uh they basically think that what we are creating is very different from how humans think and what humans want if you think about all possible Minds the space of possible designs for a brain-like architecture artificial neural network alien Minds animal Minds alien robot Minds the space is infinite all of us as diverse as we are different cultures different uh languages we have very similar architecture in terms of our DNA in terms of our brain design in terms of room temperature preferences Foods we like so all of us are that pale blue dot right there we are pulling a somewhat random mind from out of space of possible Minds we’re not designing it specifically we are kind of making it grow and teach itself and so there is a good chance it will have very different preferences from the rest of us luckily as I said there is some response starting to take place there are National Institutes now being formed in Us and other countries there are many research labs and all the big AI Labs have safety teams or had safety teams um there is unprecedented amounts of interest in the topic there is no funding available you can get invited to give a talk on this topic all of this was not the case 10 years ago it was pure science fiction so in some ways we’re doing really well the problem is that all that is still a tiny percentage of resources allocated for development of those systems and if you remember the scaling hypothesis to develop a more capable system you just need to add more funds more money for compute whereas we have no idea how to do safety and if you give someone 10 times the money they don’t output 10 times the safety so the gap between capabilities research and Safety Research keeps increasing we don’t even fully understand what the problem is there is not even a general agreement on what it is we’re trying to solve if you look at history of AI from 60s and’ 7s a lot of it was uh ethics work under different terms computer ethics robot ethics but we we tried to teach machines to kind of do what humans do once we got closer to seeing results and thinking we might get human level performance not just narrow tool AI the names also changed we started calling it friendly AI Control problem I call it AI Safety and Security engineering but there is many others value alignment human compatible is what Stuart Russell uses we without having to agree on what the name is let’s just say it’s the problem problem of control whatever and let’s try to understand how well we can do with it so one thing we do in computer science before investing resources into solving a problem we try to understand if a problem is solvable if it’s not then you have different tools for approximate Solutions but we don’t know if a problem of indefinitely controlling super intelligent machines is actually solvable it could be maybe it’s uh partially solvable undecidable maybe maybe it’s unsolvable what do experts think about it so here I have a sampling of three different experts an outsider AI safety researcher a former head of super intelligence alignment team for a large uh AI lab and the father of learning touring Award winner and they somewhat disagree the answers range from it’s trivially solvable to not in this timeline to I have no idea and that’s the answer from the touring award winning computer scientist so we don’t really know how hard the problem is what I’m trying to do in my work is uh think about what tools would I need to solve that problem I need to be able to understand how the system works be able to get an explanation of that and comprehend that explanation be able to predict what the system is going to do communicate with a system in an unambiguous way verify that the code behind the system is exactly what the software engineer intended and probably some other tools so how well are we doing in terms of access to those tools TOS in terms of explainability and comprehension the only true explanation for how a system makes its decision is the model itself the model is too large it’s not surveyable you cannot just read the weights from it so the alternative is to get a simplified explanation top 10 reasons you got denied credit I mean it’s useful information but it doesn’t give you a full picture so you ever get something you cannot comprehend or you get lossy compression with the predictability there is also problems you can predict terminal goal of the system if you’re playing chess the system may be interested in winning the game it probably will win the game but you don’t know specifically how it’s going to get there what what moves the system is going to make if you could predict that you would be playing at that level of Chess so we also Limited in terms of our prediction there are limits to verifying mathematical proofs and software for Mission critical software we know how to verify with pretty high mathematical guarantee uh static code things which are deterministic which we know what they supposed to be doing and uh we know what the edge cases are that’s possible we don’t know how to verify software which continues to learn interact with users self- modify uh work across multiple domains and even if we figured out that difficult problem at the end we still have a single verifier which we use if human or a team of humans or another piece of software which is used to verify the proof but who verifies the verifier you have this infinite regressive verifiers where more compute gets you more more likelihood of success but at the end of the day if a system makes a billion decisions a minute and you only make one error in a billion you’re not going to be error free for very long there are many different proposals for government regulation of those systems but uh they usually somewhat behind in catching up to technology and it’s very difficult to come up with good ways to do something about a system capable of killing everyone fines don’t work I heard proposals for insurance that also seems to be limited in its impact so while I strongly encourage all such efforts I don’t think we’ll have a policy solution to a technical problem in case you like such impossibility results we have a paper with about 50 of them published so you can read a lot more about tools I wish I had but probably won’t have it doesn’t mean that I need all those tools sometimes you can use a screwdriver instead of a hammer but it seems like there is a lot of impossibility results in this space so that leads us to likelihood that control itself may be impossible and depending how we Define control either Direct Control where you give orders to the system this is the famous Genie problem right you get three wishes and then your second wish is to undo the first one uh alternative is you go okay this system is smarter than me it knows me well it has a model of me I’ll just trust it to make decisions for me it’s ideal advisor system uh you may be happy with it but you’re definitely not in control at that point so we have this trade off between either making systems more capable or retaining control over them and it seems like for many definitions of control we’re not going to get something we would be happy with and we were able to publish it both in academic journals and um in popular magazines so I think this idea that we cannot indefinitely control super intelligent machines is not so novel uh Alan Turing the founding father of computation and a research uh said as much and many more modern Scholars agree uh Elan musk agrees Ray Kur while in his book has a chapter about safety issues and he definitely doesn’t think we’ll be able to directly control smarter machines there is a lot of ideas being proposed quite a few books have been published maybe you seen bostrom’s book super intelligence was a New York Times best seller I had a few collections uh with different Scholars where they talked about different proposals from economics policy law different uh domains in terms of what can be done my more recent book uh summarizes some of the impossibility results I have talked about here if you don’t want to buy the book all these papers are available for free you can just get them unexplainable unpredictability and so on but what is the stateof the art here this is a survey I did on my social media for a bunch of nerds who follow me on Twitter and Facebook so they know the topic but they may not be all working in the area there are different answers different possibilities only a third thinks that we will be able to control super intelligent machines I’ll ask you right now raise your hand if you think we’ll be able to indefinitely control super intelligent machines I kind of anchored you a little but last time in a group of 250 people there was Zero hands this time I have three hands with anchoring at 30% cool you are the only hope so what is the outcome then um so-called P Doom probability that things will go really really poorly I get in trouble for this number a lot but I want you to notice that there is no dates in this it is a prediction set about infinite interval so it’s not so crazy if you ask me about a year or two my numbers are very different but here’s what a lot of people think there is a 20 to 30% chance that it will not end well depending on new definition of not ending well those are top Scholars leaders of safety groups leaders of AI Labs CEOs of those companies so this is the state-of-the-art human assessment of how difficult that problem is going to be average for AI safety researchers from different conferences is at about 30% I recently did a interview for Lex Friedman and I told him I’m very concerned I have high P doom and he said well next time I’ll talk to some optimists who think it’s only 20% this is insane anything above Yan leun is insane for all of humanity dies we should not be at those numbers what we are trying to do is to build a perpetual motion machine in software a Perpetual safety device not only do I need to get GPT 5 to be bu free I need GPT 5 6 7 20 30 to be bu free the main difference between super intelligence safety and cyber security is that you get many opportunities to try again with cyber security I can give you a new password if yours is hacked I can reissue credit cards but if x risk happens to materialize you don’t get a second chance so we get one chance to get every future iteration of that system working in the real world with malevolent actors with Insider threats with self-learning with all the things we talked about in terms of impossibility results not happening even once it seems that it would be a good idea to maybe slow down given the predicted timelines likelihood of events happening and negative outcomes then I suggest that we should slow down I mean in terms of creating General AI super intelligence systems not narrow systems which are incredibly useful and we had great success using them for solving real world problems like protein folding then I mention it people usually respond by saying well if we don’t do it then our competitors will get there first and it’s a valid response then we talk about narrow system they are very useful for competing in domains such as military competition economic competition but if we are talking about next level we are talking about super intelligence out of control it doesn’t matter who creates it first the outcome is going to be the same so while it’s important to not allow competitors to get Advantage it’s more important to make sure that the outcome is not worse than what we are trying to prevent so how am I doing on time perfect let’s answer some easy questions