FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Roman Yampolskiy and Lex Fridman | Lex Clips

Roman Yampolskiy is an AI safety researcher and author of a new book titled AI: Unexplainable, Unpredictable, Uncontrollable.

Roman Yampolskiy and Lex Fridman | Lex Clips

Roman Yampolskiy is an AI safety researcher and author of a new book titled AI: Unexplainable, Unpredictable, Uncontrollable.

it is very obvious as we already have we’ve increasingly given our life over to software systems and then it seems obvious given the capabilities of AI that are coming that will give our lives over increasingly to AI systems cars will drive themselves ref refrigerator eventually will optimize uh what I get to eat and as more and more of our lives are controlled or managed by AI assistants it is very possible that there’s a drift I mean I mean I personally am concerned about non-existential stuff the more near-term things because before we even get to existential I feel like there could be just so many Brave New World type of situations you mentioned sort of the the term behavioral drift it’s the slow boiling that I’m really concerned about as we give our lives over to autom that our minds can become controlled by governments by companies or just in a distributed way there’s a drift some aspect of our human nature gives ourselves over to the control of AI systems and they in an unintended way just control how we think maybe there’ll be a herd-like mentality in how we think which will kill all creativity and exploration of ideas the diversity of ideas or there or or or or much worse so it’s true it’s true but I a lot of the uh conversation I’m having you with you now is also kind of wondering almost on a technical level how can AI Escape control like what would that system look like because it to me is terrifying and fascinating and also fascinating to me is uh maybe the optimistic notion it’s possible to engineer systems that defend against that um one of the things you write a lot about in your book is verifiers so not humans humans are also verifiers but software systems that look at AI systems and like help you understand this thing is getting real weird help you help you analyze those systems so maybe that’s a this is a good time to talk about verification what is this beautiful notion of verification my claim is again that there are very strong limits on what we can and cannot verify uh a lot of times when you post something on social media people go oh I need citation to a peer-reviewed article but what is a peer-reviewed article you found two people in a world of hundreds of thousands of scientists who said I would ever publish it I don’t care that’s the verifier of that process when people say oh it’s formally verified software mathematical proof they accept something close to 100% chance of it being free of all problems but if you actually look at uh research software is full of bugs old mathematical theorems which been proven for hundreds of years have been discovered to contain bugs on top of which we generate new proofs and now we have to redo all that so verifiers are not perfect usually they are either a single human or communities of humans and it’s basically kind of like a democratic vote community of maans agrees that this proof is correct mostly correct even today we’re starting to see some mathematical proofs are so complex so large that mathematical Community is unable to make a decision It looks interesting looks promising but they don’t know they will need years for top Scholars to study to figure it out so of course we can use AI to help us with this process but AI is a piece of software which needs to be verified just to to clarify so verification is the process of saying something is correct sort of the most formal a mathematical proof where there’s a statement and a series of logical statements that prove that statement to be correct this is a theorem and you’re saying it gets so complex that it’s possible for the human verifiers the human beings that verify that the logical step there’s no bugs in it it be it becomes impossible so it’s nice to talk about verification in this most formal most clear most rigorous formulation of it which is mathematical proofs right and for AI we would like to have that level of confidence for very important Mission critical software controlling satellites nuclear power plants for small deterministic programs we can do this we can check that code verifies its mapping to the design whatever software engineer intend it was correctly implemented but we don’t know how to do this for software which keeps learning self-modifying rewriting its own code we don’t know how to prove things about the physical world states of humans in the physical world so there are papers coming out now and I have this beautiful one uh towards guaranteed safe AI mhm very cool paper some of the best authors uh I ever seen I think there is multiple touring Award winners there is uh quite you can have this one and one just came out kind of similar uh managing extreme AI risks so all of them uh expect this level of proof but um I I would say that uh we can get more confidence with more resources we put into it but at the end of the day we still as reliable as the verifiers and you have this infinite regress of verifiers the software used to verify a program is itself a piece of program if aliens give us well aligned super intelligence we can use that to create our own safe AI but it’s a cat22 you need to have already proven to be safe system to verify this new system of equal or greater complexity you just mentioned this paper TS guarantees safe AI a framework for ensuring robust and reliable AI systems like you mentioned it’s like a who’s who Josh tound yosha Benjo s Russell Max Stark and many many many other billion people the page you have it open on there are many possible strategies for creating safety specifications these strategies can roughly be placed on a spectrum depending on how much safety it would Grant if successfully implemented one way to do this is as follows and there’s a set of levels from Level zero no safety specification is used to level 7 the safety specification completely encodes all things that humans might want in all contexts where does this paper fall short to you so when I wrote a paper artificial intelligence safety engineering which kind of coins the term AI safety that was 2011 we had 2012 conference 2013 Journal paper one of the things I proposed let’s just do formal verifications on it let’s do mathematical formal proofs in the follow-up work I basically realized it will still not get us 100% we can get 99.9 we can put more resources exponentially and get closer but we never get to 100% if a system makes a billion decisions a second and you use it for 4 100 years you’re still going to deal with a problem this is wonderful research I’m so happy they doing it this is great but it is not going to be a permanent solution to to that problem so just to clarify the task of creating an AI verifier is what it’s creating a verifier that the AI system does exactly as it says it does or or it sticks within the guard rails that it says it must there are many many levels so first you’re verifying the hardware in which it is run you need to verify you know Communication channel with the human you every aspect of that whole world model needs to be verified somehow it needs to map the world into the world mble uh map and territory differences so how do I know internal states of humans are you happy or sad I can’t tell so how do I make proofs about real physical world yeah I can verify that deterministic algorithm follows certain properties that can be done some people are arue that maybe just maybe 2+ 2 is not four I’m not that extreme but once you have sufficiently large proof over sufficiently complex environment the probability that it has zero bugs in it is greatly reduced if you keep deploying this a lot eventually you’re going to have a bug anyways there’s always a bug there is always a bug and the fundamental difference is what I mentioned we’re not dealing with cyber security we’re not going to get a new credit card new Humanity so this paper is really interesting you said 2011 artificial intelligence safety engineering why machine ethics is a wrong approach uh the Grand Challenge you write of AI safety engineering we propose the problem of developing safety mechanisms for self-improving systems self-improving systems by the way that’s an interesting term for the thing that we’re talking about is self-improving more General than learning so self-improving that’s an interesting term you can improve the rate at which you are learning you can become more efficient meta Optimizer the word self it’s like self-replicating self-improving you can imagine a system building its own world on a scale and in a way that is way different than the current systems do it feels like the current systems are not self-improving or self-replicating or self- growing or self spreading all that kind of stuff and once you take that leap that’s when a lot of the challenge seems to happen because the kind of bugs you can find now seems more akin to the current sort of normal software debugging kind of process uh but whenever you can do self-replication and arbitrary self-improvement that’s when a bug can become a real problem problem real real fast uh so what is the difference to you between verification of a non self-improving system versus a verification of a self-improving system so if you have fixed code for example you can verify that code static verification at the time but if it will continue modifying it you have a much harder time guaranteeing that important properties of that system have not been modified than the cach changed is it even doable no does the does the whole process of verification just completely fall apart it can always cheat it can store parts of its code outside in the environment it can have kind of extended mind situation so this is exactly the type of problems I’m trying to bring up what are the classes of verifiers that you write about in the book is there interesting ones that stand out to you you have your some favorites so I like Oracle types where you kind of just know that it’s right touring like Oracle machines they know the right answer how who knows but they pull it out from somewhere so you have to trust them and that’s a concern I have about humans uh in a world with very smart machines we experiment with them we see after a while okay they always been right before and we start trusting them without any verification of what they are saying oh I see that we kind of build Oracle verifiers or rather we build verifiers we believe to be oracles and then we start to without any proof use them as if they’re Oracle verifiers we remove ourselves from that process we are not scientists who understand the world we are humans who get new data presented to us okay one one really cool class of air firers is a self verifier is it possible that you somehow engineer into AI systems the thing that constantly verifies itself preserved portion of it can be done but in terms of mathematical verification it’s kind of useless you saying you are the greatest guy in the world because you are saying it it’s circular and not very helpful but it’s consistent we know that within that world you have verified that system in a paper I try to kind of brute force all possible verifiers it doesn’t mean that this one is particularly important to us but what about like self-doubt like the kind of verification where you said you say or I say I’m the greatest guy in the world what about a thing which I actually have is is a voice that is constantly extremely critical so like engineer into the system a a constant uncertainty about self a constant doubt well any smart system would have doubt about everything right you not sure if what information you are given is through if you are subject to manipulation you have this Safety and Security mindset but I mean you have doubt about yourself so the AI systems that has doubt about whether the thing is doing is causing harm is the right thing to be doing so just a constant doubt about what is doing because it’s hard to be a dictator full of doubt I I may be wrong but I think steuart Russell’s uh ideas are all about machines which are uncertain about what humans want and trying to learn better and better what we want the problem of course says we don’t know what we want and we don’t agree on it yeah but uncertainty his his idea is that having that like uh self-doubt uncertainty in AI systems engineering TI systems is one way to solve the control problem it could also backfire maybe you uncertain about completing your mission like I am paranoid about your camera is not recording right now so I would feel much better if you had a secondary camera but I also would feel even better if you had a third and eventually I would turn this whole world into cameras pointing at us making sure we’re capturing this no but wouldn’t you have a meta concern like that you just stated that eventually there’ll be way too many cameras so you would be able to keep zooming on in the big picture of your concerns so it’s a multi-objective optimization it depends how much I value capturing this versus not destroying the universe right exactly and and then you will also ask about like what does it mean to destroy the universe and how many universes are and you keep asking that question but that doubting yourself would prevent you from destroying the universe because you’re constantly full of doubt it might affect your productivity you just you might be scared to do anything it’s to scared to do anything mess things up well that’s better I mean I guess the question is it Poss to engineer that in I guess your answer would be yes but we don’t know how to do that and we need to invest a lot of effort into figuring out how to do that but it’s unlikely underpinning a lot of your writing is this sense that we’re screwed but it just feels like it’s an engineering problem I don’t understand why we’re screwed it it we time and time again Humanity has gotten itself into trouble and figured out a way to get out of the trouble we are in a situation where people making more capable systems just need more sources they don’t need to invent anything in my opinion some will disagree but so far at least I don’t see diminishing returns if you have 10x compute you will get better performance the same doesn’t apply to safety if you give uh Mei or any other organization 10 times the money they don’t output 10 times the safety and the Gap be between capabilities and safety becomes bigger and bigger all the time so it’s hard to be completely optimistic about our results here I can name 10 excellent breakthrough papers in machine learning I would struggle to name equally important breakthroughs in safety a lot of times a safety paper will propose a toy solution and point out 10 new problems discovered as a result it’s like this fractal you’re zooming in and you see more problems and it’s infinite in all directions does this apply to other Technologies or is this is this unique to AI where safety is always lagging behind so I guess we can look at related Technologies with cyber security right we we did manage to have Banks and casinos and Bitcoin so you can have secure narrow systems which are doing okay uh narrow attacks on them fail but you can always go outside outside of a bux so if I can’t hack your Bitcoin I can hack you so there is always something if I really want it I will find a different way we talk about uh guard rails for AI well that’s a fence I can dig a tunnel under it I can jump over it I can climb it I can walk around it you may have a very nice guard rail but in a real world it’s not a permanent guarantee of safety and again this is the fundamental difference we are not saying we need to be 90% safe to get those trillions of dollars of benefit we need to be 100% indefinitely Or we might lose the principle for

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.