78,534 views 2 Nov 2023
Each iteration of ChatGPT has demonstrated remarkable step function capabilities. But what’s next? Ilya Sutskever, Co-Founder & Chief Scientist at OpenAI, joins Sarah Guo and Elad Gil to discuss the origins of OpenAI as a capped profit company, early emergent behaviors of GPT models, the token scarcity issue, next frontiers of AI research, his argument for working on AI safety now, and the premise of Superalignment. Plus, how do we define digital life? Ilya Sutskever is Co-founder and Chief Scientist of OpenAI. He leads research at OpenAI and is one of the architects behind the GPT models. He co-leads OpenAI’s new “Superalignment” project, which tries to solve the alignment of superintelligences in 4 years. Prior to OpenAI, Ilya was co-inventor of AlexNet and Sequence to Sequence Learning. He earned his Ph.D in Computer Science from the University of Toronto. 00:00 – Early Days of AI Research 06:49 – Origins of Open Ai & CapProfit Structure 13:54 – Emergent Behaviors of GPT Models 18:05 – Model Scale Over Time & Reliability 23:51 – Roles & Boundaries of Open Source in the AI Ecosystem 28:38 – Comparing AI Systems to Biological & Human Intelligence 32:56 – Definition of Digital Life 35:11 – Super Alignment & Creating Pro Human AI 41:20 – Accelerating & Decelerating Forces
TRANSCRIPT
Early Days of AI Research
0:00[Music]
0:05
open aai a company that we all know now but only a year ago was 100 people is
0:11
changing the world their research is leading the charge to AGI since Chachi
0:16
captured consumer attention last November they show no signs of slowing down this week elad and I sit down with
0:22
ilas Suk co-founder and chief scientist at open aai to discuss the state of AI
0:28
research where will hit limit the future of AGI and what it’s going to take to reach super alignment IO welcome to no
0:35
priors thank you it’s good to be here let’s start with the beginning pre Alex net nothing in deep learning was really
0:40
working and then given that environment you guys took a um a very unique bet
0:46
what motivated you to go in this direction indeed in those Dark Ages AI
0:52
was not an area where people had hope and people were not accustomed
0:59
to any kind of success at all and because there wasn’t there hasn’t been any success there was a lot of debate
1:06
and there were different schools of thoughts that had different arguments about how machine learning in AI should
1:12
be and you had people who were into knowledge representation from the good old fashioned you had people who were
1:20
beian and they liked beian non-parametric methods you had people who like graphical models and you had
1:25
the people who like neural networks those people were marginalized because neural networks did
1:31
not had the property that you can’t prove math theorems about them if you can’t prove theorems about
1:37
something it means that your research isn’t good that’s how it has been but the reason why I gravitated to neural
1:43
networks from the beginning is because it felt like those are small little brains and who cares if he can prove any
1:49
theorems about them because we are training small little brains and maybe they’ll become maybe they’ll do something one day and the reason that we
1:57
were able to do Alex NBD is because a combination of two factors three factors
2:04
the first factor is that this was shortly after gpus started to be used in machine learning people kind of had an
2:11
intuition that that’s a good thing to do but it wasn’t like today where people exactly knew what the npus is for it was
2:17
like oh let’s like play with those cool fast computers and see what we can do with them it was an especially good fit for neural networks so that was a very
2:24
that definitely helped us I was very fortunate in that I was able to realize
2:30
that the reason neural networks of the time weren’t good is because they were
2:36
too small so like if you try to solve a vision task with a neural network which has like a thousand neurons what can it
2:44
do it can’t do anything it doesn’t matter how good your learning is and everything else but if you have a much
2:50
larger neural network you’ll do something unprecedented what what gave you the intuition to think that that was the case because I think at the time it
2:57
was reasonably um contrarian to think that despite to your point you know a lot of the the human brain in some sense
3:02
works that way or different you know biological neural circuits but I’m just curious like what gave you that intuition early on to think that this
3:07
was a good direction I think yeah looking at the brain and specifically the if you like all those things follow
3:17
very easily if you allow yourself if you allow yourself to accept the idea right
3:24
now this idea is reasonably well accepted back then people still talked about it they haven’t really accepted it
3:31
or internalize the idea that maybe an artificial neuron in some sense is not
3:37
that different from a biological neuron so now whatever you imagine animals do with their brains you could perhaps
3:43
assemble some artificial neural network of similar size maybe if you train it it
3:49
will do something similar so there so that leads to the so that leads you to
3:55
start to imagine okay like almost imagine the computation being done by the neural network you can almost think
4:00
like if you have a high resolution image and you have like one neuron for like a large group
4:06
of pixels what can the neuron do it’s just just not much it can do if you but if you have a lot of neurons then they can actually do something and compute
4:12
something so I think it was like our like it was this was it was considerations like this plus a
4:18
technical realization the technical realization is that if you have a large
4:26
training set that specifies the behavior of the neur Neal Network and the
4:31
training set is large enough such that it can constrain the large neural network sufficiently and furthermore if
4:38
you have the algorithm to find that neural network because what we do is that we turn the training
4:44
set into a neural network which satisfies a training set neural network
4:49
training can almost be seen as solving a neural
4:55
equation solving a neural equation where every data point is is an equation and
5:00
every parameter is a variable and so it was multiple things
5:07
the realization that a bigger neural network could do something unprecedented the realization that if you have a large
5:14
data set together with the compute to solve the neural equation
5:20
that’s what gradient descent comes in but it’s not gradian descent gradient descent was around for a long time it
5:26
was certain technical insights about how to make it work because back then the prevailing belief was well you can’t
5:32
train those neuron Nets anything it’s all hopeless so it wasn’t just about the size it was about even if someone did
5:37
think gosh it would be cool to try a big neural net they didn’t have the technical ability to turn this idea into
5:45
reality you needed not only to code the neural net you need to do a bunch of things right and only then it will work
5:52
and then another fortunate thing is that the person with whom I work with Alex kvki he just discovered that he really
5:59
loves gpus and he was perhaps one of the the first person who really
6:05
mastered writing really like really performant code for for the gpus and that’s why we were able to squeeze a lot
6:11
of performance out of two gpus and do something and produce something unprecedented so to sum up it was
6:17
multiple things the idea that a big neural network in this case a a vision neural
6:22
network a convolutional neural network with many layers one that’s much much bigger than anything that’s ever been
6:28
done before could do something very unprecedented because the brain can see and the brain is a large neural network
6:34
and we can see quickly so our neurons don’t have a lot of time then the compute needed the technical knowhow
6:41
that in fact we can train such neural networks and it was not at all widely distributed most people in machine
6:47
learning would not have been able to train such a neural network even if they wanted to did you guys have any um like
Origins of Open Ai & CapProfit Structure
6:54
particular goal from a size perspective or was it just as as uh you know and if
7:00
that’s biologically inspired or where that number comes from or just as large as we can go definitely as large as we can go because keep in mind I mean we
7:07
had a certain amount of compute which we could usefully consume and then what can it do maybe if we think about just like
7:15
the origin of open Ai and uh the goals of the organization like what was the
7:22
original goal and how’s that evolved over time the goal did not evolve over time the tactic evolved over time
7:30
so the goal of open AI from the very beginning has been to make sure that
7:36
artificial general intelligence by which we mean autonomous
7:41
systems AI that can actually do most of the jobs and activities and tasks that
7:47
people do benefits all of humanity that was the goal from the beginning the
7:53
initial thinking has been that maybe the best way to do it is by just open
7:59
sourcing a lot of Technology we later and we also
8:04
attempted to do it as a nonprofit seemed very sensible this is the goal nonprofit is the way to do it what changed some
8:13
point at open AI we realized and we were perhaps among among the earlier the
8:19
earliest to realize that to make progress in AI for real you need a lot of compute now what does a lot mean the
8:27
appetite for compute is truly endless as as now as as now clearly seen but we realize that we will need a
8:34
lot and a nonprofit was wouldn’t wouldn’t be the way to to to get there wouldn’t be
8:41
able to build a large cluster with a nonprofit that’s why we became we converted into this unusual structure
8:47
called CAP profit and to my knowledge we are the only cap profit company in the world the
8:53
idea is that investors put in some money but even if the company does incredibly
8:58
well they don’t get more than some multiplier on top of their original investment and the reason to do this the
9:06
reason why that Mak sense you know there are arguments one could make arguments against it as well
9:12
but the argument for it is that if you believe that the technology that we are
9:19
building AGI could potentially be so capable as to do every
9:26
single task that people do does it mean that it might unemploy
9:32
everyone well I don’t know but it’s not impossible and if that’s the case it makes sense it will make a lot of sense
9:38
if the company that buil such a technology would not be able to make U infinite would not be incentivized
9:44
rather to make infinite profits I don’t know if it will literally play out this way because of competition in AI so
9:50
there will be M multiple companies and I think that will have some unforeseen implications
9:57
on the argument which I’m making but that was the thing I remember visiting the offices back when you were I think housed at YC or
10:03
something or you know cohabited some space there and at the time there was uh a suite of different efforts there was
10:09
robotic arms uh that were being manipulated and then there was um you know some video game related work which
10:14
was really cutting edge um how did you think about how the research agenda evolved and what really drove it down
10:20
this path of Transformer based models and other forms of of learning so our
10:26
thinking has been evolving over the years from when we started openi and the first year we indeed did
10:33
some of the more conventional machine learning work the conventional machine learning work I mean because the world
10:38
has changed so much a lot of things which were known to everyone in 2016 or 2017
10:44
are completely and utterly forgotten it’s like the Stone Age almost so in that in that Stone Age the world the
10:51
world of machine learning looked very different it was dramatically more
10:57
academic the goals values and objectives were much more academic they were about
11:03
discovering small bits of knowledge and sharing them with the other researchers and getting scientific recognition as a
11:09
result and it’s a very valid goal and it’s very understandable I’ve been doing a for 20 years now more than half of my
11:14
time that I spent in AI was in that framework and so what do you do you
11:20
write papers you share your small discoveries two realizations the first realization is just at a high level it
11:26
doesn’t seem like it’s the way to go to for a dramatic impact and why is that
11:32
because if you imagine how an AGI should look like it
11:37
has to be some kind of a big engineering project that’s using a lot of compute right even if you don’t know how
11:44
to build it what that should look like you know that this is the ideal you want to strive towards so you want to somehow move towards larger projects as opposed
11:50
to small projects so while we attempted a first large project where we
11:57
trained the neural network to play a real real time strategy game as well as as well as the best humans it’s the Dota
12:03
2 project and it was it was driven by two people um yakob botski and Greg
12:09
Brockman they they really dropped this project and make it made it a success and this was our first attempt at a
12:15
large project but it wasn’t quite the right formula for us because that the
12:20
neural networks were a little bit too small it was just an arrow domain just a game I mean it’s cool to play a game
12:26
they kept looking and at some point we realized that hey if you train a large neural network a very very large
12:33
Transformer to predict text better and better something very surprising will happen this realization also arrived a
12:39
little bit gradually we were exploring generative models we were exploring
12:44
ideas around next word prediction those are ideas also related to compression we were exploring them
12:52
Transformer came out we got really excited we like this is this is the greatest thing we’re going to do Transformers now it’s clearly Superior
12:58
than anything else before it we started doing Transformers we did gpt1 gpt1 started to show very interesting signs
13:05
of life and that led us to doing gpt2 and then ultimately gpt3 gpt3 really
13:10
opened everyone else’s eyes as well to hey this thing has a lot of traction there is one specific formula right now
13:17
that everyone is doing and this formula is train a larger and larger Transformer
13:22
on more and more data I mean for me the big wake up moment to your point was gpt2 to gpt3 transition where you saw
13:29
such a big step function and capabilities and then obviously with four um open AI published some really
13:35
interesting uh uh research around some of the different domains of knowledge or
13:40
domains of expertise or Chain of Thought or other things that the models can suddenly do in an emergent form what was
13:46
the most surprising thing for you in terms of emergent behavior in these models over time you know it’s very hard
13:51
to answer that question it’s very hard to answer because I’m too close and I’ve seen it progress every step of the
Emergent Behaviors of GPT Models
13:57
way so as much as I’d like I find it very hard to answer that question I
14:02
think if I had to pick one I think maybe the the most surprising thing for me is
14:09
the whole thing works at all you know it’s hard it’s and I’m not sure I I know
14:14
how to convey this what what I have in mind here because if you see a lot of
14:20
neural networks do amazing things well obviously neural networks is the thing that works but I have witnessed
14:27
personally what it’s like to be in a world for many years where the neural networks not work at all and then to
14:35
contrast that to where we are today just the fact that they work and they do these amazing things I think maybe the
14:42
most surprising the most surprising if I had to pick one it would be the fact that when I speak to it I feel
14:48
understood yeah there’s a there’s a really good um saying from I’m trying to remember maybe it’s Arthur Clark or one
14:53
of the Sci-Fi authors which is effectively it says advanced technology is sometimes indistinguishable from
15:00
Magic yeah I’m I’m fully in this Camp yeah yeah it definitely feels like there’s some magical moments with with
15:06
uh some of these models now is there a way that you guys decide internally uh
15:11
given all of the different capabilities you could pursue how to continually
15:17
choose the set of big projects you’ve sort of described that centralization and committing to certain research
15:24
directions at scale is really important to open AI success given the breath of opportunity now like what’s the process
15:30
for deciding what’s worth working on I mean I think there is some combination of bottom up and top down where we have
15:38
some top down ideas that we believe should work but we not 100% sure so we
15:44
still we need to have good top- down ideas and there is a lot of bottomup exploration Guided by those top down
15:49
ideas as well and their combination is what informs us as to what to do
15:54
next and uh if you think about those bottom I mean either Direction top down
15:59
or bottom up ideas like clearly we have this dominant continue to scale
16:04
Transformers Direction um do you explore additional like architectural directions
16:10
or is that just not relevant it’s certainly possible that various improvements can be
16:15
found I think I think improvements can be found in all kinds of places both small improvements and large
16:21
improvements I think the way to think about it is that while the current thing that’s being
16:27
done keeps getting better as you keep on increasing the amount of compute and
16:33
data that you put into it so we have that property the bigger you make it the better it
16:38
gets it is also the property that different things get better by different
16:45
amount as you keep on improving as you keep on scaling them up so not only you want to of course scale up what we doing
16:51
we also want to SC keep scaling up the best thing possible what is uh a I mean you you
16:58
probably don’t need to predict because you can see internally what do you think is um improving most from a capability
17:04
perspective in the current generation of scale the best way for me to answer this question would be to point out the to
17:14
point to the models that are publicly available and you can see how they
17:19
compare from this year to last year and the difference is quite significant I’m not talking about the difference between
17:26
not only the difference between let’s you can look at the difference between gpt3 and GPT 3.5 and then chat GPT chat
17:33
GPT 4 chat GPT 4 with vision and you can just see for yourself it’s easy to forget where things used to be but
17:40
certainly the big way in which things are changing is that these models become more and more
17:46
reliable before they were very they were only very partly there right now they
17:52
are mostly there but there are still gaps and in the future perhaps these models will be there even more you could
17:58
trust their answers they’ll be more reliable they’ll be able to do more tasks in general across the board and
Model Scale Over Time & Reliability
18:05
then another thing that they will do is that they’ll have deeper insight as we train them they gain more and more
18:12
insight into the true nature of the human world and their Insight will
18:17
continue to deepen I I was just going to ask about how that relates to sort of um model scale over time because a lot of
18:23
people are really stricken by the capabilities of the very large scale models and emergent behavior in terms of
18:30
understanding of the world and then in parallel as people incorporate some of these things into products which is a
18:35
very different type of path they often start worrying about inference costs going up with the scale of the model and
18:41
therefore they’re looking for smaller models that are fine-tuned but then of course you may lose some of the capabilities around some of the insights
18:47
and ability to to reason and so I was curious in your thinking in terms of how all this evolves over the coming years I
18:53
would actually point out that the main thing that’s lost when you switch to the smaller models is reli ability I would
18:59
argue that at this point it is reliability that’s the biggest bottleneck to these models being truly
19:06
useful how you defining reliability so it’s like when you ask a question that’s not much harder than other questions
19:14
that the model succeeds at then you’ll have very high degree of confidence that
19:20
it will continue to succeed so I’ll give you an example let’s suppose that I want to learn about some historical thing and
19:26
I can ask what tell me what is the prevailing opinion about this and about that and I can keep asking questions and
19:32
let’s suppose it answered 20 of my questions correctly I really don’t want the 21st question to have a gross gross
19:40
mistake that’s what I mean by by reliability or like let’s suppose I upload some documents some financial
19:45
documents suppose they say something I want you to do some analysis and to make some conclusion and I want to take action on this basis on this conclusion
19:53
and it’s like it’s not a super hard task and the model these models clearly succeed on this task most of the time
19:59
but because they don’t succeed all the time and if it’s a consequential decision I actually can’t trust the model any of those times and I have to
20:05
verify the answer somehow so that’s how I Define reliability it’s very similar to the self-driving situation right if
20:11
you have a self-driving car and it’s like does things mostly well that’s not
20:16
good enough situation is not as Extreme as with a cell driving car but that’s what I mean by reliability my perception
20:22
reliability is that a um to your point it goes up with model scale but also it goes up in if you tune for specific in
20:29
uh use cases or instances or data sets and so there is that trade-off in terms of size
20:34
versus uh you know specialized fine-tuning versus reliability so
20:40
certainly people who care about some specific application have every incentive to get the smallest model
20:46
working well enough I think that’s true it’s undeniable I think anyone who cares
20:53
about a specific application will want the smallest model for it that’s self-evident I do think though that as
20:58
models continue to get larger and better then they will unlock new and
21:03
unprecedentedly valuable applications so yeah the small models will have their Niche for the less interesting
21:09
applications which are still very useful and then the bigger models will be delivering on applications okay let’s
21:16
let’s pick an example consider the task of producing good legal advice it’s
21:22
really valuable if you can really trust the answer maybe you need a much bigger model for it but it justifies the cost
21:27
there’s been a lot of investment this year uh at the 7B in particular but 7B
21:33
13B 34b sizes do you do you think continued research at those scales is
21:40
wasted no of course not I mean I think that in the kind of Med like medium term
21:50
medium term by I time scale anyway there will be an ecosystem there will be
21:56
different uses for different model sizes there will be plenty of people who are very excited for whom it’s the best 7B
22:03
model is good enough they’ll be very happy with it and then there’ll be very
22:08
plenty of very very exciting and amazing applications for which it won’t be enough I think that’s all I mean I think
22:15
the big models will will be better than the small models but not all applications will justify the cost of a
22:22
of a large model what do you think the role of Open Source is in this ecosystem
22:27
well open source is complicated I’ll describe to you my mental picture I think that in the near term open source
22:34
is just helping companies produce useful like let’s see why would one want
22:42
to have an open source to use an open source model instead of a Clos Source model that’s hosted by some other
22:47
company I mean I think it’s very valid to want to be the final decider
22:55
on the exact way in which you want your model to be used and for you to make the
23:00
decision of exactly how you want the model to be used in which use case you wish to support and I think there’s
23:07
going to be a lot of demand for open source models and I think there will be quite a few companies that will use them
23:12
and I’d imagine that will be the case in the near term I would say in the long run I think the situation with open
23:18
source models will become more complicated and I’m not sure what the right answer is there right now it’s a
23:24
little bit difficult to imagine so we need to put our future hat maybe futurist hat it’s not too hard
23:31
to get into sci-fi into a Sci-Fi mode when you remember that we are talking to computers and they understand us but so
23:37
far these computers these models actually not very competent they can’t do tasks at
23:42
all I do think that there will come a day where the level of capability of models
23:49
will be very high like in the end of the day intelligence is power right right
Roles & Boundaries of Open Source in the AI Ecosystem
23:55
now these models their main impact I would say at least least popular impact is primarily around entertainment and
24:01
like simple question answer so you talk to a model about this is so cool you produce some images you had a
24:07
conversation maybe you had some question you could answer it but it’s very different from completing some large and
24:13
complicated task like what about if you had a model which could autonomously start and build a
24:21
large tech company I think if these models were open source they would have a difficult
24:27
to predict consequence like we are quite far from these models right now and by quite far I mean by by it time scale but
24:33
still like this is not what you’re talking about but the day will come when you have models which can do science
24:40
autonomously like be deliver on big science projects and it becomes more complicated
24:47
as to whether it is desirable that models of such power should be open
24:53
sourced I think the argument there is a lot less clearcut a lot less straightforward compared to the current
24:59
level models which are very useful and I think it’s fantastic that the current level models have been built so
25:06
like that is maybe maybe I answered a slightly bigger question rather than what is the role of Open Source models
25:11
like what’s the deal with open source and the deal is up to a certain capability it’s great but not difficult
25:18
to imagine model sufficiently powerful which will be built where it becomes a lot less obvious to the benefits of
25:25
their open source is there signal for you that we’ve reached that level or
25:31
that we’re approaching it like what’s the what’s the boundary so I think figuring out this boundary very well is
25:38
an urgent research Pro research project I think one of the things that help is
25:45
that the closed Source models are more capable than open source models so the
25:51
Clos Source models could be studied and so on and so you’d have some experience with the generation of close Source
25:57
model and then then you know like oh these models capabilities it’s fine there’s no big deal there then in a in
26:02
like couple years the open source models catch up maybe a day will come when we going to say w like these close Source
26:07
models they’re getting a little too a little too drastic and then some other approaches needed if we have our you
26:15
know future hat on maybe let’s like think about like a several year timeline
26:20
um what are the limits you see if any in the in the near- term in scaling is it
26:26
like data token scarcity cost of compute architectural issues so the most
26:33
near-term limit to scaling is obviously data this is well known and some
26:40
research is required to address it without going into the details I’ll just say that the data limit can be
26:47
overcome and progress will continue one question I’ve heard people debate a
26:52
little bit is the degree to which the Transformer based models can be applied to sort of the full set of
26:58
areas that you’d need for AGI and if you look at the human brain for example you do have reasonably specialized systems
27:04
or allal networks be specialized systems for the visual cortex versus you know um areas of higher thought areas for
27:11
empathy or other sort of aspects of everything from personality to processing do you think that the
27:17
Transformer architectures are the main thing that will just keep going and get us there or do you think we’ll need other architectures over time so I have
27:24
to I understand precisely what you’re saying and have two answers to this question the first is that in my opinion
27:32
the best way to think about the question of Architecture is not in terms of a binary is it enough but how much effort
27:42
how what will be the cost of using this particular architecture like at this
27:48
point I don’t think anyone doubts that the Transformer architecture can do amazing things but maybe something else
27:54
maybe some modification could have have some computer efficiency benefits so so
28:00
better to think about it in terms of computer efficiency rather than in terms of can it get there at all I think at
28:06
this point the answer is obviously yes to the question about well what about the human brain then with its brain
28:13
regions I actually think that the situation there is subtle
28:19
and deceptive for the following reasons so what I believe you alluded to is the fact that the human brain has known
28:25
regions it has like it has a speech perception region it has a speech production region it has an image region
28:31
it has a face region has like all these regions and it looks like it’s specialized but you know what’s
Comparing AI Systems to Biological & Human Intelligence
28:38
interesting sometimes there are cases where very young children have severe cases of epilepsy at a young age and the
28:46
only way they figure out how to treat such children is by removing half of their
28:52
brain because it happened at such a young age these children grow grow up to
28:57
be pretty functional adults and they have all the same brain regions but they are somehow compressed onto one
29:04
hemisphere so maybe some you know information processing efficiency is
29:09
lost it’s a very traumatic thing to experience but somehow all these brain regions rearrange themselves there is
29:14
another experiment where that which was done maybe 30 or 40 years ago on ferrets
29:20
so the ferret is a small animal it’s a pretty mean experiment they took the optic nerve of the feret which comes
29:25
from its eye and attached it to its auditory cortex
29:31
so now the inputs from the eye starts to map to the speech processing area of the brain and then they recorded different
29:38
neurons after it had a few days of learning to C and they found neurons in the auditory cortex which were very
29:44
similar to the visual cortex or vice versa it was either they mapped the eye to the ear to the auditory cortex or the
29:51
ear to the visual cortex but something like this has happened these are fairly well-known ideas in AI that the cortex
29:58
of humans and animals are extremely uniform and so that further supports the a like you just need one big uniform
30:04
architecture so yeah in general it seems like every biological system is reasonably lazy in terms of taking one
30:10
system and then reproducing it and then reusing it in different ways and that’s true of everything from DNA in coding you know there’s 20 amino acids and
30:16
protein sequences and so everything is made out of the same 20 amino acids on through to uh to your point sort of how
30:22
you think about tissue architectures so it’s remarkable that that carries over into the digital world as well depending on the you use I mean the way I see it
30:29
is that this is an indication that from a technological point of view we are very much on the right track because you
30:35
have all these interesting analogies between human intelligence and biological intelligence and artificial
30:40
intelligence we’ve got artificial neurons biological neurons unified brain architecture for
30:47
biological intelligence unified neural network architecture for artificial intelligence at what point do you think
30:52
we should start thinking about these systems in digital life I can answer that question I think that will happen
30:58
when those systems become reliable in such a way as to be very autonomous
31:04
right now those systems are clearly not autonomous they’re inching there but they’re not and that makes them a lot
31:11
less useful too because you can’t ask it hey like do my homework or do my taxes or you see what I mean so the usefulness
31:17
is greatly limited as the usefulness increases they will indeed become more like artificial life which is also makes
31:24
it more I would argue um trepidacious right like if you imagine
31:29
actual artificial life with brains that are smarter than humans go gosh that’s
31:35
like that seems pretty Monumental why is your uh definition based on autonomy
31:40
because you know if you often look at the definition of biological life it has to do with reproductive
31:45
capability plus I guess some form of autonomy right like a virus isn’t really necessarily considered alive much of the
31:51
time right but a bacteria is and you could imagine situations where you have
31:56
um a symbiotic relation a ships or other things where something can’t really quite function autonomously but it’s still considered a life form so I’m a
32:02
little bit curious about autonomy being the definition versus some of these other aspects well I mean definitions
32:07
are chosen for our convenience and it’s a matter of debate in my opinion
32:12
technology already has the reproduction the reproductive function right and if you look at for examp I don’t know if
32:17
you seen those images of the evolution of cell phones and then smartphones over the past 25 years you got this like what
32:24
almost looks like an evolutionary tree or the evolution of cars over the past Century so technology is already reproducing using the minds of people
32:31
who copy ideas from previous generation of technology so I claim that the reproduction is already there the
32:37
autonomy piece I claim is not and indeed I also agree that there is no autonomous
32:42
reproduction but that would be like can you imagine if you have like autonomously reproducing AIS I actually
32:48
think that that is pretty dramatic and I would say quite a scary thing if you
32:54
have an autonomous reproducing AI if it’s is also very capable should we talk about uh super alignment yeah very much
Definition of Digital Life
33:02
so can you um just sort of Define it and then you know we were talking about what the boundary is for we when we when you
33:09
feel we need to begin to worry about uh these capabilities being in in open source like what is super alignment and
33:16
like why invest in it now the answer to your question
33:22
really depends to where you think AI is headed you just try to imagine look into
33:28
the future which is of course a very difficult thing to do but let’s make let’s let’s try to do it anyway where do
33:35
we think things will be in five years or in 10 years mean progress has been really stunning over the past few years
33:43
maybe it will be a little bit slower but still if you if you extrapolate this kind of progress you’ll be in a very
33:49
very different place in 5 years L Al on 10 years it doesn’t seem implausible it
33:56
doesn’t doesn’t seem at all implausible that we will have computers data centers that are much
34:03
smarter than people and by smarter I don’t mean just have more memory or have more knowledge but I also have mean have
34:10
deeper insight into the same subjects that we people are studying and looking
34:15
into it means learn even faster than people like what could such AIS do I
34:23
don’t know certainly if such an AI were the basis of some artificial life it would be well how do you even think
34:30
about it if you have some very powerful data center that’s also alive in a sense
34:36
that’s what you’re talking about and when I imagine this world I my reaction is Gush this is very unpredictable
34:42
what’s going to happen very unpredictable but the bare minimum but there is a bare minimum which we can
34:48
articulate that if such super if such very very
34:54
intelligent super intelligent data centers buil being built at all we want those data centers to have warm and
35:02
positive feelings towards people towards Humanity because those this is going to
35:07
be nonhuman life in a sense potentially it could be potentially be that and so I
Super Alignment & Creating Pro Human AI
35:15
would want that any instance of such super intelligence to have warm
35:20
feelings towards humanity and so this is what we doing with the super alignment project we saying hey if if you just
35:27
allow yourself if you just accept that the progress that we’ve seen maybe it will be slower but it will
35:34
continue if you allow yourself that then can you can start doing
35:41
productive work today to build the science so that we will be able to handle the problem of
35:50
controlling such Future Super intelligence of imprinting onto them a
35:56
strong desire desire to be nice and kind to people because those data centers
36:01
right they’ll be they’ll be really quite powerful you know there’ll probably be many of them they will be the world will
36:08
be very complicated but somehow to the extent that they are autonomous to the extent that they are
36:14
agents to the extent they are beings I want them to be to be pro-social Pro Human Social
36:22
that’s the goal what do you think is the likelihood of that coal I mean some of
36:27
it it feels like a a outcome you can hopefully affect right but uh are we are
36:34
we likely to have pro-social AIS that we are friends with individually or you
36:40
know as a species well I mean friends be I think that that part is not
36:46
necessary the the the Friendship piece I think is optional but I do think that we want to have very Pro social AI I think
36:53
it’s I think it’s possible I don’t think it’s guaranteed but I think it’s possible possible I think it’s going to be possible and the possibility of that
37:00
will increase in so far as more and more people allow themselves to look into the
37:06
future into the five to 10 year future and just ask yourself
37:12
what what do you expect AI to be able to do then how capable do you expect it to
37:17
be then and I think that with each passing year if indeed AI continues to improve
37:26
and as people get to experience because right now you’re talking making arguments but if you actually get to
37:33
experience oh gosh the AI from last year which was really helpful this year puts the previous one
37:39
to shame and you go okay and then one year later and one starting to do
37:44
science the AI software engineer is starting to get really quite good let’s say I think that you create a lot more
37:51
desire in people for what you just described for the
37:58
future super intelligence to indeed be very pro-social you know I think there going to be a lot of disagreement it’s
38:03
going to be a lot of political questions but I think that as people see AI actually getting better as people
38:10
experience it the desire for the pro-social super intelligence the
38:17
humanity loving super intelligence you know as much as this is as as much as it can be done will increase and on the
38:23
scientific problem you know I think right now it’s still being an area where not that many people were working
38:29
on are AI are getting powerful enough you can really start studying it productively we’ll have some very
38:35
exciting research to to share soon but I would say that’s the big
38:41
picture situation here just really it really boils down to look at what you’ve experienced with AI up until now ask
38:49
yourself like is it slowing down will it slow down next year like we will see and
38:55
we’ll experience it again and again and I think it will keep keep and what needs to be done will keep becoming clear do you think we’re just on an
39:01
accelerative path because I think fundamentally if you look at certain technology waves they tend to inflect
39:07
and then accelerate versus decelerate and so it really feels like we’re in an acceleration phase right now
39:12
versus the deceleration phase yeah I mean we are right now it is indeed the
39:18
case that we are in an acceleration phase you know it’s hard to say you know multiple forces will come
39:26
in to play some forces are accelerating forces and some forces are decelerating so for example the cost and scale are a
39:33
decelerating force the fact that our data is finite is a decelerating force
39:39
to some to some degree at least I don’t want to overstate yeah it’s kind of a within an ASM toote right like at some point you hit it but one it’s the
39:45
standard S curve right or sigal well with the data in particular I just think it won’t be it just won’t be an issue
39:52
because we’ll figure out some something else but then you might might argue like the size of the engineering project is a
39:58
accelerating Force just the complexity of management on the other hand the amount of investment is an accelerating
40:04
Force the amount of interest from people from Engineers scientists is an accelerating force and I think there is
40:09
one other accelerating force and that is the fact that biological evolution has
40:15
been able to figure it out and the fact that up until now progress in AI has had
40:22
up until this point this weird property that it’s kind of been you know it’s been very hard to execute
40:28
on but in some sense it’s also been more straightforward than one would have
40:33
expected perhaps like in some sense I don’t know much physics but my
40:40
understanding is that if you want to make progress in quantum physics or something you need to be really
40:46
intelligent and spend many years in grad school studying how these things work
40:53
whereas with AI you have people come in get up to speed quickly start making contributions quickly it has the flavor
40:58
is somehow different somehow it’s very there is some kind of there’s a lot of give to this particular area of research
41:05
and I think this is also an accelerating Force how will it all play out remains to be seen like it may be that somehow
41:12
the scale required the engineering complexity will start to make it so that the rate of progress will start to slow
41:17
down it will still continue but maybe not as quick as we had before or maybe the forces which are coming together to
Accelerating & Decelerating Forces
41:23
push it will be such that it will be as fast for maybe a few more years before it will start to slow down if at all
41:30
that’s that would be my articulation here Ilia this has been a great conversation thanks for joining us thank
41:36
you so much for the conversation I really enjoyed it find us on Twitter at no prior pod subscribe to our YouTube
41:43
channel if you want to see our faces follow the show on Apple podcast Spotify or wherever you listen that way you get
41:49
a new episode every week and sign up for emails or find transcripts for every episode at no- pri.com