FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

In 2025, we’ll make an £18m grant to establish a new organisation to develop advanced AI systems with provable + quantifiable safety guarantees.

Complete this short EOI if you’d like to be involved, as an affiliated entity, organisational co-founder, or in a technical/executive/advisory capacity: https://www.aria.org.uk/media/cjcp200d/aria-ta2-eoi.pdf

ARIA is a non-departmental public body, sponsored by the Department for Science, Innovation + Technology.

From climate change to AI, society faces challenges and opportunities that can be uniquely addressed by science + technology.

Who we are

Created by an Act of Parliament, and sponsored by the Department for Science, Innovation, and Technology, ARIA funds breakthrough R&D in underexplored areas to catalyse new paths to prosperity for the UK and the world.

What we do

We empower scientists and engineers to pursue research that is too speculative, too hard, or too interdisciplinary to pursue elsewhere. ARIA’s programmes are shaped and led by our Programme Directors, scientific and technical leaders with deep expertise and a focused, creative vision for how technology can enable a better future.

How we do it

Our independence allows us to pursue bold, long-term outcomes but that is only made possible through robust accountability. As a public body, we are accountable to Parliament and our Board of Directors plays a critical role in scrutinising our strategy. We also engage broadly and seek open feedback to shape our research, baking in ethics and governance considerations from the start to ensure we pursue breakthroughs responsibly.

TRANSCRIPT

1
00:00:08.315 –> 00:00:08.805
Welcome.

2
00:00:09.065 –> 00:00:10.885
I’m excited that we’re all here and chatting.

3
00:00:11.155 –> 00:00:13.525
Just to set the stage a little bit upfront.

4
00:00:13.745 –> 00:00:16.405
So this will be a conversation about the technical area two

5
00:00:16.585 –> 00:00:20.325
of the Safeguard AI program in a nutshell in technical area

6
00:00:20.465 –> 00:00:23.645
two, I will be making a single 18 million pound grant

7
00:00:23.705 –> 00:00:26.285
to a new organization to spearhead the r

8
00:00:26.285 –> 00:00:29.525
and d of the machine learning elements that form a part

9
00:00:29.525 –> 00:00:30.685
of the Safeguard AI program.

10
00:00:31.145 –> 00:00:33.485
And because we wanna reach a broader audience of,

11
00:00:33.485 –> 00:00:36.125
of outstanding talent who might wanna be part and,

12
00:00:36.125 –> 00:00:37.565
and sort of champion this effort.

13
00:00:37.785 –> 00:00:39.845
The goal for this conversation is to share

14
00:00:39.945 –> 00:00:42.565
and make more accessible our thinking on the purpose

15
00:00:42.745 –> 00:00:44.005
and the scope of TA two,

16
00:00:44.185 –> 00:00:46.125
how we’re gonna be funding this, et cetera.

17
00:00:46.185 –> 00:00:50.005
So as a sort of solicitation preview, if this is something

18
00:00:50.005 –> 00:00:51.005
that sparks your interest,

19
00:00:51.305 –> 00:00:53.725
you can actually submit an expression of interest.

20
00:00:53.905 –> 00:00:55.765
Um, you can find that on the Aria website.

21
00:00:55.945 –> 00:00:59.085
If you type in safeguard AI program, you, you land on the,

22
00:00:59.305 –> 00:01:02.205
on our page, and, and you then find TA two, EEO i

23
00:01:02.515 –> 00:01:04.605
with a link to the, to the expression of interest.

24
00:01:04.945 –> 00:01:07.325
Um, we’ll probably repeat that information in y Well,

25
00:01:07.345 –> 00:01:10.485
and then just very briefly, so for the conversation today,

26
00:01:10.665 –> 00:01:14.765
we have Davida here, the program director Gio,

27
00:01:14.865 –> 00:01:17.165
who is the scientific director for the, for the program,

28
00:01:17.545 –> 00:01:19.765
and then ob Adam Marble Stone, the co-founder

29
00:01:19.945 –> 00:01:23.765
and CEO of Conversion Research, um, has agreed kindly

30
00:01:23.765 –> 00:01:25.645
to join us and help sort of move us

31
00:01:25.645 –> 00:01:27.325
through the conversation, um, moderate,

32
00:01:27.995 –> 00:01:30.845
make sure this is accessible and interesting to everyone.

33
00:01:30.985 –> 00:01:32.845
And then my name is No Aman.

34
00:01:33.165 –> 00:01:34.325
I work together with David

35
00:01:34.465 –> 00:01:36.165
and Yasha as a technical specialist

36
00:01:36.165 –> 00:01:37.165
for Safeguard AI program.

37
00:01:37.475 –> 00:01:39.805
With that context, I think I’ll just hand it over to you,

38
00:01:39.805 –> 00:01:41.365
Adam, and you help us work

39
00:01:41.365 –> 00:01:42.565
through, work through the conversation.

40
00:01:43.045 –> 00:01:45.405
Actually, one thing I wanna say, just as, as a piece

41
00:01:45.405 –> 00:01:48.565
of intro for Adam, is that, uh, Adam was the first person to

42
00:01:49.365 –> 00:01:52.125
publicly point out that, uh, Y’s research agenda

43
00:01:52.145 –> 00:01:53.925
and my research agenda we’re going in,

44
00:01:53.945 –> 00:01:56.325
in convergent directions a little over a year ago.

45
00:01:56.425 –> 00:01:57.565
So they’re really great to have

46
00:01:57.565 –> 00:01:58.725
him in the, in the call today.

47
00:01:58.985 –> 00:02:01.165
I’m, I’m excited to be here with, with all of you.

48
00:02:01.465 –> 00:02:02.565
I’m, I’m a fan of this and,

49
00:02:02.565 –> 00:02:04.165
and looking forward to, to learning more.

50
00:02:04.265 –> 00:02:06.565
So, so, Davi dad, why don’t, why don’t you kick us off and,

51
00:02:06.565 –> 00:02:08.085
and just tell us in, in a nutshell, what,

52
00:02:08.085 –> 00:02:10.005
what is the Safeguarded AI program?

53
00:02:10.275 –> 00:02:11.885
Yeah, so briefly, we’re trying

54
00:02:11.885 –> 00:02:15.805
to look a few years ahead on the trajectory of AI to a point

55
00:02:15.805 –> 00:02:18.845
where the underlying foundation models have such strong

56
00:02:18.845 –> 00:02:21.525
general capabilities, including in areas like science

57
00:02:21.545 –> 00:02:22.885
and engineering and management,

58
00:02:22.905 –> 00:02:25.285
but also in areas like persuasion

59
00:02:25.425 –> 00:02:28.365
and cyber attacks that people who make decisions about this,

60
00:02:28.545 –> 00:02:32.365
uh, determine that this future AI system is too dangerous

61
00:02:32.385 –> 00:02:34.485
to put in direct contact with the internet,

62
00:02:34.485 –> 00:02:37.125
or even to put in direct contact with any human users

63
00:02:37.265 –> 00:02:39.685
who might be vulnerable to being convinced

64
00:02:39.685 –> 00:02:41.285
of some false things, causing them

65
00:02:41.285 –> 00:02:43.965
to participate in some unpredictable and dangerous plans.

66
00:02:44.145 –> 00:02:46.005
And yet, we wouldn’t just wanna put a pause on

67
00:02:46.325 –> 00:02:47.405
progress when we get to that point.

68
00:02:47.465 –> 00:02:49.765
We wanna have a plan for how we can use

69
00:02:49.865 –> 00:02:51.685
and harness those superhuman capabilities

70
00:02:51.945 –> 00:02:53.645
to help develop new medical treatments

71
00:02:53.705 –> 00:02:56.285
and develop new software and hardware that’s more efficient

72
00:02:56.305 –> 00:02:58.365
and secure and effective in various ways,

73
00:02:58.365 –> 00:02:59.965
develop new physical technologies

74
00:03:00.105 –> 00:03:03.445
and control systems, allocate resources to balance

75
00:03:03.445 –> 00:03:06.085
between multi-stakeholder objectives in various contexts,

76
00:03:06.085 –> 00:03:07.405
lower the cost of manufacturing

77
00:03:07.745 –> 00:03:11.045
and deploying advanced technologies in, in rapidly changing,

78
00:03:11.045 –> 00:03:12.045
in uncertain environments.

79
00:03:12.045 –> 00:03:14.765
And ultimately to sort of defend the attack surfaces

80
00:03:14.795 –> 00:03:18.565
that some rogue AI might eventually try to exploit

81
00:03:18.665 –> 00:03:21.165
and to improve the overall resilience of human societies.

82
00:03:21.385 –> 00:03:22.605
So that’s sort of the question

83
00:03:22.605 –> 00:03:24.285
that safeguarded AI is trying to answer.

84
00:03:24.465 –> 00:03:25.845
Uh, what is a possible way

85
00:03:25.845 –> 00:03:28.565
that we might use such advanced AI technology,

86
00:03:28.695 –> 00:03:31.365
maybe fewer years in the future than you might think,

87
00:03:31.505 –> 00:03:34.285
to actually derive benefits both for economics and security?

88
00:03:34.505 –> 00:03:36.365
Um, I should say by way of introduction,

89
00:03:36.365 –> 00:03:38.885
that at Aria we have a kind of rich taxonomy

90
00:03:38.885 –> 00:03:39.925
of areas of interest.

91
00:03:39.945 –> 00:03:41.325
So at the top level, we have

92
00:03:41.325 –> 00:03:43.005
what we call opportunity spaces,

93
00:03:43.215 –> 00:03:46.565
which are the topical areas defined by each program director

94
00:03:46.565 –> 00:03:49.965
that are more general and broad than the program itself,

95
00:03:50.025 –> 00:03:53.085
but more specific than any existing well established

96
00:03:53.565 –> 00:03:55.485
research community or, or subfield.

97
00:03:55.585 –> 00:03:59.485
So for example, we have the opportunity space smarter robot

98
00:03:59.505 –> 00:04:02.765
bodies, which is more specific than robotics, uh,

99
00:04:02.785 –> 00:04:06.925
but less specific than the program on robot dexterity.

100
00:04:06.955 –> 00:04:08.485
This a particular part of robot bodies.

101
00:04:08.495 –> 00:04:10.765
Those are both being run by my colleague Jenny Reed.

102
00:04:10.785 –> 00:04:13.205
In my case, my opportunity space is called

103
00:04:13.275 –> 00:04:15.165
Mathematics for Safe ai.

104
00:04:15.265 –> 00:04:16.885
So more specific than AI safety,

105
00:04:16.985 –> 00:04:20.645
but less specific than the program safeguarded ai, which is,

106
00:04:20.665 –> 00:04:23.805
uh, really quite a specific bet on a particular pattern

107
00:04:23.865 –> 00:04:25.005
of arranging components

108
00:04:25.065 –> 00:04:27.965
and interfaces that I believe will help us leverage

109
00:04:27.965 –> 00:04:30.645
mathematics to, to solve this, uh, AI safety problem

110
00:04:30.785 –> 00:04:33.965
by stating the real world tasks we want to use the AI

111
00:04:33.985 –> 00:04:36.005
for in a form that’s sufficiently rigorous,

112
00:04:36.115 –> 00:04:38.605
that a sufficiently advanced AI system should be able

113
00:04:38.605 –> 00:04:41.285
to prove quantitative guarantees on the safety

114
00:04:41.345 –> 00:04:44.965
and performance of its, of its proposed answers, outputs,

115
00:04:44.965 –> 00:04:47.005
actions that we can formally verify, uh,

116
00:04:47.245 –> 00:04:49.365
ultimately grounded in, in mathematical proof,

117
00:04:49.365 –> 00:04:51.565
which is the highest, the highest grade form

118
00:04:51.565 –> 00:04:53.845
of safety argument that, that we could possibly get.

119
00:04:53.985 –> 00:04:56.005
And then within the program safeguarded ai,

120
00:04:56.005 –> 00:04:58.845
there’s a further subdivision into technical areas

121
00:04:59.095 –> 00:05:00.325
which we call TAs.

122
00:05:00.325 –> 00:05:02.885
So TA one, TA two, and TA three.

123
00:05:02.945 –> 00:05:05.045
And then within those technical areas, there are sub areas,

124
00:05:05.185 –> 00:05:07.885
so TA 1.1, 1.2, and so on

125
00:05:07.885 –> 00:05:10.965
or sub objectives, TA two A, TA two B.

126
00:05:11.025 –> 00:05:12.805
And then within the technical areas

127
00:05:13.065 –> 00:05:15.245
or sub areas, there are projects

128
00:05:15.345 –> 00:05:18.725
and projects are the same thing as individual contracts

129
00:05:18.725 –> 00:05:21.285
that Aria gives to particular external parties

130
00:05:21.385 –> 00:05:23.845
to actually conduct a piece of research within that area.

131
00:05:23.905 –> 00:05:26.245
So it’s also worth saying, just as a matter of course,

132
00:05:26.725 –> 00:05:29.165
aria doesn’t conduct research, we’re a funding body.

133
00:05:29.225 –> 00:05:31.085
We fund research, so we make contracts

134
00:05:31.085 –> 00:05:33.005
with external parties, but compared

135
00:05:33.005 –> 00:05:35.805
to a typical funding body, we do have a little bit more

136
00:05:35.945 –> 00:05:39.605
of a particular kind of concrete vision about what it is

137
00:05:39.605 –> 00:05:41.805
that all the projects that we fund are trying to,

138
00:05:41.825 –> 00:05:42.845
trying to do together.

139
00:05:43.145 –> 00:05:45.805
So yeah, I’ll, I’ll leave it at that for the program. We’ll

140
00:05:45.805 –> 00:05:47.565
Get more into the technical details sort

141
00:05:47.565 –> 00:05:48.965
of in the second half of the conversation.

142
00:05:49.025 –> 00:05:51.765
But just for now, you do have a pretty specific thesis

143
00:05:51.785 –> 00:05:53.845
behind the safeguarded AI program,

144
00:05:54.255 –> 00:05:56.925
which you mentioned in, in passing there.

145
00:05:57.145 –> 00:05:59.685
Um, which I, I feel is very different from many

146
00:05:59.685 –> 00:06:02.285
of the other approaches we see in, in AI safety, right?

147
00:06:02.285 –> 00:06:03.885
Some people are working on just sort

148
00:06:03.885 –> 00:06:05.205
of spot checks or evals.

149
00:06:05.205 –> 00:06:06.805
Let’s, let’s check it, you know, that,

150
00:06:06.805 –> 00:06:09.245
that this model doesn’t do something bad on certain cases

151
00:06:09.345 –> 00:06:11.605
and, and hope that that that generalizes other people are

152
00:06:11.605 –> 00:06:14.285
trying to align the AI so that it’s, it’s sort of intentions

153
00:06:14.345 –> 00:06:16.085
or objectives are, are right.

154
00:06:16.555 –> 00:06:18.765
What, what is the sort of specific thesis behind this?

155
00:06:18.765 –> 00:06:19.965
There’s an idea of containment

156
00:06:19.965 –> 00:06:22.885
and there’s an idea of formal verification gatekeepers.

157
00:06:23.395 –> 00:06:24.685
What are kind of the key concepts

158
00:06:24.685 –> 00:06:26.525
that you have in this program that are different from

159
00:06:26.525 –> 00:06:28.965
typical AI safety research programs?

160
00:06:29.275 –> 00:06:31.725
Yeah, great question. So I, I would say, uh,

161
00:06:31.745 –> 00:06:34.605
what’s become fairly well understood at this point is a

162
00:06:34.605 –> 00:06:37.565
distinction between a prosaic alignment, which is

163
00:06:37.565 –> 00:06:39.045
where you’re taking some existing model

164
00:06:39.065 –> 00:06:40.845
and trying to put some, some safeguards

165
00:06:40.845 –> 00:06:42.885
and some fine tuning around it so that it, uh,

166
00:06:42.885 –> 00:06:45.005
doesn’t engage in certain dangerous activities

167
00:06:45.005 –> 00:06:46.245
as much as it otherwise would.

168
00:06:46.305 –> 00:06:48.685
Versus ambitious alignment, which is where you’re trying

169
00:06:48.685 –> 00:06:50.045
to really get into the internals.

170
00:06:50.045 –> 00:06:52.045
It’s often a mechanistic interpretability,

171
00:06:52.325 –> 00:06:53.365
a style of approach.

172
00:06:53.545 –> 00:06:56.725
And try and ensure that even in unforeseen

173
00:06:56.745 –> 00:06:58.965
and untested situations, that the sort

174
00:06:58.965 –> 00:07:01.805
of internal incentive structures that shape the behavior

175
00:07:01.825 –> 00:07:04.205
of the system are, are robustly aligned

176
00:07:04.235 –> 00:07:07.245
with some particular way of, of orienting to, uh,

177
00:07:07.245 –> 00:07:08.885
action selection and, and objectives.

178
00:07:08.945 –> 00:07:10.165
And I would say what we’re trying

179
00:07:10.165 –> 00:07:12.085
to do here is actually kind of in between,

180
00:07:12.185 –> 00:07:14.845
and it’s a space that’s not, um, super well understood

181
00:07:14.845 –> 00:07:17.485
yet we’re targeting systems that are too dangerous

182
00:07:17.625 –> 00:07:20.845
or too capable to just rely on the kind of prosaic alignment

183
00:07:20.855 –> 00:07:24.125
where we try to squash bugs as they arise and, uh,

184
00:07:24.185 –> 00:07:26.365
and do some, some monitoring and some oversight.

185
00:07:26.505 –> 00:07:29.045
Uh, because if there’s even one failure,

186
00:07:29.225 –> 00:07:30.725
it could be a catastrophic failure

187
00:07:30.795 –> 00:07:33.245
that creates some unstoppable malware that gets out

188
00:07:33.245 –> 00:07:35.325
of control on sort of an irreversible event.

189
00:07:35.705 –> 00:07:38.605
But that level of capability, that lower bound on

190
00:07:38.605 –> 00:07:40.325
how capable a system has to be to be

191
00:07:40.325 –> 00:07:42.845
that worried about it is not incompatible

192
00:07:43.035 –> 00:07:45.845
with it being still not yet so capable

193
00:07:45.875 –> 00:07:48.525
that we could not possibly hope to contain it

194
00:07:48.525 –> 00:07:50.645
because it might discover some new laws of physics

195
00:07:51.105 –> 00:07:54.005
and find some way to escape through the quantum fields that,

196
00:07:54.005 –> 00:07:56.165
uh, we didn’t consider when we designed the hardware,

197
00:07:56.225 –> 00:07:59.605
or find some way of, of defeating, uh, formal verification

198
00:07:59.705 –> 00:08:01.765
by discovering a new mathematical principle

199
00:08:01.765 –> 00:08:03.605
that makes the ZFC inconsistent or something.

200
00:08:03.605 –> 00:08:06.965
So there is a level, you know, way beyond several orders

201
00:08:06.985 –> 00:08:09.125
of magnitude beyond the current frontier, uh,

202
00:08:09.175 –> 00:08:10.685
where it might make sense

203
00:08:10.685 –> 00:08:13.285
to ask those questions about is there any, any way

204
00:08:13.285 –> 00:08:16.405
of possibly safely running this at all, even in the most

205
00:08:16.925 –> 00:08:18.125
rigorous containment measures.

206
00:08:18.185 –> 00:08:19.685
But we’re not targeting that either.

207
00:08:19.785 –> 00:08:21.525
So we’re targeting this kind of middle ground

208
00:08:21.695 –> 00:08:24.565
where it would be possible to contain this thing and run it.

209
00:08:24.565 –> 00:08:26.845
And then the question becomes, what could you do with it

210
00:08:26.845 –> 00:08:29.565
that’s useful if you can’t talk to it directly,

211
00:08:29.565 –> 00:08:31.725
because that would give it too much causal power.

212
00:08:31.985 –> 00:08:36.405
We want to use the recent advances in machine learning in

213
00:08:36.405 –> 00:08:38.965
order to tackle the very ambitious questions

214
00:08:38.965 –> 00:08:40.045
that Davida talked about.

215
00:08:40.185 –> 00:08:42.725
So let me try to clarify what that means.

216
00:08:43.275 –> 00:08:48.005
Instead of focusing on agency as sort of the starting point,

217
00:08:48.265 –> 00:08:49.845
we wanna focus on an AI

218
00:08:49.845 –> 00:08:52.365
that actually understands the domain in which it is

219
00:08:52.365 –> 00:08:55.565
operating as a model of the world in the same way

220
00:08:55.635 –> 00:08:58.405
that scientists build an understanding of the world

221
00:08:58.625 –> 00:09:01.045
and then reason with that knowledge

222
00:09:01.065 –> 00:09:04.285
and all the pieces of it which are interpretable so

223
00:09:04.285 –> 00:09:05.405
as to come to conclusions.

224
00:09:05.745 –> 00:09:08.005
And so that’s the, the verification aspect

225
00:09:08.075 –> 00:09:10.045
that Teda talked about is done

226
00:09:10.045 –> 00:09:14.565
by the machine learning itself in the sense that we train it

227
00:09:14.745 –> 00:09:17.205
to create pieces of knowledge

228
00:09:17.205 –> 00:09:19.565
and reasoning paths that can be verified.

229
00:09:19.715 –> 00:09:21.565
Otherwise, it’s not a good training.

230
00:09:21.865 –> 00:09:24.085
And verification is something that can be done in,

231
00:09:24.085 –> 00:09:27.925
in various ways, but it doesn’t need as much intelligence.

232
00:09:28.185 –> 00:09:29.605
The little pieces of knowledge

233
00:09:29.605 –> 00:09:32.685
that are put together can be verified to be consistent

234
00:09:32.685 –> 00:09:35.605
with each other using non-AI pieces

235
00:09:35.745 –> 00:09:38.565
or even like simple math in various ways.

236
00:09:38.745 –> 00:09:42.085
And so this is a very different starting point from the

237
00:09:42.085 –> 00:09:44.445
current approach where we are right away trying

238
00:09:44.445 –> 00:09:45.685
to build an agent

239
00:09:45.905 –> 00:09:48.485
and we’re not sure that, you know, there is not some monster

240
00:09:48.635 –> 00:09:51.005
that may hide behind this agent.

241
00:09:51.315 –> 00:09:54.525
Instead, the way I like to think about it is we’re trying

242
00:09:54.525 –> 00:09:56.205
to imitate the process of

243
00:09:56.305 –> 00:09:59.845
how scientists take decisions based on an understanding

244
00:09:59.845 –> 00:10:03.725
of the world that they can, you know, explain and justify,

245
00:10:03.945 –> 00:10:08.605
but it is the power of machine learning that

246
00:10:08.605 –> 00:10:11.965
that allows to find these reasoning steps

247
00:10:12.425 –> 00:10:14.805
and, you know, pieces of knowledge that are coherent

248
00:10:14.805 –> 00:10:16.085
with each other and so on.

249
00:10:16.145 –> 00:10:19.285
Uh, so it doesn’t necessarily mean we start from scratch,

250
00:10:19.435 –> 00:10:21.205
like ai, you know, zero.

251
00:10:21.545 –> 00:10:23.285
We actually wanna leverage the advances

252
00:10:23.285 –> 00:10:24.805
that have happened in the last few years,

253
00:10:24.825 –> 00:10:26.605
and that is the reason why we need people

254
00:10:27.185 –> 00:10:29.045
who have a good understanding

255
00:10:29.585 –> 00:10:32.085
of the advances in machine learning over the last

256
00:10:32.145 –> 00:10:33.245
few years. Well,

257
00:10:33.245 –> 00:10:34.725
That’s a great segue because my,

258
00:10:34.785 –> 00:10:36.205
my next question will be about what is,

259
00:10:36.205 –> 00:10:37.445
what is this technical area to,

260
00:10:37.545 –> 00:10:38.805
you know, what is, what is the goal?

261
00:10:38.825 –> 00:10:40.525
And I understand that has to do with machine learning,

262
00:10:40.525 –> 00:10:42.365
but when I, what I’m understanding from you here is

263
00:10:42.365 –> 00:10:44.605
that there, there’s this key element of a world model.

264
00:10:44.825 –> 00:10:47.285
The AI is not just acting in the open world,

265
00:10:47.345 –> 00:10:48.405
it has some world model,

266
00:10:48.435 –> 00:10:50.645
there’s machine learning maybe involved in creating

267
00:10:50.645 –> 00:10:52.005
that world model in the first place.

268
00:10:52.115 –> 00:10:54.365
There’s machine learning involved in proving

269
00:10:54.505 –> 00:10:56.325
or bounding properties of

270
00:10:56.325 –> 00:10:57.725
what can happen in that world model.

271
00:10:57.875 –> 00:10:59.245
Then there’s the AI itself,

272
00:10:59.245 –> 00:11:00.525
which is maybe the dangerous one,

273
00:11:00.665 –> 00:11:02.965
and maybe machine learning shows up in a few other places.

274
00:11:03.585 –> 00:11:05.325
So tell us about TA two.

275
00:11:05.705 –> 00:11:08.685
TA two is the machine learning Yeah. Kind of core of this.

276
00:11:08.995 –> 00:11:10.885
What is it? What does TA two actually have

277
00:11:10.885 –> 00:11:12.365
to accomplish and build?

278
00:11:12.905 –> 00:11:14.765
So, TA two is where we’re looking

279
00:11:14.865 –> 00:11:17.005
to develop the new machine learning capabilities

280
00:11:17.235 –> 00:11:18.485
that are gonna be necessary

281
00:11:18.585 –> 00:11:20.525
to fulfill the safeguarded AI vision.

282
00:11:20.905 –> 00:11:24.605
Um, and we’re gonna do this by seeding a new organization

283
00:11:24.795 –> 00:11:27.445
with an 18 million pound grant to do some of the research

284
00:11:27.555 –> 00:11:29.085
that, uh, we need for, uh,

285
00:11:29.105 –> 00:11:30.485
for new machine learning capabilities.

286
00:11:30.625 –> 00:11:32.285
So to get more into the technical details of that,

287
00:11:32.285 –> 00:11:35.165
there’s basically like four technical objectives within TA

288
00:11:35.225 –> 00:11:36.405
two machine learning capabilities

289
00:11:36.405 –> 00:11:37.645
that we don’t have right now that we need.

290
00:11:37.705 –> 00:11:41.045
The first one of those, like TA two A is machine learning

291
00:11:41.045 –> 00:11:44.085
systems that help teams of humans to construct world models

292
00:11:44.475 –> 00:11:46.325
that are in a kind of formal language,

293
00:11:46.405 –> 00:11:48.085
a little bit like a probalistic programming language

294
00:11:48.275 –> 00:11:51.045
that are auditable and interpretable and explainable,

295
00:11:51.045 –> 00:11:52.805
but that are not hard coded the way

296
00:11:52.805 –> 00:11:54.845
that a human would like try to hardcode their knowledge.

297
00:11:54.845 –> 00:11:55.885
That’s coming from machine learning,

298
00:11:55.945 –> 00:11:58.645
but in a formal language, not not just a neural network.

299
00:11:58.715 –> 00:12:00.405
It’s sort of the neural network outputs,

300
00:12:00.405 –> 00:12:01.605
something that people can understand.

301
00:12:01.845 –> 00:12:05.085
D two B is then where we’re gonna say, let’s assume

302
00:12:05.085 –> 00:12:06.845
that we have a world model like that in a language

303
00:12:06.845 –> 00:12:08.205
that’s like a probabilistic perme language.

304
00:12:08.305 –> 00:12:10.085
How do we make it tractable to do inference

305
00:12:10.345 –> 00:12:12.645
and actually query like, what’s going

306
00:12:12.645 –> 00:12:14.805
to happen if we take this action in this world model?

307
00:12:14.945 –> 00:12:17.005
And, and what are the probabilities of different outcomes?

308
00:12:17.025 –> 00:12:18.805
Or really we’re thinking about imprecise probability.

309
00:12:18.805 –> 00:12:20.885
So it’s like, what upper bounds can we put on the

310
00:12:20.885 –> 00:12:23.325
probabilities of different outcomes with some kind of, uh,

311
00:12:23.545 –> 00:12:25.685
non asymptotic conversion guarantee where we know

312
00:12:25.685 –> 00:12:27.885
that we’re getting an answer that is accurate

313
00:12:27.885 –> 00:12:29.445
to some extent that we can quantify.

314
00:12:29.625 –> 00:12:32.925
Um, TA two C then is where we sort of take that, uh, uh,

315
00:12:32.925 –> 00:12:35.805
technology and turn it towards making proofs of safety.

316
00:12:35.985 –> 00:12:39.365
So taking an overall system that includes neural networks

317
00:12:39.365 –> 00:12:40.765
that are, are trying to take actions,

318
00:12:40.765 –> 00:12:42.645
that also includes runtime verification,

319
00:12:42.655 –> 00:12:44.165
doing reachability analysis,

320
00:12:44.185 –> 00:12:45.445
and we’re trying to say, uh,

321
00:12:45.545 –> 00:12:48.205
can we get a quantitative bound on the probability

322
00:12:48.545 –> 00:12:51.285
of some safety condition being violated over time

323
00:12:51.425 –> 00:12:52.845
as the system is being run.

324
00:12:52.945 –> 00:12:55.885
And then finally the last one, which is TA 2D, is

325
00:12:55.885 –> 00:12:58.965
where we’re gonna try to train a system inside of, uh,

326
00:12:59.105 –> 00:13:02.205
inside of a kind of contained containment, uh, vessel

327
00:13:02.295 –> 00:13:04.725
where we do autonomous a r and D that would

328
00:13:04.725 –> 00:13:07.525
otherwise may be kind of dangerous if it were developing a

329
00:13:07.525 –> 00:13:09.765
GI that was potentially even more capable in itself.

330
00:13:09.865 –> 00:13:11.005
So we’re gonna try to avoid that,

331
00:13:11.025 –> 00:13:13.125
but instead, say if we had systems

332
00:13:13.155 –> 00:13:14.765
that could do autonomous AI r

333
00:13:14.765 –> 00:13:17.005
and d, how could we use them to d do r

334
00:13:17.005 –> 00:13:19.525
and d of special purpose AI agents,

335
00:13:19.525 –> 00:13:22.165
which we would then verify with the techniques in TA two B

336
00:13:22.165 –> 00:13:24.085
and TA two C as, uh,

337
00:13:24.085 –> 00:13:26.085
satisfying our quantitative safety requirements.

338
00:13:26.225 –> 00:13:28.005
But essentially you’re, you’re sort of building on things.

339
00:13:28.005 –> 00:13:29.645
We’re seeing, you know, alpha zero,

340
00:13:29.895 –> 00:13:31.765
alpha proof, neural theory, improving.

341
00:13:32.135 –> 00:13:34.365
Bunch of those elements come in, bunch

342
00:13:34.365 –> 00:13:36.045
of other machine learning elements come in,

343
00:13:36.445 –> 00:13:38.325
collaborative authoring of text or programs.

344
00:13:38.585 –> 00:13:39.605
All of these elements, very

345
00:13:39.605 –> 00:13:40.845
cross-disciplinary have to come together.

346
00:13:41.625 –> 00:13:42.925
Um, and then you’re adding on top

347
00:13:42.925 –> 00:13:44.005
of this is very unique context

348
00:13:44.125 –> 00:13:46.605
where maybe you have some special security requirements.

349
00:13:47.185 –> 00:13:50.365
Um, you don’t want this thing to be just dumping the proof

350
00:13:50.595 –> 00:13:52.525
that it wrote on the internet or something like that.

351
00:13:52.585 –> 00:13:54.245
You want it to have limited communication,

352
00:13:54.825 –> 00:13:56.005
uh, with the outside world.

353
00:13:56.155 –> 00:13:58.285
Okay, great. So, so, so I think we understand there’s some

354
00:13:58.285 –> 00:14:00.885
very challenging machine learning problems, uh,

355
00:14:00.885 –> 00:14:04.165
that TA two has to tackle building on current trends, right?

356
00:14:04.225 –> 00:14:06.925
But sort of really adding in an unprecedented number

357
00:14:06.925 –> 00:14:09.365
of different layers and complexities and scales.

358
00:14:09.625 –> 00:14:11.885
How is Aria going to actually fund TA two

359
00:14:11.945 –> 00:14:15.005
and try to catalyze an organization actually existing

360
00:14:15.115 –> 00:14:16.405
that can solve these problems?

361
00:14:16.675 –> 00:14:20.085
Yeah, so procedurally though, from the aria end of things,

362
00:14:20.085 –> 00:14:23.005
so basically we, two phases to how we want to fund this.

363
00:14:23.015 –> 00:14:25.445
There will be a shorter sort of phase one

364
00:14:25.855 –> 00:14:29.605
where we wanna fund just like a handful of teams, um,

365
00:14:29.605 –> 00:14:32.325
that might be interested in, in funding this organization

366
00:14:32.705 –> 00:14:35.405
to have some month to basically develop a full proposal,

367
00:14:35.585 –> 00:14:39.165
and then we select in phase two, one of those teams

368
00:14:39.545 –> 00:14:42.285
to get sort of a full large grant, um,

369
00:14:42.285 –> 00:14:43.525
to set up this organization.

370
00:14:43.785 –> 00:14:46.725
And the point, also the point in some plans for a lot of,

371
00:14:46.805 –> 00:14:48.805
I think what we are going to talk about in a second is

372
00:14:48.805 –> 00:14:51.365
that we don’t know all the answers to like,

373
00:14:51.365 –> 00:14:52.845
what this organization should look like.

374
00:14:53.225 –> 00:14:56.605
Um, instead we basically have thought a bunch about

375
00:14:57.035 –> 00:14:59.045
what are the key questions

376
00:14:59.065 –> 00:15:00.565
or sort of the key considerations

377
00:15:00.565 –> 00:15:02.925
that such a founding team would have

378
00:15:02.925 –> 00:15:03.925
to think carefully about,

379
00:15:04.305 –> 00:15:06.405
and then we’re sort of excited for, for those people to,

380
00:15:06.425 –> 00:15:08.245
to figure out the details of that. Mm-hmm.

381
00:15:08.545 –> 00:15:11.005
So you, you don’t know how to solve and,

382
00:15:11.005 –> 00:15:12.405
and, you know, design the organization.

383
00:15:12.405 –> 00:15:14.045
You need very talented organizational

384
00:15:14.245 –> 00:15:15.725
builders as well to mm-hmm.

385
00:15:15.805 –> 00:15:17.365
To actually come in and figure this out

386
00:15:17.545 –> 00:15:19.205
and build something, not only for the sort

387
00:15:19.205 –> 00:15:21.405
of founding period where Aria funds it,

388
00:15:21.405 –> 00:15:22.845
but for what it becomes in the future

389
00:15:22.905 –> 00:15:26.045
and everything, um, potentially very pivotal organization in

390
00:15:26.045 –> 00:15:28.925
the AI safety, security capability space.

391
00:15:29.225 –> 00:15:30.765
But you, you do have a pretty clear thesis

392
00:15:30.795 –> 00:15:32.445
that there is a new organization, right?

393
00:15:32.445 –> 00:15:35.725
This is not something that existing organizations can do.

394
00:15:35.865 –> 00:15:37.765
Can you just say quickly a little bit about that?

395
00:15:38.105 –> 00:15:39.525
Why does it need to be a new organization?

396
00:15:39.525 –> 00:15:40.885
Why are we talking about people proposing

397
00:15:40.885 –> 00:15:42.725
to found something new versus just say, Hey,

398
00:15:42.725 –> 00:15:44.045
we have one that’s, it’s great.

399
00:15:44.445 –> 00:15:47.285
I I just need to do more research within it.

400
00:15:47.745 –> 00:15:50.685
So we have some particular, uh, kind

401
00:15:50.685 –> 00:15:54.285
of governance attributes that we, we want to, uh,

402
00:15:54.385 –> 00:15:57.845
be confident of in supporting the research in this area.

403
00:15:57.845 –> 00:16:00.125
Because although it’s framed

404
00:16:00.125 –> 00:16:03.085
and motivated by, uh, questions of safety, which are,

405
00:16:03.085 –> 00:16:04.765
which are about avoiding harm

406
00:16:04.765 –> 00:16:07.245
to the public from a technical perspective, uh,

407
00:16:07.505 –> 00:16:09.365
the technical success in these areas

408
00:16:09.485 –> 00:16:10.725
wouldn’t lead to safety.

409
00:16:10.785 –> 00:16:12.245
It would lead to controllability.

410
00:16:12.465 –> 00:16:15.165
So the human overseers who are participating in,

411
00:16:15.265 –> 00:16:17.045
in this process of, of helping to specify

412
00:16:17.225 –> 00:16:20.245
and audit, uh, what safety is defined as

413
00:16:20.275 –> 00:16:22.605
what the properties are that we want the systems behavior

414
00:16:22.605 –> 00:16:25.365
to have, uh, need to be really trustworthy or the, and, and,

415
00:16:25.365 –> 00:16:28.045
or they need to, in fact be meta trustworthy about

416
00:16:28.045 –> 00:16:30.525
incorporating the right kinds of other stakeholders in

417
00:16:30.525 –> 00:16:32.045
that process of, of developing

418
00:16:32.045 –> 00:16:33.605
and auditing safety definitions.

419
00:16:33.825 –> 00:16:35.485
In other words, when you have capability,

420
00:16:35.555 –> 00:16:38.645
including the kind of scientific understanding of the world,

421
00:16:38.645 –> 00:16:40.725
and ability to answer questions with that, uh,

422
00:16:40.795 –> 00:16:43.685
that can be misused, uh, in dangerous ways.

423
00:16:43.985 –> 00:16:46.805
And so hence governance, uh, becomes important,

424
00:16:47.675 –> 00:16:48.675
Right? So,

425
00:16:48.675 –> 00:16:52.505
uh, we want to ensure that the safety

426
00:16:53.025 –> 00:16:54.825
critical decisions that might be made

427
00:16:54.845 –> 00:16:58.345
by an organization which has stewardship over these new

428
00:16:58.375 –> 00:17:02.785
technological capabilities wouldn’t be systematically biased

429
00:17:03.015 –> 00:17:04.465
towards decisions

430
00:17:04.465 –> 00:17:07.905
that result in getting more net profit sooner.

431
00:17:08.245 –> 00:17:10.985
And so we wanna make sure that the incentive structure

432
00:17:11.085 –> 00:17:13.825
and the decision making structure are robust to that kind

433
00:17:13.825 –> 00:17:15.065
of systematic phenomenon.

434
00:17:15.125 –> 00:17:19.065
And we just, uh, are speculating almost conjecturing

435
00:17:19.255 –> 00:17:21.225
that in order to meet that requirement,

436
00:17:21.285 –> 00:17:23.785
it would be necessary to create a new organization.

437
00:17:24.055 –> 00:17:25.345
Okay. Great. And, and, and what,

438
00:17:25.345 –> 00:17:27.305
what will be Aria’s role in the organization?

439
00:17:27.525 –> 00:17:28.865
So let’s be very clear about this.

440
00:17:29.145 –> 00:17:31.105
Aria is not seeking to found a new organization.

441
00:17:31.205 –> 00:17:33.425
In fact, this organization could be established

442
00:17:33.425 –> 00:17:36.105
by an existing entity provided that all of the governance

443
00:17:36.205 –> 00:17:37.985
and capability requirements are met.

444
00:17:38.245 –> 00:17:41.025
But we are seeking an organization with characteristics

445
00:17:41.175 –> 00:17:43.785
that, uh, that we don’t see to the level that we,

446
00:17:43.785 –> 00:17:46.065
that we want, um, in the UK right now.

447
00:17:46.285 –> 00:17:50.285
So we have earmarked 18 million pounds to give a grant.

448
00:17:50.715 –> 00:17:52.765
This would be completely non-dilutive.

449
00:17:52.825 –> 00:17:55.085
It would be in exchange for, um,

450
00:17:55.435 –> 00:17:58.445
that organization spending some of its human resources

451
00:17:58.505 –> 00:18:02.845
and time on compute resources pursuing, um, the objectives

452
00:18:02.915 –> 00:18:04.125
that we’ve laid out in,

453
00:18:04.125 –> 00:18:05.885
in technical area too, of the program thesis.

454
00:18:06.225 –> 00:18:09.885
So the role that Aria would have would, wouldn’t be really

455
00:18:10.115 –> 00:18:12.845
that different from the role that we have in,

456
00:18:12.945 –> 00:18:14.725
in our other kind of grantees,

457
00:18:14.725 –> 00:18:16.645
except this would be a particularly large grant.

458
00:18:16.785 –> 00:18:20.085
And we would be more discerning about the organizational

459
00:18:20.085 –> 00:18:22.765
characteristics of the grantee than we would with, um,

460
00:18:22.765 –> 00:18:23.965
with a, with a smaller and,

461
00:18:23.965 –> 00:18:26.125
and less, um, crucially significant grant.

462
00:18:26.425 –> 00:18:29.925
And, and what about, uh, your own roles individually, um,

463
00:18:30.025 –> 00:18:31.365
in sort of influencing

464
00:18:31.365 –> 00:18:34.885
or steering what happens here as program directors and,

465
00:18:34.905 –> 00:18:36.645
and, uh, scientific directors and, and

466
00:18:36.645 –> 00:18:37.645
So on? Yeah, so as

467
00:18:37.645 –> 00:18:40.725
the program director, I am taking responsibility

468
00:18:40.945 –> 00:18:42.765
for making the grant, uh,

469
00:18:42.765 –> 00:18:44.565
to the organization that gets selected.

470
00:18:45.105 –> 00:18:47.965
Um, there will also be this, this earlier kind of phase one

471
00:18:48.215 –> 00:18:50.565
where, uh, we’ll have more than one,

472
00:18:50.565 –> 00:18:53.365
hopefully more than one credible, uh, team that will fund

473
00:18:53.465 –> 00:18:55.205
to develop their full proposal

474
00:18:55.265 –> 00:18:57.565
and sort of red team the governance aspects and so on.

475
00:18:57.565 –> 00:18:59.365
And I’ll be making, I’ll be taking responsibility

476
00:18:59.365 –> 00:19:01.365
for making this selection of who, uh,

477
00:19:01.365 –> 00:19:02.805
those shortlisted groups are.

478
00:19:03.205 –> 00:19:07.365
I also will be available, um, to, to sort of give input,

479
00:19:07.365 –> 00:19:09.125
particularly on the, on the technical side

480
00:19:09.185 –> 00:19:10.765
and also on the governance side and,

481
00:19:10.785 –> 00:19:13.325
and sort of, you know, give my perspective on strategy.

482
00:19:13.585 –> 00:19:15.765
But I’m not looking to be a board member.

483
00:19:16.025 –> 00:19:18.205
I’m not, certainly not looking to be, uh, one

484
00:19:18.205 –> 00:19:19.725
of the founders or, or have a

485
00:19:19.885 –> 00:19:20.925
position with this organization.

486
00:19:21.035 –> 00:19:24.725
I’ll be, uh, responsible for aria’s, uh, contribution of,

487
00:19:24.725 –> 00:19:26.005
of non-dilutive funding to it.

488
00:19:26.325 –> 00:19:29.045
I can maybe add, uh, on my end, uh,

489
00:19:29.785 –> 00:19:31.165
as a scientific director.

490
00:19:31.625 –> 00:19:35.805
Um, I will advise on, on, on this project, uh, on the scope,

491
00:19:36.065 –> 00:19:38.285
the, the scientific direction of the program.

492
00:19:38.705 –> 00:19:43.605
And I will provide support to the TA two, uh, teams

493
00:19:43.785 –> 00:19:47.045
and, and, and the chosen team at the end in, you know,

494
00:19:47.045 –> 00:19:50.005
conversations, uh, advice about research priorities.

495
00:19:50.505 –> 00:19:53.125
I’m very interested in those directions.

496
00:19:53.195 –> 00:19:54.925
That is why I accepted this role.

497
00:19:55.245 –> 00:19:57.805
I have opinions that will share,

498
00:19:57.905 –> 00:19:59.845
but you know, I, they, they will,

499
00:19:59.995 –> 00:20:01.445
they will lead their own project.

500
00:20:01.965 –> 00:20:05.765
I am motivated by the this direction

501
00:20:05.795 –> 00:20:09.605
because I think that we’re not currently

502
00:20:09.755 –> 00:20:12.685
with the leading AI labs, uh, sufficiently taking

503
00:20:13.225 –> 00:20:14.485
safety seriously

504
00:20:14.865 –> 00:20:19.005
and seriously means being quantitative about estimating

505
00:20:19.005 –> 00:20:21.445
risks and mitigating them in a way

506
00:20:21.475 –> 00:20:25.965
that is not just geared at the short term dangers with,

507
00:20:25.965 –> 00:20:29.165
with carry eyes, but is by construction going

508
00:20:29.165 –> 00:20:33.285
to be holding up when we, we pass the threshold of, uh,

509
00:20:33.295 –> 00:20:35.125
human level and, and beyond.

510
00:20:35.485 –> 00:20:38.245
I think we need a lot more exploration.

511
00:20:38.385 –> 00:20:41.445
So we, we want those T two creators

512
00:20:41.665 –> 00:20:45.805
to be innovating in this very important challenge.

513
00:20:46.265 –> 00:20:50.365
And, um, I’ve been involved in machine learning

514
00:20:50.365 –> 00:20:52.485
and especially deep learning for many decades,

515
00:20:52.975 –> 00:20:56.525
especially on the probabilistic machine learning side

516
00:20:56.535 –> 00:20:57.605
using neur nets.

517
00:20:57.605 –> 00:20:59.845
And I hope this can be useful, but we’ll see.

518
00:20:59.975 –> 00:21:03.085
We’ll see where in what direction those teams want to go.

519
00:21:03.745 –> 00:21:05.885
Are, are your calls as a technical specialist?

520
00:21:05.885 –> 00:21:09.285
So in, in my role, um, this is mostly focused on trying

521
00:21:09.285 –> 00:21:12.525
to help operationalize and sort of run the solicitation, um,

522
00:21:12.705 –> 00:21:15.285
and sort of selection process, including for TA two.

523
00:21:15.585 –> 00:21:18.165
Um, and then also to some extent, sort of, um,

524
00:21:18.165 –> 00:21:20.325
throughout the, the, the process that the grand runs,

525
00:21:20.595 –> 00:21:24.365
support the RD creators, um, contribute perspectives and,

526
00:21:24.365 –> 00:21:26.445
and, and considerations to, to those decisions.

527
00:21:26.655 –> 00:21:29.605
Great. So let’s, let’s dive a bit into more on

528
00:21:29.605 –> 00:21:30.845
this question of the organization.

529
00:21:30.845 –> 00:21:33.125
Sort of what are, what are the ideas, considerations

530
00:21:33.155 –> 00:21:34.965
that the people who are proposing

531
00:21:34.985 –> 00:21:36.925
and founding this organization are

532
00:21:36.925 –> 00:21:37.965
going to have to grapple with?

533
00:21:37.965 –> 00:21:40.485
What are the sort of key strategic considerations at the

534
00:21:40.485 –> 00:21:42.645
highest level when thinking about setting up an

535
00:21:42.645 –> 00:21:44.285
organization with this purpose?

536
00:21:44.825 –> 00:21:46.925
At the top level, I would say what are, you know,

537
00:21:46.925 –> 00:21:49.125
what are the things we want, uh, from this organization?

538
00:21:49.185 –> 00:21:51.565
We, we want, uh, well, liveness

539
00:21:51.565 –> 00:21:53.925
and safety is how one might put it, um, from a,

540
00:21:53.925 –> 00:21:55.325
from a formal methods point of view,

541
00:21:55.325 –> 00:21:56.485
but we, we want performance.

542
00:21:56.545 –> 00:21:58.165
Uh, and, and we also want security.

543
00:21:58.225 –> 00:22:01.245
So there are two, uh, performance objectives, maybe

544
00:22:01.725 –> 00:22:03.085
to break it down to one level more,

545
00:22:03.175 –> 00:22:05.285
which is the research quality.

546
00:22:05.465 –> 00:22:07.885
You know, we, we need, uh, really top people

547
00:22:07.945 –> 00:22:09.045
to, to be working on this.

548
00:22:09.045 –> 00:22:12.005
People who have a kind of interdisciplinary background,

549
00:22:12.005 –> 00:22:13.725
but also a lot of, uh, a lot of depth.

550
00:22:13.905 –> 00:22:17.125
We, so we need, we need, uh, uh, credible expertise, um,

551
00:22:17.315 –> 00:22:20.845
that can, that can really, uh, tackle these, uh, uh,

552
00:22:20.845 –> 00:22:23.325
these challenges in a way that will give confidence

553
00:22:23.325 –> 00:22:25.565
to users in safety critical contexts.

554
00:22:26.065 –> 00:22:27.845
And also just do it, do it well.

555
00:22:27.945 –> 00:22:29.325
The other aspect being speed.

556
00:22:29.465 –> 00:22:31.445
So we, we wanna get the, the, these,

557
00:22:31.585 –> 00:22:33.125
the research objectives, uh,

558
00:22:33.425 –> 00:22:35.085
um, achieved as soon as possible.

559
00:22:35.265 –> 00:22:39.245
So we, we also need people who have experience in research,

560
00:22:39.245 –> 00:22:42.085
project management and sort of, uh, organizing r

561
00:22:42.085 –> 00:22:44.365
and d teams that are doing things that have never been done

562
00:22:44.365 –> 00:22:47.245
before, but in a way that’s also not purely, uh, kind

563
00:22:47.245 –> 00:22:50.125
of curiosity driven, exploratory, basic research, but,

564
00:22:50.145 –> 00:22:52.325
but really more like fundamental development

565
00:22:52.425 –> 00:22:53.645
of, of new capabilities.

566
00:22:53.825 –> 00:22:55.925
And then the second part, that’s kind of the safety

567
00:22:55.985 –> 00:22:57.005
or security part.

568
00:22:57.185 –> 00:22:59.485
One side of that which we’ve already gone into is

569
00:22:59.485 –> 00:23:00.765
organizational governance.

570
00:23:00.795 –> 00:23:02.645
That when you’re making decisions

571
00:23:02.645 –> 00:23:05.885
that could have negative externalities on public safety,

572
00:23:05.975 –> 00:23:08.965
which could include really large experiments internally,

573
00:23:09.145 –> 00:23:12.565
but especially includes publication of algorithms or,

574
00:23:12.665 –> 00:23:14.245
or publication of model weights,

575
00:23:14.245 –> 00:23:16.645
or granting access to particular users

576
00:23:16.795 –> 00:23:18.565
with particular restrictions in place.

577
00:23:18.785 –> 00:23:20.445
The people who make those decisions need

578
00:23:20.445 –> 00:23:23.165
to reliably be considering their, their positive

579
00:23:23.165 –> 00:23:24.245
and negative externalities.

580
00:23:24.585 –> 00:23:26.365
And it’s a relatively small part of

581
00:23:26.365 –> 00:23:27.445
that to ask who they are.

582
00:23:27.505 –> 00:23:28.725
Yet we want it to be diverse.

583
00:23:28.745 –> 00:23:30.485
We wanna have representation from public sector

584
00:23:30.505 –> 00:23:32.925
and private sector and civil society and academia.

585
00:23:33.475 –> 00:23:35.645
There’s some criteria like that about who they are,

586
00:23:35.645 –> 00:23:38.805
but it’s mostly about, um, the structural features

587
00:23:38.805 –> 00:23:40.285
of their incentives and the process

588
00:23:40.355 –> 00:23:41.885
that they’ll be using to make decisions.

589
00:23:41.945 –> 00:23:44.725
And then the final one, uh, which is also related to

590
00:23:44.725 –> 00:23:47.005
that kind of safety question, is security,

591
00:23:47.005 –> 00:23:48.605
particularly cybersecurity,

592
00:23:48.705 –> 00:23:50.365
but also other dimensions of security.

593
00:23:50.365 –> 00:23:53.125
Because even if everyone in the organization is subject

594
00:23:53.185 –> 00:23:55.165
to a very sound, uh, uh,

595
00:23:55.185 –> 00:23:58.645
and, uh, uh, deliberative, um, decision making process,

596
00:23:58.785 –> 00:24:01.245
if someone else from outside the organization, from, uh,

597
00:24:01.245 –> 00:24:03.805
even potentially from a nation state adversary can

598
00:24:03.855 –> 00:24:05.285
exfiltrate the, the models

599
00:24:05.385 –> 00:24:08.005
or the research progress, uh, that’s still a, a risk.

600
00:24:08.225 –> 00:24:10.765
So we need to also ensure that everything is,

601
00:24:10.825 –> 00:24:12.605
is appropriately secured, um,

602
00:24:12.605 –> 00:24:14.445
given it its potential level of impact.

603
00:24:14.675 –> 00:24:16.565
Okay. So sort of, there’s, there’s a,

604
00:24:16.565 –> 00:24:18.365
there’s a governance, top level governance aspect,

605
00:24:18.365 –> 00:24:19.405
there’s a security aspect.

606
00:24:19.405 –> 00:24:22.365
Part of what I’m hearing is a significant aspect

607
00:24:22.365 –> 00:24:23.605
where there’s, there’s both kind

608
00:24:23.605 –> 00:24:26.405
of very deep conceptual creative research happening here.

609
00:24:26.625 –> 00:24:29.085
The exact framework or choice of how you formalize

610
00:24:29.105 –> 00:24:31.445
and how you prove and how you verify, um,

611
00:24:31.505 –> 00:24:32.845
how you construct these world models.

612
00:24:32.845 –> 00:24:35.405
There’s a lot that’s new scientifically in this.

613
00:24:35.715 –> 00:24:37.085
It’s not just straightforward engineering,

614
00:24:37.305 –> 00:24:40.085
but there’s also parts of it, like on the cybersecurity

615
00:24:40.145 –> 00:24:43.285
or on actually scaling up so that the, the AI models

616
00:24:43.285 –> 00:24:46.125
that you have to assist in all this research and, and,

617
00:24:46.225 –> 00:24:47.485
and be part of these systems.

618
00:24:47.865 –> 00:24:50.845
You need a lot of engineering scale compute

619
00:24:51.305 –> 00:24:54.045
and, you know, kind of industry grade expertise on

620
00:24:54.045 –> 00:24:56.725
cybersecurity and, uh, model scaling and,

621
00:24:57.025 –> 00:24:58.685
and compute infrastructure and all that.

622
00:24:58.955 –> 00:25:00.525
Exactly. Yeah. Great.

623
00:25:00.905 –> 00:25:02.645
How does that break down into some of the sort

624
00:25:02.645 –> 00:25:03.925
of, I guess, key dimensions?

625
00:25:04.145 –> 00:25:05.685
Um, the, the teams are gonna have

626
00:25:05.685 –> 00:25:07.325
to make specific design choices

627
00:25:07.745 –> 00:25:09.085
around a few different dimensions.

628
00:25:09.125 –> 00:25:10.725
I mean, maybe Nora you can say, what are the sort

629
00:25:10.725 –> 00:25:13.605
of the dimensions that you see discreet choices having

630
00:25:13.605 –> 00:25:16.485
to be made when someone proposes an organization? Yeah, so

631
00:25:16.485 –> 00:25:19.325
This, this basically comes back to what are the,

632
00:25:19.345 –> 00:25:20.725
the key questions we want to think,

633
00:25:20.785 –> 00:25:23.205
we want the founding team to think really carefully about.

634
00:25:23.545 –> 00:25:26.045
And we can go through them sort of, uh, one by one.

635
00:25:26.065 –> 00:25:28.565
But maybe just as a brief overview, first off,

636
00:25:28.695 –> 00:25:30.165
we’re like interested in thinking about

637
00:25:30.165 –> 00:25:32.085
what should the entity structure be

638
00:25:32.085 –> 00:25:33.325
of this, this new organization?

639
00:25:33.585 –> 00:25:36.165
What’s the economic model, both sort of in the short run,

640
00:25:36.165 –> 00:25:38.605
in the longer term, what are the governance mechanism?

641
00:25:38.605 –> 00:25:40.525
We’ve already touched upon that to some extent,

642
00:25:40.545 –> 00:25:41.965
and the sort of security

643
00:25:42.225 –> 00:25:43.765
and the sort of comprehensive security.

644
00:25:44.185 –> 00:25:46.685
How are you gonna recruit the sort of talent, uh, to,

645
00:25:46.745 –> 00:25:48.525
to actually pursue that mission successfully?

646
00:25:48.825 –> 00:25:50.685
And then how do you sort of manage that talent

647
00:25:50.745 –> 00:25:52.605
or sort of manage the organization throughout? So

648
00:25:52.605 –> 00:25:54.365
Let’s, let’s talk a little bit about this question, sort

649
00:25:54.365 –> 00:25:56.845
of the, the entity structure, the, the economic model.

650
00:25:56.965 –> 00:25:59.045
I mean, you mentioned Aria’s relation to this is sort

651
00:25:59.045 –> 00:26:01.285
of a non-dilutive mission oriented funding.

652
00:26:01.545 –> 00:26:04.005
Um, what about the org, the organization overall?

653
00:26:04.245 –> 00:26:06.645
I mean, what, what types of organizational kind

654
00:26:06.645 –> 00:26:08.685
of missions economic models are possible?

655
00:26:09.085 –> 00:26:12.725
I think we’re relatively agnostic about legal structures.

656
00:26:12.945 –> 00:26:17.045
The one kind of legal structure that, uh, we don’t think is,

657
00:26:17.265 –> 00:26:19.325
is easily compatible with, uh,

658
00:26:19.385 –> 00:26:22.005
the governance objective is pure, uh,

659
00:26:22.015 –> 00:26:25.325
for-profit in which the directors have a fiduciary duty to

660
00:26:25.925 –> 00:26:27.685
maximize, um, the shareholder returns.

661
00:26:27.865 –> 00:26:30.765
But there are versions like, uh, the, uh,

662
00:26:30.765 –> 00:26:33.605
public benefit corporation in the us which translates to,

663
00:26:33.705 –> 00:26:36.765
uh, in the uk the closest analog is a community interest

664
00:26:36.765 –> 00:26:39.765
corporation where they, where they don’t necessarily need

665
00:26:39.765 –> 00:26:41.405
to optimize the bottom line.

666
00:26:41.505 –> 00:26:43.045
And then through charters

667
00:26:43.125 –> 00:26:45.525
and bylaws, other types of considerations can,

668
00:26:45.545 –> 00:26:47.805
can become a core part of the decision making structure.

669
00:26:47.815 –> 00:26:50.885
There are also structures like trusts, um,

670
00:26:51.305 –> 00:26:52.685
or, uh, there are, uh,

671
00:26:53.065 –> 00:26:55.725
or structures like charities where there is a,

672
00:26:55.765 –> 00:26:59.685
a much more clear, um, differentiation between the

673
00:27:00.205 –> 00:27:02.245
intended beneficiaries of the structure

674
00:27:02.545 –> 00:27:04.365
and the, uh, contributors

675
00:27:04.365 –> 00:27:06.205
or the financial, um, stakeholders.

676
00:27:06.425 –> 00:27:09.405
And then there is also, uh, the structure of kind of doing,

677
00:27:09.625 –> 00:27:11.325
uh, a nested, you know, a nonprofit

678
00:27:11.465 –> 00:27:14.485
or a charity that, um, has a for-profit subsidiary

679
00:27:14.585 –> 00:27:17.845
and the for-profit, um, sort of raises funding with a,

680
00:27:17.845 –> 00:27:21.645
with a capped profit or, uh, with some other kind of, uh,

681
00:27:21.645 –> 00:27:23.685
disclosure that, that makes it clear that

682
00:27:23.685 –> 00:27:25.045
because of their control

683
00:27:25.145 –> 00:27:27.565
by an overarching nonprofit organization,

684
00:27:27.565 –> 00:27:30.685
they’re not optimizing primarily for, for the return, um,

685
00:27:30.705 –> 00:27:31.925
to the for-profit investors.

686
00:27:32.325 –> 00:27:34.485
I guess, uh, some more things to say along these lines,

687
00:27:34.665 –> 00:27:36.485
we are looking for, we’re looking

688
00:27:36.485 –> 00:27:40.325
to fund an entity which is, uh, legally and economically

689
00:27:40.385 –> 00:27:42.485
and physically located in the United Kingdom.

690
00:27:42.585 –> 00:27:45.805
That’s sort of part of the, the case, uh, to be, to be made

691
00:27:45.865 –> 00:27:48.725
for this, uh, this public funding to be non-dilutive.

692
00:27:48.725 –> 00:27:52.205
We are, um, uh, trying to catalyze the usage of

693
00:27:52.205 –> 00:27:55.285
what I think is actually a world class talent pool

694
00:27:55.285 –> 00:27:58.245
that we have in the UK for doing this kind of work into,

695
00:27:58.395 –> 00:28:00.725
into this, uh, this, this new kind of structure.

696
00:28:00.975 –> 00:28:03.965
We’re also looking to catalyze other, uh, funders.

697
00:28:04.025 –> 00:28:07.485
So we’re hoping that, uh, proposers will come to us, if not

698
00:28:07.515 –> 00:28:10.365
with other funders, backing them, that with a plan, uh,

699
00:28:10.385 –> 00:28:12.045
and a, a pitch that they could use,

700
00:28:12.345 –> 00:28:13.565
uh, over the course of the program.

701
00:28:14.045 –> 00:28:16.765
’cause by the end of the program, we do want this entity to,

702
00:28:16.865 –> 00:28:18.165
to have a, a life

703
00:28:18.165 –> 00:28:20.405
after the program, um, not just

704
00:28:20.405 –> 00:28:21.925
to be dependent on AGAs funding.

705
00:28:22.225 –> 00:28:25.285
Um, so, uh, by the end of the program, we would expect, uh,

706
00:28:25.505 –> 00:28:28.285
uh, external funding to be raised in some form, whether

707
00:28:28.315 –> 00:28:31.325
that be to a for-profit subsidiary or as donations

708
00:28:31.345 –> 00:28:34.565
or as additional public funding from other public sector

709
00:28:34.565 –> 00:28:36.485
sources or others in the program.

710
00:28:36.705 –> 00:28:39.885
We actually have, uh, technical area three, which is,

711
00:28:39.985 –> 00:28:41.525
we call it applications,

712
00:28:41.545 –> 00:28:44.765
but you could also think of it as customers where we are,

713
00:28:44.865 –> 00:28:47.365
uh, funding groups that could make use

714
00:28:47.505 –> 00:28:51.285
of the technical area two and one capabilities to go

715
00:28:51.305 –> 00:28:52.565
and start thinking about

716
00:28:52.865 –> 00:28:56.605
how they would formalize their real world problems in, in,

717
00:28:56.785 –> 00:28:59.885
in such a way that it, they could be amenable to, uh,

718
00:28:59.885 –> 00:29:02.125
being solved by TA two AI systems

719
00:29:02.475 –> 00:29:04.365
with a quantitative guarantee about

720
00:29:04.365 –> 00:29:05.525
both safety and performance.

721
00:29:05.705 –> 00:29:09.245
So part of what the TA two entity should consider is

722
00:29:09.315 –> 00:29:12.445
what kind of economic model do they want to apply?

723
00:29:12.455 –> 00:29:15.925
We’re not gonna be imposing any particular economic, uh,

724
00:29:16.115 –> 00:29:17.445
licensing arrangement,

725
00:29:17.785 –> 00:29:22.085
but there’s one way of looking at it, which is, um, ta uh,

726
00:29:22.185 –> 00:29:25.165
you know, A-A-T-A-T organization gets customers who come

727
00:29:25.165 –> 00:29:27.125
with a particular problem and they, uh, work together

728
00:29:27.125 –> 00:29:28.325
to co-develop a solution.

729
00:29:28.625 –> 00:29:31.285
And then that solution is kind of operated as a service,

730
00:29:31.425 –> 00:29:33.845
and there’s some service oriented economic model

731
00:29:33.845 –> 00:29:35.165
around each different domain.

732
00:29:35.225 –> 00:29:37.405
Or another way is of looking at it is that, um,

733
00:29:37.465 –> 00:29:39.725
the TA two AI is sort of a creator

734
00:29:39.785 –> 00:29:41.485
of intellectual property in its own right.

735
00:29:41.515 –> 00:29:43.045
It’s creating the world model

736
00:29:43.145 –> 00:29:44.725
and the solutions to particular tasks

737
00:29:44.725 –> 00:29:45.765
within that world model.

738
00:29:45.825 –> 00:29:48.125
And then those solutions can be licensed

739
00:29:48.265 –> 00:29:50.205
as intellectual property to, uh, uh,

740
00:29:50.205 –> 00:29:51.645
for-profits in different industries.

741
00:29:51.875 –> 00:29:53.965
Okay, great. So that’s, that’s sort of some of the space

742
00:29:53.965 –> 00:29:54.965
around the economic model.

743
00:29:55.035 –> 00:29:56.525
What about the governance mechanisms

744
00:29:56.525 –> 00:29:58.845
and the security considerations? Yeah,

745
00:29:58.905 –> 00:30:00.525
We already touched on some of that.

746
00:30:00.745 –> 00:30:04.565
Um, I think the way I think about this essentially is

747
00:30:04.565 –> 00:30:07.845
that we want to make sure that consequential decisions

748
00:30:07.875 –> 00:30:09.205
that ta that is sort of,

749
00:30:09.205 –> 00:30:11.325
that are taken within this organization, be

750
00:30:11.325 –> 00:30:12.925
that about development decisions

751
00:30:12.925 –> 00:30:14.885
or deployment decisions that they kind

752
00:30:14.885 –> 00:30:18.205
of appropriately account for both positive

753
00:30:18.265 –> 00:30:19.565
and negative externalities.

754
00:30:19.985 –> 00:30:22.165
The governance mechanism question is basically,

755
00:30:22.425 –> 00:30:24.645
how do we set up this organization such

756
00:30:24.645 –> 00:30:26.405
that we can do that reliably?

757
00:30:26.645 –> 00:30:30.125
I think ultimately this looks like really pushing sort

758
00:30:30.125 –> 00:30:33.005
of the state of the art of robust c corporate governance.

759
00:30:33.195 –> 00:30:36.485
Some, you know, some aspects of, of that might be

760
00:30:36.715 –> 00:30:39.165
what is the board structure, which include things like

761
00:30:39.305 –> 00:30:41.045
who is on the border, who, how do we decide

762
00:30:41.065 –> 00:30:43.965
who is on the board, but also like what is, uh, the powers

763
00:30:43.965 –> 00:30:45.165
that this board has, um,

764
00:30:45.165 –> 00:30:47.525
what decisions should they be involved in, et cetera.

765
00:30:47.675 –> 00:30:49.245
Another question here is like,

766
00:30:49.245 –> 00:30:50.485
what’s the compensation structure?

767
00:30:50.485 –> 00:30:53.565
Like, how do people working on, on this, uh,

768
00:30:53.865 –> 00:30:57.205
get compensated in a way that doesn’t mess with the, these,

769
00:30:57.205 –> 00:30:58.685
the decision making progress and,

770
00:30:58.685 –> 00:31:02.085
and it allows the organization to continue, um, taking the,

771
00:31:02.085 –> 00:31:04.165
these externalities into account appropriately.

772
00:31:04.785 –> 00:31:06.245
Um, and maybe sort of,

773
00:31:06.245 –> 00:31:07.565
and then there’s, there’s a bunch more.

774
00:31:07.585 –> 00:31:10.125
One could, one could bring up here auditing mechanisms,

775
00:31:10.495 –> 00:31:12.725
other sort of corporate corporate mechanisms.

776
00:31:12.825 –> 00:31:14.605
Um, um, and then,

777
00:31:14.625 –> 00:31:16.165
and then the second part, again,

778
00:31:16.185 –> 00:31:19.005
we have mentioned this briefly, uh, if successful,

779
00:31:19.145 –> 00:31:21.005
the capabilities that will be, um,

780
00:31:21.115 –> 00:31:23.325
developed here are significant.

781
00:31:23.665 –> 00:31:25.565
Um, they could be misused.

782
00:31:25.825 –> 00:31:27.645
So it’s important that the security

783
00:31:27.985 –> 00:31:30.285
of this organization is really up to standard,

784
00:31:30.635 –> 00:31:33.245
potentially even pushing sort of the frontier of

785
00:31:33.245 –> 00:31:35.045
what we sort of see here by default,

786
00:31:35.145 –> 00:31:37.485
that’s definitely in the cyber security we on,

787
00:31:37.485 –> 00:31:40.885
but also sort of more comprehensively, making sure that sort

788
00:31:40.885 –> 00:31:43.965
of sensitive model weights or solutions don’t leak

789
00:31:43.985 –> 00:31:48.485
and don’t get sort of irreversibly in enhanced that don’t,

790
00:31:48.485 –> 00:31:50.365
are not sort of subject to the governance mechanism

791
00:31:50.745 –> 00:31:52.085
of the organization itself.

792
00:31:52.185 –> 00:31:55.045
That’s great. So that’s definitely a very, uh, juicy set

793
00:31:55.045 –> 00:31:58.205
of challenges for an organization builder in terms

794
00:31:58.205 –> 00:32:00.685
of recruiting top scientific talent,

795
00:32:00.695 –> 00:32:02.565
especially to something like this.

796
00:32:02.745 –> 00:32:04.325
And, and managing the talent.

797
00:32:04.445 –> 00:32:06.005
I mean, what, what do you think are the key considerations

798
00:32:06.005 –> 00:32:08.565
there to sort of have leading lights and machine learning

799
00:32:08.565 –> 00:32:11.605
and other fields excited to be working there?

800
00:32:12.065 –> 00:32:14.725
In my opinion, what we need here are people

801
00:32:14.725 –> 00:32:16.245
that have two characteristics.

802
00:32:17.145 –> 00:32:20.325
One is the technical competence in machine learning,

803
00:32:20.585 –> 00:32:22.445
and to some extent in AI safety

804
00:32:23.025 –> 00:32:27.005
or some of the mathematical areas like, you know,

805
00:32:27.285 –> 00:32:29.965
verification and, and formal, uh, methods.

806
00:32:30.505 –> 00:32:34.245
And on the other hand, the second characteristic is, uh,

807
00:32:34.305 –> 00:32:37.005
the right motivations, uh, by the way that is connected

808
00:32:37.005 –> 00:32:38.205
to the security question, right?

809
00:32:38.225 –> 00:32:41.805
We wanna see people working on this who are not doing it

810
00:32:41.825 –> 00:32:43.525
for just, you know, just another job.

811
00:32:43.705 –> 00:32:45.245
But because they, they believe

812
00:32:45.945 –> 00:32:47.605
in the importance of this effort.

813
00:32:48.185 –> 00:32:52.405
Um, they, you wanna want to see AI be used beneficially

814
00:32:52.585 –> 00:32:54.005
for humanity and society,

815
00:32:54.425 –> 00:32:56.765
and are like really emotionally involved

816
00:32:57.185 –> 00:32:58.645
in making that happen.

817
00:32:59.105 –> 00:33:01.405
So, you know, that that’s, uh,

818
00:33:01.785 –> 00:33:03.645
that’s an intersection of two sets.

819
00:33:04.065 –> 00:33:06.245
So, you know, it’s a smaller set, but,

820
00:33:06.305 –> 00:33:07.685
but it, it is really important.

821
00:33:08.185 –> 00:33:11.605
The good news is there’s a shift right now in the machine

822
00:33:11.725 –> 00:33:12.965
learning community as more

823
00:33:12.965 –> 00:33:16.205
and more people are becoming aware of the potential

824
00:33:16.225 –> 00:33:19.485
for catastrophic risks with future advances in ai.

825
00:33:20.065 –> 00:33:22.725
So there’s more and more publications and more

826
00:33:22.725 –> 00:33:24.605
and more interest in, in those discussions.

827
00:33:24.825 –> 00:33:27.325
But what’s interesting here is with this program,

828
00:33:27.855 –> 00:33:30.845
we’re giving agency to people who feel concerned

829
00:33:30.995 –> 00:33:33.765
that they have expertise that they could put to good use.

830
00:33:34.305 –> 00:33:36.725
That’s great. So, so maybe, maybe we can use that

831
00:33:36.725 –> 00:33:39.325
as a segue to, to also talk about the, the kind of long,

832
00:33:39.325 –> 00:33:42.365
longer term, bigger picture opportunity around this,

833
00:33:42.515 –> 00:33:43.685
this type of organization.

834
00:33:43.835 –> 00:33:45.645
There’s an economic model.

835
00:33:45.765 –> 00:33:47.165
I mean, this is gonna, if this really works,

836
00:33:47.235 –> 00:33:50.525
potentially has use cases in a kind of vast industries

837
00:33:50.525 –> 00:33:54.765
and swaths of society, there is a kind of strategic, uh,

838
00:33:54.865 –> 00:33:57.565
civilizational component, a AI safety, you know,

839
00:33:57.915 –> 00:34:00.485
even if this organization has all sorts

840
00:34:00.485 –> 00:34:03.285
of technical success, but no one else adopts, um,

841
00:34:03.665 –> 00:34:05.325
the methods that it develops, that’s a problem.

842
00:34:05.345 –> 00:34:08.245
So it has a huge mission at some level to promulgate

843
00:34:08.465 –> 00:34:12.525
or somehow catalyze other people also using these methods

844
00:34:13.065 –> 00:34:14.445
around safety and security.

845
00:34:14.745 –> 00:34:17.405
And, you know, you mentioned it’s, it’s in the uk, uh,

846
00:34:17.405 –> 00:34:19.445
there’s also a huge sort of international component, right?

847
00:34:19.445 –> 00:34:21.245
So, so maybe we can go through those in turn.

848
00:34:21.705 –> 00:34:24.925
So, uh, some of the application areas

849
00:34:25.075 –> 00:34:28.205
that I think are amenable to solutions

850
00:34:28.315 –> 00:34:29.725
with quantitative guarantees

851
00:34:29.725 –> 00:34:32.325
that are mostly automated include, uh,

852
00:34:32.585 –> 00:34:36.645
one is optimizing like the energy usage of telecom networks,

853
00:34:36.645 –> 00:34:39.365
particularly the, the cellular radio, uh,

854
00:34:39.705 –> 00:34:44.165
access points which consume, uh, energy on, on the scale of,

855
00:34:44.225 –> 00:34:45.325
uh, uh, you know, uh,

856
00:34:45.485 –> 00:34:47.685
a billion pounds a year in the UK alone, right now,

857
00:34:47.685 –> 00:34:50.845
it’s common practice to reduce the capacity,

858
00:34:50.845 –> 00:34:54.085
basically reduce the number of radios that are online

859
00:34:54.145 –> 00:34:56.525
during a period, overnight when, uh,

860
00:34:56.525 –> 00:34:58.245
there’s usually less demand.

861
00:34:58.385 –> 00:35:01.045
In order to save the energy of, of running those elements,

862
00:35:01.065 –> 00:35:02.285
it takes 10 or 15 minutes

863
00:35:02.345 –> 00:35:03.925
to switch them on and off completely.

864
00:35:04.065 –> 00:35:07.325
So it’s not trivial to just sort of on demand, you know,

865
00:35:07.365 –> 00:35:09.525
turn on exactly as many elements as you need.

866
00:35:09.545 –> 00:35:12.845
But right now, um, it’s a very coarse kind of adaptation.

867
00:35:12.945 –> 00:35:14.685
And I think with, uh, AI control,

868
00:35:14.685 –> 00:35:17.365
we could get a much more fine-grained, more efficient, uh,

869
00:35:17.365 –> 00:35:19.045
energy reduction for telecoms,

870
00:35:19.045 –> 00:35:21.325
but the operators need to be very confident

871
00:35:21.325 –> 00:35:24.285
that any such system would have a very low probability of,

872
00:35:24.345 –> 00:35:27.925
of ending up, uh, not having enough capacity to meet demand

873
00:35:27.925 –> 00:35:29.205
because it had made the decision

874
00:35:29.205 –> 00:35:30.525
to switch, switch things off.

875
00:35:30.785 –> 00:35:32.085
Um, so there’s, there is a need

876
00:35:32.085 –> 00:35:34.445
for quantitative guarantees in order to, to use ai.

877
00:35:34.445 –> 00:35:37.085
There another application which is somewhat similar, uh,

878
00:35:37.145 –> 00:35:39.765
and has a similar scale of impact is, um,

879
00:35:39.875 –> 00:35:41.045
balancing the supply

880
00:35:41.105 –> 00:35:44.725
and demand of, uh, of power on the electrical grid.

881
00:35:45.025 –> 00:35:48.165
Uh, particularly as our supply of power becomes more

882
00:35:48.165 –> 00:35:51.565
of a renewable heavy mix, it’s less able to respond

883
00:35:51.585 –> 00:35:54.365
to demand on demand, as it were, in a,

884
00:35:54.365 –> 00:35:55.565
in a really reliable way,

885
00:35:55.565 –> 00:35:57.205
because it depends on the weather conditions.

886
00:35:57.225 –> 00:35:59.565
You know, the capacity is not just controlled

887
00:35:59.585 –> 00:36:02.045
by someone in a control room adjusting the, the amount

888
00:36:02.045 –> 00:36:03.165
of gas going through a turbine.

889
00:36:03.425 –> 00:36:05.725
So at the same time, uh, we’re starting

890
00:36:05.725 –> 00:36:08.685
to get more energy storage resources on the grid, uh,

891
00:36:08.685 –> 00:36:10.165
which can help with this problem of kind

892
00:36:10.165 –> 00:36:12.965
of matching the weather determined supply

893
00:36:13.185 –> 00:36:15.045
to the consumer determined demand.

894
00:36:15.265 –> 00:36:17.885
But then that actually just introduces even more decisions

895
00:36:17.885 –> 00:36:19.085
that that need to be made.

896
00:36:19.195 –> 00:36:20.965
It’s not just about, uh, when

897
00:36:20.985 –> 00:36:23.445
to release energy from the storage resources,

898
00:36:23.445 –> 00:36:24.885
but also when to charge them.

899
00:36:24.905 –> 00:36:27.005
So when to offtake energy from the grid

900
00:36:27.005 –> 00:36:28.885
and just put it into one of these storage devices.

901
00:36:29.065 –> 00:36:30.605
So it becomes a dynamic programming problem

902
00:36:30.605 –> 00:36:33.685
where one really needs to look ahead to future time periods

903
00:36:33.685 –> 00:36:35.165
and what the predicted weather will be

904
00:36:35.165 –> 00:36:36.685
and what the predicted demand will be.

905
00:36:37.065 –> 00:36:40.405
Um, and the uncertainty here is probabilistic,

906
00:36:40.405 –> 00:36:42.925
but it’s also not kind of, not merely probabilistic.

907
00:36:42.925 –> 00:36:45.405
We don’t wanna make such strong assumptions as

908
00:36:45.555 –> 00:36:48.765
that we have the, the exact probabilities, um, nailed down

909
00:36:48.765 –> 00:36:50.965
of what future demand and weather will look like.

910
00:36:51.345 –> 00:36:53.525
Uh, so that’s a, I think a, a really good fit.

911
00:36:53.525 –> 00:36:55.085
And of course, it’s also safety critical.

912
00:36:55.085 –> 00:36:56.885
It’s a, it’s a, it’s a huge, huge problem.

913
00:36:57.065 –> 00:36:59.125
If at any point, the total, uh,

914
00:36:59.125 –> 00:37:00.925
supply is less than the total demand

915
00:37:00.985 –> 00:37:02.645
by more than a small margin,

916
00:37:02.745 –> 00:37:05.165
and that margin for error is also actually getting smaller

917
00:37:05.625 –> 00:37:08.485
as less of the grid supply becomes coupled

918
00:37:08.485 –> 00:37:11.965
to like the physical rotational inertia of the gas turbines,

919
00:37:11.965 –> 00:37:13.725
which actually can take up a little bit of slack.

920
00:37:13.985 –> 00:37:16.165
Um, so as we get more renewable heavy, we need

921
00:37:16.165 –> 00:37:18.125
to be more precise in the way that we balance the grid.

922
00:37:18.685 –> 00:37:20.845
A completely different type of application I’m excited about

923
00:37:20.985 –> 00:37:24.005
is, um, automating the manufacturing of biopharmaceuticals.

924
00:37:24.185 –> 00:37:26.965
Um, in particular, uh, monoclonal antibodies

925
00:37:27.105 –> 00:37:29.565
as a monoclonal antibodies are really versatile

926
00:37:29.785 –> 00:37:33.045
and powerful class of drug with applications from malaria

927
00:37:33.345 –> 00:37:36.565
to cancer treatments, to, uh, respiratory viruses,

928
00:37:36.585 –> 00:37:39.045
to autoimmune diseases and, and so many more.

929
00:37:39.065 –> 00:37:41.845
And basically the general, uh, sort of general toolkit

930
00:37:42.105 –> 00:37:45.045
for enlisting the human, uh, native immune system in,

931
00:37:45.065 –> 00:37:47.485
in fighting, uh, disease of, of almost any kind.

932
00:37:47.545 –> 00:37:50.125
And the main reason they don’t see more clinical use is

933
00:37:50.125 –> 00:37:51.725
that they’re very expensive to produce,

934
00:37:51.745 –> 00:37:53.525
and particularly the more customized ones

935
00:37:53.525 –> 00:37:54.605
are very expensive to produce.

936
00:37:54.625 –> 00:37:56.885
And that’s because the manufacturing process for,

937
00:37:56.905 –> 00:37:59.765
for a monoclonal antibody includes lots of steps

938
00:37:59.915 –> 00:38:03.245
with bioreactors that need to be carefully monitored and,

939
00:38:03.305 –> 00:38:05.645
and controlled, um, in real time

940
00:38:05.825 –> 00:38:09.245
by highly skilled biologists in order to ensure, um,

941
00:38:09.255 –> 00:38:10.325
consistent quality

942
00:38:10.625 –> 00:38:13.765
and safety of the, of the products, uh, that come out.

943
00:38:13.865 –> 00:38:17.085
So if we could automate that process of biomanufacturing

944
00:38:17.105 –> 00:38:18.605
for monoclonal antibodies, uh,

945
00:38:18.605 –> 00:38:20.925
and even beyond that, automate the process of

946
00:38:21.965 –> 00:38:23.245
constructing the manufacturing process

947
00:38:23.425 –> 00:38:26.165
for new customized monoclonal antibodies, um,

948
00:38:26.395 –> 00:38:29.285
that could be a pretty big deal, um, for pharmaceuticals.

949
00:38:29.395 –> 00:38:31.165
Another health application, um,

950
00:38:31.305 –> 00:38:33.525
is optimizing the design of clinical trials.

951
00:38:33.745 –> 00:38:36.525
So often the, the statistical approaches that one uses

952
00:38:36.525 –> 00:38:40.005
to get enough statistical power to demonstrate a, a target

953
00:38:40.105 –> 00:38:43.005
for safety or efficacy are around a factor of two

954
00:38:43.005 –> 00:38:46.565
or three more than work than would ideally have been needed

955
00:38:46.585 –> 00:38:48.125
to get the same level of evidence.

956
00:38:48.185 –> 00:38:50.285
Of course, we can’t trust today’s language models

957
00:38:50.345 –> 00:38:53.725
to not just hallucinate this clinical trial design is,

958
00:38:53.745 –> 00:38:56.205
is well founded, and in fact even give a a,

959
00:38:56.445 –> 00:38:57.605
a quite compelling

960
00:38:57.605 –> 00:38:59.845
and convincing explanation of why it’s well founded.

961
00:39:00.225 –> 00:39:02.045
Um, and that turns out not to be true.

962
00:39:02.265 –> 00:39:04.085
And of course, the the people, um,

963
00:39:04.105 –> 00:39:05.805
people building the organization,

964
00:39:05.805 –> 00:39:07.245
they may come up with completely other ones.

965
00:39:07.245 –> 00:39:08.925
But it sounds like these are ones where, you know,

966
00:39:08.925 –> 00:39:10.205
making the world model is sort

967
00:39:10.205 –> 00:39:11.645
of pushing the boundaries, right?

968
00:39:11.795 –> 00:39:14.525
It’s not just a complete simulated world.

969
00:39:14.605 –> 00:39:15.965
I mean, the power grid depends on the

970
00:39:15.965 –> 00:39:17.085
weather or something like that.

971
00:39:17.305 –> 00:39:19.525
Or the clinical trial case depends on, you know,

972
00:39:19.525 –> 00:39:20.525
patient recruiting

973
00:39:20.525 –> 00:39:22.845
or certain aspects of human behavior or human error.

974
00:39:23.025 –> 00:39:26.045
So we need to incorporate, uh, kinds of uncertainty that,

975
00:39:26.045 –> 00:39:28.725
that are not seen in a purely cyber application,

976
00:39:28.985 –> 00:39:32.165
But it’s not like you’re trying to model the entire all

977
00:39:32.165 –> 00:39:33.485
of society or something like that.

978
00:39:33.485 –> 00:39:34.765
There’s sort of discreet applications.

979
00:39:34.765 –> 00:39:36.525
People will pay for the quantitative guarantees,

980
00:39:37.055 –> 00:39:38.055
Right? And it’s, and

981
00:39:38.055 –> 00:39:39.965
it’s important to clarify that when we say

982
00:39:39.965 –> 00:39:42.445
that we have a world model that’s adequate for,

983
00:39:42.545 –> 00:39:45.485
for safeguarding an application that doesn’t imply

984
00:39:45.485 –> 00:39:47.885
that this world model can make, uh,

985
00:39:48.155 –> 00:39:50.485
precise predictions about anything that might happen.

986
00:39:50.595 –> 00:39:52.525
When you give that query to the world model, it’ll,

987
00:39:52.525 –> 00:39:55.365
it’ll give a very large interval, uh, a very large range

988
00:39:55.365 –> 00:39:57.245
of possibilities and not even specify

989
00:39:57.245 –> 00:39:58.805
probabilities within that range.

990
00:39:58.835 –> 00:40:01.965
This sort of agnosticism, um, can be part of the, part

991
00:40:01.965 –> 00:40:03.045
of the, um, model.

992
00:40:03.345 –> 00:40:05.805
It just needs to be confident enough about enough things

993
00:40:06.435 –> 00:40:09.685
that one of the consequences that could be deduced is

994
00:40:09.685 –> 00:40:13.565
that if you deploy this particular narrow domain AI system,

995
00:40:13.775 –> 00:40:16.045
which could internally also be doing a lot

996
00:40:16.045 –> 00:40:17.405
of probabilistic reasoning and,

997
00:40:17.405 –> 00:40:20.165
and runtime verification, that system is gonna be able

998
00:40:20.165 –> 00:40:22.445
to cope with whatever does arise in a way

999
00:40:22.445 –> 00:40:24.725
that avoids a unacceptable risk. Well, let’s

1000
00:40:24.725 –> 00:40:25.925
Just talk about that for a second also,

1001
00:40:25.925 –> 00:40:27.525
because this is the first time you’ve mentioned this idea

1002
00:40:27.525 –> 00:40:28.845
of runtime verification also.

1003
00:40:29.005 –> 00:40:30.725
’cause Yeah, let, let’s talk about runtime verification.

1004
00:40:30.725 –> 00:40:33.005
So, so, so suppose that this thing is, it’s control,

1005
00:40:33.005 –> 00:40:35.525
it’s trying to control the power grid at this top level,

1006
00:40:35.525 –> 00:40:38.005
you’ve been able to put quantitative guarantees on it.

1007
00:40:38.005 –> 00:40:39.885
You know, let’s, let’s say it, it puts up,

1008
00:40:39.885 –> 00:40:42.245
puts out something and, and the, the system flags.

1009
00:40:42.515 –> 00:40:43.845
This is not, this is not

1010
00:40:43.845 –> 00:40:45.645
how we wanna control the power grid.

1011
00:40:45.645 –> 00:40:47.525
This is unacceptably risky. Mm-hmm.

1012
00:40:47.605 –> 00:40:48.765
What, what, what does it do?

1013
00:40:48.875 –> 00:40:51.685
Yeah, so the, the baseline, uh, idea that I have here,

1014
00:40:51.775 –> 00:40:53.165
which is again subject

1015
00:40:53.305 –> 00:40:56.285
to potential innovation is called the black box

1016
00:40:56.285 –> 00:40:57.325
simplex architecture.

1017
00:40:57.325 –> 00:40:59.685
So it’s also not my idea, um, to the literature

1018
00:40:59.825 –> 00:41:03.565
and the architecture is you have a, uh, advanced controller,

1019
00:41:03.655 –> 00:41:06.285
which is not actually formally verified.

1020
00:41:06.425 –> 00:41:08.685
Uh, it could be just a black box neural network,

1021
00:41:08.695 –> 00:41:11.325
which hasn’t even really been inspected or,

1022
00:41:11.325 –> 00:41:13.005
or mechanistically interpreted in any way.

1023
00:41:13.195 –> 00:41:16.245
Then you have a backup controller, which is verified,

1024
00:41:16.425 –> 00:41:19.765
and it, and, uh, it’s the property that is, is verified

1025
00:41:19.765 –> 00:41:23.965
to satisfy is that it can, from a particular region

1026
00:41:23.985 –> 00:41:26.365
of state space, which is called the recoverable region,

1027
00:41:26.665 –> 00:41:30.485
the backup controller can recover, uh, to a stable part

1028
00:41:30.485 –> 00:41:32.685
of state space where the safety property remains true.

1029
00:41:33.065 –> 00:41:34.845
If you’re within this recoverable space,

1030
00:41:34.905 –> 00:41:37.365
the backup controller can keep you in the safe zone

1031
00:41:37.475 –> 00:41:39.485
with no guarantee about how much it will cost

1032
00:41:39.585 –> 00:41:42.085
or what other objectives are achieved or performance.

1033
00:41:42.505 –> 00:41:45.085
So the idea is then the backup controller has a simple

1034
00:41:45.085 –> 00:41:47.165
enough task that it can be a simple enough system

1035
00:41:47.435 –> 00:41:49.645
that it can be formally verified to satisfy

1036
00:41:49.715 –> 00:41:51.085
that criteria. You’re

1037
00:41:51.085 –> 00:41:53.205
Trying to save costs, you’re trying to get more bang

1038
00:41:53.205 –> 00:41:55.005
for your buck out of, out of the power grid,

1039
00:41:55.005 –> 00:41:57.725
but then it might say, okay, wait, hold on, we just need

1040
00:41:57.725 –> 00:41:59.245
to go back to something very simple.

1041
00:41:59.305 –> 00:42:01.365
But that might be expensive in some other way,

1042
00:42:01.365 –> 00:42:02.725
but it wouldn’t, it wouldn’t be catastrophic.

1043
00:42:02.795 –> 00:42:04.605
Yeah. So if at runtime,

1044
00:42:04.985 –> 00:42:08.605
the advanced controller proposes an action which the runtime

1045
00:42:09.005 –> 00:42:13.245
verifier cannot prove, maintains the recoverable

1046
00:42:13.885 –> 00:42:16.605
property, so keeps you in the recoverable set for the,

1047
00:42:16.625 –> 00:42:17.965
for a certain time horizon,

1048
00:42:18.235 –> 00:42:20.325
then there is a system called a gatekeeper,

1049
00:42:20.415 –> 00:42:22.605
which could decide to switch and,

1050
00:42:22.625 –> 00:42:24.725
and basically engage the backup controller

1051
00:42:24.725 –> 00:42:26.525
instead under those conditions. Great.

1052
00:42:26.525 –> 00:42:28.165
So let’s, let’s talk a little bit more broadly.

1053
00:42:28.465 –> 00:42:30.925
So those are some specific applications, maybe ways of,

1054
00:42:31.025 –> 00:42:34.085
of bootstrapping this to be used in more, more sectors,

1055
00:42:34.085 –> 00:42:36.605
more, more more real world usage, relevant

1056
00:42:36.605 –> 00:42:38.205
for the economic model for the organization.

1057
00:42:38.645 –> 00:42:40.925
Probably. Let’s talk a bit more broadly about the sort

1058
00:42:40.925 –> 00:42:42.885
of safety civilizational level.

1059
00:42:43.465 –> 00:42:46.125
What’s going to happen with ai? What is the role of this?

1060
00:42:47.005 –> 00:42:49.485
I mean, one, one thing I’m, I’m excited to add,

1061
00:42:49.485 –> 00:42:51.525
David told some story here about, hey,

1062
00:42:51.525 –> 00:42:55.045
there’s actually a bunch of like e economic sort of benefit

1063
00:42:55.195 –> 00:42:56.365
that that is on the table

1064
00:42:56.435 –> 00:42:59.325
that we could get while maintaining, um, you know,

1065
00:42:59.325 –> 00:43:01.085
certain bounds on, on the risks we are

1066
00:43:01.205 –> 00:43:02.325
actually societally willing to accept.

1067
00:43:02.525 –> 00:43:05.285
I think one important aspect of that is

1068
00:43:05.285 –> 00:43:09.085
that we can also use these same capabilities on problems on

1069
00:43:09.085 –> 00:43:12.405
tasks that actually then really are able to boost sort

1070
00:43:12.405 –> 00:43:13.805
of civilizational resilience

1071
00:43:14.065 –> 00:43:17.005
and make us sort of across the board more resilient

1072
00:43:17.105 –> 00:43:18.805
to whether that’s like, you know,

1073
00:43:19.025 –> 00:43:20.285
misuse from various actors,

1074
00:43:20.315 –> 00:43:22.325
including eventually potentially sort

1075
00:43:22.325 –> 00:43:24.845
of misuse from rogue AI actors, which is sort

1076
00:43:24.845 –> 00:43:28.005
of another way in which it addresses like, parts of sort

1077
00:43:28.005 –> 00:43:29.325
of AI safety concerns.

1078
00:43:29.425 –> 00:43:31.485
You know, there’s a bunch of different examples here,

1079
00:43:31.585 –> 00:43:34.125
but like maybe sort of building up like one,

1080
00:43:34.265 –> 00:43:36.925
one application here could definitely be in the, the domain

1081
00:43:36.925 –> 00:43:38.325
of cyber risk cyber attacks.

1082
00:43:38.465 –> 00:43:43.085
So having sort of re-implementing, um, legacy code to be

1083
00:43:43.685 –> 00:43:46.805
formally verified to be be bug free, uh, you could sort

1084
00:43:46.805 –> 00:43:49.685
of imagine that to like really reduce the attack vector

1085
00:43:50.185 –> 00:43:52.885
of cyber attacks, sort of very, very err radically.

1086
00:43:53.025 –> 00:43:54.285
Um, and that would sort

1087
00:43:54.285 –> 00:43:56.845
of very meaningfully change even just like the game

1088
00:43:56.845 –> 00:43:58.445
theoretic setup, um, here.

1089
00:43:58.865 –> 00:44:01.845
And then, you know, you could potentially in sort of extend

1090
00:44:01.845 –> 00:44:04.725
that to other problems, um, formally verified hardware

1091
00:44:04.725 –> 00:44:07.005
that we like, have really sort of good understanding

1092
00:44:07.025 –> 00:44:09.845
and like, um, firm sort of verification about

1093
00:44:09.875 –> 00:44:13.845
what hardware do pieces of hardware do and don’t do on chips

1094
00:44:13.845 –> 00:44:17.085
or in data centers, or sort of maybe even like bench tops,

1095
00:44:17.085 –> 00:44:19.405
sort of DNA synthesizer, like we don’t want them

1096
00:44:19.405 –> 00:44:20.725
to get hacked, et cetera.

1097
00:44:21.145 –> 00:44:23.805
So can I, can I just ask a few few clarifying questions

1098
00:44:23.805 –> 00:44:25.765
and one sort of broader question, but so, so, so, so why

1099
00:44:26.195 –> 00:44:27.725
with the cybersecurity application?

1100
00:44:27.765 –> 00:44:29.565
I mean, why, why isn’t that just sort of the bread

1101
00:44:29.565 –> 00:44:32.485
and butter obvious, you know, big application

1102
00:44:32.485 –> 00:44:33.565
to go after right away?

1103
00:44:33.585 –> 00:44:37.045
Is, is, is that, is the idea here that yes, that’s possible,

1104
00:44:37.045 –> 00:44:39.845
but this organization, you know, wouldn’t just focus on that

1105
00:44:39.845 –> 00:44:42.445
because it’s trying to get into these other, more,

1106
00:44:42.555 –> 00:44:44.845
more non-software world models?

1107
00:44:45.185 –> 00:44:47.245
Um, or, or, you know, would, would it make sense

1108
00:44:47.245 –> 00:44:49.445
for this organization to just go,

1109
00:44:49.545 –> 00:44:52.365
go all in in some sense on securing, uh,

1110
00:44:53.205 –> 00:44:56.805
Remember we are after difficult fundamental research

1111
00:44:57.205 –> 00:45:00.605
challenges, and the issue is if you start an organization

1112
00:45:01.045 –> 00:45:02.965
focusing on the application X mm-hmm.

1113
00:45:03.235 –> 00:45:05.405
Yeah, it’s difficult to do both at the same time, right?

1114
00:45:05.465 –> 00:45:08.645
So we, we want the energy in TA two

1115
00:45:08.985 –> 00:45:11.925
to be on these hard methodological questions

1116
00:45:12.225 –> 00:45:13.485
And that, and that will take you, even,

1117
00:45:13.485 –> 00:45:16.205
even if some organization would use some of these methods

1118
00:45:16.345 –> 00:45:19.205
to massively boost cybersecurity, which is great.

1119
00:45:19.505 –> 00:45:21.405
You wanna be say, okay, well what’s beyond that?

1120
00:45:21.405 –> 00:45:22.765
What about if we have humans in the loop?

1121
00:45:22.765 –> 00:45:24.165
What if we have biology in the loop?

1122
00:45:24.475 –> 00:45:25.965
Yeah. And I, I think it’s, it’s,

1123
00:45:25.985 –> 00:45:27.885
it would be relatively straightforward.

1124
00:45:28.225 –> 00:45:30.565
Uh, and, and indeed there are groups, uh,

1125
00:45:30.715 –> 00:45:33.085
that are already working on pipelines for

1126
00:45:33.655 –> 00:45:35.845
using foundation models to, uh,

1127
00:45:35.945 –> 00:45:37.725
to create formally verified software.

1128
00:45:37.825 –> 00:45:40.125
And then the risk would become, if, if

1129
00:45:40.125 –> 00:45:41.125
that’s a demonstration

1130
00:45:41.125 –> 00:45:42.725
or if that’s a key proof point for

1131
00:45:43.605 –> 00:45:46.805
safeguarded AI more broadly, uh, the criticism would be,

1132
00:45:46.835 –> 00:45:49.885
well, that’s all, that’s all fine for, for tech, you know,

1133
00:45:49.905 –> 00:45:52.485
for software, but we wanna use AI

1134
00:45:52.485 –> 00:45:54.245
to solve the really hard problems in,

1135
00:45:54.265 –> 00:45:55.525
in climate and in health.

1136
00:45:55.745 –> 00:45:59.045
So I’m, I’m sort of trying to, uh, preempt that criticism

1137
00:45:59.185 –> 00:46:01.845
by actually prioritizing problems that have relevance

1138
00:46:01.845 –> 00:46:03.885
for climate and health from the outset. Uh, let’s

1139
00:46:04.125 –> 00:46:05.285
Just talk a little bit more about the kind

1140
00:46:05.285 –> 00:46:06.325
of international framework.

1141
00:46:06.325 –> 00:46:08.925
So the organization might also collaborate internationally,

1142
00:46:09.145 –> 00:46:10.405
um, in some meaningful way.

1143
00:46:10.425 –> 00:46:11.885
And, and there’s also, I mean, there’s sort

1144
00:46:11.885 –> 00:46:12.725
of this looming question,

1145
00:46:12.725 –> 00:46:13.925
maybe it’s a little bit depressing.

1146
00:46:13.925 –> 00:46:16.245
If you think, well, this great organization is gonna build

1147
00:46:16.245 –> 00:46:18.965
all this capability for, for, uh, gatekeeping, uh,

1148
00:46:19.365 –> 00:46:21.005
safeguarding these powerful AI systems,

1149
00:46:21.225 –> 00:46:23.525
but then somebody else just doesn’t use it.

1150
00:46:23.905 –> 00:46:26.285
What’s the kind of landscape of how this would spread,

1151
00:46:26.305 –> 00:46:28.565
how this would collaborate across different institutions

1152
00:46:28.945 –> 00:46:29.945
As contexts?

1153
00:46:30.225 –> 00:46:34.925
We are currently in a competitive race towards more

1154
00:46:34.925 –> 00:46:36.885
and more advanced ai, which could become dangerous.

1155
00:46:37.145 –> 00:46:40.765
And, and the competition is something that, you know,

1156
00:46:40.765 –> 00:46:45.125
drives innovation, but also has a tendency of, uh,

1157
00:46:45.525 –> 00:46:48.005
ignoring ex risk externalities, you know,

1158
00:46:48.015 –> 00:46:51.245
makes people maybe biased towards optimism

1159
00:46:51.505 –> 00:46:52.685
and willing to take risks

1160
00:46:52.955 –> 00:46:54.445
that they would otherwise not take.

1161
00:46:54.705 –> 00:46:58.485
So if we wanna really go through, you know, a GI,

1162
00:46:58.625 –> 00:47:02.365
as you know, as societies, democratic societies, we need

1163
00:47:02.365 –> 00:47:06.365
to find ways to coordinate, uh, to minimize these effects.

1164
00:47:06.875 –> 00:47:10.165
That means more collaboration between

1165
00:47:10.705 –> 00:47:14.325
the organizations involved in, in, in all this, including

1166
00:47:14.475 –> 00:47:17.045
between the countries involved in all this,

1167
00:47:17.135 –> 00:47:19.285
especially if those countries share values.

1168
00:47:19.705 –> 00:47:23.845
The idea, you know, the hope is that with this program

1169
00:47:24.065 –> 00:47:27.085
and T two, it’s not just a particular research project.

1170
00:47:27.715 –> 00:47:30.165
It’s, uh, you know, know one of the pieces

1171
00:47:30.705 –> 00:47:34.205
of future international collaboration on AI safety,

1172
00:47:34.855 –> 00:47:37.405
which hopefully will involve many other players.

1173
00:47:37.625 –> 00:47:39.405
We, we don’t know which path is gonna work.

1174
00:47:39.465 –> 00:47:41.565
So we need many groups in the world.

1175
00:47:41.925 –> 00:47:43.605
I mean, humanity needs many groups in the world

1176
00:47:43.905 –> 00:47:45.245
to explore these kinds of questions,

1177
00:47:45.785 –> 00:47:49.925
and we need those groups to collaborate more than compete

1178
00:47:50.265 –> 00:47:51.285
as much as possible.

1179
00:47:51.745 –> 00:47:56.485
So, so yes, we, we absolutely, uh, you know, wanna give

1180
00:47:56.595 –> 00:48:00.045
that flexibility of collaboration globally to

1181
00:48:00.585 –> 00:48:02.885
the organization that’s gonna be funded, of course,

1182
00:48:03.315 –> 00:48:06.565
with the usual governance, uh, goals

1183
00:48:06.665 –> 00:48:09.885
and making sure that the collaborations continue to align

1184
00:48:10.315 –> 00:48:12.245
with, uh, the missions we, we discussed.

1185
00:48:12.465 –> 00:48:14.485
And that means partnerships, uh,

1186
00:48:14.795 –> 00:48:18.285
that can involve sharing information, uh, in, in a way

1187
00:48:18.285 –> 00:48:19.765
that is aligned with all the security

1188
00:48:20.105 –> 00:48:24.525
and goals of, of working to create a better future,

1189
00:48:25.265 –> 00:48:28.045
uh, with international counterparts, uh,

1190
00:48:28.055 –> 00:48:29.885
maybe even international organizations

1191
00:48:29.885 –> 00:48:31.045
that will be created in the future.

1192
00:48:31.385 –> 00:48:33.125
So one way to think about this, uh,

1193
00:48:33.145 –> 00:48:35.565
is something like the International Space stations,

1194
00:48:35.565 –> 00:48:38.685
where you have a number of national organizations

1195
00:48:39.205 –> 00:48:42.165
collaborating on a very, very difficult scientific challenge

1196
00:48:42.865 –> 00:48:46.485
and sharing to some extent in a way that we coordinate

1197
00:48:47.185 –> 00:48:49.565
and we advance more efficiently than, you know,

1198
00:48:49.585 –> 00:48:52.005
if each country was doing their own thing and,

1199
00:48:52.005 –> 00:48:54.765
and then competing in a, in a way that could be dangerous.

1200
00:48:54.825 –> 00:48:57.565
For example, militarily, these analogies carry

1201
00:48:57.625 –> 00:48:59.205
to advanced ai.

1202
00:48:59.665 –> 00:49:01.805
We, we have all of that in the back of our minds.

1203
00:49:02.225 –> 00:49:04.565
It also seems like, uh, sort of at a technical level,

1204
00:49:04.715 –> 00:49:07.245
this idea of being able to have formal verification

1205
00:49:07.245 –> 00:49:09.285
or quantitative guarantees, some

1206
00:49:09.285 –> 00:49:11.485
of the other techniques you talked about could sort

1207
00:49:11.485 –> 00:49:14.245
of facilitate some of these cooperation aspects, right?

1208
00:49:14.245 –> 00:49:15.965
It’s, if, if I just say, Hey, you have to believe me,

1209
00:49:16.025 –> 00:49:19.125
my model, it’s aligned, uh, with someone

1210
00:49:19.225 –> 00:49:21.805
or something that’s, that’s not so, so, so believable.

1211
00:49:21.825 –> 00:49:24.965
But if I say, Hey, you know, here is so this zero knowledge,

1212
00:49:24.965 –> 00:49:26.605
you know, proof that it’s doing the right thing

1213
00:49:26.605 –> 00:49:27.965
or something like that, your,

1214
00:49:27.965 –> 00:49:29.845
your AI can test whether my AI

1215
00:49:29.845 –> 00:49:30.965
is doing the right thing. The

1216
00:49:30.965 –> 00:49:33.245
Point about zero knowledge is, um, is really important,

1217
00:49:33.505 –> 00:49:36.045
um, because in a lot of, uh,

1218
00:49:36.045 –> 00:49:38.925
international assurance contexts today and,

1219
00:49:39.065 –> 00:49:40.685
and over the course of the 20th century,

1220
00:49:40.865 –> 00:49:43.245
the primary tension has been between,

1221
00:49:43.505 –> 00:49:44.805
um, confidence building.

1222
00:49:45.265 –> 00:49:48.085
Uh, and on the other hand, uh, the facilitation

1223
00:49:48.265 –> 00:49:49.885
of espionage, uh,

1224
00:49:49.945 –> 00:49:52.805
by leaking additional information about capabilities

1225
00:49:52.825 –> 00:49:55.965
and context as part of that confidence building in, in,

1226
00:49:55.965 –> 00:49:58.405
in a pretty, you know, uh, inextricable way.

1227
00:49:58.705 –> 00:50:01.925
And many of the techniques, um, have involved, basically

1228
00:50:02.265 –> 00:50:06.045
how do we deploy the, a kind of sensors kind of kinds

1229
00:50:06.045 –> 00:50:07.525
of remote sensing, which

1230
00:50:07.525 –> 00:50:10.565
because of their kind of physical relationship to the,

1231
00:50:10.565 –> 00:50:12.805
the phenomenon that we’re trying to, to get assurance about,

1232
00:50:13.025 –> 00:50:15.925
reveal as little as possible of the unnecessary information,

1233
00:50:15.945 –> 00:50:18.445
but all of the necessary information to get that confidence.

1234
00:50:18.445 –> 00:50:19.965
When we’re, we’re talking about ai,

1235
00:50:19.975 –> 00:50:21.685
we’re talking about computer technology,

1236
00:50:21.825 –> 00:50:23.845
on the one hand there’s a bit of a disadvantage

1237
00:50:23.845 –> 00:50:27.005
because computations don’t really leave physical traces

1238
00:50:27.085 –> 00:50:30.005
particularly that distinguish them from other computations

1239
00:50:30.005 –> 00:50:31.045
of a similar size.

1240
00:50:31.185 –> 00:50:34.245
So we can’t rely as much on that kind of physical, uh,

1241
00:50:34.565 –> 00:50:37.045
surveillance approach, remote sensing approach to assurance.

1242
00:50:37.225 –> 00:50:39.485
But on the other hand, actually the upsides are

1243
00:50:39.485 –> 00:50:40.565
way bigger than that downside.

1244
00:50:40.565 –> 00:50:42.445
The upside is we could, on the device,

1245
00:50:42.445 –> 00:50:45.125
on the computing device actually use cryptographic

1246
00:50:45.285 –> 00:50:47.885
techniques, um, which include zero knowledge proofs,

1247
00:50:47.885 –> 00:50:49.205
both non-interactive and,

1248
00:50:49.225 –> 00:50:50.565
and actually interactive zero

1249
00:50:50.565 –> 00:50:51.925
knowledge proofs could come into this.

1250
00:50:51.925 –> 00:50:54.525
And the, the, what a zero knowledge proof means if you don’t

1251
00:50:54.525 –> 00:50:57.645
know that term, is it sort of, uh, just like it sounds, um,

1252
00:50:57.925 –> 00:51:01.205
a proof that, uh, doesn’t give you any bits of information,

1253
00:51:01.705 –> 00:51:03.445
uh, as the, the recipient

1254
00:51:03.445 –> 00:51:06.165
or the verifier of the proof, except the one bit

1255
00:51:06.165 –> 00:51:08.125
of information that the property that you,

1256
00:51:08.125 –> 00:51:09.485
that you were proven is, is true.

1257
00:51:09.665 –> 00:51:12.365
So we can use this as very versatile technique, uh,

1258
00:51:12.365 –> 00:51:15.365
that we can use to produce all kinds of assurance mechanisms

1259
00:51:15.365 –> 00:51:17.965
that certain properties are true about AI systems

1260
00:51:17.965 –> 00:51:20.765
that are being operated by a party with, uh,

1261
00:51:20.765 –> 00:51:21.965
with no mutual trust.

1262
00:51:22.435 –> 00:51:25.685
That, uh, as long as we can, uh, be confident

1263
00:51:25.685 –> 00:51:29.085
that there are not other systems of, of similar

1264
00:51:29.185 –> 00:51:31.245
or greater size that we don’t know about

1265
00:51:31.245 –> 00:51:33.525
because of the, uh, kind of dynamics

1266
00:51:33.525 –> 00:51:34.965
of the semiconductor supply chain,

1267
00:51:35.085 –> 00:51:37.005
I think there’s quite a bit of hope about that as well.

1268
00:51:37.185 –> 00:51:38.965
So this could be really a basis if we can,

1269
00:51:39.065 –> 00:51:42.525
if we can formalize the property, the safety properties

1270
00:51:42.525 –> 00:51:45.925
that we want to agree on between a certain set of countries

1271
00:51:45.925 –> 00:51:48.325
and then maybe a certain broader set of countries, uh,

1272
00:51:48.445 –> 00:51:50.765
a smaller set of agreed upon safety properties.

1273
00:51:50.865 –> 00:51:53.365
If they’re formal, then we can verify them in zero knowledge

1274
00:51:53.425 –> 00:51:55.445
and, and that could be very powerful for assurance.

1275
00:51:55.945 –> 00:51:57.765
Is there a component of also verifying the hardware

1276
00:51:57.765 –> 00:52:00.045
that does the zero knowledge proofs, or, yeah.

1277
00:52:00.045 –> 00:52:02.845
Well, when, when you receive a zero knowledge proof, uh,

1278
00:52:03.105 –> 00:52:06.045
you, you can actually, you know, be confident regardless

1279
00:52:06.105 –> 00:52:08.285
of whether the hardware that, uh, produced,

1280
00:52:08.285 –> 00:52:10.565
the proof is compromised, as long as the, as long

1281
00:52:10.565 –> 00:52:11.925
as you’re confident that the hardware

1282
00:52:11.925 –> 00:52:13.725
that’s verifying the proof is not compromised,

1283
00:52:13.745 –> 00:52:14.725
and you can have redundant

1284
00:52:14.725 –> 00:52:16.365
implementations to reduce that risk.

1285
00:52:16.625 –> 00:52:18.885
So one kind of international paradigm

1286
00:52:18.885 –> 00:52:21.565
that might be considered is that there is a, a group

1287
00:52:21.565 –> 00:52:24.485
of countries that all have their own implementation

1288
00:52:24.485 –> 00:52:27.285
of a verifier, and they all check the zero knowledge proofs

1289
00:52:27.395 –> 00:52:28.525
from every other country.

1290
00:52:28.625 –> 00:52:30.445
And so you get sort of an n squared, um,

1291
00:52:30.565 –> 00:52:31.845
verification complexity,

1292
00:52:31.845 –> 00:52:33.845
but you really reduce the risk of a proof that, um,

1293
00:52:33.845 –> 00:52:36.045
somehow exploits the verification hardware,

1294
00:52:36.065 –> 00:52:39.125
but the ver verification of the inference hardware

1295
00:52:39.145 –> 00:52:41.365
or of the training hardware, those are kind

1296
00:52:41.365 –> 00:52:44.925
of important steps toward a world in which we can be

1297
00:52:44.925 –> 00:52:47.965
confident about what all of the large AI systems

1298
00:52:47.995 –> 00:52:49.765
that are being inferenced or trained are.

1299
00:52:49.865 –> 00:52:52.445
And, and so if we have a verification that on the device,

1300
00:52:52.665 –> 00:52:56.045
it, it says, if you are above a, a certain, um,

1301
00:52:56.115 –> 00:52:58.245
size threshold and parameters like the total number

1302
00:52:58.245 –> 00:53:00.805
of flops in a computation or the total number of parameters

1303
00:53:00.805 –> 00:53:03.805
or data under certain conditions, um, we wanna verify

1304
00:53:03.805 –> 00:53:05.045
that this hardware will not,

1305
00:53:05.045 –> 00:53:07.845
will not run such a computation without, uh,

1306
00:53:07.845 –> 00:53:10.645
doing some additional thing like running an eval

1307
00:53:10.645 –> 00:53:11.965
for some dangerous capabilities.

1308
00:53:12.105 –> 00:53:14.805
And if those evals then turn up dangerous capabilities,

1309
00:53:14.835 –> 00:53:16.725
then we wanna say you can’t inference

1310
00:53:16.725 –> 00:53:19.405
that model without doing some further additional thing, uh,

1311
00:53:19.555 –> 00:53:22.605
such as formally verifying, uh, some safety properties

1312
00:53:22.625 –> 00:53:24.525
and putting a bound on the transfer entropy

1313
00:53:24.595 –> 00:53:25.685
that the model on the

1314
00:53:25.685 –> 00:53:27.565
inside can communicate to the outside world.

1315
00:53:28.185 –> 00:53:29.765
So maybe for, maybe for just briefly,

1316
00:53:29.865 –> 00:53:31.925
but let, let’s, let’s imagine we’re six months in,

1317
00:53:31.925 –> 00:53:34.605
we’re at a, you know, quarterly meeting of, of the, the sort

1318
00:53:34.605 –> 00:53:37.565
of technical, technical leads in this new organization.

1319
00:53:37.625 –> 00:53:39.325
You have people working on machine learning

1320
00:53:39.325 –> 00:53:42.205
for the world modeling, uh, the verification and,

1321
00:53:42.225 –> 00:53:45.605
and guarantees other parts, building the specifications.

1322
00:53:45.675 –> 00:53:46.605
What are the sort of key

1323
00:53:46.605 –> 00:53:48.125
capabilities that they’re developing?

1324
00:53:48.195 –> 00:53:50.125
What are the key challenges? Envision?

1325
00:53:50.155 –> 00:53:51.885
What, what would it actually take to,

1326
00:53:51.885 –> 00:53:53.445
to build this organization, to solve these?

1327
00:53:53.505 –> 00:53:55.325
All of what we’ve been doing has been assuming

1328
00:53:55.325 –> 00:53:56.845
that we can solve these technical challenge.

1329
00:53:57.385 –> 00:54:00.325
So I would say the first, um, the first part

1330
00:54:00.325 –> 00:54:03.805
of the workflow, which is not necessarily the first research

1331
00:54:04.085 –> 00:54:06.685
priority, but it, but it may well be, is, um, helping

1332
00:54:06.745 –> 00:54:08.725
to construct world models and specifications.

1333
00:54:08.945 –> 00:54:12.645
And we could approach that even with present models, uh,

1334
00:54:12.715 –> 00:54:13.725
just with scaffolding

1335
00:54:13.725 –> 00:54:15.725
and prompting to, uh, sort of learn, even

1336
00:54:15.725 –> 00:54:18.045
with present probabilistic programming languages.

1337
00:54:18.105 –> 00:54:21.245
Uh, there’s a paper called From Word Models to World Models

1338
00:54:21.395 –> 00:54:23.485
that took a, a very first step in that direction

1339
00:54:23.515 –> 00:54:25.765
with GPT-3 0.5, uh, and,

1340
00:54:25.765 –> 00:54:28.085
and with the church probabilistic pro programming language,

1341
00:54:28.085 –> 00:54:30.565
surprisingly good results given that they, uh,

1342
00:54:30.825 –> 00:54:33.605
did no prompting, no scaffolding, no fine tuning, really.

1343
00:54:33.845 –> 00:54:35.245
I mean, very, very simple prompting.

1344
00:54:35.275 –> 00:54:36.965
Just a few examples in the prompt,

1345
00:54:37.105 –> 00:54:39.045
And just so I understand, so probabilistic programming,

1346
00:54:39.045 –> 00:54:41.485
this is a way that you could specify these world

1347
00:54:41.485 –> 00:54:42.565
models, right?

1348
00:54:42.665 –> 00:54:45.285
As some kind of programs, um, you know,

1349
00:54:45.285 –> 00:54:47.445
if the power grid does this, then there’s this probability

1350
00:54:47.475 –> 00:54:49.005
that this next thing will happen.

1351
00:54:49.585 –> 00:54:50.885
Um, you wanna build up a program

1352
00:54:50.955 –> 00:54:52.565
that describes all that, right?

1353
00:54:52.725 –> 00:54:55.685
I I would say, you know, hey, hey, um, I wanna power grid

1354
00:54:55.685 –> 00:54:57.605
that has this much wind power, et cetera.

1355
00:54:57.835 –> 00:55:00.565
Yeah. Sort of provide access to the, to the spreadsheets.

1356
00:55:00.565 –> 00:55:02.325
And there’s a multimodal aspect

1357
00:55:02.325 –> 00:55:05.285
because a lot of the technical data is actually on paper.

1358
00:55:05.465 –> 00:55:07.445
So scanning in the technical data as images

1359
00:55:07.545 –> 00:55:09.605
and then feeding it to a multi multimodal model

1360
00:55:09.605 –> 00:55:11.205
and saying, you know, please, please write down

1361
00:55:11.985 –> 00:55:14.525
what’s going on in this picture in a, in a formal language.

1362
00:55:14.545 –> 00:55:16.445
And then there’s a scaffolding aspect to that,

1363
00:55:16.445 –> 00:55:19.045
because of course, the first time you, you, you do one

1364
00:55:19.045 –> 00:55:20.525
of these things, you’re probably gonna get it wrong.

1365
00:55:20.545 –> 00:55:21.645
So you need to go back and,

1366
00:55:21.665 –> 00:55:24.005
and revise, uh, at, at the end of the day, of course,

1367
00:55:24.005 –> 00:55:26.045
humans should revise and do a sign off.

1368
00:55:26.305 –> 00:55:29.005
Um, but there’s quite a lot, I think, to be gained, um,

1369
00:55:29.665 –> 00:55:31.765
by just having agents interact in,

1370
00:55:31.825 –> 00:55:32.925
in ways that humans might.

1371
00:55:33.065 –> 00:55:35.605
And so the agent can, cr a new agent can come along

1372
00:55:35.605 –> 00:55:37.645
and criticize, this is work that’s being done.

1373
00:55:37.775 –> 00:55:40.325
Again, in the software domain, in the pure cyber domain,

1374
00:55:40.325 –> 00:55:42.605
when we’re talking about writing Python code, there’s a lot

1375
00:55:42.605 –> 00:55:45.485
of work being done on how to, how to scaffold agents to act

1376
00:55:45.485 –> 00:55:48.085
as code reviewers and to give constructive criticism and,

1377
00:55:48.085 –> 00:55:50.205
and revise things and, and thereby autonomously

1378
00:55:50.545 –> 00:55:51.545
And your yeshu.

1379
00:55:51.705 –> 00:55:53.525
Um, when you think about building the world model,

1380
00:55:53.625 –> 00:55:55.565
are you also thinking maybe there’s something

1381
00:55:55.565 –> 00:55:58.005
that writes probabilistic programming code for you?

1382
00:55:58.025 –> 00:55:59.085
Or are you thinking about a different

1383
00:55:59.085 –> 00:56:00.165
way of building the world models

1384
00:56:00.385 –> 00:56:01.385
Or? So the way

1385
00:56:01.385 –> 00:56:04.725
I think about this is leveraging the advances in

1386
00:56:04.725 –> 00:56:07.485
generative machine learning to produce the pieces

1387
00:56:07.485 –> 00:56:09.805
of the world model, train this thing so

1388
00:56:09.805 –> 00:56:12.445
that it produces pieces of that are coherent

1389
00:56:12.445 –> 00:56:14.285
with each other, but is also able to generate

1390
00:56:14.795 –> 00:56:18.445
alternative interpretations for things we’re not sure of.

1391
00:56:18.845 –> 00:56:20.445
A lot hinges on

1392
00:56:20.905 –> 00:56:22.645
how do you define the language in which

1393
00:56:23.025 –> 00:56:24.205
the knowledge is expressed.

1394
00:56:24.505 –> 00:56:27.365
And as we said earlier, we want it to be interpretable,

1395
00:56:27.465 –> 00:56:30.085
so close to natural language, maybe close to mathematics

1396
00:56:30.185 –> 00:56:33.405
and logic, and able to refer to concepts

1397
00:56:33.405 –> 00:56:35.285
that we use in science, for example.

1398
00:56:35.425 –> 00:56:38.405
Um, so I think there’s, there’s a lot of, uh, thinking to go

1399
00:56:38.555 –> 00:56:42.885
through here in terms of what is the right formalism,

1400
00:56:42.885 –> 00:56:44.445
for example, we’d like this language

1401
00:56:44.505 –> 00:56:46.085
to capture not just logic,

1402
00:56:46.145 –> 00:56:48.885
but also things like causality and interventions.

1403
00:56:49.185 –> 00:56:53.005
If you know it does math like you need axioms, there’s lots

1404
00:56:53.005 –> 00:56:54.565
of questions here that, you know,

1405
00:56:54.565 –> 00:56:55.605
choices that need to be made.

1406
00:56:55.605 –> 00:56:57.765
And then on the machine learning side, what sorts

1407
00:56:57.765 –> 00:56:59.125
of techniques already exist

1408
00:56:59.265 –> 00:57:02.565
or need to be improved to scale to problems

1409
00:57:02.585 –> 00:57:04.925
of the sort we wanna do in TA three, for example.

1410
00:57:05.145 –> 00:57:07.485
So there are machine learning methods to generate,

1411
00:57:07.875 –> 00:57:09.605
like patient hypotheses, for example.

1412
00:57:09.905 –> 00:57:12.845
How do we scale them up, uh, to the size

1413
00:57:12.845 –> 00:57:14.485
of problems like, you know, a power grid?

1414
00:57:14.605 –> 00:57:17.765
I think there’s a lot of, uh, questions there that need

1415
00:57:17.765 –> 00:57:19.485
to be thought about. Yeah.

1416
00:57:19.485 –> 00:57:22.045
And on the language side, um, we’re, we’re starting

1417
00:57:22.105 –> 00:57:23.885
to build these kinds of formalisms in

1418
00:57:23.885 –> 00:57:25.205
technical area one already.

1419
00:57:25.425 –> 00:57:27.845
Um, people started on on that about a month ago.

1420
00:57:28.025 –> 00:57:30.565
But it’s sort of a question of how do we, uh,

1421
00:57:30.635 –> 00:57:32.085
once we have new formalisms

1422
00:57:32.085 –> 00:57:33.925
and languages, the mach machine learning question is

1423
00:57:33.925 –> 00:57:36.685
how do we adapt agents to, uh, to be fluent

1424
00:57:36.685 –> 00:57:38.125
and competent in those languages?

1425
00:57:38.265 –> 00:57:39.685
Uh, primarily through fine tuning,

1426
00:57:39.745 –> 00:57:41.765
but also perhaps reinforcement learning from

1427
00:57:41.815 –> 00:57:42.925
persistent feedback.

1428
00:57:43.065 –> 00:57:45.165
But at some level there’s this sort of symbolic,

1429
00:57:45.165 –> 00:57:46.405
there’s a symbolic language.

1430
00:57:46.405 –> 00:57:49.125
The world model is not just weights, it’s That’s right.

1431
00:57:49.125 –> 00:57:50.525
That’s a language. Yeah, that’s right.

1432
00:57:50.525 –> 00:57:52.005
That’s a, that’s a crucial component. It

1433
00:57:52.005 –> 00:57:54.525
Might incorporate components that have parameters

1434
00:57:54.525 –> 00:57:55.645
that are learned from data,

1435
00:57:55.945 –> 00:57:58.845
but those parameters should be, uh, kind of labeled

1436
00:57:58.995 –> 00:58:00.565
with a human understandable name,

1437
00:58:00.735 –> 00:58:01.735
Right? So, so one way

1438
00:58:01.735 –> 00:58:02.925
I like to think about this is

1439
00:58:02.925 –> 00:58:04.205
that the concepts

1440
00:58:04.205 –> 00:58:06.565
that are being manipulated can be verbalized

1441
00:58:06.945 –> 00:58:09.325
and, you know, formalized the relationships

1442
00:58:09.325 –> 00:58:11.445
between those concepts as much as possible.

1443
00:58:11.445 –> 00:58:12.645
They should also be interpretable.

1444
00:58:12.645 –> 00:58:15.925
But sometimes there is no simple short formula

1445
00:58:16.075 –> 00:58:20.445
that explains a, a, a complicated, complicated relationship

1446
00:58:20.445 –> 00:58:22.285
between some small number of variables.

1447
00:58:22.345 –> 00:58:24.325
And there, uh, we’d like the way

1448
00:58:24.325 –> 00:58:27.045
that we train the machine learning so that it wants

1449
00:58:27.065 –> 00:58:30.765
to generate as simple as possible, uh, an explanation

1450
00:58:30.765 –> 00:58:33.045
that humans can take, take out independently

1451
00:58:33.045 –> 00:58:34.245
of the rest of the world model.

1452
00:58:34.465 –> 00:58:37.205
And, you know, like try to understand it, plot

1453
00:58:37.305 –> 00:58:38.925
how it looks like, and so on, and,

1454
00:58:38.925 –> 00:58:40.965
and see that it, it conforms to their intuition.

1455
00:58:41.265 –> 00:58:43.925
So part of, part of this will be formulating the languages

1456
00:58:43.925 –> 00:58:46.205
with the ta One part, I think it’s fair

1457
00:58:46.205 –> 00:58:47.205
to say the world modeling part

1458
00:58:47.205 –> 00:58:49.965
of this has some very new aspects

1459
00:58:49.965 –> 00:58:51.365
of machine learning in terms of

1460
00:58:51.365 –> 00:58:53.125
how do you translate into that space.

1461
00:58:53.755 –> 00:58:55.805
Well, there, there’s also the interactive aspect.

1462
00:58:56.145 –> 00:58:58.765
We, you know, we mentioned generative models, you know,

1463
00:58:58.785 –> 00:59:01.285
the language, but there’s also like humans in the loop here.

1464
00:59:01.345 –> 00:59:03.565
So the, that’s one reason

1465
00:59:03.585 –> 00:59:05.405
for being interpretable in that language.

1466
00:59:05.505 –> 00:59:07.725
But, but another is that we discussed earlier,

1467
00:59:08.185 –> 00:59:09.965
the uncertainty can be something

1468
00:59:09.965 –> 00:59:11.565
that’s can sometimes be resolved

1469
00:59:11.565 –> 00:59:12.965
by having an interaction with humans.

1470
00:59:13.385 –> 00:59:14.525
So they have to at least be sort

1471
00:59:14.525 –> 00:59:17.045
of the peer reviewer at at some bottom, bottom layer.

1472
00:59:17.075 –> 00:59:18.125
Okay? So there’s machine learning

1473
00:59:18.145 –> 00:59:21.125
and constructing the world model itself, helping humans

1474
00:59:21.305 –> 00:59:23.005
to construct and audit the world model.

1475
00:59:23.625 –> 00:59:25.285
And another piece of that is, is sort

1476
00:59:25.285 –> 00:59:26.805
of constructing the specifications

1477
00:59:27.145 –> 00:59:29.805
and, uh, specifications are a lot like world models.

1478
00:59:30.345 –> 00:59:31.605
Um, but instead

1479
00:59:31.605 –> 00:59:33.525
of being purely descriptive, they’re normative.

1480
00:59:33.785 –> 00:59:37.125
So they’re, they’re, they’re specifying, uh, the dynamics of

1481
00:59:37.355 –> 00:59:40.965
what, uh, are judgment, uh, as the collectively

1482
00:59:40.965 –> 00:59:44.485
as the human overseers would be, um, as certain, uh,

1483
00:59:44.485 –> 00:59:46.525
factual things change in the underlying,

1484
00:59:46.525 –> 00:59:47.645
uh, descriptive model.

1485
00:59:48.265 –> 00:59:51.285
And those specifications also should be auditable

1486
00:59:51.425 –> 00:59:53.485
and audited, um, by, by humans.

1487
00:59:54.025 –> 00:59:56.325
And there are also additional sources of,

1488
00:59:56.345 –> 00:59:57.565
of data that can be brought in.

1489
00:59:57.785 –> 00:59:59.765
So like Yasha was saying, for the world models,

1490
00:59:59.865 –> 01:00:02.885
we can bring in scientific literature, we can bring in, uh,

1491
01:00:03.275 –> 01:00:06.325
data, uh, in addition to both the human, uh,

1492
01:00:06.325 –> 01:00:09.205
understanding human concepts, uh, human dialogue

1493
01:00:09.465 –> 01:00:12.405
and the machine’s own kind of artificial intuition.

1494
01:00:12.865 –> 01:00:16.645
Um, on the specification side, uh, we can’t really bring in,

1495
01:00:16.785 –> 01:00:20.005
uh, data, but what we can, what we can bring in is, uh,

1496
01:00:20.845 –> 01:00:23.965
a machine model of human surprise, uh, and,

1497
01:00:23.985 –> 01:00:25.565
and sort of, uh, human disapproval.

1498
01:00:25.865 –> 01:00:29.165
So we can actually optimize for trajectories that, uh,

1499
01:00:29.165 –> 01:00:30.845
according to the current version

1500
01:00:30.845 –> 01:00:33.765
of the specification are akay, uh,

1501
01:00:33.765 –> 01:00:35.445
but which the machine predicts

1502
01:00:35.445 –> 01:00:38.045
that a human would find either surprising or,

1503
01:00:38.265 –> 01:00:39.925
or, uh, revolting.

1504
01:00:40.225 –> 01:00:42.285
Um, and then sort of surface, okay,

1505
01:00:42.505 –> 01:00:44.925
it seems like you might have a gap in your specification.

1506
01:00:45.305 –> 01:00:48.005
Um, and then even go on to propose, you know, here are some

1507
01:00:48.005 –> 01:00:51.125
of the vari the key variables that, that cause me to think

1508
01:00:51.125 –> 01:00:52.925
that you wouldn’t like this trajectory

1509
01:00:52.995 –> 01:00:55.045
that your current specification approves of.

1510
01:00:55.215 –> 01:00:56.845
Would you like to add this new

1511
01:00:57.095 –> 01:00:58.725
constraint to the specification?

1512
01:00:59.025 –> 01:01:00.885
Um, so there’s that collaborative process as well.

1513
01:01:00.885 –> 01:01:02.965
That’s also part of T of TA two A.

1514
01:01:03.445 –> 01:01:05.565
’cause those are very similar, um, uh, types

1515
01:01:05.565 –> 01:01:07.565
of collaboration, but it needs to be a,

1516
01:01:07.645 –> 01:01:09.285
a fine separate, fine tuning. I would

1517
01:01:09.315 –> 01:01:11.285
Also add that ideally

1518
01:01:11.595 –> 01:01:15.525
what the machine learning component does here is also model

1519
01:01:15.585 –> 01:01:18.405
the uncertainty about yes, the specification, right?

1520
01:01:18.465 –> 01:01:21.645
So the humans might not be sufficiently clear

1521
01:01:22.105 –> 01:01:23.925
and maybe they’re not even able to express

1522
01:01:24.075 –> 01:01:26.085
with full clarity what they want.

1523
01:01:26.085 –> 01:01:28.285
Think about, you know, laws, we write laws,

1524
01:01:28.305 –> 01:01:30.085
but they’re not like a hundred percent clear

1525
01:01:30.345 –> 01:01:31.685
for all kinds of reasons.

1526
01:01:32.385 –> 01:01:34.205
Uh, but a lot of it has to do

1527
01:01:34.205 –> 01:01:36.445
with natural language, which is ambiguous.

1528
01:01:36.625 –> 01:01:39.125
And, and so would like the AI ideally

1529
01:01:39.345 –> 01:01:42.285
to either help resolve those uncertainties,

1530
01:01:42.465 –> 01:01:43.565
as David was saying,

1531
01:01:43.665 –> 01:01:45.605
or maintain the un uncertainty if we’re not able

1532
01:01:45.605 –> 01:01:49.525
to resolve them, so that later when the AI proposes

1533
01:01:49.525 –> 01:01:51.965
particular things to do, we take into account

1534
01:01:51.965 –> 01:01:54.445
that uncertainty to be playing on the conservative side.

1535
01:01:54.545 –> 01:01:56.925
If we’re not sure that something would be considered

1536
01:01:57.205 –> 01:01:59.645
unacceptable by humans, then we just don’t do it, right?

1537
01:01:59.645 –> 01:02:02.725
Mm-hmm. And, and there are like interesting questions about

1538
01:02:02.835 –> 01:02:06.245
that uncertainty and math, mathematics can come in here

1539
01:02:06.625 –> 01:02:09.205
to give us guarantees that we will not go

1540
01:02:09.205 –> 01:02:11.765
through a threshold of, uh, well,

1541
01:02:11.865 –> 01:02:14.405
it might be dangerous given the uncertainty.

1542
01:02:14.625 –> 01:02:16.285
So there’s sort of a collaborative authoring

1543
01:02:16.285 –> 01:02:17.445
process for the world model.

1544
01:02:17.715 –> 01:02:20.045
There’s sort of collaborative offer authoring of,

1545
01:02:20.065 –> 01:02:21.325
of these specifications.

1546
01:02:21.665 –> 01:02:23.045
Yep. And these are both in sort

1547
01:02:23.045 –> 01:02:25.125
of unprecedentedly complicated situations.

1548
01:02:25.165 –> 01:02:26.925
I mean, you can talk about specifications for, you know,

1549
01:02:27.085 –> 01:02:28.325
formally verified software.

1550
01:02:28.465 –> 01:02:30.165
That’s one thing, but you’re also talking about

1551
01:02:30.365 –> 01:02:32.405
specifications in this relative to this world model,

1552
01:02:32.405 –> 01:02:35.005
which might be this complicated messier thing, okay.

1553
01:02:35.005 –> 01:02:37.205
Translating back and forth into these languages

1554
01:02:37.205 –> 01:02:38.565
that they’re kinda maybe generalizations

1555
01:02:38.565 –> 01:02:40.125
of probabilistic programming in some way,

1556
01:02:40.145 –> 01:02:42.845
or sort of broader languages, um, interacting with humans.

1557
01:02:42.985 –> 01:02:44.725
So, so that there’s gonna be a presentation at the,

1558
01:02:44.725 –> 01:02:46.805
at the group meeting about how that’s progressing.

1559
01:02:47.545 –> 01:02:49.645
Um, uh, what are some of the other kind of pillars

1560
01:02:49.645 –> 01:02:51.125
of technical work that that have to happen

1561
01:02:51.125 –> 01:02:52.365
and what are their key challenges? And

1562
01:02:52.465 –> 01:02:54.205
So that, that next big piece is, uh,

1563
01:02:54.635 –> 01:02:57.605
what we call it in the thesis, coherent reasoning ml, um,

1564
01:02:57.735 –> 01:03:00.525
which is, is basically, uh, approximate, um,

1565
01:03:00.675 –> 01:03:02.165
amortized basian inference.

1566
01:03:02.305 –> 01:03:05.125
Uh, but we want to do that with, uh, search

1567
01:03:05.125 –> 01:03:07.565
and learning so that the, the better lesson doesn’t,

1568
01:03:07.585 –> 01:03:08.645
uh, rule it out somehow.

1569
01:03:09.105 –> 01:03:12.965
Uh, so, uh, so we want to, to use, um, uh, state

1570
01:03:12.965 –> 01:03:14.085
of the art machine learning techniques

1571
01:03:14.085 –> 01:03:15.365
and state of the art hardware and that,

1572
01:03:15.365 –> 01:03:18.925
and then also incorporate search, um, one, uh, kind

1573
01:03:18.925 –> 01:03:20.405
of work in this direction, uh,

1574
01:03:20.875 –> 01:03:22.285
that I was just looking at recently.

1575
01:03:22.355 –> 01:03:25.205
It’s a little, uh, uh, a little, uh, older, um,

1576
01:03:25.625 –> 01:03:27.005
but less than a year old still.

1577
01:03:27.305 –> 01:03:32.285
Um, it’s, uh, bay Bays 3D, um, which is, uh, from, uh,

1578
01:03:32.285 –> 01:03:34.565
Joshua Te Bauman and Akash KO’s group.

1579
01:03:34.585 –> 01:03:36.005
And they, they’ve, uh, sort

1580
01:03:36.005 –> 01:03:38.925
of used literally just a combination of search and learning

1581
01:03:39.145 –> 01:03:41.805
and, uh, and, and rendering, uh, 3D graphics

1582
01:03:41.985 –> 01:03:43.565
to do an approximate bayesian

1583
01:03:43.565 –> 01:03:44.965
inversion of the rendering process.

1584
01:03:45.145 –> 01:03:47.285
So you, you give it, uh, uh, camera images

1585
01:03:47.505 –> 01:03:50.885
and it, uh, it estimates that Bayesian posterior of

1586
01:03:50.915 –> 01:03:52.725
what is happening in the 3D scene

1587
01:03:52.725 –> 01:03:54.845
that the camera’s looking at and the camera position.

1588
01:03:54.945 –> 01:03:56.965
So that’s, that’s an instance. Can, can

1589
01:03:56.965 –> 01:03:58.565
You remind me how this part sort of fits in?

1590
01:03:58.565 –> 01:04:00.285
So I think I, I understood, yeah, we, we want

1591
01:04:00.285 –> 01:04:01.485
to translate maybe a bunch

1592
01:04:01.485 –> 01:04:03.445
of information we have about some complicated system.

1593
01:04:03.445 –> 01:04:05.525
Maybe it’s the power grid or something else into these kind

1594
01:04:05.525 –> 01:04:08.725
of probabilistic sort of somewhat symbolic kind

1595
01:04:08.725 –> 01:04:09.765
of modeling languages.

1596
01:04:09.985 –> 01:04:11.965
Um, what is, what is, what is the role of,

1597
01:04:12.145 –> 01:04:14.485
of this reasoning that you’re training for?

1598
01:04:14.485 –> 01:04:16.365
What is, what is it trying to deduce or,

1599
01:04:16.435 –> 01:04:19.365
Yeah, so, so I would say that the, not the only purpose,

1600
01:04:19.425 –> 01:04:22.045
but the primary purpose that probably illustrate it best is

1601
01:04:22.065 –> 01:04:24.605
if we think about this runtime verification aspect,

1602
01:04:24.605 –> 01:04:27.325
which is you’re at runtime, you’re receiving some stream

1603
01:04:27.325 –> 01:04:29.885
of observations from remote sensing devices,

1604
01:04:30.265 –> 01:04:33.165
and the advanced controller, um, this, uh, kind

1605
01:04:33.165 –> 01:04:35.005
of opaque neural network has proposed a,

1606
01:04:35.245 –> 01:04:38.165
a very clever next action as, as the runtime verifier,

1607
01:04:38.305 –> 01:04:40.845
you need to determine whether this action is gonna take us

1608
01:04:40.905 –> 01:04:43.245
out of the recoverable set with low enough probability.

1609
01:04:43.425 –> 01:04:44.885
And, uh, so in order to do that,

1610
01:04:44.945 –> 01:04:47.965
you don’t just need the dynamics of the underlying system,

1611
01:04:47.965 –> 01:04:50.725
which can include latent variables, hidden, hidden states

1612
01:04:50.725 –> 01:04:52.885
of the, of the network that you can’t directly observe.

1613
01:04:52.945 –> 01:04:56.405
You also need to, to, to be able to determine what your, uh,

1614
01:04:56.405 –> 01:04:59.605
current belief about those latent variables should be,

1615
01:04:59.655 –> 01:05:02.765
given only the observations, um, that you can sense.

1616
01:05:03.305 –> 01:05:05.685
And so that, that’s really the key place where this kind

1617
01:05:05.805 –> 01:05:07.245
of approximate base inference needs

1618
01:05:07.305 –> 01:05:08.305
To come in. Okay. So now I understand,

1619
01:05:08.305 –> 01:05:08.645
you know,

1620
01:05:08.645 –> 01:05:10.805
why you’re mentioning this sort of base 3D thing of sort of,

1621
01:05:10.805 –> 01:05:13.125
you, you have some information that, that that’s data.

1622
01:05:13.345 –> 01:05:15.445
You have some dynamics, but you also need to say, okay,

1623
01:05:15.445 –> 01:05:16.645
well what’s my belief about

1624
01:05:16.715 –> 01:05:17.965
what the actual state the power

1625
01:05:17.965 –> 01:05:19.925
grid is in or something right now? Exactly.

1626
01:05:20.235 –> 01:05:21.645
Okay. Yeah. And that, that’s a,

1627
01:05:21.645 –> 01:05:24.125
that’s a probabilistic inference problem, which

1628
01:05:24.735 –> 01:05:26.485
could be intractable.

1629
01:05:26.905 –> 01:05:28.605
And, you know, you may have to search

1630
01:05:28.605 –> 01:05:30.125
through a large number of possibilities.

1631
01:05:30.665 –> 01:05:32.565
So this is where, again,

1632
01:05:32.675 –> 01:05:34.845
like the modern advances in machine learning,

1633
01:05:35.805 –> 01:05:37.885
because they can help us find, you know,

1634
01:05:37.885 –> 01:05:39.165
the most relevant, uh,

1635
01:05:39.345 –> 01:05:40.885
The most promising hypothesis.

1636
01:05:41.035 –> 01:05:42.085
Exactly. Exactly.

1637
01:05:42.475 –> 01:05:43.885
Okay. So in first, so yeah, we’re,

1638
01:05:43.885 –> 01:05:45.125
we’re authoring in the language,

1639
01:05:45.125 –> 01:05:46.405
we’re inferring latent variables.

1640
01:05:46.405 –> 01:05:49.045
What are some of the other sort of pillars of, of research?

1641
01:05:49.315 –> 01:05:51.805
Yeah, so I mean, another piece is, is, uh, is,

1642
01:05:51.825 –> 01:05:53.845
is actually generating a competent policy,

1643
01:05:54.215 –> 01:05:56.925
which is a little bit like just a reinforcement learning

1644
01:05:56.925 –> 01:05:59.125
problem, neural, neural deep reinforcement learning.

1645
01:05:59.125 –> 01:06:01.725
This is sort of to synthesize the advanced controller,

1646
01:06:01.905 –> 01:06:05.285
but it’s, uh, it’s got a little bit of, of an additional,

1647
01:06:05.465 –> 01:06:06.525
uh, twist, which is

1648
01:06:06.525 –> 01:06:08.845
that the advanced controller will be deployed inside

1649
01:06:08.845 –> 01:06:11.765
of the safeguarding framework, where if it gets too close

1650
01:06:11.765 –> 01:06:13.405
to the edge of the recoverable zone,

1651
01:06:13.515 –> 01:06:14.965
then um, it’ll be switched.

1652
01:06:15.135 –> 01:06:16.725
It’ll be switched off or switched over

1653
01:06:16.725 –> 01:06:17.805
to a backup controller.

1654
01:06:17.825 –> 01:06:20.525
So training it not just to achieve the performance goals,

1655
01:06:20.625 –> 01:06:24.725
but to do so in a way that, um, that, uh, remains legibly,

1656
01:06:25.025 –> 01:06:28.845
uh, kind of verifiably at runtime within the safe zone

1657
01:06:29.145 –> 01:06:30.965
In the realm of reinforcement learning, which is

1658
01:06:30.965 –> 01:06:32.245
of course, a very broad field,

1659
01:06:32.355 –> 01:06:34.405
like now touching on model

1660
01:06:34.405 –> 01:06:35.765
based reinforcement learning, right?

1661
01:06:35.765 –> 01:06:36.965
Right. So it’s not a policy

1662
01:06:36.965 –> 01:06:38.325
that’s learned directly from the data,

1663
01:06:38.325 –> 01:06:39.365
because that’s very dangerous.

1664
01:06:39.365 –> 01:06:40.805
We don’t understand, you know, you know

1665
01:06:40.865 –> 01:06:42.005
why it’s doing what it’s doing.

1666
01:06:42.005 –> 01:06:45.165
That’s where we are now. Uh, instead it’s a policy that has

1667
01:06:45.185 –> 01:06:46.485
to be consistent,

1668
01:06:46.825 –> 01:06:48.805
and that’s what, you know, the step about verification

1669
01:06:49.035 –> 01:06:51.485
with the world model, which is also something, you know,

1670
01:06:51.545 –> 01:06:53.245
we are, which is interpretable and, and,

1671
01:06:53.245 –> 01:06:54.325
and, and kind of verified.

1672
01:06:54.465 –> 01:06:57.325
And so that, that consistency, i, i,

1673
01:06:57.345 –> 01:06:59.045
is what’s gonna give, give us some guarantees.

1674
01:06:59.045 –> 01:07:00.445
Personally, I’m not even sure.

1675
01:07:00.505 –> 01:07:03.685
We need to stay strictly within the reinforcement learning

1676
01:07:03.915 –> 01:07:07.165
framework, which also comes with some dangers.

1677
01:07:07.385 –> 01:07:10.485
Uh, you know, think about things like reward hacking,

1678
01:07:10.485 –> 01:07:11.725
reward tempering and so on

1679
01:07:11.725 –> 01:07:13.805
that researchers in the united safety are worried about.

1680
01:07:13.805 –> 01:07:16.485
There may be other ways that are safer to come up

1681
01:07:16.485 –> 01:07:20.005
with a policy and would like to encourage the TA two team

1682
01:07:20.025 –> 01:07:21.245
to think about those questions.

1683
01:07:21.465 –> 01:07:24.845
And then, uh, another piece is, uh, the safety guarantees.

1684
01:07:25.265 –> 01:07:29.205
So, um, in a formal verification story,

1685
01:07:29.545 –> 01:07:32.645
the safety guarantees come from, uh, some kind of, uh,

1686
01:07:32.655 –> 01:07:36.085
proof object as sort of dag of axio

1687
01:07:36.645 –> 01:07:38.285
axiomatic justified, uh,

1688
01:07:38.395 –> 01:07:41.485
reasoning steps about the expected values of quantities,

1689
01:07:41.485 –> 01:07:44.805
doing case analysis, doing well-founded induction, um,

1690
01:07:45.015 –> 01:07:49.445
using, uh, tools like barrier certificates or OV functions

1691
01:07:49.465 –> 01:07:53.445
or even neural, um, neural reach, avoid super martingale so

1692
01:07:53.445 –> 01:07:55.965
that they’re actually, uh, neural networks that are part

1693
01:07:55.965 –> 01:07:58.285
of the safety proof that are not part of the policy.

1694
01:07:58.505 –> 01:07:59.965
Um, this is something that started to,

1695
01:08:00.025 –> 01:08:01.805
to come into the field in the last couple years.

1696
01:08:01.815 –> 01:08:05.445
There are these additional artifacts which are, uh, part

1697
01:08:05.445 –> 01:08:07.525
of the safety argument that we would like

1698
01:08:07.525 –> 01:08:09.645
to also automate the production of, in addition to,

1699
01:08:09.665 –> 01:08:10.965
to producing the safe system.

1700
01:08:11.025 –> 01:08:12.205
That’s TA two C.

1701
01:08:12.385 –> 01:08:14.045
One of the aspects here that, uh,

1702
01:08:14.315 –> 01:08:17.285
I’ve been thinking a lot about recently is in order

1703
01:08:17.305 –> 01:08:20.485
to get safety guarantees, it’s okay,

1704
01:08:21.045 –> 01:08:22.085
although less desirable,

1705
01:08:22.225 –> 01:08:23.925
but sometimes that’s the only choice we have

1706
01:08:23.945 –> 01:08:25.125
to go for bounds.

1707
01:08:25.275 –> 01:08:27.565
Like, in other words, you know, there’s gonna be a trade off

1708
01:08:27.565 –> 01:08:30.485
between how certain we are that something is, is safe,

1709
01:08:30.945 –> 01:08:34.605
and you know how much we’re willing to lose in terms

1710
01:08:34.625 –> 01:08:36.045
of the user objective.

1711
01:08:36.145 –> 01:08:40.725
So we already seen the distinction between the backup, uh,

1712
01:08:40.745 –> 01:08:42.485
policy and, and, and the main policy.

1713
01:08:42.665 –> 01:08:45.405
But, but you know, you could have other similar trade offs

1714
01:08:45.535 –> 01:08:48.645
where you’re going, when do you decide that something is,

1715
01:08:48.945 –> 01:08:50.605
is sufficiently safe?

1716
01:08:51.015 –> 01:08:54.725
Maybe the only thing you can prove is a bound on, you know,

1717
01:08:54.785 –> 01:08:56.405
the quality that bad things could happen.

1718
01:08:56.475 –> 01:08:59.285
There’s, there’s a thing, interesting work coming from

1719
01:08:59.925 –> 01:09:03.005
statistics machine learning, other areas of applied math

1720
01:09:03.075 –> 01:09:05.885
that can be brought to bear, like including learning theory

1721
01:09:05.985 –> 01:09:08.445
by the way, to help construct these bounds

1722
01:09:08.445 –> 01:09:11.805
that then can be computed that run runtime to say, okay,

1723
01:09:11.805 –> 01:09:14.525
this is, this is a level of risk which is acceptable.

1724
01:09:14.675 –> 01:09:17.645
Just like, you know, if you are operating a nuclear plant,

1725
01:09:18.025 –> 01:09:20.845
you know, there’s going to be runtime calculations

1726
01:09:20.875 –> 01:09:22.885
that are model based of the risk.

1727
01:09:23.065 –> 01:09:25.725
It might not be zero, but it should be pretty small so

1728
01:09:25.725 –> 01:09:27.525
that it’s acceptable socially speaking.

1729
01:09:27.795 –> 01:09:29.485
Okay, great. You need machine learning to,

1730
01:09:29.485 –> 01:09:30.645
to write the proofs, as you said.

1731
01:09:30.645 –> 01:09:32.885
And so in, so in the case of just this formally verified

1732
01:09:33.125 –> 01:09:34.605
software, like the software operating system,

1733
01:09:34.865 –> 01:09:36.325
you have certain mathematical proofs.

1734
01:09:36.325 –> 01:09:38.405
The software will never have, you know, have the memory,

1735
01:09:38.585 –> 01:09:39.805
you know, interact with the

1736
01:09:39.805 –> 01:09:41.165
processor in this particular way.

1737
01:09:41.265 –> 01:09:42.845
Um, you have mathematical proofs of this,

1738
01:09:43.065 –> 01:09:44.845
or this was humans that wrote those proofs

1739
01:09:44.845 –> 01:09:46.565
by looking at the source code Yeah.

1740
01:09:46.665 –> 01:09:49.405
And actually co-developing the source code and the proof.

1741
01:09:49.405 –> 01:09:52.525
Yeah. Okay. And so, so now you would have ai not not

1742
01:09:52.525 –> 01:09:54.845
looking at the underlying, you know, agent

1743
01:09:54.845 –> 01:09:57.005
or autonomous AI that that, that you’re worried about,

1744
01:09:57.005 –> 01:09:58.245
but looking at this world model

1745
01:09:58.245 –> 01:09:59.885
and looking at the, the types of actions

1746
01:09:59.885 –> 01:10:01.805
and specifications that can be proposed in the world model,

1747
01:10:01.805 –> 01:10:04.445
you would say it could help write a proof

1748
01:10:04.795 –> 01:10:06.765
that can then be verified, um,

1749
01:10:06.995 –> 01:10:08.445
that the specifications would hold.

1750
01:10:08.745 –> 01:10:09.925
So if, if it was just software,

1751
01:10:09.925 –> 01:10:11.925
it would literally just be you write a proof, you know,

1752
01:10:11.925 –> 01:10:13.805
in Isabelle or some other language, right?

1753
01:10:13.825 –> 01:10:15.725
Um, that you can verify automatically

1754
01:10:15.945 –> 01:10:17.125
in, in what you’re talking about.

1755
01:10:17.125 –> 01:10:19.805
You started bringing in probability and precise probability.

1756
01:10:19.985 –> 01:10:22.085
Why, why, why is there something more complicated than just

1757
01:10:22.085 –> 01:10:24.605
writing a, a proof that then just gets checked that,

1758
01:10:24.605 –> 01:10:26.365
you know, the specification always holds?

1759
01:10:26.635 –> 01:10:28.605
Yeah. So fun. Fundamentally, it’s

1760
01:10:28.605 –> 01:10:32.565
because, uh, there are, uh, critical factors, uh,

1761
01:10:33.245 –> 01:10:36.445
critical variables in cyber physical systems that, um,

1762
01:10:36.865 –> 01:10:38.125
we can neither control

1763
01:10:38.425 –> 01:10:41.365
nor measure apriori, uh, at, at fundamentally that,

1764
01:10:41.385 –> 01:10:43.925
you know, that that comes down to thermal randomness,

1765
01:10:43.925 –> 01:10:46.045
which fundamentally comes down to quantum randomness

1766
01:10:46.065 –> 01:10:49.005
and, uh, the physical world is uncertain. Um, whereas,

1767
01:10:49.185 –> 01:10:51.485
But also lack of measurement, as you said, like there’s

1768
01:10:51.485 –> 01:10:53.205
so many things in the state of the universe

1769
01:10:53.205 –> 01:10:54.365
that we don’t have access to.

1770
01:10:54.365 –> 01:10:56.485
Yes. So there will be uncertainty in our predictions

1771
01:10:56.505 –> 01:10:58.525
or the AI predictions, and we have to handle that.

1772
01:10:58.905 –> 01:11:00.365
So will it, will it write a proof

1773
01:11:00.915 –> 01:11:03.605
that just makes reference to certain probabilities of,

1774
01:11:03.705 –> 01:11:04.765
of certain things happening?

1775
01:11:04.785 –> 01:11:07.125
Or is there, is there not really writing a proof at all,

1776
01:11:07.145 –> 01:11:08.805
but rather coming up with some other way

1777
01:11:08.805 –> 01:11:09.965
of constraining the system?

1778
01:11:10.065 –> 01:11:12.325
What’s the output of the sort of proof part of machine

1779
01:11:12.445 –> 01:11:13.445
Learning? This is an

1780
01:11:13.445 –> 01:11:15.845
area where we’re, we’re, uh, we’re pretty,

1781
01:11:15.905 –> 01:11:18.005
pretty agnostic in a certain sense.

1782
01:11:18.105 –> 01:11:20.485
Um, it, yashua has a particular direction

1783
01:11:20.485 –> 01:11:21.805
that he’s, he is excited about.

1784
01:11:21.845 –> 01:11:23.445
I have a particular direction I’m excited about.

1785
01:11:23.665 –> 01:11:26.045
We would be even more excited if the people who come

1786
01:11:26.045 –> 01:11:28.685
and create the TA a two organization have even a third way

1787
01:11:28.715 –> 01:11:30.405
that they might think about approaching this.

1788
01:11:30.865 –> 01:11:32.525
Um, but, uh, but,

1789
01:11:32.585 –> 01:11:36.685
but my, uh, my current best idea, it, it involves, uh, a,

1790
01:11:36.885 –> 01:11:39.925
a logic that’s based on placing upper bounds on the expected

1791
01:11:39.925 –> 01:11:41.125
values of quantities.

1792
01:11:41.385 –> 01:11:43.725
And then there are sort of deduction rules that are similar

1793
01:11:43.825 –> 01:11:45.085
to logical deduction rules,

1794
01:11:45.385 –> 01:11:46.885
but they’re, they’re quantitative in nature.

1795
01:11:47.605 –> 01:11:49.245
I wanted to first connect

1796
01:11:49.245 –> 01:11:51.125
what we’ve been discussing about the world model

1797
01:11:51.155 –> 01:11:53.325
with this whole thing about, you know, statements

1798
01:11:53.515 –> 01:11:55.365
that can be, uh, verified.

1799
01:11:55.465 –> 01:11:57.445
It may not be obvious for everyone,

1800
01:11:57.505 –> 01:11:59.685
but once you have a world model, even one

1801
01:11:59.685 –> 01:12:02.005
that has uncertainty in it, in principle,

1802
01:12:02.345 –> 01:12:06.925
you can calculate any probability for like future events,

1803
01:12:06.935 –> 01:12:10.245
given some context, uh, in a unique way.

1804
01:12:10.445 –> 01:12:11.805
I mean, there’s some conditions for this,

1805
01:12:11.905 –> 01:12:16.725
but, uh, essentially, uh, it’s then just logic, right?

1806
01:12:16.945 –> 01:12:20.245
You can combine the pieces now connecting

1807
01:12:20.245 –> 01:12:21.285
to proofs and so on.

1808
01:12:21.705 –> 01:12:24.885
Uh, the problem is there may be a huge number of ways

1809
01:12:24.885 –> 01:12:26.405
of connecting the pieces, right?

1810
01:12:27.225 –> 01:12:31.165
Um, if you think about what is the main advance

1811
01:12:31.165 –> 01:12:33.525
that we’ve made in machine earning over the last few

1812
01:12:33.525 –> 01:12:37.085
decades, uh, we’ve essentially built machines

1813
01:12:37.085 –> 01:12:38.245
with very good intuition.

1814
01:12:38.865 –> 01:12:41.165
Mm-hmm. And that is essential here.

1815
01:12:41.545 –> 01:12:44.925
Uh, why, because there’s so many possible proofs

1816
01:12:44.985 –> 01:12:47.845
or even so many possible things we could wanna prove

1817
01:12:48.035 –> 01:12:50.725
that might be interesting and useful for taking a decision.

1818
01:12:51.065 –> 01:12:54.645
And so the intuition machines that are really good

1819
01:12:54.645 –> 01:12:56.365
that we have, they, they can make mistakes,

1820
01:12:56.745 –> 01:12:58.485
but, you know, instead of a random search,

1821
01:12:58.545 –> 01:13:00.965
we now have machines that can propose things that with a,

1822
01:13:00.965 –> 01:13:02.485
you know, a reasonable proportion

1823
01:13:02.485 –> 01:13:04.725
of the time might be very good conjectures

1824
01:13:04.985 –> 01:13:06.245
and very good proofs.

1825
01:13:06.585 –> 01:13:08.405
That’s where, you know, we need verification

1826
01:13:08.405 –> 01:13:11.325
because these neural nets are gonna make mistakes sometimes.

1827
01:13:11.745 –> 01:13:14.365
Um, so putting these two things together, the power

1828
01:13:14.385 –> 01:13:18.765
of mathematics and like logic with the power of intuition,

1829
01:13:19.145 –> 01:13:21.445
all of this applied to a world model

1830
01:13:21.445 –> 01:13:23.925
that can have uncertainty, especially about, you know,

1831
01:13:23.945 –> 01:13:25.245
the safety specification.

1832
01:13:25.305 –> 01:13:26.925
You know, what is acceptable or not.

1833
01:13:27.025 –> 01:13:29.045
All of those pieces come together here. Yeah.

1834
01:13:29.045 –> 01:13:31.125
And I, I wanna mention the, uh, the,

1835
01:13:31.265 –> 01:13:33.045
the often misunderstood, but,

1836
01:13:33.045 –> 01:13:36.205
but extremely important bitter lesson of, of rich Sutton.

1837
01:13:36.465 –> 01:13:39.405
Um, which, uh, it, it, you know, is often misconstrued

1838
01:13:39.405 –> 01:13:41.725
as saying, uh, the only thing that’s going

1839
01:13:41.725 –> 01:13:43.045
to matter is neural networks

1840
01:13:43.145 –> 01:13:45.485
and all of that symbolic stuff, uh, too bad.

1841
01:13:45.505 –> 01:13:48.045
But what it actually says is, uh, you know, the only thing

1842
01:13:48.045 –> 01:13:50.485
that’s going to matter is things that scale with,

1843
01:13:50.545 –> 01:13:51.725
uh, parallel compute.

1844
01:13:51.985 –> 01:13:54.405
And, uh, apparently the only things that scale

1845
01:13:54.405 –> 01:13:57.365
with parallel compute, he says, are learning and search.

1846
01:13:57.625 –> 01:13:59.565
And we’re, we’re just now since the release

1847
01:13:59.805 –> 01:14:02.165
of oh one seeing a, a kind of shift in the discourse,

1848
01:14:02.405 –> 01:14:03.725
everyone’s talking about test time,

1849
01:14:03.725 –> 01:14:06.365
compute test time compute is really the search

1850
01:14:06.435 –> 01:14:07.685
part of that equation.

1851
01:14:07.775 –> 01:14:09.645
We’re starting to see the search come in,

1852
01:14:09.785 –> 01:14:12.565
but it’s still not quite a formal search where, uh,

1853
01:14:12.565 –> 01:14:14.725
when the search completes, you get a certificate of,

1854
01:14:14.745 –> 01:14:16.405
of correctness or of uniqueness.

1855
01:14:16.585 –> 01:14:18.125
So that, that’s part of what we’re trying to do.

1856
01:14:18.365 –> 01:14:20.725
I also wanna mention uniqueness is a,

1857
01:14:20.745 –> 01:14:23.365
is a really interesting property that’s sort of undervalued,

1858
01:14:23.365 –> 01:14:25.125
you know, the benefits of formal verification,

1859
01:14:25.385 –> 01:14:27.205
the obvious one is correctness,

1860
01:14:27.305 –> 01:14:29.765
but if we have a really smart machine, if it happens

1861
01:14:29.825 –> 01:14:31.605
to be aligned by default,

1862
01:14:31.705 –> 01:14:33.405
we might get a correct answer anyway.

1863
01:14:33.545 –> 01:14:35.405
So that’s not giving us actually correctness.

1864
01:14:35.405 –> 01:14:37.885
It gives us the confidence of correctness so

1865
01:14:37.885 –> 01:14:39.125
that we can trust it, and so

1866
01:14:39.125 –> 01:14:41.765
that we can have a good multi-stakeholder assessment of risk

1867
01:14:41.985 –> 01:14:43.685
before going ahead with is a solution

1868
01:14:43.685 –> 01:14:44.965
that might have been correct anyway.

1869
01:14:44.965 –> 01:14:46.725
So that’s the second layer of benefit.

1870
01:14:47.105 –> 01:14:49.605
But uniqueness is this third layer of benefit

1871
01:14:49.855 –> 01:14:52.885
where if we have sufficiently constrained

1872
01:14:52.905 –> 01:14:55.765
and defined the problem such that there’s a unique solution,

1873
01:14:55.825 –> 01:14:58.205
and the proof is actually a proof of correctness

1874
01:14:58.265 –> 01:15:00.965
and of uniqueness, what that means is that the system

1875
01:15:01.145 –> 01:15:04.725
inside the containment vessel, uh, doesn’t have the power

1876
01:15:04.745 –> 01:15:07.805
to choose other than to decline to answer.

1877
01:15:07.945 –> 01:15:09.645
So it can either give us the unique answer

1878
01:15:09.745 –> 01:15:10.925
or, or it could say nothing.

1879
01:15:11.505 –> 01:15:14.005
Uh, maybe we can, each of you can give a kind

1880
01:15:14.005 –> 01:15:15.725
of quick any concluding thoughts?

1881
01:15:15.755 –> 01:15:17.125
Well, I can start, you know, there,

1882
01:15:17.135 –> 01:15:18.765
there are many other things in my life

1883
01:15:19.035 –> 01:15:20.845
that are taking my attention,

1884
01:15:21.145 –> 01:15:24.565
but this particular program is striking, uh,

1885
01:15:24.845 –> 01:15:27.525
a very important direction to,

1886
01:15:27.665 –> 01:15:32.085
to bring us towards greater safety with future advanced AI

1887
01:15:32.275 –> 01:15:34.725
that I personally think is very, very, uh,

1888
01:15:34.725 –> 01:15:35.845
promising and important.

1889
01:15:36.185 –> 01:15:39.085
And keeping in mind that we don’t know the answers

1890
01:15:39.105 –> 01:15:42.445
to these questions, David and I have our own like ideas

1891
01:15:42.585 –> 01:15:45.285
and so on, but the whole point of, you know, why we need

1892
01:15:45.285 –> 01:15:46.725
so much money is also

1893
01:15:46.725 –> 01:15:48.645
because there’s a lot of research to be done,

1894
01:15:49.025 –> 01:15:53.725
and we are eager to collaborate with whoever’s gonna,

1895
01:15:53.825 –> 01:15:54.965
you know, embark on this

1896
01:15:55.305 –> 01:15:58.245
and with other entities around the world who are, you know,

1897
01:15:58.245 –> 01:16:02.925
have share similar, uh, objectives of fairly rigorous safety

1898
01:16:03.505 –> 01:16:05.205
by design AI systems.

1899
01:16:05.625 –> 01:16:08.405
So, so that, that, I, I just wanna share my excitement

1900
01:16:08.545 –> 01:16:10.525
and enthusiasm for, uh, this project.

1901
01:16:10.705 –> 01:16:13.525
Thanks very much. Um, I guess my,

1902
01:16:13.705 –> 01:16:17.765
my concluding remark is sometimes people say that, uh,

1903
01:16:18.065 –> 01:16:21.485
we have no idea how to solve safety

1904
01:16:21.665 –> 01:16:23.045
for superintelligent systems.

1905
01:16:23.305 –> 01:16:27.005
And, and sometimes people say there is no known engineering

1906
01:16:27.365 –> 01:16:29.845
solution for AI safety for superintelligent systems.

1907
01:16:30.105 –> 01:16:31.125
Uh, and, uh,

1908
01:16:31.265 –> 01:16:32.445
and I think, I think the,

1909
01:16:32.505 –> 01:16:34.525
the former is wrong, the latter is right.

1910
01:16:34.745 –> 01:16:35.845
Um, the former is wrong.

1911
01:16:35.865 –> 01:16:38.605
We have some ideas if our, if we’re correct,

1912
01:16:38.665 –> 01:16:41.685
if our hypotheses are correct, if even some subset

1913
01:16:41.705 –> 01:16:42.765
of our hypotheses or

1914
01:16:42.765 –> 01:16:44.965
or conjectures are correct, this could turn into an

1915
01:16:45.125 –> 01:16:46.965
engineering solution for how to secure, um,

1916
01:16:46.965 –> 01:16:49.085
superintelligent systems and uh,

1917
01:16:49.225 –> 01:16:50.925
and we just need to do more science

1918
01:16:50.945 –> 01:16:52.685
and engineering, uh, to get to that point.

1919
01:16:52.705 –> 01:16:54.685
But we really do have a lot of ideas, um,

1920
01:16:54.685 –> 01:16:58.205
that are actionable and ripe for, um, the right people to,

1921
01:16:58.345 –> 01:17:00.605
uh, to do that research and development on.

1922
01:17:00.755 –> 01:17:03.685
Yeah. I’ll, we wanted to talk a bunch about TA two.

1923
01:17:03.795 –> 01:17:04.885
This is a funding call

1924
01:17:04.885 –> 01:17:06.925
that will come out sometime next year.

1925
01:17:07.225 –> 01:17:10.005
So this was a bit of a, a preview of that, you know, if

1926
01:17:10.005 –> 01:17:12.285
that is work, that work sounds interesting to you.

1927
01:17:12.785 –> 01:17:15.245
Um, uh, you know, keep your eyes out

1928
01:17:15.245 –> 01:17:16.565
for when the funding call is live,

1929
01:17:16.625 –> 01:17:18.645
but one particular way you can keep your eyes out for

1930
01:17:18.645 –> 01:17:21.365
that is by, you know, submitting an expression of interest.

1931
01:17:21.505 –> 01:17:23.565
Um, then we have you on our radar

1932
01:17:23.585 –> 01:17:26.085
and we are able to like at least sort of, um, share

1933
01:17:26.085 –> 01:17:27.165
with you when, when the,

1934
01:17:27.505 –> 01:17:29.645
the actual funding call comes out sometime next year.

1935
01:17:29.705 –> 01:17:31.325
And yeah, you find that on the website.

1936
01:17:31.545 –> 01:17:33.525
Um, I’m sure we’ll link it somewhere close

1937
01:17:33.545 –> 01:17:35.645
to this recording at the time we put it out.

1938
01:17:35.945 –> 01:17:38.405
And it’s very easy. It’ll only take you five minutes

1939
01:17:38.465 –> 01:17:40.005
to submit your expression of interest.

1940
01:17:40.025 –> 01:17:41.165
So, so just do it now.

1941
01:17:41.895 –> 01:17:43.285
Great. Well thank, thanks everyone.

1942
01:17:43.435 –> 01:17:44.885
That was a fascinating conversation

1943
01:17:44.945 –> 01:17:47.085
and it’s a very exciting program. Thanks Adam.

1944
01:17:47.085 –> 01:17:47.525
Thanks Adam.

1945
01:18:03.965 –> 01:18:04.685
K fruition.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.