“The concern is that the machines become smarter, develop feelings of superiority and then decide that they don’t want to be turned off. Right, that’s the concern. At what point do they say actually I’m in charge?”

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

As is the case with most other software, artificial intelligence (AI) is vulnerable to hacking. A hacker, who is part of an international effort to draw attention to the shortcomings of the biggest tech companies, is stress-testing, or “jailbreaking,” the language models at Microsoft, ChatGPT and Google, according to a recent report from the Financial Times. Two weeks ago, Russian hackers used AI for a cyber-attack on major London hospitals, according to the former chief executive of the National Cyber Security Centre. Hospitals declared a critical incident after the ransomware attack, which affected blood transfusions and test results. On this week’s AI Decoded, the BBC’s Christian Fraser explores the security implications of businesses that are turning to AI to improve their systems.

TRANSCRIPT. welcome to AI decoded it’s that time of the week when we dive deep into the most eye-catching stories in the world of artificial intelligence this week cyber security how long would it take an advanced hacker to break into the most complex most powerful AI models that we built 30 minutes as it happens this week the Ft carried an exclusive interview with the mysterious faceless hacker known as plenny the prompter he is stress testing or jailbreaking the large language models that Microsoft Chat GPT and Google have been building an ethical online Warrior who is part of the international effort to draw attention to the shortcomings of the biggest tech companies who he argues are putting profit before safety and if you think that is a risk somewhere in the future think again just two weeks ago the Russian cyber criminals killing used sophisticated AI tools to find their way into the NHS computers tens of thousands of patients here in the UK had names dates of birth and private information published on the dart web when the testing firm cinis refused to pay the ransom the hackers encrypted Vital Information which made the it systems at two NHS hospitals useless so how worried should we be what are the implications for those businesses who are turning to AI to improve their systems tonight we’re bringing together a former hacker Conor Lehi now the CEO of AI Safety Company conjecture darl Flack is here CEO of cyber security company blockfish and as ever to help guide us through it our resident expert and presenter Stephanie hair let me start with you Stephanie plyy the prompter is what we call a white hat a friendly hacker yep the good guys in the shadows who are stress testing the system what have they been able to do so they’re able to get large language models to do things that they should not be able to do so these are Co with guard rails they’re CED coded with rules and jailbreaking a model is imagine what you would do if you had a sort of teenager at home who was trying to play with a game and see if they could break it break with the system lots of Engineers have grown up trained that way hackers love to do it they almost can’t help but doing it and I I quite like plenty the prompter argument that he’s actually doing work for free that these companies should be doing themselves to bring attention to the public so we’re talking about writing malware we’re talking about showing how SC uh scammers could create scripts that make people click on links which is a way that you can inject all sorts of very nasty code into people’s hospitals as we’ve just seen which is a very soft target very ripe Target but it’s schools its individuals and it’s only growing this is only a growing problem and so we just had last month the AI safety Institute here in the United Kingdom published a groundbreaking report brand new showing that every single major llm large language model can be broken jail yeah it’s quite alarming if you like the color red which I have worn tonight to you know show and solidarity with our our researchers here in this country they are trying to put out a warning to the public for all those companies that are saying I’ve got a bit of fomo bit of fear of missing out I really feel like I need to get in with generative AI which I am hearing with every single client group I speak with everybody’s got real anxiety about missing out what we’re hearing from the AI safety Institute and from the national cyber security Center here in the United Kingdom is that a lot of this technology we should really consider to be in beta it’s not really ready so it’s fun if you’re approaching it from an engineering perspective but not if you are a CEO or general counsel who needs to protect your risk but Conor these companies they have red team hackers who were employed by the company to go in there and try and find a way through the system why are the white hats guys like you able to do it and their red teams are not well so in the hacker business you usually talk about blue teams and red teams so red teams are the people who try and break things blue teams are the people who try to defend and and the way things usually are is red team always wins red team always wins blue team can just minimize the harms and with AI systems in particular what we’re seeing by people like py and a lot of other people like me my former and my old friends back you know in the in the you know Discord online days um is that AI systems are very immature when it comes to safety a lot of the stuff that we do for cyber security and safety simply does not apply to AI systems in the same way in other forms of software when a hacker finds a vulnerability you will have your programmers go look at the code of the system fix the problem and then deploy a patch that solves the problem the problem is is that AI systems are not like normal software they’re not written line by line with code it’s more like they’re like grown they’re more like artificial like huge piles of numbers that can do great things but we don’t really know how to patch them we could do a little bit you know openi and all these other companies invested a lot of money into trying to tweak these things but as you know peny and other people have shown it’s wildly ineffective at this point in time so darl that this jailbreaking started about a year ago and and the attacks as as Connor is describing have evolved so it’s a constant C game of c mouse this what’s the inherent problem with that if you’re a company hoping to integrate it into your systems I think Ely it’s always going to be an unknown risk um you are giving up your data your information potentially your intellectual property your brand and your reputation and you’re putting it into a system that you are hoping is going to protect it and secure it and so organizations need to be aware that the risks that they’re taking around that and um like we say organizations don’t yet know what those risks are and because it’s a new technology there’s always going to be a bit of an arms race there’s going to be the Defenders against the attackers has been said and so looking at that arms race everything comes down to how quickly you want to jump to these platforms and how much data you want to give them as we’ve seen from the um uh Institute it’s very much uh early stages beta stages and so the risks of things going wrong are quite high at the moment so um taking pilot approaches with synthesized data with anonymized data would be a good start to test it to make sure that you’re getting the types of outcomes that you’re EXP expecting but if you jump straight in right now with all of your data trying to be sort of aead of the curve then you’ve got to be aware that that risk you’re taking could come to fruition what’s really fun about this article is it gives a bit of a heads up that California’s legislature is going to be voting in August on a bill that will require any company in the state so that’s open AI That’s Google and that’s meta to ensure that they don’t develop models with a hazardous capability and plenty the prompter says all AI models will fit that criteria I I just talking um um about the NHS model Connor um what what is it what is it that is weak in the system and and and does that mean that all our systems given what you’ve just described are at risk of this kind of of of malign activity the short answer is absolutely yes it is a complete disaster it is a complete mess none of our methods are adequate whatsoever this is not ready for safe production whatsoever across the board to give you a a a feeling of this there are systems that are as close to unhackable as possible these are usually used in military contexts or like nuclear power plants and stuff and it’s called formally verified software and to give you a feeling for this developing even like a relatively simple piece of software let’s say the embedded software and like a helicopter or something which not simple but like you know it’s something that usually takes like maybe a couple years to develop developing this kind of thing can take over a decade and can cost millions or even hundreds of millions of dollars and this is using techniques that we understand the best tools the best researchers were tools that we’ve developed over many many decades AI as it ex today has only existed really like for like a couple of years maybe a decade so our understanding of what our AI models do internally is extremely inadequate compared to our understanding of other types of software systems and even with other types of software systems our safety is like very shaky that that terrifies me I I was because I wrote a question down here Daryl earlier and I thought well if we’re integrating this technology into our critical infrastructure systems then this becomes really serious it’s interesting Conor used the term about a nuclear power plant but maybe that’s not far-fetched maybe maybe our critical infrastructure is at risk I think with the Advent of new technology um legisl is always somewhat behind and so organizations will start to deploy it to save time to speed up operations and if it’s deployed too quickly in some of those critical National infrastructure environments then there’s always the risk that it will go wrong um and as Connor was saying um years and years are spent to assure products to assure platforms to assure software so that when they are placed into uh critical National infrastructure they perform as expected just just really quickly Stephanie are we talking AI developing AI to attack these systems absolutely so you can use AI both that becomes ever more sophisticated potentially the question is going to be when it’s it surpasses us we won’t be able to necessarily even keep up with that war that would be happening between the machines okay you think that’s scary coming up the former Twitter boss Jack dorsy warns that within the next 5 10 years we won’t know what is real anymore so sophisticated will the AI generated content be that it will feel he said says like you’re in a simulation will those technological advancements bring us all closer to Singularity the point in our Evolution when we share this planet with entities smarter than us we will discuss that after the break welcome back to Aid decoded earlier this year during a primary election in New Hampshire thousands of Voters were sent a message by Joe Biden you know the value of voting Democratic when our votes count it’s important that you save your vote vot for the November election we’ll need your help in electing Democrats up and down the ticket voting this Tuesday only enables the Republicans in their quest to re elect Donald Trump again it’s persuasive but anyone carefully listening to that Robo call might detect some stilted language the emphasis and pauses in the wrong places which would tell you there’s something wrong with it in fact it had been created by a political consultant who’d use generative AI to influence the vote it shows what is possible and it raises some really big issues this week the former Twitter CEO and co-founder Jack dorsy warned that in 5 to 10 years from now it will be impossible to differentiate what is real from what is fake don’t trust verify so even this talk everything I’ve said don’t trust me you have to experience it yourself and you have to learn yourself and this is going to be so critical as we enter this time in the next 5 years to 10 years because the way that images are created uh deep fakes videos you will not you will literally not know what is real and what is fake it’ll be almost impossible to tell it will feel like you’re in a simulation because everything will look manufactured everything will look produced and it’s very important that you shift your mindset or you attempt to shift your mindset to verifying the things that you feel you need through your experience through your intuition because all these devices in your in your bags in your pockets they’re all floating from that image of the neuron I sent you and because all these things are on your phone now you’re not you’re not building those connections in your brain anymore this is one of those stories where you just wonder how Mr dorsy would like all of us to only verify by experiencing like that is not a scalable model for a tech CEO to be recommending so it’s quite weird how can I know what is being reported for instance in China I’m not there I’m here in London I’m Rec the whole point of news is that we need a system of contributors who put forward evidence that other people peerreview and then it’s dis it’s discussed as truth I mean this is taking us back to philosophy what is reality or the study of history how do we treat sources or even the law what is evidence what is not what is admissible in court what is not telling everybody that we just have to you know go with our intuition doesn’t feel great can you protect a democracy Connor if you can’t differentiate what is real and what is fake I mean yeah definitely not easily that’s for sure a lot of what makes a society possible is that we can agree on something being true or not if we can’t fundamentally agree on what is true or not then and especially if the things that we believe are true or not is maybe selected by people or things that do not have our best interests at heart whether it’s to sell us something or to make us angry or to make us do some political thing then you can’t form a state you can’t form a civilization in the limit it’s kind of interesting darl that he’s putting the onus on us because I would say if you can’t differentiate then that is an existential risk for them for the social media companies why would you ever tune in if if it’s just bringing you fake news absolutely I mean to to try and put the burden on the uh consumer or the user is moving the uh portion of blame to the wrong place um it’s the tech companies that are providing the technology they should have the intelligence behind that technology to be able to identify if something is true or if it’s false or not ultimately the the user at the end is the last line of defense if we’re having to use our inition then the system has failed to get to that point what about watermarking I thought open AI were going to Watermark everything that their systems produce watermarking is often trotted out as this sort of um solution but everybody agrees it’s also not a silver bullet so let’s discuss that why everybody’s going to have to come up with um some sort of water marking that they would all agree upon and use this becomes really interesting for things like blockchain ways that You’ be able to do even like a chain of evidence how do you verify there’s there’s cyber security and cryptographic principles that we could use in this that problem is is it doesn’t scale it’s really really intensive so you know what are we going to do every time we check a post on social media we’re going to have to do some sort of decryption to see if it’s real like that’s what doesn’t work and you’re right they constantly these tech companies put the onus on the users to protect their data and now to decide what’s true or not based on feelings weird you know thing we thing we keep returning to in this program and is and and in mind of what we’ve just been discussing here is how long it’s going to take before we reach this thing known as Singularity what what do I mean by that Singularity is the term given to the point in human history when the machines become smarter than those programming them so the point where they don’t really need us is that really likely to happen or is it just a concept created by those who are making money from the technology of the future there’s no better panel to discuss it with so let let’s begin with you Connor are you are you a believer in Singularity or are you a skeptic I see no reason from any scientific evidence that we have so far that it shouldn’t be possible humans are the dumbest species that can create industrial civilization we were the first ones to evolve we were the first draft there’s no reason we couldn’t build AI systems that use you know that are faster you know our CPUs our computers you know switch about a million times faster than our neurons do so even if we just get a computer smart as a human right now let’s say it runs a thousand times faster than the human well that means it can do two years of thinking per day two years of labor per day I mean that if that exists already like and we know that this is like technically possible look there are there are so many facets that we call intelligence two of which computers have already mastered one is memory and the other is is calculus or how to calculate right what about reasoning and emotional intelligence is that innate or is that learned I mean it’s a bit of both but I mean both of that you know you open up your brain there’s a bunch of goop in there made out of neurons that are sending electric signals around sure it’s a pretty complicated system the brain is one of the most complicated things in the universe but that doesn’t mean it’s not something we can do we already now have systems like GPT 4 is much better than me at writing Shakespearean poetry it’s uh much it know has read much more books than I have it’s really good at understanding social situations and analyzing emotions it’s not perfect no sure but neither am I neither are you the concern Daryl is that the machines become smarter develop feelings of superiority and then decide that they don’t want to be turned off right that’s that’s the concern at what point do they say actually I’m in charge and I think that’s why it’s really important to set the guard rails now to understand the principles of how we should be using Ai and the use cases where it is valuable and there there will be places where it’s incredibly valuable to have that speed of processing that U ability to scan lots of data and to find solutions to problems that we’ve struggled with for hundreds of years but then equally there’s things that humans are very good at um and the empathy side of things the looking after each other and that closeness and uh intimacy to be able to have that um maybe that we have to look at a different future where the workload side of the world is carried out by Ai and we have to look at other things to keep ourselves busy and to have a view to what our top level on maslin’s hierarchy of self-actuation should be I was having a philosophical conversation with someone the other day about this I’m sure lots of people at home do as well about whether I mean you know machines can’t feel touch or the emotion of Love or hate they can’t react to a flower opening or a sunset or actually can they because that is all learned behavior well there’s even this question of is intelligence also about embodiment so we exist in a physical body that has you know we sense things but also I don’t have to physically touch you to sense how you’re feeling I’m reading you and can machines do that really you know they can’t physically exist in the world unless they’re a robot which it does that has sensors and cameras Etc it doesn’t know what a feeling is because feeling are biochemical Productions in a human brain or if you have pets certain animals um perhaps all animals and even plants we’re still learning even about the natural world so there’s a lot of this discussion you know what is thinking what is intelligence that isn’t just about machines or even humans it’s about what it means to be alive do you always feel that we never have enough time to get to everything in this program it just sort of flies by doesn’t it every week I could talk for another two hours about this um thanks to Connor thank you Connor for joining us again it’s always great to have you on the program thank you darl I hope you’ll come back and thank you also to Stephanie just a bit of housekeeping before we go we will be off air next week for the election but Aid coded will be back a week on Thursday from The Wimbledon Championship uh priia is going to be going to look at how they are using AI to improve the whole experience of tennis and not just tennis see I didn’t even mention v um do join us for that

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.