Max Tegmark says we are training AI models not to say harmful things rather than not to want harmful things, which is like training a serial killer not to reveal their murderous desires pic.twitter.com/YagTCPNtyo
— Tsarathustra (@tsarnick) December 17, 2024
when we all look at this scorecard which we all will what’s the purpose behind it and what’s the takeaway I you know the US News and World reports college ranking that comes out every year yeah that really inspired here because it’s something which incentivizes all the colleges to do better and see what they can improve the reason that these professors gave lower grades wasn’t just because they’re kogin but we you know we picked professors precisely because we wanted people who have no Financial of interests with the companies if you you don’t want to let students grade themselves right the the uh current harms that’s stuff like making sure that your large language model your chat bot doesn’t say harmful things doesn’t teach a terrorist how to make bioweapons or encourage kids to commit suicide stuff like that and I commend the companies for having spent some small fraction of their money on trying to really tackle those things that the biggest success so far on on the AI safety work by the companies has been training these large language models to not say harmful things as opposed to not wanting harmful things so that’s a little bit like if you train a serial killer to never say anything that Reveals His murderous desires problem solved right so but this works pretty well now because if if you’re just using a chat bot that’s all you care about but in but very soon we’re going to start seeing these acting much more out in the world where their goals will matter more if they’re if they’re running your bank account for you or if they’re doing if they’re operating autonomous vehicles or more things that are physically embodied then it matters what goals these systems have and the sad fact is we have no clue we basically it’s not like we built them like like we do with cars we kind of grew them we train on a lot of data and they learn stuff and so this is why um the the grades are so low on the existential side precisely because nobody really has any convincing plan for how we would make smarter than human stuff the message I want my students at MIT to take if they get a low grade is you know go back and study harder and I I’m very much hoping that the companies that just didn’t have a lot of people working on this stuff you know will I want to say something positive also though here so we don’t get too gloomy because it’s important to remember that uh if we get this right there’s an incredible upside and it’s actually not it’s actually very obvious how we can get it right just from a policy perspective some the any all other Technologies in the United States all other Industries have some kind of safety standards right if you want to launch a new airplane you go to the FAA and you get it safy approved if you want to launch a new medicine you go to the FDA and you get your clinical trial approved even if you want to sell a open a sandwich shop you know you have to have the health inspector come check first the only industry that does is completely unregulated right now which has no safety standards is AI so it that if you just flip that switch and be like okay we’re going to treat AI like all the other Industries the pro then uh the problem basically gets solved because instead of it the way it is now where companies are racing to build maybe uncontrollable AGI as fast as possible and we’re just hoping someone who’s going to figure out how to control it the companies now have an incentive to figure out how to make it controllable first because otherwise they can’t release it you know imagine if if you John walked into the FDA and said it’s inevitable that my company John biotech is going to release this new drug next year I just hope you at the FDA can figure out how to make it safe first you know they would laugh you out of the office but this is exactly how ai ai industry operates right now.