Transformative AI products and services from the global leader in AI Technology.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Chapters

Transcript. 20:57 DeepMind

DeepMind [Applause]. >>DEMIS HASSABIS: Thanks, Sundar. It’s so great to be here. Ever since I was a kid, playing chess for the England Junior Team, I’ve been thinking about the nature of intelligence. I was captivated by the idea of a computer that could think like a person. It’s ultimately why I became a programmer and studied neuroscience. I co-founded DeepMind in 2010 with the goal of one day building AGI: Artificial general intelligence, a system that has human-level cognitive capabilities. I’ve always believed that if we could build this technology responsibly, its impact would be truly profound and it could benefit humanity in incredible ways. Last year, we reached a milestone on that path when we formed Google DeepMind, combining AI talent from across the company in to one super unit. Since then, we’ve built AI systems that can do an amazing range of things, from turning language and vision into action for robots, navigating complex virtual environments, involving Olympiad level math problems, and even discovering thousands of new materials. Just last week, we announced our next generation AlphaFold model. It can predict the structure and interactions of nearly all of life’s molecules, including how proteins interact with strands of DNA and RNA. This will accelerate vitally important biological and medical research from disease understanding to drug discovery. And all of this was made possible with the best infrastructure for the AI era, including our highly optimized tensor processing units. At the center of our efforts is our Gemini model. It’s built up from the ground up to be natively multimodal because that’s how we interact with and understand the world around us. We’ve built a variety of models for different use cases. We’ve seen how powerful Gemini 1.5 Pro is, but we also know from user feedback that some applications need lower latency and a lower cost to serve. So today we’re introducing Gemini 1.5 Flash. [Cheers and Applause]. Flash is a lighter-weight model compared to Pro. It’s designed to be fast and cost-efficient to serve at scale, while still featuring multimodal reasoning capabilities and breakthrough long context. Flash is optimized for tasks where low latency and efficiency matter most. Starting today, you can use 1.5 Flash and 1.5 Pro with up to one million tokens in Google AI Studio and Vertex AI. And developers can sign up to try two million tokens. We’re so excited to see what all of you will create with it. And you’ll hear a little more about Flash later on from Josh. We’re very excited by the progress we’ve made so far with our family of Gemini models. But we’re always striving to push the state-of-the-art even further. At any one time we have many different models in training. And we use our very large and powerful ones to help teach and train our production-ready models. Together with user feedback, this cutting-edge research will help us to build amazing new products for billions of people. For example, in December, we shared a glimpse into the future of how people would interact with multimodal AI, and how this would end up powering a new set of transformative experiences. Today, we have some exciting new progress to share about the future of AI assistants that we’re calling Project Astra. [Cheers and Applause]. For a long time, we’ve wanted to build a universal AI agent that can be truly helpful in everyday life. Our work making this vision a reality goes back many years. It’s why we made Gemini multimodal from the very beginning. An agent like this has to understand and respond to our complex and dynamic world just like we do. It would need to take in and remember what it sees so it can understand context and take action. And it would have to be proactive, teachable and personal, so you can talk to it naturally, without lag or delay. While we’ve made great strides in developing AI systems that can understand multimodal information, getting response time down to something conversational is a difficult engineering challenge. Building on our Gemini model, we’ve developed agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this for efficient recall. We’ve also enhanced how they sound, with a wider range of intonations. These agents better understand the context you’re in, and can respond quickly in conversation, making the pace and quality of interaction feel much more natural. Here’s a video of our prototype, which you’ll see has two parts. Each part was captured in a single take, in real time. >> Okay. Let’s do some tests. Tell me when you see something that makes sound. >> I see a speaker, which makes sound. >> What is that part of the speaker called? >> That is the tweeter. It produces high frequency sounds. >> Give me a creative alliteration about these. >> Creative crayons color cheerfully. They certainly craft colorful creations. >> What does that part of the code do? >> This code defines encryption and decryption functions. It seems to use AES-CBC encryption to encode and decode data based on a key and an initialization vector (IV). >> That’s right. What neighborhood do you think I’m in? >> This appears to be the Kings Cross area of London. It is known for its railway station and transportation connections. >> Do you remember where you saw my glasses? >> Yes, I do. Your glasses were on the desk near a red apple. [Applause]. >> What can I add here to make this system faster? >> Adding a cache between the server and database could improve speed. >> What does this remind you of? >> Schroedinger’s cat. >> All right. Give me a band name for thisduo. >> Golden Stripes. >> Nice. Thanks, Gemini. [Applause]. >>DEMIS HASSABIS: I think you’ll agree it’s amazing to see how far AI has come, especially when it comes to spatial understanding, video processing and memory. It’s easy to envisage a future where you can have an expert assistant by your side through your phone or new exciting form factors like glasses. Some of these agent capabilities will come to Google products like the Gemini app later this year. For those of you onsite today, you can try out a live demo version of this experience in the AI Sandbox area. [Cheers and Applause]. Next, let’s take a look at how our innovations are helping people bring new creative ideas to life. Today, we’re introducing a series of updates across our generative media tools with new models covering image, music and video. Over the past year, we’ve been enhancing quality, improving safety and increasing access. To help tell this story, here’s Doug. [Applause]. >>DOUG ECK: Thanks, Demis. Over the past few months, we’ve been working hard to build a new image generation model from the ground up, with stronger evaluations, extensive red teaming, and state-of-the-art watermarking with SynthID. Today, I’m so excited to introduce Imagen 3. It’s our most capable image generation model yet. Imagen 3 is more photorealistic. You can literally count the whiskers on its snout. With richer details, like the incredible sunlight in this shot, and fewer visual artifacts or distorted images. It understands prompts written the way people write. The more creative and detailed you are, the better. And Imagen 3 remembers to incorporate small details like the ‘wildflowers’ or ‘a small blue bird’ in this longer prompt. Plus, this is our best model yet for rendering text, which has been a challenge for image generation models. In side-by-side comparisons, independent evaluators preferred Imagen 3 over other popular image generation models. In sum, Imagen 3 is our highest-quality image generation model so far. You can sign up today to try Imagen 3 in ImageFX, part of our suite of AI tools at labs.Google, and it will be coming soon to developers and enterprise customers in Vertex AI. Another area, full of creative possibility, is generative music. I’ve been working in this space for over 20 years and this has by far the most exciting year of my career. We’re exploring ways of working with artists to expand their creativity with AI. Together with YouTube, we’ve been building Music AI Sandbox, a suite of professional music AI tools that can create new instrumental sections from scratch, transfer styles between tracks, and more. To help us design and test them, we’ve been working closely with incredible musicians, songwriters and producers. Some of them made even entirely new songs in ways that would not have been possible without these tools. Let’s hear from some of the artists we’ve been working with. >> I’m going to put this right back into the Music AI tool. The same Boom, boom, bam, boom, boom. What happens if Haiti meets Brazil? Dude, I have no clue what’s about to be sprat out. This is what excites me. Da da See see see. As a hip hop producer, we dug in the crates. We playin’ these vinyls, and the part where there’s no vocal, we pull it, we sample it, and we create an entire song around that. So right now we digging in the infinite crate. It’s endless. Where I found the AI really useful for me, this way to like fill in the sparser sort of elements of my loops. Okay. Let’s try bongos. We’re going to putviola. We’re going to put rhythmic clapping, and we’re going to see what happens there. Oh, and it makes it sound, ironically, at the end of the day, a little more human. So then this is entirely Google’s loops right here. These are Gloops. So it’s like having, like, this weird friend that’s just like, try this, try that. And then you’re like, Oh, okay. Yeah. No, that’s pretty dope. (indistinct noises) >> The tools are capable of speeding up the process of what’s in my head, getting it out. You’re able to move lightspeed with your creativity. This is amazing. That right there. [Applause]. >>DEMIS HASSABIS: I think this really shows what’s possible when we work with the artist community on the future of music. You can find some brand new songs from these acclaimed artists and songwriters on their YouTube channels now. There’s one more area I’m really excited to share with you. Our teams have made some incredible progress in generative video. Today, I’m excited to announce our newest, most capable generative video model, called Veo. [Cheers and Applause]. Veo creates high-quality, 1080P videos from text, image and video prompts. It can capture the details of your instructions in different visual and cinematic styles. You can prompt for things like aerial shots of a landscape or a time lapse, and further edit your videos using additional prompts. You can use Veo in our new experimental tool called VideoFX. We’re exploring features like storyboarding and generating longer scenes. Veo gives you unprecedented creative control. Techniques for generating static images have come a long way. But generating video is a different challenge altogether. Not only is it important to understand where an object or subject should be in space, it needs to maintain this consistency over time, just like the car in this video. Veo builds upon years of our pioneering generative video model work, including GQN, Phenaki, Walt, VideoPoet, Lumiere and much more. We combined the best of these architectures and techniques to improve consistency, quality and output resolution. To see what Veo can do, we put it in the hands of an amazing filmmaker. Let’s take a look. >>DONALD GLOVER: Well, I’ve been interested in AI for a couple of years now. We got in contact with some of the people at Google, and they had been working on something of their own. So we’re all meeting here at Gilga Farms to make a short film. >>KORY MATHEWSON: The core technology is Google Deep Mind’s generative video model that has been trained to convert input text into output video. [Laughter]. >>DONALD GLOVER: It looks good. >>KORY MATHEWSON: We are able to bring ideas to life that were otherwise not possible. We can visualize things of time scale that’s 10 or 100 times faster than before. >>MATTHIEU KIM LORRAIN: When you’re shooting, you can’t really iterate as much as you wish. And so we’ve been hearing the feedback that it allows for more optionality, more iteration, more improvisation. >>DONALD GLOVER: But that’s what’s cool about it. It’s like you can make a mistake faster. That’s all you really want at the end of the day, at least in art, is just to make mistakes fast. >>KORY MATHEWSON: So, using Gemini’s multimodal capabilities to optimize the model’s training process, VEO is better able to capture the nuance from prompts. So this includes cinematic techniques and visual effects, giving you total creative control. >>DONALD GLOVER: Everybody’s going to become a director and everybody should be a director. Because at the heart of all of this is just storytelling. The closer we are to being able to tell each other our stories, the more we will understand each other. >>KORY MATHEWSON: These models are really enabling us to be more creative and to share that creativity with each other. [Cheers and Applause]. >>DEMIS HASSABIS: Over the coming weeks some of these features will be available to select creators through VideoFX at labs.google, and the waitlist is open now. Of course, these advances in generative video go beyond the beautiful visuals you’ve seen today. By teaching future AI models how to solve problems creatively, or in effect simulate the physics of our world, we can build more useful systems that help people communicate in new ways, and thereby advance the frontiers of AI. When we first began this journey to build AI more than 15 years ago, we knew that one day it would change everything. Now that time is here. And we continue to be amazed by the progress we see and inspired by the advances still to come, on the path to AGI. Thanks, and back to you Sundar.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.