AI #94: Not Now, Google…
ZVI MOWSHOWITZ. DEC 12, 2024.
At this point, we can confidently say that no, capabilities are not hitting a wall. Capacity density, how much you can pack into a given space, is way up and rising rapidly, and we are starting to figure out how to use it.
Not only did we get o1 and o1 pro and also Sora and other upgrades from OpenAI, we also got Gemini 1206 and then Gemini Flash 2.0 and the agent Jules (am I the only one who keeps reading this Jarvis?) and Deep Research, and Veo, and Imagen 3, and Genie 2 all from Google. Meta’s Llama 3.3 dropped, claiming their 70B is now as good as the old 405B, and basically no one noticed.
This morning I saw Cursor now offers ‘agent mode.’ And hey there, Devin. And Palisade found that a little work made agents a lot more effective.
And OpenAI partnering with Anduril on defense projects. Nothing to see here.
There’s a ton of other stuff, too, and not only because this for me was a 9-day week.
Tomorrow I will post about the o1 Model Card, then next week I will follow up regarding what Apollo found regarding potential model scheming. I plan to get to Google Flash after that, which should give people time to try it out. For now, this post won’t cover any of that.
I have questions for OpenAI regarding the model card, and asked them for comment, but press inquiries has not yet responded. If anyone there can help, please reach out to me or give them a nudge. I am very concerned about the failures of communication here, and the potential failures to follow the preparedness framework.
Table of Contents
Previously this week: o1 turns Pro.
- Table of Contents.
- Language Models Offer Mundane Utility. Cursor gets an agent mode.
- A Good Book. The quest for an e-reader that helps us read books the right way.
- Language Models Don’t Offer Mundane Utility. Some are not easily impressed.
- o1 Pro Versus Claude. Why not both? An o1 (a1?) built on top of Sonnet, please.
- AGI Claimed Internally. A bold, and I strongly believe incorrect, claim at OpenAI.
- Ask Claude. How to get the most out of your conversations.
- Huh, Upgrades. Canvas, Grok Aurora, Gemini 1206, Llama 3.3.
- All Access Pass. Context continues to be that which is scarce.
- Fun With Image Generation. Sora, if you can access it. Veo, Imagen 3, Genie 2.
- Deepfaketown and Botpocalypse Soon. Threats of increasing quantity not quality.
- They Took Our Jobs. Attempt at a less unrealistic economic projection.
- Get Involved. EU AI office, Apollo Research, Conjecture.
- Introducing. Devin, starting at $500/month, no reports of anyone paying yet.
- In Other AI News. The rapid rise in capacity density.
- OpenlyEvil AI. OpenAI partners with Anduril Industries for defense technology.
- Quiet Speculations. Escape it all. Maybe go to Thailand? No one would care.
- Scale That Wall. Having the model and not releasing is if anything scarier.
- The Quest for Tripwire Capability Thresholds. Holden Karnofsky helps frame.
- The Quest for Sane Regulations. For now it remains all about talking the talk.
- Republican Congressman Kean Brings the Fire. He sat down and wrote a letter.
- CERN for AI. Miles Brundage makes the case for CERN for AI, sketches details.
- The Week in Audio. Scott Aaronson on Win-Win.
- Rhetorical Innovation. Yes, of course the AIs will have ‘sociopathic tendencies.’
- Model Evaluations Are Lower Bounds. A little work made the agents better.
- Aligning a Smarter Than Human Intelligence is Difficult. Anthropic gets news.
- I’ll Allow It. We are still in the era where it pays to make systematic errors.
- Frontier AI Systems Have Surpassed the Self-Replicating Red Line. Says paper.
- People Are Worried About AI Killing Everyone. Chart of p(doom).
- Key Person Who Might Be Worried About AI Killing Everyone. David Sacks.
- Other People Are Not As Worried About AI Killing Everyone. Bad modeling.
- Not Feeling the AGI. If AGI wasn’t ever going to be a thing, I’d build AI too.
- Fight For Your Right. Always remember to backup your Sims.
- The Lighter Side. This is your comms department.