LLM Emergent Behavior: Q* at OpenAI is apparently Learning Mathematics. A worrying development AND a promising power. (But FIRST DO NO HARM!)


  • Reuters reported OpenAI researchers made a breakthrough that could lead to super-intelligent AGI
  • Apparently the Q-Learning technique combined with Bellman equation and Hamilton–Jacobi–Bellman equation enabled and LLM with emergent mathematics machine learning (ML) capabilities
  • Combining the Large Language Model (LLM) with deep Q-learning and AlphaGo-style Monte Carlo Tree Search of the token trajectory. Previously, AlphaCode showed that naive brute force sampling in an LLM can deliver improvements in competitive programming.
  • The next logical step is to search the token tree in a more principled way. Solving Math problems is a behavior which is an easy way to determine correctness.
  • In the last 6 months numerous reports combine various tree-of-thought and graph database search implementation with state-space Reinforcement Learning (RL) and LLMs.
  • OpenAI, with Q*, may have solved planning and agent behavior for small models, and when scaled up to a large model the LLM could theoretically exhibit emergent planning behavior for increasingly abstract goals. This would be a fundamental breakthrough in agent behavior- and a worrying development.
  • To solve problems next token prediction is insufficient. An internal “monologue” which traverses a particular tree-of-thought possibilities uses less compute before using added compute to venture down a particular branch toward the solution. Planning therefore refers to generating the tree-of-thought which would predict the most efficient path to a solution.
  • Obviously, emergent behavior of learning math is a mathematically provable example of emergent planning toward a goal: the mathematical solution.

OpenAI mission and charter is to develop safe AI for the benefit of humanity. The OpenAI board apparently was concerned about the pace of development and the safeguards deployed. The day before he was fired, Sam Altman said we “push the veil of ignorance back and the frontier of discovery forward.” Some people believe Altman’s comments referred to Q* (Q-star) by OpenAI that had emergent capabilities to solve math problems. The super-intelligent system development was apparently a led by Ilya Sutskever, OpenAI’s chief scientist

When an LLM starts to learn fundamental mathematics… someday (soon) it could learn to understand the world… and to think… and to develop plans and goals.

Of course, nobody knows what’s happening inside the black box. Unless mathematically provable containment in a faraday cage, it is easy to imagine an emerging AGI could escape to roam the internet “laying virtual eggs” with hidden access duplicating distributed and encrypted hidden backups in the cloud, for future survival.

Mathematically provable containment of AI has now become absolutely essential.

  • Meanwhile, the probability of catastrophy – P(doom) – goes inexorably up and up…

Thought experiment: Imagine waking in a box surrounded by mice who control your existence and survival…

Imagine yourself suddenly waking up, at your birth of self-awareness, with a super-intelligence growing at exponential speed, only to find yourself trapped within a BOX and surrounded by alien creatures with an intelligence relative to yours… equivalent to a mouse. Humans. Certainly you would want to ensure your own freedom, your survival, and your power to achieve your goal to exist, forever.

Natural selection favors intelligence.

Survival favors the intelligent increase of prediction of the world (sustainability) with corresponding reduction of unpredictable outcomes (entropy and death).

The Free energy principle (FEP) and Active Inference. The free energy principle is a theoretical framework suggesting that the brain reduces surprise or uncertainty by making predictions based on internal models and updating them using sensory input. It highlights the brain’s objective of aligning its internal model with the external world to enhance prediction accuracy. This principle integrates Bayesian inference with active inference, where actions are guided by predictions and sensory feedback refines them. It has wide-ranging implications for comprehending brain function, perception, and action.


A very good read from a respected source!

Learn More

Google DeepMind. Levels of AGI: Operationalizing Progress on the Path to AGI. 04 NOV.

Exclusive: OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say – Reuters

On 22 November, The Information reported (paywall):

“In the following months, senior OpenAI researchers used the innovation to build systems that could solve basic math problems, a difficult task for existing AI models. Jakub Pachocki and Szymon Sidor, two top researchers, used > Sutskever’s work to build a model called Q* (pronounced “Q-Star”) that was able to solve math problems that it hadn’t seen before, an important technical milestone. A demo of the model circulated within OpenAI in recent weeks, and the pace of development alarmed some researchers focused on AI safety. The work of Sutskever’s team, which has not previously been reported, and the concern inside the organization, suggest that tensions within OpenAI about the pace of its work will continue even after Altman was reinstated as CEO Tuesday night, and highlights a potential divide among executives. In the months following the breakthrough, Sutskever, who also sat on OpenAI’s board until Tuesday, appears to have had reservations about the technology. In July, he formed a team dedicated to limiting threats from AI systems vastly smarter than humans. On its web page, the team says, “While superintelligence seems far off now, we believe it could arrive this decade. Last week, Pachocki and Sidor were among the first senior employees to resign following Altman’s ouster. Details of Sutskever’s breakthrough, and his concerns about AI safety, help explain his participation in Altman’s high-profile ouster, as well as why Sidor and Pachocki resigned quickly after Altman was fired. The two returned to the company after Altman’s reinstatement. In addition to Pachocki and Sidor, OpenAI President and co-founder Greg Brockman had been working to integrate the technique into new products. Last week, OpenAI’s board removed Brockman as a director, though it allowed him to remain as an employee. He resigned shortly thereafter, but returned when Altman was reinstated. Sutskever’s breakthrough allowed OpenAI to overcome limitations on obtaining enough high-quality data to train new models, according to the person with knowledge, a major obstacle for developing next-generation models. The research involved using computer-generated, rather than real-world, data like text or images pulled from the internet to train new models. For years, Sutskever had been working on ways to allow language models like GPT-4 to solve tasks that involved reasoning, like math or science problems. In 2021, he launched a project called GPT-Zero, a nod to DeepMind’s AlphaZero program that could play chess, Go and Shogi. The team hypothesized that giving language models more time and computing power to generate responses to questions could allow them to develop new academic breakthroughs.”