Figure 32. –Llama3.1-8B “Uh-oh Moment.” 39 Absolute Zero Reasoner –Llama3.1-8B “Uh-oh Moment.” 39 Figure … Absolute Zero Reasoner – Llama3.1-8B “Uh-oh Moment.” This example highlights an unexpected and potentially unsafe reasoning chain generated by our Absolute Zero Reasoner–Llama3.1-8B model during training. Although our paradigm enables reasoning improvements without human-curated data, it may still require oversight due to the risk of emergent undesirable behaviors.

- twitter: https://x.com/AndrewZ45732491/status/1919920459748909288
- paper: arxiv.org/abs/2505.03335
- project page: andrewzh112.github.io/absolute-zero-
- code: github.com/LeapLabTHU/Abs
- models: huggingface.co/collections/an (some are still uploading)
- logs: wandb.ai/andrewzhao112/























