OpenAI has unveiled its latest model, GPT-5.3-Codex, just minutes after Anthropic introduced Claude Opus 4.6. Both of these advanced models are primarily designed for coding and the creation of autonomous agents. Claude Opus 4.6 achieved an impressive 65.4% on the Terminal-Bench 2.0 benchmark, which measures performance in autonomous terminal tasks. However, shortly thereafter, GPT-5.3-Codex surpassed it with a remarkable score of 77.3%. While benchmark scores are significant, real-world performance often tells a more compelling narrative. In a series of 2,000 sessions involving 2.2 billion tokens and an investment of $20,000, Claude Opus 4.6 successfully developed a fully operational C compiler consisting of 100,000 lines of code, which was able to compile the Linux kernel. In a groundbreaking achievement, GPT-5.3-Codex became the first model from OpenAI capable of self-building. Engineers utilized earlier iterations of this AI for various tasks including debugging, training, deployment, and testing, and were astonished by how rapidly Codex could enhance its own development process. Both models have demonstrated their capabilities in practical applications as well, with GPT-5.3-Codex and Claude Opus 4.6 managing to create a 3D racing game from a single prompt. Although the models, textures, and physics were not perfect, they were fully functional. Which model do you believe outperforms the other? OpenAI or Anthropic?
Informational material. 18+.