Teaching AI Agents to Self-Learn Without Datasets

Teaching AI Agents to Self-Learn Without Datasets

A new study published in January 2026 introduces an innovative framework that enables AI agents to learn without human-curated datasets, significantly changing the way AI self-improves. The research, shared on Hugging Face, focuses on the Dr. Zero framework for self-evolving search agents, offering a fresh approach to AI development. This method allows AI agents to train themselves by interacting with one another, learning without external datasets and without requiring human intervention for training or validation.

The process begins with two AI agents—Proposer and Solver—initialized using a compact Qwen2.5-3B or 7B model. These agents undergo a cycle of mutual self-improvement. The Proposer generates questions based on search, and the Solver learns by searching for answers within a knowledge base. As the agents evolve, the Proposer creates increasingly complex and nuanced questions, while the Solver hones its reasoning ability by answering these questions. This creates a dynamic and self-sustaining learning loop that continuously adapts and refines itself.

Dr. Zero’s self-learning mechanism overcomes key limitations of traditional AI training. Unlike earlier methods that rely on pre-curated questions and datasets, this approach allows the agents to generate their own queries and responses, enhancing flexibility and scalability. The system also introduces the Hop-Grouped Relative Policy Optimization (HRPO) algorithm, which optimizes the agents’ learning process by grouping queries based on complexity and reducing the computational costs associated with generating multiple responses.

The research highlights the benefits of using external search engines in enhancing the Proposer's ability to generate more sophisticated questions, which in turn forces the Solver to refine its problem-solving methods. The entire system is designed to evolve iteratively, with minimal human oversight, and aims to reduce the reliance on expensive dataset creation and validation processes.

This new approach could revolutionize the development of AI search agents and machine learning models, providing a more cost-effective, scalable, and efficient path for AI self-improvement. For the broader AI market, this method may push competitors to focus on creating more autonomous systems that require fewer resources to train and refine, potentially making AI development more accessible and faster.

Informational material. 18+.

" content="b3bec31a494fc878" />