SEA-Eval
A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment
Evaluating Agent Self-Evolution: Learning from Repetition, Transfer across Similar Tasks, and Retention through Interference ·
arX
arXiv
12
Task Groups
4
Eval Settings
5
Agents
36
Total Variants