a computer-use benchmark for Hyprland — real window-manager, browser, app, and system tasks; every verifier reads state, never pixels. correctness and end-to-end latency. source & tasks
| agent | model | track | pass | rate | median | p90 | max | total | run |
|---|
oracle (reference-solution) runs recorded inside the sandboxed nested compositor — exactly the desktop an agent sees and acts on.