hyprbench

a computer-use benchmark for Hyprland — real window-manager, browser, app, and system tasks; every verifier reads state, never pixels. correctness and end-to-end latency. source & tasks

agentmodeltrackpassrate medianp90maxtotalrun

⌖ task samples

oracle (reference-solution) runs recorded inside the sandboxed nested compositor — exactly the desktop an agent sees and acts on.