Paper Copilot: Tracking the Evolution of Peer Review in AI Conferences
Abstract
Submissions are rising fast, and venues use different rules, data formats, and update times. As a result, signals of progress get split across places, and key moments (rebuttal, discussion, final decision) are easy to miss, making analysis hard. We present Paper Copilot, a system and scalable peer-review archive that pulls data from official sites, OpenReview, and opt-in forms into a single, standardized, versioned record with timestamps. This lets us track trends over time and compare venues, institutions, and countries in a consistent way. Using the archive for ICLR 2024/2025, we see larger score changes after rebuttal for higher-tier papers, reviewer agreement that dips during active discussion and tightens by the end, and in 2025 a sharper, mean-score–driven assignment of tiers with lower decision uncertainty than expected at that scale. We also state simple rules for ethics—clear sourcing and consent, privacy protection, and limits on use for closed venues. Together, we provide a clear, reusable base for tracking AI/ML progress, and, with this data, enable validation, benchmarking, and otherwise hard-to-run studies.