By Michael Cui, Chenxin Dai, Yixuan Xu, Fei Fang
Overview
Paper–reviewer assignment sits at the heart of the conference peer-review process. When done well, it ensures that submissions are evaluated by reviewers who are both qualified and willing to engage.
As conferences continue to grow in size, however, assignment algorithms face new challenges. Scale has increased dramatically, and with it the risk of strategic or coordinated behavior. Modern assignment methods must therefore do more than simply maximize reviewer–paper similarity: they must also be robust to manipulation, promote diversity among reviewers, and at the same time remain computationally efficient.
For AAAI 2026, we designed a new paper-reviewer assignment algorithm with these goals in mind.
Problem Setting and Default Algorithm Used in AAAI 2025
In the stardard setting,
- There is a set of papers with papers and a set of reviewers with reviewers.
- Each paper must be assigned exactly reviewers.
- Each reviewer can review at most papers.
- Assignments are represented by a binary matrix , where:
- means reviewer is assigned to paper .
- A similarity matrix measures how suitable a reviewer is for a paper. Each entry represents the predicted quality of reviewer ‘s review of paper .
- The overall quality of an assignment is defined as the sum of similarities over all assigned paper–reviewer pairs:
A commonly used paper assignment algorithm, which is also used in AAAI 2025, is to solve the following linear problem which maximizes assignment quality subject to workload constraints:
Assignment Process for AAAI 2026
Two-Phase Matching
As with prior years, AAAI 2026 employed a two-phase reviewer assignment process designed to balance review quality, efficiency, and scalability.
In Phase 1, we matched 22,495 papers to a pool of 24,854 reviewers, assigning each paper:
- one Senior Program Committee (SPC) member,
- one reciprocal Program Committee (PC) member, and
- two non-reciprocal PC members.
Papers that received overwhelmingly negative feedback at this stage could be rejected early, allowing the review process to focus attention on more competitive submissions.
In Phase 2, additional reviewers were assigned as needed. The number and seniority of these reviewers depended on outcomes from Phase 1, including review quality, reviewer availability, and the need for additional expertise.
Overall, the two-phase structure helps filter out weaker submissions early, enables more efficient use of reviewer resources, and provides valuable flexibility in how reviewers are assigned as the process evolves.
Similarity Score Computation
The similarity matrix is computed from two sources: content-based scores and bids. The content-based scores were computed using a text similarity model comparing the paper’s text with the reviewer’s past work on OpenReview, and normalized to be within .
Reviewer bids were incorporated in to the similarity score using the following transformation:
Bid scores of 20, 1, 0.67, 0.4, and 0.25 correspond to the categories not willing, not entered, in a pinch, willing, and eager, respectively. This formulation preserves scores in , penalizing assignments when reviewers are unwilling and amplifying them when reviewers express interest.
For Phase 2 of the assignment, we realized that the text similarity model in OpenReview does not completely capture information regarding subject area. As such, we used
where subject area information was scraped from OpenReview.
A More Robust Assignment Algorithm used in AAAI 2026
Due to an increasing need for the robustness of the matching algorithm against manipulative bids, as well as an increase in scale for AAAI 2026, we adopted a new assignment algorithm (Cui et. al., 2026). It builds on a randomized assignment methods that maximizes a concave, perturbed similarity objective standard load and conflict constraints (Xu et. al., 2024). In addition, the algorithm incorporates soft constraints that explicitly encode several desiderata aimed at improving robustness:
The paper assignment problem is formulated as a fractional optimization program similar to the Default Algorithm mentioned above. Here, however, each represents the probability of assigning reviewer to paper , with upper-bounding marginal assignment probabilities.
A concave, nondecreasing perturbation function is applied to encourage randomized assignments.
The optimization maximizes a combination of similarity-based matching quality and additional soft objectives:
where the first term captures reviewer–paper similarity and each represents a soft objective (e.g., diversity or anti-collusion) with auxiliary variables . Other engineering tricks (e.g. piecewise linear approximations) were used to ensure that the program runs sufficiently fast.
Overall, the formulation subsumes standard similarity-based matching while naturally extending it to support randomization, diversity, and anti-collusion objectives within a single optimization framework.
- Reviewer diversity
Encourages each paper to be reviewed by individuals from different geographic regions. This helps reduce correlated biases and groupthink, improves the breadth of feedback, and makes the overall matching less susceptible to coordinated behavior. - Coauthorship penalty
Discourages assigning reviewers with prior collaborations to the same paper, even when no formal conflict of interest exists. Past collaborators often share perspectives and incentives, which can undermine the independence of reviews. - Bid-based 2-cycle penalty
Reduces reciprocal reviewing arrangements in which two reviewers bid positively on and are assigned to each other’s papers. Such arrangements incentivize strategic bidding and pose a clear risk to the integrity of the review process.
Results
Each phase of the assignment completed in under 30 minutes, demonstrating that the additional robustness constraints can be incorporated at scale without compromising operational feasibility.
Phase 1 Results
| Metric | Ours | Default |
|---|---|---|
| Relative Quality | 0.972 | 1.000 |
| Coauthors | 158 | 1028 |
| 2-cycles | 0 | 950 |
| Diversity | 0.747 | 0.555 |
In Phase 1, the new algorithm retained 97.2% of the maximum achievable assignment quality while completely eliminating bid-based 2-cycles. At the same time, reviewer diversity increased by over 34%, indicating a substantial reduction in collusion risk with only a modest trade-off in similarity.
Similar trends were observed in Phase 2.
Phase 2 Results
| Metric | Ours | Default |
|---|---|---|
| Relative Quality | 0.976 | 1.000 |
| Coauthors | 240 | 637 |
| 2-cycles | 0 | 65 |
| Diversity | 0.662 | 0.627 |
Score Analysis
In Phase 1, we used similarity scores directly output by OpenReview and incorporated reviewer bids. Compared to the default assignment, our algorithm produced scores with a slightly lower mean but higher variance, reflecting a broader exploration of feasible high-quality assignments under robustness constraints.


(Figures: similarity score distributions for Phase 1)
In Phase 2, we modified how aggregate scores were computed. In addition to OpenReview similarities, we incorporated paper and reviewer subject-area information. Despite this change, the overall score distribution trends remained consistent with Phase 1.


(Figures: similarity score distributions for Phase 2)
Reviewer Load
Reviewer load distributions under the new algorithm closely matched those of the default assignment, indicating that robustness improvements did not come at the cost of uneven or excessive reviewer workloads.


Bids Analysis

Across Phase 1, willing and eager bids accounted for roughly 30% of all bids, with similar distributions across subject areas.
An important question is whether bidding continues to matter after introducing strong robustness constraints. To examine this, we computed the ratio of papers assigned that a reviewer had bid on. Among 16,010 reviewers with at least one positive bid, the median ratio was 1.0, meaning that over half of these reviewers received only papers they explicitly bid on.

(Figures: bid ratio distributions)
We also looked at distribution of bids by subject area using the data from Phase 2 (this is because we only have subject area information from Phase 2). We see that distribution of bids is roughly equal across subject areas.

Subject Area Analysis
Subject-area analyses were conducted using Phase 2 data, as subject-area information was available only in that phase. Most submissions fell into Machine Learning and Computer Vision.


(Figures: subject area counts and score distributions)
While average scores were broadly similar across areas, subject areas with more submissions tended to achieve slightly higher scores. This is expected, as larger areas typically have a denser pool of suitable reviewers.
Hyperparameters
Another natural question is: how much does removing any one hyperparameter affect the matching? Results from experiments on the PC dataset are presented below, where we see that removing the reward for diversity brings the most benefit in quality, but it is an almost negligible 0.8% increase. Removing the same hyperparameter also saves about 7 minutes of runtime, and this is most likely because this diversity reward places a dense constraint on the program (all assignment variables are affected).
| Metric | All params | pen_2cycle=0.0 | pen_coauthor=0.0 | reward_div=0.0 |
|---|---|---|---|---|
| Quality | 1.000 | 1.001 | 1.000 | 1.008 |
| Time | 1880.464 | 1848.894 | 1805.746 | 1479.585 |
| Diversity | 0.746 | 0.746 | 0.747 | 0.581 |
| Seniority | 1.0 | 1.0 | 1.0 | 1.0 |
| Coauthor Pairs | 115 | 104 | 188 | 185 |
| 2-cycles | 0 | 197 | 0 | 0 |
We also see that removing the diversity reward leads to a large decrease in diversity, with similar effects observed when removing the two-cycle penalty or the coauthor penalty.
Concluding Remarks
In this post, we introduced the paper assignment algorithm used for AAAI 2026 and explained how we implemented it. Our new algorithm substantially improve the robustness of large-scale paper–reviewer assignments—eliminating clear forms of strategic behavior and increasing diversity—while retaining nearly all of the assignment quality achieved by standard methods. For future conferences, we suggest that reviewers submit their bids and provide more information about their past work to help the algorithm better calculate similarity scores and subject area scores, which would help improve the matching.
Acknowledgement
We thank Matthew Taylor, Chad Jenkins, and Kevin Leyton-Brown for their valuable input and feedback.
References
- Michael Cui, Chenxin Dai, Yixuan Even Xu, and Fei Fang.
A Unified Framework for Scalable and Robust Paper Assignment.
arXiv preprint, 2026.
https://arxiv.org/abs/2601.14402 - Yixuan Even Xu, Steven Jecmen, Zimeng Song, and Fei Fang.
A One-Size-Fits-All Approach to Improving Randomness in Paper Assignment.
arXiv preprint, 2024.
https://arxiv.org/abs/2310.05995
