Here we evaluate how each of the models has done compared to observed values.

## Metrics

**Error**: Error is the raw difference between a predicted point value (\(\hat{y}_{i}\)) and observed value (\(y_{i}\)): \(e_{i} = \hat{y}_{i} - y_{i}\). Although we generally summarize errors in a fashion that results in non-negative values only (*e.g.*, via squaring), the raw error provides information about prediction bias (if the models tend to over- or under-predict in a systematic fashion).

**RMSE**: Root Mean Square Error (RMSE) is used to evaluate the absolute accuracy of the point estimates of a forecast. Across \(N\) predicted values in a time series, \(RMSE = \sqrt{\frac{\sum_{i=1}^{N}{e_{i}^2}}{N}}\).

**Coverage**: Coverage is used to evaluate the uncertainty in a forecast and is the percentage of observations which fell within the confidence interval of the prediction. Because we use a 90% CI, coverage would ideally be equal to 0.90. If coverage is higher than 0.90, the forecast’s intervals are too wide; if it’s lower than 0.90, the forecast intervals are too narrow.

## How have the models done recently?

This figure shows the raw error as a function of lead time (how far in advance the prediction was made) for three forecast dates (colors of lines) across each combination of model (column) and species (row) for a selection of historically common species and the total rodents, for the control (unmanipulated) plots only.

## How would have the models done historically?

Here, we evaluate the models’ abilities to hindcast the historic data. This figure shows coverage (left column) and RMSE (right column) for each of a selection of historically common species and total rodents on the control (unmanipulated) plots only. The x-axis in each plot corresponds to the different models, including the ensemble. Each model’s raw metric values are displayed for all included hindcasts (with jittering to show overlapping points) in addition to a summarizing quartile box-whisker (ends of the whiskers show the exteme values, box shows the 25%-75% range, horizontal line shows the median). The 90% coverage target is indicated by the horizontal dotted line.