Bayesian Aggregation

This page provides some background on the Bayesian poll aggregation models I use on this site for the 2024-25 Australian Federal election. At the moment I am using three different Bayesian models:

  • A Gaussian Random Walk (GRW) which models the election by assuming (1) voting intention on any single day only varies a little from the day before, (2) the polls are a noisy indication of population voting intention, and (3) individually pollsters have systemic biases but collectively across all pollsters the polls are unbiased.

  • A Gaussian Random Walk that is Left Anchored (GRWLA) to the election outcome at the previous election. Otherwise, this model makes the same assumptions as the GRW above.

  • A Gaussian Process (GP) model, that assumes the hidden voting intention on a day on which a poll was conducted will be be closely correlated with nearby polls but less correlated with polls further away in time. The degree of covariance between polls is specified by a quadratic exponentiation kernel.  In these models I use a length-scale of 40 days, such that polls 40 days apart will have a correlation of around 0.61, polls 80 days apart will have a correlation of around 0.14, and polls 120 days apart will have a correlation of around 0.01. Like the above models, the GP model also assumes that individually pollsters have systemic biases but collectively across all pollsters the polls are unbiased.

These models are encoded in Python and they can be seen on my github repository. If you want to look at the models closely, you should look at bayes_tools.py and the notebook _poll_agg.ipynb

The GRW and GRWLA modelling is based on the work of Simon Jackman in Bayesian Analysis for Social Sciences (2009).