Active Learning Basics with Gaussian Process Regression

This narrative visualization demonstrates how active learning can be performed using Gaussian process (GP) regression models. The "true" data in this demo has been generated by sampling from a 1-D sine function with added Gaussian noise. A Jupyter notebook of the data generator (in Python) can be found here.

Active learning is a popular method for identifying new training data points when the number of observables is limited. Once training an initial GP model, model retraining is performed sequentially after identifying the next point in search space that minimizes the overall uncertainty (i.e., maximize information gain) of the model. Note that the (re)trained GP model hyperparameters are static for the purposes of this demo. Also note that we are not attempting to optimize our response function (via an acquisition function) as in the case of Bayesian optimization.

Project for UIUC's CS 416 Data Visualization course during the Summer 2021 semester.

Author: Brian Yoo