Analytical example with greedy kernel points

Step-by-step usage of greedy kernel points on an analytical example.

See GreedyKernelPoints for a description of the method.

We start with the supervised dataset of three points in \(\mathbb{R}^2\)

\[\begin{split}X = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 2 & 1 \end{pmatrix}\end{split}\]

with supervision

\[\begin{split}y^{(1)} = \begin{pmatrix} 0 \\ 1 \\ 5 \end{pmatrix} .\end{split}\]

With the LinearKernel with output_scale \(= 1\) and constant \(= 0\) and using the GreedyKernelPoints algorithm with zero regularisation (\(\lambda = 0\)), we seek a coreset of size two without duplicates.

The feature Gramian is

\[\begin{split}K^{(11)} &= X^T X \\ &= \begin{pmatrix} 1 & 0 & 2 \\ 0 & 1 & 1 \\ 2 & 1 & 5 \end{pmatrix} .\end{split}\]

Following the notation in GreedyKernelPoints, let \(\mathcal{D}^{(2)} = \{(x_i, y_i)\}_{i=1}^m\) be the set of points already selected for the coreset. Then, \(X^{(2)} \in \mathbb{R}^{m \times 2}\) is the selected data, \(y^{(2)} \in \mathbb{R}^m\) is the vector of responses, \(K^{(12)}\) is the cross-matrix of kernel evaluations (a subset of columns of the feature Gramian), and \(K^{(22)}\) is the kernel matrix on \(\mathcal{D}^{(2)}\) (a subset of rows and columns of the feature Gramian). With zero regularisation, the predictions for a given coreset are given by \(z = K^{(12)} {K^{(22)}}^{-1} y^{(2)}\).

First iteration

At the first iteration, there are no data points already in the coreset, so we consider the three candidate coresets, each of size one.

Candidate (0)

\[\begin{split}X^{(2)} &= \begin{pmatrix} 1 & 0 \end{pmatrix} ; \\ y^{(2)} &= \begin{pmatrix} 0 \end{pmatrix} ; \\ K^{(12)} &= \begin{pmatrix} 1 \\ 0 \\ 2 \end{pmatrix} ; \\ K^{(22)} &= \begin{pmatrix} 1 \end{pmatrix} .\end{split}\]

Then, the inverse of the kernel matrix is

\[{K^{(22)}}^{-1} = \begin{pmatrix} 1 \end{pmatrix} ,\]

and the prediction is

\[\begin{split}z = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} ,\end{split}\]

so the loss is

\[\begin{split}L &= \left\| y^{(1)} - z \right\|^2 \\ &= 0^2 + 1^2 + 5^2 \\ &= 26 .\end{split}\]

Candidate (1)

\[\begin{split}X^{(2)} &= \begin{pmatrix} 0 & 1 \end{pmatrix} ; \\ y^{(2)} &= \begin{pmatrix} 0 \end{pmatrix} ; \\ K^{(12)} &= \begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix} ; \\ K^{(22)} &= \begin{pmatrix} 1 \end{pmatrix} .\end{split}\]

Then, the inverse of the kernel matrix is

\[{K^{(22)}}^{-1} = \begin{pmatrix} 1 \end{pmatrix} ,\]

and the prediction is

\[\begin{split}z = \begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix} ,\end{split}\]

so the loss is

\[\begin{split}L &= 0^2 + 0^2 + 4^2 \\ &= 16 .\end{split}\]

Candidate (2)

\[\begin{split}X^{(2)} &= \begin{pmatrix} 2 & 1 \end{pmatrix} ; \\ y^{(2)} &= \begin{pmatrix} 5 \end{pmatrix} ; \\ K^{(12)} &= \begin{pmatrix} 2 \\ 1 \\ 5 \end{pmatrix} ; \\ K^{(22)} &= \begin{pmatrix} 5 \end{pmatrix} .\end{split}\]

Then, the inverse of the kernel matrix is

\[{K^{(22)}}^{-1} = \begin{pmatrix} \frac{1}{5} \end{pmatrix} ,\]

and the prediction is

\[\begin{split}z = \begin{pmatrix} 2 \\ 1 \\ 5 \end{pmatrix} ,\end{split}\]

so the loss is

\[\begin{split}L &= 2^2 + 0^2 + 0^2 \\ &= 4 .\end{split}\]

Selection

Index 2 has the lowest loss, so joins the coreset.

Second iteration

We consider two candidate coresets of size two, each containing data corresponding to index 2 as the first element with another index in the second element.

Candidate (2 0)

\[\begin{split}X^{(2)} &= \begin{pmatrix} 2 & 1 \\ 1 & 0 \end{pmatrix} ; \\ y^{(2)} &= \begin{pmatrix} 5 \\ 0 \end{pmatrix} ; \\ K^{(12)} &= \begin{pmatrix} 2 & 1 \\ 1 & 0 \\ 5 & 2 \end{pmatrix} ; \\ K^{(22)} &= \begin{pmatrix} 5 & 2 \\ 2 & 1 \end{pmatrix} .\end{split}\]

Then, the inverse of the kernel matrix is

\[\begin{split}{K^{(22)}}^{-1} = \begin{pmatrix} 1 & -2 \\ -2 & 5 \end{pmatrix} ,\end{split}\]

and the prediction is

\[\begin{split}z &= K^{(12)} {K^{(22)}}^{-1} y^{(2)} \\ &= \begin{pmatrix} 2 & 1 \\ 1 & 0 \\ 5 & 2 \end{pmatrix} \begin{pmatrix} 1 & -2 \\ -2 & 5 \end{pmatrix} \begin{pmatrix} 5 \\ 0 \end{pmatrix} \\ &= \begin{pmatrix} 2 & 1 \\ 1 & 0 \\ 5 & 2 \end{pmatrix} \begin{pmatrix} 5 \\ -10 \end{pmatrix} \\ &= \begin{pmatrix} 0 \\ 5 \\ 5 \end{pmatrix} ,\end{split}\]

so the loss is

\[\begin{split}L &= 0^2 + 4^2 + 0^2 \\ &= 16 .\end{split}\]

Candidate (2 1)

\[\begin{split}X^{(2)} &= \begin{pmatrix} 2 & 1 \\ 0 & 1 \end{pmatrix} ; \\ y^{(2)} &= \begin{pmatrix} 5 \\ 1 \end{pmatrix} ; \\ K^{(12)} &= \begin{pmatrix} 2 & 0 \\ 1 & 1 \\ 5 & 1 \end{pmatrix} ; \\ K^{(22)} &= \begin{pmatrix} 5 & 1 \\ 1 & 1 \end{pmatrix} .\end{split}\]

Then, the inverse of the kernel matrix is

\[\begin{split}{K^{(22)}}^{-1} = \frac{1}{4} \begin{pmatrix} 1 & -1 \\ -1 & 5 \end{pmatrix} ,\end{split}\]

and the prediction is

\[\begin{split}z &= K^{(12)} {K^{(22)}}^{-1} y^{(2)} \\ &= \frac{1}{4} \begin{pmatrix} 2 & 0 \\ 1 & 1 \\ 5 & 1 \end{pmatrix} \begin{pmatrix} 1 & -1 \\ -1 & 5 \end{pmatrix} \begin{pmatrix} 5 \\ 1 \end{pmatrix} \\ &= \begin{pmatrix} 2 & 0 \\ 1 & 1 \\ 5 & 1 \end{pmatrix} \begin{pmatrix} 4 \\ 0 \end{pmatrix} \\ &= \begin{pmatrix} 2 \\ 1 \\ 5 \end{pmatrix} ,\end{split}\]

so the loss is

\[\begin{split}L &= 2^2 + 0^2 + 0^2 \\ &= 4 .\end{split}\]

Final selection

The second candidate has the lower loss, so the final coreset consists of indices \(\begin{pmatrix} 2 & 1 \end{pmatrix}\). In terms of original data, this can be expressed as

\[\begin{split}\hat{X} &= \begin{pmatrix} 2 & 1 \\ 0 & 1 \end{pmatrix} ; \\ \hat{y} &= \begin{pmatrix} 5 \\ 1 \end{pmatrix} .\end{split}\]
examples.greedy_kernel_points_analytic.main()[source]

Run the greedy kernel points analytical example.

Return type:

Coresubset[SupervisedData]

Returns:

Object containing coreset indices and materialised coreset.