Kernels#
Classes and associated functionality to use kernel functions.
A kernel is a non-negative, real-valued integrable function that can take two inputs,
x and y, and returns a value that decreases as x and y move further away
in space from each other. Note that further here may account for cyclic behaviour in
the data, for example.
In this library, we often use kernels as a smoothing tool: given a dataset of distinct points, we can reconstruct the underlying data generating distribution through smoothing of the data with kernels.
Some kernels are parameterizable and may represent other well known kernels when given
appropriate parameter values. For example, the SquaredExponentialKernel,
which if parameterized with an output_scale of
\(\frac{1}{\sqrt{2\pi} \,*\, \text{length_scale}}\), yields the well known
Gaussian kernel.
A Kernel must implement a
compute_elementwise() method, which returns the floating
point value after evaluating the kernel on two floats, x and y. Additional
methods, such as Kernel.grad_x_elementwise(), can optionally be overridden to
improve performance. The canonical example is when a suitable closed-form representation
of a higher-order gradient can be used to avoid the expense of performing automatic
differentiation.
Such an example can be seen in SteinKernel, where the analytic forms of
Kernel.divergence_x_grad_y() are significantly cheaper to compute that the
automatic differentiated default.
- coreax.kernel.median_heuristic(x)[source]#
Compute the median heuristic for setting kernel bandwidth.
Analysis of the performance of the median heuristic can be found in [garreau2018median].
- class coreax.kernel.Kernel[source]#
Abstract base class for kernels.
- compute(x, y)[source]#
Evaluate the kernel on input data
xandy.- The ‘data’ can be any of:
floating numbers (so a single data-point in 1-dimension)
zero-dimensional arrays (so a single data-point in 1-dimension)
a vector (a single-point in multiple dimensions)
array (multiple vectors).
Evaluation is always vectorised.
- Parameters:
- Return type:
- Returns:
Kernel evaluations between points in
xandy. Ifx=y, then this is the Gram matrix corresponding to the RKHS inner product.
- abstract compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x(x, y)[source]#
Evaluate the gradient (Jacobian) of the kernel function w.r.t.
x.- The function is vectorised, so
xorycan be any of: floating numbers (so a single data-point in 1-dimension)
zero-dimensional arrays (so a single data-point in 1-dimension)
a vector (a single-point in multiple dimensions)
array (multiple vectors).
- The function is vectorised, so
- grad_y(x, y)[source]#
Evaluate the gradient (Jacobian) of the kernel function w.r.t.
y.- The function is vectorised, so
xorycan be any of: floating numbers (so a single data-point in 1-dimension)
zero-dimensional arrays (so a single data-point in 1-dimension)
a vector (a single-point in multiple dimensions)
array (multiple vectors).
- The function is vectorised, so
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y(x, y)[source]#
Evaluate the divergence operator w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). This function is vectorised, so it accepts vectors or arrays.
This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- gramian_row_mean(x, *, block_size=None, unroll=1)[source]#
Compute the (blocked) row-mean of the kernel’s Gramian matrix.
A convenience method for calling
compute_mean(). Equivalent to the callcompute_mean(x, x, axis=0, block_size=block_size, unroll=unroll).- Parameters:
x (
Union[Array,ndarray,bool_,number,bool,int,float,complex,Data]) – Data matrix, \(n \times d\)block_size (
Union[int,None,tuple[Optional[int],Optional[int]]]) – Block size parameter passed tocompute_mean()unroll (
Union[int,bool,tuple[Union[int,bool],Union[int,bool]]]) – Unroll parameter passed tocompute_mean()
- Return type:
- Returns:
Gramian ‘row/column-mean’, \(\frac{1}{n}\sum_{i=1}^{n} G_{ij}\).
- compute_mean(x, y, axis=None, *, block_size=None, unroll=1)[source]#
Compute the (blocked) mean of the matrix \(K_{ij} = k(x_i, y_j)\).
The \(n \times m\) kernel matrix \(K_{ij} = k(x_i, y_j)\), where
xandyare respectively \(n \times d\) and \(m \times d\) (weighted) data matrices, has the following (weighted) means:mean (
axis=None) \(\frac{1}{n m}\sum_{i,j=1}^{n, m} K_{ij}\)row-mean (
axis=0) \(\frac{1}{n}\sum_{i=1}^{n} K_{ij}\)column-mean (
axis=1) \(\frac{1}{m}\sum_{j=1}^{m} K_{ij}\)
If
xandyare of typeData, their weights are used to compute the weighted mean as defined injax.numpy.average().Note
The conventional ‘mean’ is a scalar, the ‘row-mean’ is an \(m\)-vector, while the ‘column-mean’ is an \(n\)-vector.
To avoid materializing the entire matrix (memory cost \(\mathcal{O}(n m)\)), we accumulate the mean over blocks (memory cost \(\mathcal{O}(B_x B_y)\), where
B_xandB_yare user-specified block-sizes for blocking thexandyparameters respectively.Note
The data
xand/oryare padded with zero-valued and zero-weighted data points, whenB_xand/orB_yare non-integer divisors ofnand/orm. Padding does not alter the result, but does provide the block shape stability required byjax.lax.scan()(used for block iteration).- Parameters:
x (
Union[Array,ndarray,bool_,number,bool,int,float,complex,Data]) – Data matrix, \(n \times d\)y (
Union[Array,ndarray,bool_,number,bool,int,float,complex,Data]) – Data matrix, \(m \times d\)axis (
Optional[int]) – Which axis of the kernel matrix to compute the mean over; a value of None computes the mean over both axesblock_size (
Union[int,None,tuple[Optional[int],Optional[int]]]) – Size of matrix blocks to process; a value ofNonesets \(B_x = n\) and \(B_y = m\), effectively disabling the block accumulation; an integer valueBsets \(B_y = B_x = B\); a tuple allows different sizes to be specified forB_xandB_y; to reduce overheads, it is often sensible to select the largest block size that does not exhaust the available memory resourcesunroll (
Union[int,bool,tuple[Union[int,bool],Union[int,bool]]]) – Unrolling parameter for the outer and innerjax.lax.scan()calls, allows for trade-offs between compilation and runtime cost; consult the JAX docs for further information
- Return type:
- Returns:
The (weighted) mean of the kernel matrix \(K_{ij}\)
- class coreax.kernel.PairedKernel(first_kernel, second_kernel)[source]#
Abstract base class for kernels that compose/wrap two kernels.
- class coreax.kernel.AdditiveKernel(first_kernel, second_kernel)[source]#
Define a kernel which is a summation of two kernels.
Given kernel functions \(k:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) and \(l:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), define the additive kernel \(p:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) where \(p(x,y) := k(x,y) + l(x,y)\)
- compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- class coreax.kernel.ProductKernel(first_kernel, second_kernel)[source]#
Define a kernel which is a product of two kernels.
Given kernel functions \(k:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) and \(l:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), define the product kernel \(p:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) where \(p(x,y) = k(x,y)l(x,y)\)
- compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- class coreax.kernel.LinearKernel(output_scale=1.0, constant=0.0)[source]#
Define a linear kernel.
Given \(\rho`=\)’output_scale` and \(c =\)’constant’, the linear kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho x^Ty + c\).
- Parameters:
- compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- class coreax.kernel.PolynomialKernel(output_scale=1.0, constant=0.0, degree=2)[source]#
Define a polynomial kernel.
Given \(\rho =``output_scale\), \(c =\)’constant’, and \(d=\)’degree’, the polynomial kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho (x^Ty + c)^d\).
- Parameters:
- compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- class coreax.kernel.SquaredExponentialKernel(length_scale=1.0, output_scale=1.0)[source]#
Define a squared exponential kernel.
Given \(\lambda =``length_scale\) and \(\rho =``output_scale\), the squared exponential kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||^2}{2 \lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.
- Parameters:
- compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- class coreax.kernel.ExponentialKernel(length_scale=1.0, output_scale=1.0)[source]#
Define an exponential kernel.
Given \(\lambda =``length_scale\) and \(\rho =``output_scale\), the exponential kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||}{2 \lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.
Note
Note that the Exponential kernel is not differentiable when \(x=y\).
- Parameters:
- compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- class coreax.kernel.RationalQuadraticKernel(length_scale=1.0, output_scale=1.0, relative_weighting=1.0)[source]#
Define a rational quadratic kernel.
Given \(\lambda =``length_scale\), \(\rho =``output_scale\), and \(\alpha =``relative_weighting\), the rational quadratic kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * (1 + \frac{||x-y||^2}{2 \alpha \lambda^2})^{-\alpha}\) where \(||\cdot||\) is the usual \(L_2\)-norm.
- Parameters:
length_scale (
float) – Kernel smoothing/bandwidth parameter, \(\lambda\)output_scale (
float) – Kernel normalisation constant, \(\rho\)relative_weighting (
float) – Parameter controlling the relative weighting of large-scale and small-scale variations, \(\alpha\). As \(alpha \to \infty\) the rational quadratic kernel is identical to the squared exponential kernel.
- compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- class coreax.kernel.PeriodicKernel(length_scale=1.0, output_scale=1.0, periodicity=1.0)[source]#
Define a periodic kernel.
Given \(\lambda =``length_scale\), \(\rho =``output_scale\), and \(\p =``periodicity\), the periodic kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(\frac{-2 \sin^2(\pi ||x-y||/p)}{\lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.
Note
Note that the Periodic kernel is not differentiable when \(x=y\).
- Parameters:
- compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- class coreax.kernel.LocallyPeriodicKernel(periodic_length_scale=1.0, periodic_output_scale=1.0, periodicity=1.0, squared_exponential_length_scale=1.0, squared_exponential_output_scale=1.0)[source]#
Define a locally periodic kernel.
The periodic kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = r(x,y)l(x,y)\) where \(r\) is the periodic kernel and \(l\) is the squared exponential kernel.
Note
Note that the Periodic kernel is not differentiable when \(x=y\).
- Parameters:
periodic_length_scale (
float) – Periodic kernel smoothing/bandwidth parameterperiodic_output_scale (
float) – Periodic kernel normalisation constantperiodicity (
float) – Parameter controlling the periodicity of the Periodic kernelsquared_exponential_length_scale (
float) – SquaredExponential kernel smoothing/bandwidth parameter]squared_exponential_output_scale (
float) – SquaredExponential Kernel normalisation constant
- class coreax.kernel.LaplacianKernel(length_scale=1.0, output_scale=1.0)[source]#
Define a Laplacian kernel.
Given \(\lambda =``length_scale\) and \(\rho =``output_scale\), the Laplacian kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||_1}{2 \lambda^2})\) where \(||\cdot||_1\) is the \(L_1\)-norm.
- Parameters:
- compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- class coreax.kernel.PCIMQKernel(length_scale=1.0, output_scale=1.0)[source]#
Define a pre-conditioned inverse multi-quadric (PCIMQ) kernel.
Given \(\lambda =``length_scale\) and \(\rho =``output_scale\), the PCIMQ kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \frac{\rho}{\sqrt{1 + \frac{||x-y||^2}{2 \lambda^2}}} where :math:`||\cdot||\) is the usual \(L_2\)-norm.
- Parameters:
- compute_elementwise(x, y)[source]#
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_x()provides a vectorised version of this method for arrays.
- grad_y_elementwise(x, y)[source]#
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernel.Kernel.grad_y()provides a vectorised version of this method for arrays.
- divergence_x_grad_y_elementwise(x, y)[source]#
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- class coreax.kernel.CompositeKernel(base_kernel)[source]#
Abstract base class for kernels that compose/wrap one kernel.
- Parameters:
base_kernel (
Kernel) – kernel to be wrapped/used in composition
- class coreax.kernel.SteinKernel(base_kernel, score_function)[source]#
Define the Stein kernel, i.e. the application of the Stein operator.
\[\mathcal{A}_\mathbb{P}(g(\mathbf{x})) := \nabla_\mathbf{x} g(\mathbf{x}) + g(\mathbf{x}) \nabla_\mathbf{x} \log f_X(\mathbf{x})^\intercal\]w.r.t. probability measure \(\mathbb{P}\) to the base kernel \(k(\mathbf{x}, \mathbf{y})\). Here, differentiable vector-valued \(g: \mathbb{R}^d \to \mathbb{R}^d\), and \(\nabla_\mathbf{x} \log f_X(\mathbf{x})\) is the score function of measure \(\mathbb{P}\).
\(\mathbb{P}\) is assumed to admit a density function \(f_X\) w.r.t. d-dimensional Lebesgue measure. The score function is assumed to be Lipschitz.
The key property of a Stein operator is zero expectation under \(\mathbb{P}\), i.e. \(\mathbb{E}_\mathbb{P}[\mathcal{A}_\mathbb{P} f(\mathbf{x})]\), for positive differentiable \(f_X\).
The Stein kernel for base kernel \(k(\mathbf{x}, \mathbf{y})\) is defined as
\[k_\mathbb{P}(\mathbf{x}, \mathbf{y}) = \nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) + \nabla_\mathbf{x} \log f_X(\mathbf{x}) \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) + \nabla_\mathbf{y} \log f_X(\mathbf{y}) \cdot \nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) + (\nabla_\mathbf{x} \log f_X(\mathbf{x}) \cdot \nabla_\mathbf{y} \log f_X(\mathbf{y})) k(\mathbf{x}, \mathbf{y}).\]This kernel requires a ‘base’ kernel to evaluate. The base kernel can be any other implemented subclass of the Kernel abstract base class; even another Stein kernel.
The score function \(\nabla_\mathbf{x} \log f_X: \mathbb{R}^d \to \mathbb{R}^d\) can be any suitable Lipschitz score function, e.g. one that is learned from score matching (
ScoreMatching), computed explicitly from a density function, or known analytically.- Parameters: