Coresets¶
Module for defining coreset data structures.
- class coreax.coreset.Coreset(nodes, pre_coreset_data)[source]¶
Bases:
Module,Generic[_Data]Data structure for representing a coreset.
A coreset is a reduced set of \(\hat{n}\) (potentially weighted) data points, \(\hat{X} := \{(\hat{x}_i, \hat{w}_i)\}_{i=1}^\hat{n}\) that, in some sense, best represent the “important” properties of a larger set of \(n > \hat{n}\) (potentially weighted) data points \(X := \{(x_i, w_i)\}_{i=1}^n\).
\(\hat{x}_i, x_i \in \Omega\) represent the data points/nodes and \(\hat{w}_i, w_i \in \mathbb{R}\) represent the associated weights.
- Parameters:
nodes (
Data) – The (weighted) coreset nodes, \(\hat{x}_i\); once instantiated, the nodes should only be accessed viaCoresubset.coreset()pre_coreset_data (
Data) – The dataset \(X\) used to construct the coreset.
- property coreset: _Data¶
Materialised coreset.
- solve_weights(solver, **solver_kwargs)[source]¶
Return a copy of ‘self’ with weights solved by ‘solver’.
- Return type:
Self- Parameters:
solver (WeightsOptimiser[_Data])
- class coreax.coreset.Coresubset(nodes, pre_coreset_data)[source]¶
Bases:
Coreset[Data],Generic[_Data]Data structure for representing a coresubset.
A coresubset is a
Coreset, with the additional condition that the coreset data points/nodes must be a subset of the original data points/nodes, such that\[\hat{x}_i = x_i, \forall i \in I, I \subset \{1, \dots, n\}, \text{card}(I) = \hat{n}.\]Thus, a coresubset, unlike a coreset, ensures that feasibility constraints on the support of the measure are maintained [litterer2012recombination].
In coresubsets, the dataset reduction can be implicit (setting weights/nodes to zero for all \(i \notin I\)) or explicit (removing entries from the weight/node arrays). The implicit approach is useful when input/output array shape stability is required (E.G. for some JAX transformations); the explicit approach is more similar to a standard coreset.
- Parameters:
nodes (
Data) – The (weighted) coresubset node indices, \(I\); the materialised coresubset nodes should only be accessed viaCoresubset.coreset().pre_coreset_data (
Data) – The dataset \(X\) used to construct the coreset.
- property coreset: _Data¶
Materialise the coresubset from the indices and original data.
- property unweighted_indices: Shaped[Array, 'n']¶
Unweighted Coresubset indices - attribute access helper.