Coresets¶
Module for defining coreset data structures.
- class coreax.coreset.AbstractCoreset[source]¶
Bases:
Module,Generic[_TPointsData_co,_TOriginalData_co]Abstract base class for coresets.
A coreset is a reduced set of \(\hat{n}\) (potentially weighted) data points, \(\hat{X} := \{(\hat{x}_i, \hat{w}_i)\}_{i=1}^\hat{n}\) that, in some sense, best represent the “important” properties of a larger set of \(n > \hat{n}\) (potentially weighted) data points \(X := \{(x_i, w_i)\}_{i=1}^n\).
\(\hat{x}_i, x_i \in \Omega\) represent the data points/nodes and \(\hat{w}_i, w_i \in \mathbb{R}\) represent the associated weights.
- abstract property points: _TPointsData_co¶
The coreset points.
- abstract property pre_coreset_data: _TOriginalData_co¶
The original data that this coreset is based on.
- abstract solve_weights(solver, **solver_kwargs)[source]¶
Return a copy of ‘self’ with weights solved by ‘solver’.
- Return type:
Self- Parameters:
solver (WeightsOptimiser[Data])
- compute_metric(metric, **metric_kwargs)[source]¶
Return metric-distance between self.pre_coreset_data and self.coreset.
- property coreset: _TPointsData_co¶
Deprecated alias for .points.
- class coreax.coreset.PseudoCoreset(nodes, pre_coreset_data)[source]¶
Bases:
AbstractCoreset[Data,_TOriginalData_co],Generic[_TOriginalData_co]Data structure for representing a pseudo-coreset.
The points of a pseudo-coreset are not necessarily points in the original dataset.
- Parameters:
nodes (
Data) – The (weighted) coreset nodes, \(I\); these can be accessed viaCoresubset.points().pre_coreset_data (
Any) – The dataset \(X\) used to construct the coreset.
- classmethod build(nodes, pre_coreset_data)[source]¶
Construct a PseudoCoreset from Data or raw Arrays.
- Parameters:
nodes (
Union[Data,Array]) – The (weighted) coreset nodes, \(I\); these can be accessed viaCoresubset.points().jax.Arrayinstances are automatically converted intoData.pre_coreset_data (
Union[Any,Array,tuple[Array,Array]]) – The dataset \(X\) used to construct the coreset.jax.Arrayinstances are automatically converted intoData.tuple[jax.Array,jax.Array] is automatically converted intoSupervisedData.
- Return type:
Union[PseudoCoreset[Data],PseudoCoreset[SupervisedData],PseudoCoreset[Any]]
- property pre_coreset_data¶
The original data that this coreset is based on.
- solve_weights(solver, **solver_kwargs)[source]¶
Return a copy of ‘self’ with weights solved by ‘solver’.
- Return type:
Self- Parameters:
solver (WeightsOptimiser[Data])
- class coreax.coreset.Coreset(nodes, pre_coreset_data)[source]¶
Bases:
PseudoCoresetDeprecated - split into AbstractCoreset and PseudoCoreset.
- Parameters:
nodes (Data)
pre_coreset_data (_TOriginalData_co)
- class coreax.coreset.Coresubset(indices, pre_coreset_data)[source]¶
Bases:
AbstractCoreset[_TOriginalData_co,_TOriginalData_co],Generic[_TOriginalData_co]Data structure for representing a coresubset.
A coresubset is a
Coreset, with the additional condition that the coreset data points/nodes must be a subset of the original data points/nodes, such that\[\hat{x}_i = x_i, \forall i \in I, I \subset \{1, \dots, n\}, \text{card}(I) = \hat{n}.\]Thus, a coresubset, unlike a coreset, ensures that feasibility constraints on the support of the measure are maintained [litterer2012recombination].
In coresubsets, the dataset reduction can be implicit (setting weights/nodes to zero for all \(i \notin I\)) or explicit (removing entries from the weight/node arrays). The implicit approach is useful when input/output array shape stability is required (E.G. for some JAX transformations); the explicit approach is more similar to a standard coreset.
- Parameters:
indices (
Data) – The (weighted) coresubset node indices, \(I\); the materialised coresubset nodes should only be accessed viaCoresubset.points().pre_coreset_data (
Any) – The dataset \(X\) used to construct the coreset.
- classmethod build(indices, pre_coreset_data)[source]¶
Construct a Coresubset from Data or raw Arrays.
- Parameters:
indices (
Union[Data,Array]) – The (weighted) coresubset node indices, \(I\); the materialised coresubset nodes should only be accessed viaCoresubset.points().jax.Arrayinstances are automatically converted intoData.pre_coreset_data (
Union[Any,Array,tuple[Array,Array]]) – The dataset \(X\) used to construct the coreset.jax.Arrayinstances are automatically converted intoData.tuple[jax.Array,jax.Array] is automatically converted intoSupervisedData.
- Return type:
Union[Coresubset[Data],Coresubset[SupervisedData],Coresubset[Any]]
- property points: _TOriginalData_co¶
Materialise the coresubset from the indices and original data.
- property unweighted_indices: Shaped[Array, 'n']¶
Unweighted Coresubset indices - attribute access helper.
- property pre_coreset_data¶
The original data that this coreset is based on.
- solve_weights(solver, **solver_kwargs)[source]¶
Return a copy of ‘self’ with weights solved by ‘solver’.
- Return type:
Self- Parameters:
solver (WeightsOptimiser[Data])