Coreax (v0.4.0)¶
Coreax is a library for coreset algorithms, written in JAX for fast execution and GPU support.
Setup¶
Before installing Coreax, make sure JAX is installed. Be sure to install the preferred version of JAX for your system.
Install JAX noting that there are (currently) different setup paths for CPU and GPU use.
Install Coreax:
$ python3 -m pip install coreax
Optionally, install additional dependencies required to run the examples:
$ python3 -m pip install coreax[test]
Should the installation fail, try again using stable pinned package versions. Note that these versions may be rather outdated, although we endeavour to avoid versions with known vulnerabilities. To install Coreax:
$ python3 -m pip install --no-dependencies -r requirements.txt
To run the examples, use requirements-test.txt instead.
Contents¶
API Reference
- Approximations
- Coresets
- Data
- Kernels
median_heuristic()ScalarValuedKernelUniCompositeKernelPowerKernelDuoCompositeKernelAdditiveKernelProductKernelLinearKernelPolynomialKernelExponentialKernelLaplacianKernelSquaredExponentialKernelPCIMQKernelRationalQuadraticKernelMaternKernelPeriodicKernelLocallyPeriodicKernelPoissonKernelSteinKernel
- Least Squares
- Metrics
- Networks
- Score Matching
- Solvers
SolverCoresubsetSolverRefinementSolverExplicitSizeSolverPaddingInvariantSolverCompositeSolverMapReduceRandomSampleHerdingStateKernelHerdingKernelThinningSteinThinningRPCholeskyStateRPCholeskyGreedyKernelPointsStateGreedyKernelPointsRecombinationSolverCaratheodoryRecombinationTreeRecombination
- Utility Functions
- Weights
Bibliography¶
Qiang Liu, Jason D. Lee, and Michael I. Jordan. A kernelized stein discrepancy for goodness-of-fit tests and model evaluation. 2016. arXiv:1602.03253.
Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: a scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, 574–584. PMLR, 2020.
Yifan Chen, Ethan N. Epperly, Joel A. Tropp, and Robert J. Webber. Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations. 2023. arXiv:2207.06503.
Clement Benard, Brian Staber, and Sébastien Da Veiga. Kernel Stein discrepancy thinning: a theoretical perspective of pathologies and a practical fix with regularization. In Thirty-seventh Conference on Neural Information Processing Systems. 2023. URL: https://openreview.net/forum?id=TjgG4UT62W.
Antoine Chatalic, Nicolas Schreuder, Alessandro Rudi, and Lorenzo Rosasco. Nyström Kernel Mean Embeddings. 2022. arXiv:2201.13055.
C. Litterer and T. Lyons. High order recombination and an application to cubature on wiener space. The Annals of Applied Probability, August 2012. URL: http://dx.doi.org/10.1214/11-AAP786, doi:10.1214/11-aap786.
Raaz Dwivedi and Lester Mackey. Kernel thinning. 2024. URL: https://arxiv.org/abs/2105.05842, arXiv:2105.05842.
Damien Garreau, Wittawat Jitkrittum, and Motonobu Kanagawa. Large sample analysis of the median heuristic. 2018. arXiv:1707.07269.
Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. 2009. arXiv:0909.4061.
Yutian Chen, Max Welling, and Alex Smola. Super-samples from kernel herding. 2012. arXiv:1203.3472.
Timothy Nguyen, Zhourong Chen, and Jaehoon Lee. Dataset meta-learning from kernel ridge-regression. 2021. arXiv:2011.00050.
M Tchernychova. Caratheodory cubature measures. PhD thesis, University of Oxford, 2016. URL: https://ora.ox.ac.uk/objects/uuid:a3a10980-d35d-467b-b3c0-d10d2e491f2d.
Ferenc Huszar and David Duvenaud. Optimally-Weighted Herding is Bayesian Quadrature. 2016. arXiv:1204.1664.
Release Cycle¶
We anticipate two release types: feature releases and security releases. Security releases will be issued as needed in accordance with the security policy. Feature releases will be issued as appropriate, dependent on the feature pipeline and development priorities.
Coming Soon¶
Some features coming soon include:
Coordinate bootstrapping for high-dimensional data.
Other coreset-style algorithms, including kernel thinning and recombination, as means to reducing a large dataset whilst maintaining properties of the underlying distribution.