High Performance Kernels for the Fast Fourier Transform#

Hpk provides a library for computing the FFT in one or more dimensions of either real or complex data in half, single, or double precision.


  • Modern C++ – Hpk is designed for C++ developers. Functions and types are architected to allow most common errors to be detected at compilation time, and FFT compute objects are immutable, thread-safe, fully initialized at construction, and managed by smart pointers.

  • Accuracy – The accuracy of Hpk is typically superior to that of vendor-supplied FFT libraries (e.g., 1.3X on Sapphire Rapids). For details, see our papers below.

  • Performance – The performance of Hpk is generally higher than that of vendor-tuned libraries (e.g., 1.3X on Sapphire Rapids and over 2X on Graviton3E). For details, see our papers below.

  • Python – The Python interface supports NumPy, JAX, PyTorch, and TensorFlow. Both accuracy and performance are superior to that of SciPy (e.g., 1.1X float32 accuracy, 1.2X float64 accuracy, and over 2X performance). For details, see our most recent paper below.


This document corresponds to release 0.4.0 of the Hpk library. Hpk uses semantic versioning, i.e., Major.Minor.Patch, where Major is incremented for incompatible API changes, Minor for backward-compatible additions to functionality, and Patch for backward-compatible bug fixes or performance enhancements.

The latest version of the documentation is available online at:


The documentation for your local installation may be found in the subdirectory share/doc/hpk/html. For example, if the core-devel package has been installed in /opt/libhpk0, the associated documentation can be accessed in a web browser using the URL:



High Performance Kernels LLC. 2024. “Accuracy and Performance of FFT Software Libraries”

Paul Caprioli and Robby Jenkins. 2023. “High Performance Kernels for FFT via Modern C++”
   Slides (updated 2024)


This software is currently available for Linux/x86_64 on hardware supporting AVX2 (and, optionally, AVX512) and for Linux/aarch64 on hardware supporting SVE with 256-bit vectors.

Note the minimum GNU C library versions listed below. For example, Debian 11 (and later) is a good choice for x86_64, as are Debian 12 and Amazon Linux 2023 for aarch64.





Intel/AMD 64


C++ Download Page


ARM 64


C++ Download Page

Please see Getting Started for installation instructions and a description of the package files.

The Python Download Page has Linux/x86_64 wheels for Python 3.11 and later.

Contact Info#

Our email address is: