Introduction#

Background, features, and getting started

Background#

The Discrete Fourier Transform (DFT) transforms a sequence of $N$ complex numbers in the time domain

$\mathbf{x} = \langle x_0, x_1, \dots , x_{N-1} \rangle$

into another sequence of $N$ complex numbers in the frequency domain

$\mathcal{F}(\mathbf{x}) = \mathbf{X} = \langle X_0, X_1, \dots , X_{N-1} \rangle$

such that

$X_k = \alpha \sum_{n=0}^{N-1} x_n \exp\!\left(\frac{-2\pi i}{N}kn\right)$

where $\alpha$ is a real-valued scaling factor and $i = \sqrt{-1}$ .

The transform above is also known as the forward DFT. The backward DFT goes the other way, transforming a sequence of complex numbers in the frequency domain into another sequence in the time domain

$\mathcal{F}^{\,-1}(\mathbf{X}) = \mathbf{x} = \langle x_0, x_1, \dots , x_{N-1} \rangle$

such that

$x_k = \beta \sum_{n=0}^{N-1} X_n \exp\!\left(\frac{+2\pi i}{N}kn\right)$

where $\beta$ is the backward scaling factor.

If the two scaling factors are chosen such that $\alpha \cdot \beta \cdot N = 1$ , then the forward and backward DFTs are inverse functions of each other. That is,

$\mathbf{x} = \mathcal{F}^{\,-1}\big(\mathcal{F}( \mathbf{x} )\big) \quad \mathrm{and} \quad \mathbf{X} = \mathcal{F}\big(\mathcal{F}^{\,-1}( \mathbf{X} )\big) \mathrm{.}$

So, common choices are $\alpha = 1,\, \beta = {}^1\!{/\!}_N$ and $\alpha = \beta = {}^1\!{/\!}_{\sqrt{N}}$ , though the scale factor is sometimes selected either statically or dynamically for the practical purpose of preventing overflow.

The Fast Fourier Transform (FFT) is an algorithm for computing the DFT that has log-linear complexity, i.e., $\mathcal{O}(N \log N)$ .

Hpk Features#

Hpk provides functions for computing forward and backward FFTs with unit scaling $(\alpha = 1,\, \beta = 1)$ and with arbitrary real-valued scaling factors.

One can choose that the output data overwrite the input data by using an Inplace FFT, or, by using an Ooplace FFT, the output data can be written to memory that is disjoint from the input’s. In the latter case, the input data is left unchanged.

The following data representations and layouts are supported:

Complex time, Complex frequency
- One-dimensional data contiguous in memory
- One-dimensional strided data (padding between points)
- Multidimensional FFT with each row contiguous in memory
Real time, Complex frequency
- One-dimensional data contiguous in memory
- Multidimensional FFT with each row contiguous in memory

There is no predefined limit on the number of dimensions in a multidimensional FFT. Also, the layout can have padding between rows, slabs, hyperslabs, etc. Only the points within each row need be contiguous. Batching is also supported, and padding is permitted between batches. The programming interface is discussed in the API Design section.

For the real time domain case, the frequency domain has conjugate symmetry and so only about half of the complex points are represented in the frequency data layout. The Real Time Domain section has the details.

Getting Started#

Hpk is currently available for Linux on x86_64 and aarch64. The former requires hardware supporting the AVX2 instruction set, with AVX512 optional. The latter requires hardware supporting SVE with a vector length of 256 bits.

Half precision (IEEE Std. 754-2019 binary16) on x86_64 requires hardware with the AVX512_FP16 instruction set.

For developers, a compiler supporting the C++17 standard is required. Developing for half precision requires a compiler that supports the _Float16 floating point data type.

Installation#

The library is divided into packages. The core package is required by everyone, as is at least one of the instruction set specific packages. For x86_64 (e.g., Intel’s Xeon), the packages avx2 and avx512 are available; for ARM’s AArch64 (e.g., Amazon’s Graviton3), sve256.

Each basic package is separated into two, e.g., core and core-devel. Developers will want to install both, but users of their applications will only need to install the package lacking the -devel suffix.

Package (`x86_64`)	Content description
`core`	core shared library
`core-devel`	include, lib symlink, pkgconfig, cmake, examples, and html
`avx2`	AVX2 shared libs for single and double precision
`avx2-devel`	lib symlinks, pkgconfig, and cmake files for avx2
`avx512`	AVX512 shared libs for half, single, and double precision
`avx512-devel`	lib symlinks, pkgconfig, and cmake files for avx512
`omp`	shared library utilizing OpenMP thread-level parallelism
`omp-devel`	lib symlinks, pkgconfig, and cmake files for omp

Package (`aarch64`)	Content description
`core`	core shared library
`core-devel`	include, lib symlink, pkgconfig, cmake, examples, and html
`sve256`	SVE256 shared libs for half, single, and double precision
`sve256-devel`	lib symlinks, pkgconfig, and cmake files for sve256
`omp`	shared library utilizing OpenMP thread-level parallelism
`omp-devel`	lib symlinks, pkgconfig, and cmake files for omp

Debian users with root privilege can install into /opt/libhpk0 as follows:

sudo dpkg -i libhpk0-core_*_amd64.deb   libhpk0-core-devel_*_amd64.deb   \
             libhpk0-avx2_*_amd64.deb   libhpk0-avx2-devel_*_amd64.deb   \
             libhpk0-avx512_*_amd64.deb libhpk0-avx512-devel_*_amd64.deb \
             libhpk0-omp_*_amd64.deb    libhpk0-omp-devel_*_amd64.deb

Note that the Debian packages do not run any scripts (such as preinstall, etc.). The x86_64 packages use the name amd64 for historical reasons.

Amazon Linux users with root privilege on Graviton3 or 3E instances can install into /opt/libhpk0 as follows:

sudo rpm -i libhpk0-core-0*.aarch64.rpm   libhpk0-core-devel-*.aarch64.rpm   \
            libhpk0-sve256-0*.aarch64.rpm libhpk0-sve256-devel-*.aarch64.rpm \
            libhpk0-omp-0*.aarch64.rpm    libhpk0-omp-devel-*.aarch64.rpm

Alternatively, tar archive files can be extracted into the directory of one’s choosing, and, of course, this method works on any Linux distribution. The tar files are available in compressed format, either with lzip or gzip. Lzip is preferred; the files are smaller and have better integrity checking. For example, to list the contents of an lzipped tar file, one can use:

tar -t --lzip -f filename.tar.lz

To extract the same file into the directory /opt/libhpk0, one can use:

tar -x --lzip -f filename.tar.lz -C /opt/libhpk0

Note that, before running the previous command, you must make the directory /opt/libhpk0 if it does not already exist.

If you’ve downloaded the file having suffix .sha512, you can check the SHA512 message digest of the files before installing as follows:

sha512sum -c --ignore-missing libhpk*.sha512

And then, if everything is OK,

for f in libhpk*.tar.lz; do tar -x --lzip -f $f -C /opt/libhpk0; done

By the way, if you’ve downloaded the sha512 files from the same place you got the installation files themselves, then you’re not accomplishing very much.

Building an example AVX2 application#

First, install Hpk. Let’s suppose you have root privilege and install in /opt/libhpk0. In the following instructions, we will specify this installation directory, but you should change it as necessary.

Now in your home directory (or anywhere you like) create a file named example.cpp with the following contents:

#include <complex>
#include <vector>
#include <iostream>
#include <hpk/fft/makeFactory.hpp>

int main() {
    auto factory = hpk::fft::makeFactory<float>();
    auto fft = factory->makeInplace({4});  // Four points
    std::vector<std::complex<float>> v = {7.0f, 0.0f, 0.0f, 0.0f};
    std::cout << "Time:";
    for (const auto& point : v) std::cout << ' ' << point;
    std::cout << '\n';
    fft->forward(v.data());  // Compute the FFT!
    std::cout << "Freq:";
    for (const auto& point : v) std::cout << ' ' << point;
    std::cout << '\n';
}

Using a compiler directly#

To build the executable myexample and run it, type the following two commands:

c++ -std=c++17 -DHPK_HAVE_FFT_AVX2_FP32 -I/opt/libhpk0/include   \
    -L/opt/libhpk0/lib -Wl,-rpath=/opt/libhpk0/lib  example.cpp  \
    -lhpk_fft_avx2_fp32 -lhpk_core -ldl -o myexample
./myexample

Using cmake, make, and a compiler#

In the same directory as example.cpp, create a file named CMakeLists.txt with the following contents:

cmake_minimum_required(VERSION 3.18)
project(MyProject LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)

find_package(Hpk CONFIG)

add_executable(myexample example.cpp)
target_link_libraries(myexample PRIVATE hpk::fft_avx2_fp32 ${CMAKE_DL_LIBS})

Now, to create the subdirectory mybuild and initialize the build system, then build myexample, and then run it, type the following three commands:

cmake -D Hpk_DIR=/opt/libhpk0/lib/cmake/hpk -B mybuild
cmake --build mybuild
mybuild/myexample

You should change the directory path above to match your local installation. The find_package() looks for the file HpkConfig.cmake in the directory specified by Hpk_DIR. In particular, Linux/AArch64 typically uses lib64 instead of lib. (And, of course, you want to link with hpk::fft_sve256_fp32, not avx2.)

Using meson, ninja, and a compiler#

In the same directory as example.cpp, create a file named meson.build with the following contents:

project('MyProject', 'cpp', default_options: ['cpp_std=c++17'])

hpk_fft_fp32 = dependency('hpk_fft_avx2_fp32')
dl = dependency('dl')  # Only needed for glibc < 2.34

executable('myexample', 'example.cpp', dependencies: [hpk_fft_fp32, dl])

Now, to create the subdirectory mybuild and initialize the build system, then build myexample, and then run it, type the following three commands:

meson setup --pkg-config-path=/opt/libhpk0/lib/pkgconfig mybuild
meson compile -C mybuild
mybuild/myexample

You should change the directory path above to match your local installation. The dependency() function looks for the file hpk_fft_avx2_fp32.pc in the directory specified by --pkg-config-path. In particular, Linux/AArch64 typically uses lib64 instead of lib, and the dependency is on 'hpk_fft_sve256_fp32', not 'hpk_fft_avx2_fp32'.