Introduction#
Background, features, and getting started
Background#
The Discrete Fourier Transform (DFT) transforms a sequence of complex numbers in the time domain
into another sequence of complex numbers in the frequency domain
such that
where is a real-valued scaling factor and .
The transform above is also known as the forward DFT. The backward DFT goes the other way, transforming a sequence of complex numbers in the frequency domain into another sequence in the time domain
such that
where is the backward scaling factor.
If the two scaling factors are chosen such that , then the forward and backward DFTs are inverse functions of each other. That is,
So, common choices are and , though the scale factor is sometimes selected either statically or dynamically for the practical purpose of preventing overflow.
The Fast Fourier Transform (FFT) is an algorithm for computing the DFT that has log-linear complexity, i.e., .
Hpk Features#
Hpk provides functions for computing forward and backward FFTs with unit scaling and with arbitrary real-valued scaling factors.
One can choose that the output data overwrite the input data by using an Inplace FFT, or, by using an Ooplace FFT, the output data can be written to memory that is disjoint from the input’s. In the latter case, the input data is left unchanged.
The following data representations and layouts are supported:
- Complex time, Complex frequency
One-dimensional data contiguous in memory
One-dimensional strided data (padding between points)
Multidimensional FFT with each row contiguous in memory
- Real time, Complex frequency
One-dimensional data contiguous in memory
Multidimensional FFT with each row contiguous in memory
There is no predefined limit on the number of dimensions in a multidimensional FFT. Also, the layout can have padding between rows, slabs, hyperslabs, etc. Only the points within each row need be contiguous. Batching is also supported, and padding is permitted between batches. The programming interface is discussed in the API Design section.
For the real time domain case, the frequency domain has conjugate symmetry and so only about half of the complex points are represented in the frequency data layout. The Real Time Domain section has the details.
Getting Started#
Hpk is currently available for Linux on x86_64
and aarch64
.
The former requires hardware supporting the AVX2 instruction set, with AVX512
optional.
The latter requires hardware supporting SVE with a vector length of 256 bits.
Half precision (IEEE Std. 754-2019 binary16) on x86_64
requires hardware with
the AVX512_FP16 instruction set.
For developers, a compiler supporting the C++17 standard is required.
Developing for half precision requires a compiler that supports the _Float16
floating point data type.
Installation#
The library is divided into packages.
The core package is required by everyone, as is at least one of the instruction
set specific packages.
For x86_64 (e.g., Intel’s Xeon), the packages avx2
and avx512
are
available; for ARM’s AArch64 (e.g., Amazon’s Graviton3), sve256
.
Each basic package is separated into two, e.g., core
and core-devel
.
Developers will want to install both, but users of their applications will only
need to install the package lacking the -devel
suffix.
Package ( |
Content description |
---|---|
|
core shared library |
|
include, examples, html, lib symlink, and CMake files |
|
AVX2 shared libs for single and double precision |
|
lib symlinks and CMake config target files for avx2 |
|
AVX512 shared libs for half, single, and double precision |
|
lib symlinks and CMake config target files for avx512 |
|
shared library utilizing OpenMP thread-level parallelism |
|
lib symlinks and CMake config target files for omp |
Package ( |
Content description |
---|---|
|
core shared library |
|
include, examples, html, lib symlink, and CMake files |
|
SVE256 shared libs for half, single, and double precision |
|
lib symlinks and CMake config target files for sve256 |
|
shared library utilizing OpenMP thread-level parallelism |
|
lib symlinks and CMake config target files for omp |
Debian users with root privilege can install into /opt/libhpk0
as follows:
sudo dpkg -i libhpk0-core_*_amd64.deb libhpk0-core-devel_*_amd64.deb \
libhpk0-avx2_*_amd64.deb libhpk0-avx2-devel_*_amd64.deb \
libhpk0-avx512_*_amd64.deb libhpk0-avx512-devel_*_amd64.deb \
libhpk0-omp_*_amd64.deb libhpk0-omp-devel_*_amd64.deb
Note that the Debian packages do not run any scripts (such as preinstall, etc.).
The x86_64
packages use the name amd64
for historical reasons.
Amazon Linux users with root privilege on Graviton3 or 3E instances can install
into /opt/libhpk0
as follows:
sudo rpm -i libhpk0-core-0*.aarch64.rpm libhpk0-core-devel-*.aarch64.rpm \
libhpk0-sve256-0*.aarch64.rpm libhpk0-sve256-devel-*.aarch64.rpm \
libhpk0-omp-0*.aarch64.rpm libhpk0-omp-devel-*.aarch64.rpm
Alternatively, tar archive files can be extracted into the directory of one’s choosing, and, of course, this method works on any Linux distribution. The tar files are available in compressed format, either with lzip or gzip. Lzip is preferred; the files are smaller and have better integrity checking. For example, to list the contents of an lzipped tar file, one can use:
tar -t --lzip -f filename.tar.lz
To extract the same file into the directory /opt/libhpk0
, one can use:
tar -x --lzip -f filename.tar.lz -C /opt/libhpk0
Note that, before running the previous command, you must make the directory
/opt/libhpk0
if it does not already exist.
If you’ve downloaded the file having suffix .sha512
, you can check the
SHA512 message digest of the files before installing as follows:
sha512sum -c --ignore-missing libhpk*.sha512
And then, if everything is OK,
for f in libhpk*.tar.lz; do tar -x --lzip -f $f -C /opt/libhpk0; done
By the way, if you’ve downloaded the sha512
files from the same place you got
the installation files themselves, then you’re not accomplishing very much.
Building an example AVX2 application#
First, install Hpk. Let’s suppose you have root privilege and install in
/opt/libhpk0
.
In the following instructions, we will specify this installation directory,
but you should change it as necessary.
Now in your home directory (or anywhere you like) create a file named
example.cpp
with the following contents:
#include <complex>
#include <vector>
#include <iostream>
#include <hpk/fft/makeFactory.hpp>
int main() {
auto factory = hpk::fft::makeFactory<float>();
auto fft = factory->makeInplace({4}); // Four points
std::vector<std::complex<float>> v = {7.0f, 0.0f, 0.0f, 0.0f};
std::cout << "Time:";
for (const auto& point : v) std::cout << ' ' << point;
std::cout << '\n';
fft->forward(v.data()); // Compute the FFT!
std::cout << "Freq:";
for (const auto& point : v) std::cout << ' ' << point;
std::cout << '\n';
}
Using a compiler directly#
To build the executable myexample
and run it, type the following two
commands:
c++ -std=c++17 -DHPK_HAVE_FFT_AVX2_FP32 -I/opt/libhpk0/include \
-L/opt/libhpk0/lib -Wl,-rpath=/opt/libhpk0/lib example.cpp \
-lhpk_fft_avx2_fp32 -lhpk_core -ldl -o myexample
./myexample
Using cmake, make, and a compiler#
In the same directory as example.cpp
, create a file named CMakeLists.txt
with the following contents:
cmake_minimum_required(VERSION 3.18)
project(MyProject LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)
find_package(Hpk CONFIG)
add_executable(myexample example.cpp)
target_link_libraries(myexample PRIVATE hpk::fft_avx2_fp32)
Now, to create the subdirectory mybuild
and initialize the build system, then
build myexample
, and then run it, type the following three commands:
cmake -S . -B mybuild -D Hpk_DIR=/opt/libhpk0/lib/cmake/hpk
cmake --build mybuild
mybuild/myexample
You should change the directory path above to match your local installation.
The find_package()
looks for the file HpkConfig.cmake
in the directory
specified by Hpk_DIR
.
In particular, Linux/AArch64 typically uses lib64
instead of lib
.
(And, of course, you want to link with hpk::fft_sve256_fp32
, not avx2
.)