G6K - GPU Tensor

G6K is an open-source C++ and Python (2) library that implements several Sieve algorithms to be used in more advanced lattice reduction tasks. It follows the stateful machine framework from:

Martin R. Albrecht and Léo Ducas and Gottfried Herold and Elena Kirshanova and Eamonn W. Postlethwaite and Marc Stevens, The General Sieve Kernel and New Records in Lattice Reduction.

The main source is available in fplll/g6k

This fork expands the G6K implementation with GPU, and in particular Tensor Core, accelerated sieves, and is accompanied by the work:

Léo Ducas, Marc Stevens, Wessel van Woerden, Advanced Lattice Sieving on GPUs, with Tensor Cores, Eurocrypt 2021 (eprint).

Note the this fork has been expanded from a pretty old commit.

The CPU-only version of the BDGL-like sieve has been integrated into the main g6k repository, with further improvements, and we aim for long term maintenance. The GPU implementation has been made public in this repository, but with a lower commitment to quality, documentation and maintenance. Nevertheless feel free to create issues in this repository.

Building the library

The code has only been tested on the NVIDIA Turing generation, and might not work on more recent GPUs.

You will need the current master of FPyLLL and a recent version of the CUDA Toolkit. See bootstrap.sh for creating all dependencies from scratch except for the CUDA Toolkit:

./bootstrap.sh                # once only: creates local python env, builds fplll, fpylll and G6K
source g6k-env/bin/activate   # for every new shell: activates local python env
./rebuild.sh -f -y            # whenever you want to rebuild G6K

Otherwise, you will need fplll and fpylll already installed and build the G6K Cython extension in place like so:

pip install Cython
pip install -r requirements.txt
./rebuild.sh -f -y

Remove -f option to compile faster (fewer optimisations). The -y option significantly reduces the memory footprint, but disables the standard cpu-only sieves. See rebuild.sh for more options.

Code examples

You can run a single svp-challenge instance on a multiple cores and multiple GPUs, for example:

./svp_challenge.py 100 --threads 4 --gpus 1 --verbose

Will run a svp-challenge using 4 CPU threads and a single GPU.

For more details on the parameters used for the SVP records see Section 7.2 of the paper or runchal2.sh.

BDGL-sieve

The BDGL-like GPU sieve can be enabled by running

./svp_challenge.py 100 --threads 4 --gpus 1 --gpu_bucketer bdgl --verbose

Acknowledgements

This project was supported through the European Union PROMETHEUS project (Horizon 2020 Research and Innovation Program, grant 780701), ERC-StGARTICULATE project (no. 947821), and the RCADG-ALGSTRONGCRYPTO project (no. 740972).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cuda		cuda
g6k		g6k
kernel		kernel
scripts		scripts
spherical_coding		spherical_coding
tests		tests
Makefile		Makefile
README.rst		README.rst
article.pdf		article.pdf
bkz.py		bkz.py
bootstrap.sh		bootstrap.sh
full_sieve.py		full_sieve.py
install-dependencies.sh		install-dependencies.sh
lwe_challenge.py		lwe_challenge.py
params.txt		params.txt
pytest.ini		pytest.ini
quality.py		quality.py
rebuild.sh		rebuild.sh
requirements.txt		requirements.txt
runchal2.sh		runchal2.sh
setup.py		setup.py
svp_challenge.py		svp_challenge.py
svp_exact.py		svp_exact.py
svp_exact_find_norm.py		svp_exact_find_norm.py

WvanWoerden/G6K-GPU-Tensor

Folders and files

Latest commit

History

Repository files navigation

G6K - GPU Tensor

Building the library

Code examples

BDGL-sieve

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages