We provide an evaluation of an analytical workload in a confidential computing environment, combining DuckDB with two technologies: modular columnar encryption in Parquet files (data at rest) and the newest version of the Intel SGX Trusted Execution Environment (TEE), providing a hardware enclave where data in flight can be (more) securely decrypted and processed. One finding is that the "performance tax"for such confidential analytical processing is acceptable compared to not using these technologies. We eventually manage to run TPC-H SF30 with under 2x overhead compared to non-encrypted, non-enclave execution; we show that, specifically, columnar compression and encryption are a good combination. Our second finding consists of dos and don'ts to tune DuckDB to work effectively in this environment. There are various performance hazards: potentially 5x higher cache miss costs due to memory encryption inside the enclave, NUMA penalties, and highly elevated cost of swapping pages in and out of the enclave - which is also triggered indirectly by using a non-SGX-aware malloc library.

DuckDB Labs
doi.org/10.1145/3662010.3663447
20th International Workshop on Data Management on New Hardware, DaMoN 2024
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Battiston, I., Felius, L., Ansmink, S., Kuiper, L., & Boncz, P. (2024). DuckDB-SGX2: The Good, The Bad and The Ugly within Confidential Analytical Query Processing. In 20th International Workshop on Data Management on New Hardware, DaMoN 2024 (pp. 14:0–14:5). doi:10.1145/3662010.3663447