The hash trie data structure is a common part in standard collection libraries of JVM programming languages such as Clojure and Scala. It enables fast immutable implementations of maps, sets, and vectors, but it requires considerably more memory than an equivalent array-based data structure. This hinders the scalability of functional programs and the further adoption of this otherwise attractive style of programming. In this paper we present a product family of hash tries. We generate Java source code to specialize them using knowledge of JVM object memory layout. The number of possible specializations is exponential. The optimization challenge is thus to find a minimal set of variants which lead to a maximal loss in memory footprint on any given data. Using a set of experiments we measured the distribution of internal tree node sizes in hash tries. We used the results as a guidance to decide which variants of the family to generate and which variants should be left to the generic implementation. A preliminary validating experiment on the implementation of sets and maps shows that this technique leads to a median decrease of 55% in memory footprint for maps (and 78% for sets), while still maintaining comparable performance. Our combination of data analysis and code specialization proved to be effective.
Software (theme 1)
Domain Specific Languages: A Big Future for Small Programs
ACM International Conference on Generative Programming and Component Engineering
Published paper: DOI: 10.1145/2658761.2658763
Software Analysis and Transformation

Steindorfer, M.J, & Vinju, J.J. (2014). Code Specialization for Memory Efficient Hash Tries. In Proceedings of ACM International Conference on Generative Programming and Component Engineering 2014 (GPCE 0). ACM.