The data on the web, in digital libraries, in scientific repositories, etc. continues to grow at an increasing rate. Distribution is a key solution to overcome this data explosion. However, existing solutions are mostly based on architectures with a single point of failure. In this paper, we present Armada, a model for a database architecture to handle large data volumes. Armada assumes autonomy of sites, allowing for a decentralised setup, where systems can largely work independently. Furthermore, a novel administration schema in Armada, based on lineage trails, allows for flexible adaptation to the (query) work load in highly dynamic environments. The lineage trails capture the metadata and its history. They form the basis to direct updates to the proper sites, to break queries into multi-stage plans, and to provide a reference point for site consistency. The lineage trails are managed in a purely distributed fashion, each Armada site is responsible for their persistency and long term availability. They provide a minimal, but sufficient basis to handle all distributed query processing tasks. The analysis of the Armada reference architecture depicts a path for innovative research at many levels of a DBMS. Challenging many conventional database assumptions and theories, it will eventually allow large databases to continue to grow and stay flexible.

Datenbanksysteme in Business, Technologie und Web
Database Architectures

Groffen, F.E, Kersten, M.L, & Manegold, S. (2007). Armada: a Reference Model for an Evolving Database System. In Proceedings of Datenbanksysteme in Business, Technologie und Web. RWTH.