XArch: archiving scientific and reference data

Database archiving is important for the retrieval of old versions of a database and for temporal queries over the history of data. We demonstrate XArch, a management system for maintaining, populating, and querying archives of hierarchical data. XArch is based on a nested merge approach that efficiently stores multiple versions of hierarchical data in a compact archive. By merging elements into one data structure, any specific version is retrievable from the archive in a single pass over the data and efficient tracking of object history is possible. XArch implements this approach and extends it in two important ways. First, in order to merge large hierarchical data sets, elements need to be sorted according to their key values. We developed an efficient algorithm for sorting hierarchical data in secondary storage and modified the nested merge algorithm accordingly. Second, we designed and implemented a declarative query language that enables one both to view data from particular versions and to track the history of objects. We demonstrate this using both molecular biology and demographic reference data as examples.

[1]  Aoying Zhou,et al.  DTD-Directed Publishing with Attribute Translation Grammars , 2002, VLDB.

[2]  Keishi Tajima,et al.  Archiving scientific data , 2004, TODS.

[3]  Peter Stoehr,et al.  The EMBL sequence version archive , 2003, Bioinform..

[4]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[5]  Wenfei Fan,et al.  Keys for XML , 2001, WWW '01.

[6]  Heiko Müller,et al.  Sorting Hierarchical Data in External Memory , 2007 .

[7]  Jun Yang,et al.  NEXSORT: sorting XML in external memory , 2004, Proceedings. 20th International Conference on Data Engineering.