Designing and implementing the DTD Inference Engine for the I-Wiz project
ABSTRACT: DTD (Document Type Definition) inference for XML is an active research area. The implementation of the DTD Inference Engine (DIE) is also a part of the I-Wiz project, an on-going XML-based database integration project in the University of Florida. This thesis addresses the research and implementation issues underlying DTD inference. We make theoretical investigations into DTD inference in the context of context-free languages and present our design and implementation of a DTD Inference Engine. Our theoretical study clarifies some concepts that have been used intuitively and vaguely without definition in the research literature. We define the essential concepts such as a kernel derivation tree (KDT), sound, tight and closure DTD. We introduce two theorems stating the relationship between multiple kernel derivation trees from a single grammar and multiple grammars to a single derivation tree. We conclude that a finite language has a finite number of KDTs and an infinite language has an infinite number of KDTs. We reveal the possibility of nonexistence of closure DTDs for some source XML documents. We state our choice of the rules for DTD inference and we give our rationale for the choice of the rules. We design a new unique architecture for the DTD Inference Engine and a three-dimensional linked list data structure for representing the inferred DTD internally. We give algorithms and complexity analysis of the engine. Our DIE has three unique features not found anywhere else: a factorization reduction, the ability to handle multiple documents mechanism, and the incremental maintenance. Finally, we describe several possibilities for extending this work
Thesis, Dissertation, English, 2000
State University System of Florida, [Florida], 2000