An Alignment Model for Extracting English-Korean Translations of Term Constituents

Technical terms are linguistic representations of a domain concept, and their constituents are components used to represent the concept. Technical terms are usually multi-word terms and their meanings can be inferred from their constituents. Therefore, term constituents are essential for understanding the designated meaning of technical terms. However, there are several problems in finding the correct meanings of technical terms with their term constituents. First, because a term constituent is usually a morphological unit rather than a conceptual unit in the case of Korean technical terms, we need to first identify conceptual units by chunking term constituents. Second, conceptual units are sometimes homonyms or synonyms. Moreover their meanings show domain dependency. It is therefore necessary to give information about conceptual units and their possible meanings, including homonyms, synonyms, and domain dependency, so that natural language applications can properly handle technical terms. In this paper, we propose a term constituent alignment algorithm that extracts such information from bilingual technical term pairs. Our algorithm recognizes conceptual units and their meanings by finding English term constituents and their corresponding Korean term constituents for given English-Korean term pairs. Our experimental results indicate that this method can effectively find conceptual units and their meanings with about 6% alignment error rate (AER) on manually analyzed experimental data and about 14% AER on automatically analyzed experimental data.