My dissertation

Degree awarded by the University of Queensland, 2001: Doctor of Philosophy, in the Field of English.

My long suffering supervisors were: Roland Sussex, Rodney Huddleston and (sometimes) Tsuneko Nakazawa.

Determiners and Number in English contrasted with Japanese, as exemplified in Machine Translation

Abstract

The fact that concepts are grammaticalized differently in different languages is a major problem for translation, especially for machine translation. Two major examples of this are syntactic number, and the use of (in)definite articles (a, some, the). In languages such as English, nouns are marked for number and the choice of article (or of no article) must be made for every noun phrase. In contrast, for languages such as Japanese, number distinctions are not normally made, and there are no articles. This means that whenever a noun phrase is translated from Japanese to English, even if the denotation is perfectly understood and a good translation equivalent found, generating the noun phrase still requires two difficult choices: should the head noun be singular or plural, and which article, if any, should be generated.

This thesis proposes a semantic representation and a series of three heuristic algorithms that make possible the appropriate generation of articles and number when translating from Japanese to English. The semantic representation provides a tractable set of features to represent (1) the referential use of a noun phrase, as either referential, generic, ascriptive or idiomatic; (2) the interpretation of the noun phrase's referent as either a countable individual or a mass, with seven detailed subtypes; (3) the definiteness of the noun phrase, as either definite, indefinite, definite and extensive, or possessed. The three algorithms automatically acquire values for these features from the analysis of the Japanese text and the lexical properties of the English translation equivalents, and then use them to generate English. The first algorithm determines the referential use of Japanese noun phrases, based on a defeasible hierarchy of pragmatic rules that are applied top-down, from the clause to the noun phrase. The second algorithm determines the appropriate interpretation for English noun phrases, while the third determines which determiner, if any, should be generated. These algorithms use rules based on the different referential uses of the noun phrase.

The proposed algorithms are implemented in a Japanese-to-English machine translation system, and the detailed lexical information is entered into its lexicon. The use of the algorithms improves the percentage of noun phrases generated with correct use of articles and number from 65% to 85%.



Francis Bond <bond@cslab.kecl.ntt.co.jp>
Machine Translation Research Group
NTT Communication Science Laboratories
2-4 Hikari-dai, Seika-cho, Soraku-gun, Kyoto, JAPAN, 619-0237
Tel: 0774-93-5313 (+81); Fax: 0774-93-5345 (+81)

Valid HTML 4.0 Transitional! Last modified: Thu Apr 25 15:27:04 JST 2002