Australian Linguistics Society
1998

Interpretation of Japanese `spatial' nouns
in Japanese-to-English Machine Translation

Francis Bond, NTT Communication Science Laboratories, Kyoto,
bond@cslab.kecl.ntt.co.jp


Abstract:

Spatial nouns (kuukan meishi) express a relation, prototypically spatial, between their dependent and the entity they modify. Syntactically they take a noun phrase or clause as an adnominal complement, and the resulting phrase (including postposition) modifies another noun phrase or clause.

Japanese has few postpositions compared to the number of English prepositions, instead almost all spatial relations are expressed using spatial nouns, similarly to the English constructions such as `on top of', `in front of'.

In this paper I contrast the use of spatial nouns in Japanese with the use of English prepositions, and show the importance for interpretation of considering the meanings of both elements being related. Based on the contrastive analysis, an algorithm is outlined for automatically translating from Japanese to English, and some examples given of its successes and failures.

   
1. Introduction

In this paper I contrast the use of Japanese postpositions with that of English prepositions, and show why spatial nouns are so common in Japanese. I then introduce an algorithm for selecting prepositions when automatically translating from Japanese to English based on the contrastive analysis.

   
2. Japanese Spatial Nouns

Japanese spatial nouns are common nouns, with no special syntactic properties. Semantically however, spatial nouns take two arguments, their dependent (the internal argument) and their governer (the external argument). Either argument can be an event or an object.

The dependent argument can be a noun phrase, normally marked with the adnominal case-marker no, or a sentence. The resulting constituent can either modify another noun phrase, in which case it will itself be characteristically be marked with the adnominal case-marker, or a tensed clause, in which case it will typically be marked with one of the locative semantic markers de or ni. This is the same as any other common noun.

(1) NP1 SN-no NP2
(2) NP1 SN-mk Cl1
(3) Cl1 SN-mk Cl2
(4) Cl1 SN-no NP2

Note that there are many nouns that also pattern like this, (Martin 1988: 664-740) calls them postadnominals, as they typically appear with some adnominal (pre) modification. This distribution is a straightforward consequence of their semantic structure, they need a dependent to be their internal semantic argument. Martin 1988 estimates that there may be as many as 1,000 postadnominals, this paper will only discuss a subset of them: spatial nouns (kuukan meishi). That is those nouns which include, as one of their uses, the predication of spatial relationships.

In particular I will focus on spatial nouns that encode relative position, both vertical and horizontal, because they are widely used to encode not only spatial relations, but also temporal and other relations. The spatial nouns in question are listed below:

vertical:
ue ``up'', naka ``middle'', shita ``down''
horizontal:
mae ``before'', ato ``after'', ushiro ``behind''

There are many other spatial relations, such as those which encode surrounding space (mawari ``around'', tonari ``next to'' and others), but these are relatively straightforward to translate, as they are not extended so much, although the choice of translation is not always straightforward (Tanaka 1996).

   
3. Japanese contrasted with English

In this section I contrast Japanese and English. I first discuss the differences between Japanese postpositions and English prepositions, with one main difference being that there are fewer Japanese postpositions than English prepositions. I then show how spatial nouns are used to encode relations that are characteristically encoded by prepositions in English.

3.1 Postpositions and prepositions

Japanese postpositions and English prepositions are both examples of adpositions: a grammatically distinct closed class of words that are typically used to represent spatial relations and/or semantic roles, usually with no inflectional contrast, which characteristically take NP complements and functions as dependents of verbs, nouns and adjectives (Huddleston 1988: 123-124).

In Japanese, I will only discuss Japanese semantic and case-markers (kaku-joshi), setting aside adverbial markers (I refer to the Japanese postpositions as markers, as I do not treat them as the heads of the phrases they appear in). There are three case-markers: -ga ``nominative'', -o ``accusative'' and -ni ``dative''; and eight semantic markers: -ni ``locative/goal'' -e ``locative-goal'', -de ``locative/instrumental'', -to ``commitative'', -kara ``source'', -made ``goal'', -yori ``source/comparative'' and -no ``adnominal''. Note that the mapping from semantic marker to case-role is not straightforward, the preceding glosses do imply a commitment to a simple set of cases (see (Bond & Shirai 1997) for more discussion). Note also that there is a good deal of squish between the case and semantic markers, -ni in particular is variously classed as only case-marker, only a semantic marker or both (Ono 1996).

Arguments marked by the three case markers correspond roughly to subject, object and indirect object in English. Arguments marked by the semantic markers can be complements in clause structure, but are more typically adjuncts. These are functions normally carried out by prepositional phrases in English, and so it is worthwhile to compare Japanese postpositions and English prepositions.

The most striking difference is in volume: Japanese has eleven postpositions, while English has over 50 simple prepositions (Quirk et al. 1985: 665-667). Therefore, many distinctions made in English with prepositions, must be made in some other way in Japanese.

In the restricted field of spatial expressions, Japanese makes one distinction not generally made in English. Two semantic markers are used for static locations: -ni which is used to mark position (either static or goal), and -de which is used to mark the place at which an activity occurs. It is well known that English distinguishes between three locative relations: at the intersection of two lines (0 dimensional): at; on a line or surface (1 or 2 dimensional): on; or enclosed in a bounded space, (2 or 3 dimensional): in (Fillmore 1997, Quirk et al. 1985).

Mismatches such as this between the relations encoded in Japanese and English, are well known to cause problems for machine translation.

The choice of English preposition depends on how its complement is conceived, this is generally based on its dimensionality, but is, in many cases, more a matter of conventional usage, for example one typically rides on a bus rather than in it, even though a bus forms an enclosing space. Variation is possible, of course, and carries meaning, if I say I was riding in a bus, it brings into focus the fact that I was inside it.

3.2 Spatial nouns

Postadnominals in Japanese, such as spatial nouns, are used to extend the range of meanings that can be conveyed, in a similar way to the nouns that appear in English `complex prepositions' such as in front of. Because there are so few postpositions in Japanese, spatial nouns, and other nouns that pattern with them, are used frequently.

Many such combinations as in front of must, somehow, be stored in the lexicon, both because of possibly idiomatic semantics and idiosyncratic syntax (front takes no article). Japanese spatial nouns can also be thought of as forming a unit with their postpositions and the adnominal marker of the preceding noun phrase, the whole behaving like a single post-position. There is disagreement as to whether such combinations should be stored as a single unit, (a complex preposition) or as some kind of lexical rule. Again there appears to be a continuum here, from fully separable expressions, all the way to fully lexicalised ones (such as because of, originally by cause of).

I will now examine individually the spatial nouns listed in Section 2, considering spatial, temporal and other uses.

3.2.1 Vertical:

  • ue ``up'' is primarily used to mark the spatial relationship of being vertically above something, or greater in amount than something. Japanese makes no distinction between whether the two entities in question are touching or not. When ue is modified by a clause, it typically means ``in addition to''.
  • naka ``middle'' is primarily used to mark the state where one entity is enclosed by another. It is also used to mark one duration as being enclosed within another. Another common use is to mark a range of entities from which one ore more is to be chosen.
  • shita ``down'' is is primarily used to mark the spatial relationship of being vertically below, or less in amount than, something.

3.2.2 Horizontal:

  • mae ``before'' is primarily used to mark the spatial relationship of being in front of something. It is also commonly used to show something temporarily preceding something else.
  • ato/ushiro ``after'' is primarily used to mark the spatial relationship of being behind something. It is also commonly used to show something temporarily succeeding something else. The same Chinese character (kanji) is normally pronounced differently for the temporal use ato and the locative use ushiro.

Japanese makes the same distinctions between intrinsic directions, and extrinsic relations as English does. (Tanaka 1996) has shown in a series of experiments, that there is not much difference between Japanese and English speakers in the use of intrinsic and extrinsic spatial expressions. Therefore, although they are problematic for many artificial intelligence applications, no special processing is needed for machine translation, at least between Japanese and English.

4. Machine Translation

The choice of prepositions is recognized as one of the hardest problems for machine translation (Durand 1993).

Prepositions are normally generated from two sources, either from a predicate's subcategorization frame when it is specified (for example, on in depend on), or by transfer from a source language preposition.

The generally accepted method for transfer is to define a set of semantic relations fine enough to disambiguate the prepositions used in the languages of interest. These can then be used as to build both lexicons and transfer rules. A good example of this is (Trujillo 1995) who does this for spatial prepositions in English and Spanish, also showing that his hierarchy can handle Hungarian, and thus has some claim to being multi-lingual. (Nuebel 1996) uses case roles (abstract relations such as temp_loc) to translate from German and Japanese to English in the VERBMOBIL system.

For languages with very different structure, a successful approach has been to rewrite the source language into simpler structures before the transfer stage (Shirai et al. 1993). In this rewriting stage, complex expressions such as complex prepositions are grouped together into a single constituent, which will then be translated as a unit.

The approach I introduce here is an extension of the transfer based approach. First I describe the algorithm to select an English locative preposition (at, on or in), then I show how this can be used, along with the knowledge of the semantic field, to translate spatial nouns.

4.1 An algorithm for selecting English Locative Prepositions

Many natural language processing applications treat the choice of at, in or on as fully lexicalized, that is it will depend on the lexical class of the noun heading the dependent noun phrase (e.g., (Trujillo 1995)). This is a reasonable first approximation, but is unsatifactory for two reasons. First, it does not capture well known generalizations: fields and meadows are in, roads and highways are on and so on. Second, it does not offer a good mechanism to generate variation in classifier use: I can meet you at the church, where the church is merely though of as a point in space, or in the church, where the church is thought of as an enclosure.

I propose a more flexible method, where the preposition is selected according to the semantic category of the head noun. For nouns with multiple senses, a different sense being chosen may trigger the choice of a different preposition.

The semantic categories used are from Goi-Taikei, a Japanese lexicon, which has a semantic hierarchy of 2,800 nodes, originally developed for Japanese analysis, but also used to mark senses in a Japanese to English transfer dictionary. 300 nodes in the hierarchy are marked, principally under the nodes for shape, residence, topography, place and location [of activity]. Each node is marked with one of the three prepositions IN/ON/AT.

When a word appears in a locative expression where the English preposition is underspecified, the following processing takes place. If the word has only one marked sense, then it is used to select the preposition. If there are more than one, then the marking from the deepest node in the hierarchy (the most detailed sense) is chosen. If there are more than one nodes of equal depth, then choose at if it is a candidate, otherwise choose in.

The algorithm can be dynamically modified as follows, if one of the senses is made salient by the context, then increase it's depth. For example, locative adjuncts marked with the case marker -de, make the location [of activity], category salient. This allows us to distinguish between the following:

(5) watashi-wa ofisu-ni imasu
     I-NOM office-LOC1 be
     I am in the office
(6) watashi-wa ofisu-de hatarakimasu
     I-NOM office-LOC2 work
     I work at an office

4.2 Translating Spatial Nouns

Finally I introduce the algorithm for translating spatial nouns. The hard work is done in choosing which English preposition will be appropriate, and determining the semantic field. Choosing between at, in or on was described in the previous section, I will now describe how the semantic field is determined.

I consider three types and a remainder: locative, temporal, quantitive, and other. If the internal argument is a clause, then the semantic field will typically be temporal, so that is chosen as a default. If the external field is a clause, and the case-role of the noun phrase headed by the spatial noun is known, then the case role determines the semantic field, similarly to the use of abstract relations by (Nuebel 1996). Finally, if the case role is unknown, then the semantic category of the internal argument is used to determine the semantic field. If the category is a place or location then locative, if time or event then temporal, if it is an amount then quantitative, otherwise other.

When both the preposition-type (AT/IN/ON) and the semantic field are known, the spatial nouns can be translated as follows:

If the preposition-type governing the spatial noun is one of AT/IN or ON (in this case within and inside are treated the same as IN, and on top of is treated as on, then translated using the form given in the line corresponding to the preposition type. If there is another preposition expected (PP) (such as from or to) then combine it with the translation of the spatial noun as shown in the table.

Blank elements in the table should normally not occur, if they do, then the same form as the locative column is generated as a default.

 
Table 1: Translation by semantic field
Spatial
Noun
Preposition
Type
Semantic Field
Locative Temporal Quantitative Other
ue ON on * -- above on
IN, AT on top of * -- above on
PP PP the top of * -- PP above PP on
naka IN IN * during -- among **
ON, AT inside * during -- among **
PP PP inside of * during -- PP among **
shita IN, ON, AT under * -- below under
PP PP under * -- PP below PP under
mae IN, ON, AT in front of * before * -- --
PP PP in front of * PP before * -- --
ato
ushiro
IN, ON, AT behind * after * -- --
PP PP before * PP after * -- --
* the dependent noun phrase is interpreted as definite.
** the dependent noun phrase is interpreted as unbounded.

Note that this involves special processing for ON with ue ``above'' and for IN with naka ``middle'', which is a direct result of their different dimensionalities, it is natural to be above a line or surface, and in the middle of a bounded space.

The algorithm described above has been implemented in the Japanese-to-English machine translation system ALT-J/E (Ikehara et al. 1991) and is currently being evaluated. Preliminary results show it to be largely successful, although the method of choosing the semantic field needs to be refined. In addition, other dimensions of choice, such as between above, over and on top of or under and below for spatial relationships still need to be considered.

   
5. Conclusion

In this paper introduce an algorithm for selecting prepositions when automatically translating from Japanese to English based on a contrastive analysis of Japanese postpositions and English prepositions.

Acknowledgments

I would like to thank Christoph Neumann, Kentaro Ogura and Kyonghee Paik for their comments on this paper, and Toshiaki Nebashi for his invaluable help in the implementation. The author is currently also enrolled part time as a doctoral candidate at the University of Queensland's Center for Language Teaching & Research.

References

FULL PAPER