Interpretation of Japanese
in Japanese-to-English Machine Translation
Francis Bond, NTT Communication Science
Spatial nouns (kuukan meishi) express a relation, prototypically
spatial, between their dependent and the entity they modify.
Syntactically they take a noun phrase or clause as an adnominal
complement, and the resulting phrase (including postposition)
modifies another noun phrase or clause.
Japanese has few postpositions compared to the number of English
prepositions, instead almost all spatial relations are expressed
using spatial nouns, similarly to the English constructions such as
`on top of', `in front of'.
In this paper I contrast the use of spatial nouns in Japanese with
the use of English prepositions, and show the importance for
interpretation of considering the meanings of both elements being
related. Based on the contrastive analysis, an algorithm is outlined
for automatically translating from Japanese to English, and some
examples given of its successes and failures.
In this paper I contrast the use of Japanese postpositions with that
of English prepositions, and show why spatial nouns are so common in
Japanese. I then introduce an algorithm for selecting prepositions
when automatically translating from Japanese to English based on the
2. Japanese Spatial Nouns
Japanese spatial nouns are common nouns, with no special syntactic
properties. Semantically however, spatial nouns take two arguments,
their dependent (the internal argument) and their governer (the
external argument). Either argument can be an event or an object.
The dependent argument can be a noun phrase, normally marked with the
adnominal case-marker no, or a sentence. The resulting
constituent can either modify another noun phrase, in which case it
will itself be characteristically be marked with the adnominal
case-marker, or a tensed clause, in which case it will typically be marked
with one of the locative semantic markers de or ni. This
is the same as any other common noun.
(1) NP1 SN-no NP2
(2) NP1 SN-mk Cl1
(3) Cl1 SN-mk Cl2
(4) Cl1 SN-no NP2
Note that there are many nouns that also pattern like this,
(Martin 1988: 664-740) calls them postadnominals, as they
typically appear with some adnominal (pre) modification. This
distribution is a straightforward consequence of their semantic
structure, they need a dependent to be their internal semantic
argument. Martin 1988 estimates that there may be as
many as 1,000 postadnominals, this paper will only discuss a subset of
them: spatial nouns (kuukan meishi). That is those nouns which
include, as one of their uses, the predication of spatial
In particular I will focus on spatial nouns that encode relative
position, both vertical and horizontal, because they are widely used
to encode not only spatial relations, but also temporal and other
relations. The spatial nouns in question are listed below:
- ue ``up'', naka ``middle'', shita ``down''
- mae ``before'', ato ``after'',
There are many other spatial relations, such as those which encode
surrounding space (mawari ``around'', tonari ``next to'' and
others), but these are relatively straightforward to translate, as
they are not extended so much, although the choice of translation is
not always straightforward (Tanaka 1996).
3. Japanese contrasted with English
In this section I contrast Japanese and English. I first discuss the
differences between Japanese postpositions and English prepositions,
with one main difference being that there are fewer Japanese
postpositions than English prepositions. I then show how spatial
nouns are used to encode relations that are characteristically encoded
by prepositions in English.
Japanese postpositions and English prepositions are both examples of
adpositions: a grammatically distinct closed class of words that
are typically used to represent spatial relations and/or semantic
roles, usually with no inflectional contrast, which characteristically
take NP complements and functions as dependents of verbs, nouns and
adjectives (Huddleston 1988: 123-124).
In Japanese, I will only discuss Japanese semantic and case-markers
(kaku-joshi), setting aside adverbial markers (I refer to the
Japanese postpositions as markers, as I do not treat them as the heads
of the phrases they appear in). There are three case-markers:
-ga ``nominative'', -o ``accusative'' and -ni ``dative''; and
eight semantic markers: -ni ``locative/goal''
-e ``locative-goal'', -de ``locative/instrumental'',
-to ``commitative'', -kara ``source'', -made ``goal'',
-yori ``source/comparative'' and -no ``adnominal''. Note that
the mapping from semantic marker to case-role is not straightforward,
the preceding glosses do imply a commitment to a simple set of cases
(see (Bond & Shirai 1997) for more discussion). Note also that there
is a good deal of squish between the case and semantic markers,
-ni in particular is variously classed as only case-marker, only
a semantic marker or both (Ono 1996).
Arguments marked by the three case markers correspond roughly to
subject, object and indirect object in English. Arguments marked by
the semantic markers can be complements in clause structure, but are
more typically adjuncts. These are functions normally carried out by
prepositional phrases in English, and so it is worthwhile to compare
Japanese postpositions and English prepositions.
The most striking difference is in volume: Japanese has eleven
postpositions, while English has over 50 simple prepositions
(Quirk et al. 1985: 665-667). Therefore, many distinctions made in
English with prepositions, must be made in some other way in
In the restricted field of spatial expressions, Japanese makes one
distinction not generally made in English. Two semantic markers are
used for static locations: -ni which is used to mark position
(either static or goal), and -de which is used to mark the place
at which an activity occurs.
It is well known that English distinguishes between three locative
relations: at the intersection of two lines (0 dimensional): at;
on a line or surface (1 or 2 dimensional): on; or enclosed in a
bounded space, (2 or 3 dimensional): in (Fillmore 1997, Quirk et al. 1985).
Mismatches such as this between the relations encoded in Japanese and
English, are well known to cause problems for machine translation.
The choice of English preposition depends on how its complement is
conceived, this is generally based on its dimensionality, but is, in
many cases, more a matter of conventional usage, for example one
typically rides on a bus rather than in it, even
though a bus forms an enclosing space. Variation is possible, of
course, and carries meaning, if I say I was riding in a bus, it
brings into focus the fact that I was inside it.
Postadnominals in Japanese, such as spatial nouns, are used to extend
the range of meanings that can be conveyed, in a similar way to the
nouns that appear in English `complex prepositions' such as in
front of. Because there are so few postpositions in Japanese,
spatial nouns, and other nouns that pattern with them, are used
Many such combinations as in front of must, somehow, be stored
in the lexicon, both because of possibly idiomatic semantics and
idiosyncratic syntax (front takes no article). Japanese spatial
nouns can also be thought of as forming a unit with their
postpositions and the adnominal marker of the preceding noun phrase,
the whole behaving like a single post-position. There is disagreement
as to whether such combinations should be stored as a single unit, (a
complex preposition) or as some kind of lexical rule. Again there
appears to be a continuum here, from fully separable expressions, all
the way to fully lexicalised ones (such as because of,
originally by cause of).
I will now examine individually the spatial nouns listed in
Section 2, considering spatial, temporal and other uses.
- ue ``up'' is primarily used to mark the spatial relationship
of being vertically above something, or greater in amount than
something. Japanese makes no distinction between whether the two
entities in question are touching or not. When ue is modified
by a clause, it typically means ``in addition to''.
- naka ``middle'' is primarily used to mark the state where
one entity is enclosed by another. It is also used to mark one
duration as being enclosed within another. Another common use is to
mark a range of entities from which one ore more is to be chosen.
- shita ``down'' is is primarily used to mark the spatial
relationship of being vertically below, or less in amount than,
- mae ``before'' is primarily used to mark the spatial
relationship of being in front of something. It is also commonly
used to show something temporarily preceding something else.
- ato/ushiro ``after'' is primarily used to mark the spatial
relationship of being behind something. It is also commonly
used to show something temporarily succeeding something else. The
same Chinese character (kanji) is normally pronounced differently
for the temporal use ato and the locative use ushiro.
Japanese makes the same distinctions between intrinsic directions, and
extrinsic relations as English does. (Tanaka 1996) has shown
in a series of experiments, that there is not much difference between
Japanese and English speakers in the use of intrinsic and extrinsic
spatial expressions. Therefore, although they are problematic for
many artificial intelligence applications, no special processing is
needed for machine translation, at least between Japanese and English.
The choice of prepositions is recognized as one of the hardest
problems for machine translation (Durand 1993).
Prepositions are normally generated from two sources, either from a
predicate's subcategorization frame when it is specified (for example,
on in depend on), or by transfer from a source language
The generally accepted method for transfer is to define a set of
semantic relations fine enough to disambiguate the prepositions used
in the languages of interest. These can then be used as to build both
lexicons and transfer rules. A good example of this is
(Trujillo 1995) who does this for spatial prepositions in
English and Spanish, also showing that his hierarchy can handle
Hungarian, and thus has some claim to being multi-lingual.
(Nuebel 1996) uses case roles (abstract relations such as
temp_loc) to translate from German and Japanese to English in
the VERBMOBIL system.
For languages with very different structure, a successful approach has
been to rewrite the source language into simpler structures before the
transfer stage (Shirai et al. 1993). In this rewriting stage, complex
expressions such as complex prepositions are grouped together into a
single constituent, which will then be translated as a unit.
The approach I introduce here is an extension of the transfer based
approach. First I describe the algorithm to select an English
locative preposition (at, on or in), then I show how
this can be used, along with the knowledge of the semantic field, to
translate spatial nouns.
Many natural language processing applications treat the choice of
at, in or on as fully lexicalized, that is it will
depend on the lexical class of the noun heading the dependent noun
phrase (e.g., (Trujillo 1995)). This is a reasonable first
approximation, but is unsatifactory for two reasons. First, it does
not capture well known generalizations: fields and meadows are
in, roads and highways are on and so on. Second, it does
not offer a good mechanism to generate variation in classifier use: I
can meet you at the church, where the church is merely though of
as a point in space, or in the church, where the church is
thought of as an enclosure.
I propose a more flexible method, where the preposition is selected
according to the semantic category of the head noun. For nouns with
multiple senses, a different sense being chosen may trigger the choice
of a different preposition.
The semantic categories used are from Goi-Taikei, a Japanese lexicon,
which has a semantic hierarchy of 2,800 nodes, originally developed
for Japanese analysis, but also used to mark senses in a Japanese to
English transfer dictionary. 300 nodes in the hierarchy are marked,
principally under the nodes for shape, residence,
topography, place and location [of activity]. Each
node is marked with one of the three prepositions IN/ON/AT.
When a word appears in a locative expression where the English
preposition is underspecified, the following processing takes place.
If the word has only one marked sense, then it is used to select the
preposition. If there are more than one, then the marking from the
deepest node in the hierarchy (the most detailed sense) is chosen. If
there are more than one nodes of equal depth, then choose at if
it is a candidate, otherwise choose in.
The algorithm can be dynamically modified as follows, if one of the
senses is made salient by the context, then increase it's depth. For
example, locative adjuncts marked with the case marker -de, make
the location [of activity], category salient.
This allows us to distinguish between the following:
(5) watashi-wa ofisu-ni imasu
I-NOM office-LOC1 be
I am in the office
(6) watashi-wa ofisu-de hatarakimasu
I-NOM office-LOC2 work
I work at an office
Finally I introduce the algorithm for translating spatial nouns. The
hard work is done in choosing which English preposition will be
appropriate, and determining the semantic field. Choosing between
at, in or on was described in the previous section,
I will now describe how the semantic field is determined.
I consider three types and a remainder: locative,
temporal, quantitive, and other. If the
internal argument is a clause, then the semantic field will typically
be temporal, so that is chosen as a default. If the external field is
a clause, and the case-role of the noun phrase headed by the spatial
noun is known, then the case role determines the semantic field,
similarly to the use of abstract relations by (Nuebel 1996). Finally, if the case role is
unknown, then the semantic category of the internal argument is used
to determine the semantic field. If the category is a place
or location then locative, if time or
event then temporal, if it is an amount
then quantitative, otherwise other.
When both the preposition-type (AT/IN/ON) and the semantic field are
known, the spatial nouns can be translated as follows:
If the preposition-type governing the spatial noun is one of AT/IN or
ON (in this case within and inside are treated the same as
IN, and on top of is treated as on, then translated using
the form given in the line corresponding to the preposition type.
If there is another preposition expected (PP) (such as from or
to) then combine it with the translation of the spatial noun as
shown in the table.
Blank elements in the table should normally not occur, if they do,
then the same form as the locative column is generated as a default.
Translation by semantic field
* the dependent noun phrase is interpreted as definite.
||on top of *
||PP the top of *
||PP inside of *
||PP among **
||IN, ON, AT
||PP under *
||IN, ON, AT
||in front of *
||PP in front of *
||PP before *
|IN, ON, AT
||PP before *
||PP after *
** the dependent noun phrase is interpreted as unbounded.
Note that this involves special processing for ON with ue ``above''
and for IN with naka ``middle'', which is a direct result of their
different dimensionalities, it is natural to be above a line or
surface, and in the middle of a bounded space.
The algorithm described above has been implemented in the
Japanese-to-English machine translation system ALT-J/E
(Ikehara et al. 1991) and is
currently being evaluated. Preliminary
results show it to be largely successful, although the method of
choosing the semantic field needs to be refined. In addition, other
dimensions of choice, such as between above, over and
on top of or under and below for spatial
relationships still need to be considered.
In this paper introduce an algorithm for selecting prepositions when
automatically translating from Japanese to English based on a
contrastive analysis of Japanese postpositions and English
I would like to thank Christoph Neumann, Kentaro Ogura and Kyonghee
Paik for their comments on this paper, and Toshiaki Nebashi for his
invaluable help in the implementation. The author is currently also
enrolled part time as a doctoral candidate at the University of
Queensland's Center for Language Teaching & Research.
- Bond, Francis, & Satoshi Shirai.
Practical and efficient organization of a large valency dictionary.
In Workshop on Multilingual Information Processing -- Natural
Language Processing Pacific Rim Symposium '97: NLPRS-97, Phuket.
- Durand, Jacques.
On the translation of prepositions in multilingual MT.
In Linguistic Issues in Machine Translation, ed. by Frank Van
Eynde, 138-159. Pinter Publishers.
Fillmore, Charles J.
Lectures on Deixis.
Number 65 in CSLI lecture notes. CSLI.
English grammar: an outline.
Cambridge: Cambridge University Press.
- Ikehara, Satoru, Masahiro Miyazaki, Satoshi
Shirai, Akio Yokoo, Hiromi Nakaiwa, Kentaro Ogura, Yoshifumi Ooyama
& Yoshihiko Hayashi.
Goi-Taikei --- A Japanese Lexicon.
Tokyo: Iwanami Shoten. 5 volumes.
Ikehara, Satoru, Satoshi Shirai, Akio Yokoo, &
Toward an MT system without pre-editing - effects of new methods
In Third Machine Translation Summit: MT Summit III, 101-106,
Martin, Samuel E.
A Reference Grammar of Japanese.
Knowledge sources for the disambiguation of prepositions in machine
Syntactic behaviour of case and adverbial particles in Japanese.
Australian Journal of Linguistics 16.81-129.
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, &
A Comprehensive Grammar of the English Language.
Shirai, Satoshi, Satoru Ikehara, & Tsukasa Kawaoka.
Effects of automatic rewriting of source language within a Japanese
to English MT system.
In Fifth International Conference on Theoretical and
Methodological Issues in Machine Translation: TMI-93, 226-239.
Comparing Japanese and English in the area of surrounding
In REPORTS of the Keio Institute of Cultural and Linguistic
Studies, number 28, 99-127. Japan: Keio University.
- Trujillo, Arturo.
Towards a Cross-Linguistically Valid Classification
of Spatial Prepositions.
Machine Translation 10.93-141.