Technical Report LCL9405

Authors: Uzzi Ornan


Many efforts have been made in recent years by various Agencies of Standards to establish a common script to be used for preparing bibliographies, catalogues, historical or geographical texts, etc. of non-European languages. This script is not intended to supplant the national systems, but rather to provide a modern means for international communication for both men and machines. Usually such a script consists of Roman characters with some additional, mainly diacritical, signs. Such a script is essential for languages which use alphabetic script in which vital components of a word do not appear, such as regular Hebrew or Arabic. The regular texts in these languages do not have signs for many vowels nor for a double consonant. It is written as one letter only. Also, many "particles", such as the article, certain prepositions and connectives are written with no space before the subsequent word. Morphological analysis, and hence syntactic and semantic understanding, thus become a very hard task. Any attempt, e.g., for automatic translation becomes impractical before a solution is found. A good method for Romanizaton of such a script may be used as an interim phase for these purposes also. Our motives for establishing such a method are therefore multiplied. Two basic methods for achieving such a script are transliteration and transcription. Sometimes a mixture of both is used. In the present paper we check the concepts on which these methods are based, and suggest a third approach that should be taken for transmitting languages written in traditional form into other representations. Our suggestion is based on the concept of phonemic symbols as a representation of the theoretical structure of words, while the phonetic values of each sign (or clusters of signs) should be treated by "reading rules" (or realizing rules) for each language separately. We shall specify advantages and disadvantages of the first two methods and show how our approach preserves the advantages of each of the accepted methods and rejects their disadvantages. We exemplify it mainly by the Hebrew language.

