QALAM: A Convention For Morphological Arabic-Latic-Arabic Transliteration

==============================================================================
Qalam: A Convention for Morphological
Arabic-Latin-Arabic Transliteration

Abdelsalam Heddaya (heddaya@cs.bu.edu)

with contributions from

Walid Hamdy (hamdy@lids.mit.edu)
M. Hashem Sherif (mhs@homxa.att.com)

Created: 1985.11
Modified: 1986-1989 often
Modified: 1990.01
Modified: 1990.12.21
Modified: 1990.12.31; accepted LAiLA upper case convention,
added punctuation, ,
and
Modified: 1991.01.23; added a couple of sentences.
Modified: 1991.01.31; decided for
Modified: 1991.08.22; cleaned up acknowledgements
Modified: 1992.01.13; changed back to

DRAFT—DRAFT—DRAFT

0. Introduction
—————

Qalam is an Arabic-Latin-Arabic transliteration system between the
Arabic script and the Latin script embodied in the ASCII (American
Standard Code for Information Interchange) character set. The goal of
the Qalam system is to transliterate Arabic script for computer
communication by those literate in the language. The main
consideration in the design of Qalam is suitability for
transliteration, as well as reverse transliteration, both manually by
humans and automatically by computers. Qalam also includes several
Arabic script letters used to transliterate other languages *into*
Arabic script. Finally, Qalam aims to serve all Arabic script
languages, such as Farsi, Urdu, and Ottoman.

Qalam is a morphological system in the sense that Arabic script
words are transliterated based on spelling and diacritics (the marks
that represent vowels in Arabic), rather than on phonetics. This
makes it easy to deduce the Arabic script word from its
transliteration (i.e., to transliterate the word back into Arabic
script). The pronounciation of words, however, can still be deduced
from the transliteration, because the (optional) inclusion of
diacritic marks makes the transliterated word more pronouncable.

We describe Qalam’s mapping between Arabic letters and diacritics
to ASCII characters. Each Arabic letter or diacritic maps into (and
back from) one or two ASCII characters. The choice is made in order
to approximate, as much as possible, the Arabic pronounciation, while
maintaining the one-to-one morphological correspondence needed for
unambiguousness of reverse transliteration into Arabic script.

Arabic script letters that do not correspond to Latin sounds are
represented with upper case letters or with two character sequences.
Thus, Qalam uses upper-case ASCII characters to denote Arabic letters
that are different from those denoted by the corresponding lower-case
characters. This convention deviates from the common practice of
inserting a dot beneath the letter or a dash above it.

We give below the list of transliterations for Arabic letters and
diacritics, followed by an example and a description of the rules of
transliteration.

1. Character Mappings:
———————-
1.1. Letters:
————-

hamza ‘
‘alef aa zayn z qaaf q
baa’ b syn s kaaf k
taa’ t shyn sh laam l
thaa’ th Saad S mym m
jym j Daad D nuwn n
Haa’ H Taa’ T haa’ h
khaa’ kh Zaa’ Z waaw w
daal d `ayn ` yaa’ y
dhaal dh ghayn gh
raa’ r faa’ f

taa’ marbuwTah t or h
haa’ marbuwTah h

‘alef maqSuwrah ae
hamzat alwaSl e

1.2. Transliteration Letters:
—————————–

These are characters used in the Arabic script to represent or
transliterate letters from other languages such as
English, French, German, etc.

Egyptian sound g (= Arabic script with bar
or dots, pronounced
or )
English “v” sound v (= Arabic script with
three dots)
English “p” sound p (= Arabic script with
three dots)

1.3. Diacritics :
————————–

fatHah a
kasrah i
Dammah u
shaddah double previous letter
maddah ~aa
sukuwn –
tanwyn N

1.4. Punctuation:
—————-

question mark ?
double quotes <>
single quotes
,

2. Examples:
———–

The Qalam transliteration of the first in the ,
called goes as follows:

bismi ellaahi elraHmaani elraHym

‘alHamdu lillaahi rabbi el`aalamyn *
alraHmaani elraHym *
maaliki yawmi eldyn *
‘iyaaka na`budu wa’iyaaka nasta`yn *
‘ihdinaa elSiraaTa elmustaqym *
SiraaTa alladhyna ‘an`amta `alayhim *
ghayri elmaghDuwbi `alayhim *
walaa alDaalyn *

3. Qalam Rules and Conventions:
——————————-

Transliterate a word by following its Arabic script spelling letter by
letter, as well as any available diacritics (i.e., or
), and substituting the specified Latin script. The only
frequent exception is the in the definite article (i.e.,
), which is better to write as if it is a ,
or (, or ) as the case may be.

Diacritics are optional unless they are necessary to disambiguate
the original Arabic script spelling. For example, the verb
may be written , because the ambiguity does not affect the
original Arabic script spelling. On the other hand,

may stand
for a as in the word or for a followed by a
as in , in which case the between the and
the is necessary.

The with a transliterates to if the
is above, and to if it is below. That is, it is treated as if it
is simply a with a or .

The definite article (equivalent to “the” in English) should
not be separated from the rest of the word by a hyphen; e.g.
, meaning “the sun.” Write the even if it is
silent–. This is a case where literal transliteration is
given precedence over phonetic transliteration to make reverse
transliteration easy.

Observe word boundaries in the original Arabic, e.g.
is wrong, but is right.

Arabic has no capitalization, and hence Arabic script
transliterated by Qalam uses capitals to stand for letters that are
different from those denoted by the corresponding lower case character.

As a convention, we quote transliterated Arabic script text
embedded in another script with Arabic script quotation marks and vice
versa.

4. Technical Discussion:
————————

We would like to argue that Qalam is a superior code for communicating
Arabic script text over data networks between heterogeneous computers.
Qalam possesses the characteristics required of a good communication
code: unambiguity, compactness, and simplicity of coding/decoding.

(((Compatibility, Human readability, Code efficiency. Existing
codes.)))

Qalam’s goals include supporting automatic transliteration by
computers, as well as manual transliteration for typing in Arabic
script using Latin script available on ASCII terminals. This permits
computers that support the Arabic script directly to hide the
transliterated text from the user. Thus, a personal computer user,
for example, should be able to type in Arabic script a message, and
have the machine transliterate it for submission to
soc.culture.lebanon. Conversely, when this user receives an Arabic
script message from soc.culture.lebanon, the computer would transliterate it
back into Arabic script for display. The above scenario should hold
equally true for text that mixes Latin and Arabic scripts.

5. Bugs:
——–

The , should be distinct from the and
both must differ from the . Qalam doesn’t provide for
transliterating the written as a vertical bar shaped
diacritic, as in archaic spellings of the . The only way to
distinguish digraphs such as from the identically
transliterated followed by , is to force the inclusion of
a diacritic vowel between the two letters. Qalam needs a method to do
so without including the vowel, since it’s not always available in the
original Arabic script text.

6. Acknowledgements:
——————–

Nayel el-Shafei provided the initial impetus for this work by
researching the various transliteration systems in use in the US, and
publishing the results on egypt-net in July 1985. C.I. Browne
(cib%a@lanl.gov) provided, in August 1988, useful comments about the
placement of “.” (no longer in use by Qalam) and pointed out that
was missing in an earlier draft of Qalam. Ali Mili, of the
University of Tunis, commented on an early version of Qalam.

Stavros Macrakis pointed out the absence of a convention for and the old form of that appears as a vertical bar
diacritic (e.g., in the ). The first problem has been
corrected, but the second remains. In winter 1990/91, a debate
surfaced on USENET about transliterating Arabic text, one particular
proposal, called LAiLA, convinced us to use upper case Latin letters
instead of special characters.

References:
———-

@article{Becker87,
AUTHOR = “J.D. Becker”,
TITLE = “Arabic word processing”,
JOURNAL = “Communications of the ACM”,
VOLUME = “30”,
NUMBER = “7”,
PAGES = “600–611”,
MONTH = “July”,
YEAR = “1987”}

@article{Becker84,
AUTHOR = “J.D. Becker”,
TITLE = “Multilingual word processing”,
JOURNAL = “Scientific American”,
VOLUME = “251”,
NUMBER = “1”,
PAGES = “”,
MONTH = “July”,
YEAR = “1984”}

==============================================================================

Leave a Reply

Your email address will not be published. Required fields are marked *