fbpx
Wikipedia

Arabic diacritics

Arabic script has numerous diacritics, which include consonant pointing known as iʻjām (إِعْجَام), and supplementary diacritics known as tashkīl (تَشْكِيل). The latter include the vowel marks termed ḥarakāt (حَرَكَات; singular: حَرَكَة, ḥarakah).

Early written Arabic used only rasm (in black). Later, Arabic added i‘jām diacritics (examples in red) so that letters such as these five ـيـ ,ـنـ ,ـثـ ,ـتـ ,ـبـ (b, t, th, n, y) could be distinguished. Ḥarakāt diacritics (examples in blue)—which is used in the Qur'an but not in most written Arabic—indicate short vowels, long consonants, and some other vocalizations.

The Arabic script is a modified abjad, where short consonants and long vowels are represented by letters but short vowels and consonant length are not generally indicated in writing. Tashkīl is optional to represent missing vowels and consonant length. Modern Arabic is always written with the i‘jām—consonant pointing, but only religious texts, children's books and works for learners are written with the full tashkīl—vowel guides and consonant length. It is however not uncommon for authors to add diacritics to a word or letter when the grammatical case or the meaning is deemed otherwise ambiguous. In addition, classical works and historic documents rendered to the general public are often rendered with the full tashkīl, to compensate for the gap in understanding resulting from stylistic changes over the centuries.

Tashkil (marks used as phonetic guides) edit

The literal meaning of تَشْكِيل tashkīl is 'forming'. As the normal Arabic text does not provide enough information about the correct pronunciation, the main purpose of tashkīl (and ḥarakāt) is to provide a phonetic guide or a phonetic aid; i.e. show the correct pronunciation for children who are learning to read or foreign learners.

The bulk of Arabic script is written without ḥarakāt (or short vowels). However, they are commonly used in texts that demand strict adherence to exact pronunciation. This is true, primarily, of the Qur'an ٱلْقُرْآن (al-Qurʾān) and poetry. It is also quite common to add ḥarakāt to hadiths ٱلْحَدِيث (al-ḥadīth; plural: al-ḥādīth) and the Bible. Another use is in children's literature. Moreover, ḥarakāt are used in ordinary texts in individual words when an ambiguity of pronunciation cannot easily be resolved from context alone. Arabic dictionaries with vowel marks provide information about the correct pronunciation to both native and foreign Arabic speakers. In art and calligraphy, ḥarakāt might be used simply because their writing is considered aesthetically pleasing.

An example of a fully vocalised (vowelised or vowelled) Arabic from the Bismillah:

بِسْمِ ٱللَّٰهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
bismi -llāhi r-raḥmāni r-raḥīmi
In the name of God, the All-Merciful, the Especially-Merciful.

Some Arabic textbooks for foreigners now use ḥarakāt as a phonetic guide to make learning reading Arabic easier. The other method used in textbooks is phonetic romanisation of unvocalised texts. Fully vocalised Arabic texts (i.e. Arabic texts with ḥarakāt/diacritics) are sought after by learners of Arabic. Some online bilingual dictionaries also provide ḥarakāt as a phonetic guide similarly to English dictionaries providing transcription.

Harakat (short vowel marks) edit

The ḥarakāt حَرَكَات, which literally means 'motions', are the short vowel marks. There is some ambiguity as to which tashkīl are also ḥarakāt; the tanwīn, for example, are markers for both vowels and consonants.

Fatḥah edit

ـَ

The fatḥah فَتْحَة is a small diagonal line placed above a letter, and represents a short /a/ (like the /a/ sound in the English word "cat"). The word fatḥah itself (فَتْحَة) means opening and refers to the opening of the mouth when producing an /a/. For example, with dāl (henceforth, the base consonant in the following examples): دَ /da/.

When a fatḥah is placed before a plain letter ا (alif) (i.e. one having no hamza or vowel of its own), it represents a long /aː/ (close to the sound of "a" in the English word "dad", with an open front vowel /æː/, not back /ɑː/ as in "father"). For example: دَا /daː/. The fatḥah is not usually written in such cases. When a fathah is placed before the letter ⟨⟩ (yā’), it creates an /aj/ (as in "lie"); and when placed before the letter ⟨و⟩ (wāw), it creates an /aw/ (as in "cow").

Although paired with a plain letter creates an open front vowel (/a/), often realized as near-open (/æ/), the standard also allows for variations, especially under certain surrounding conditions. Usually, in order to have the more central (/ä/) or back (/ɑ/) pronunciation, the word features a nearby back consonant, such as the emphatics, as well as qāf, or rā’. A similar "back" quality is undergone by other vowels as well in the presence of such consonants, however not as drastically realized as in the case of fatḥah.[1][2][3]

Kasrah edit

ـِ

A similar diagonal line below a letter is called a kasrah كَسْرَة and designates a short /i/ (as in "me", "be") and its allophones [i, ɪ, e, e̞, ɛ] (as in "Tim", "sit"). For example: دِ /di/.[4]

When a kasrah is placed before a plain letter (yā’), it represents a long /iː/ (as in the English word "steed"). For example: دِي /diː/. The kasrah is usually not written in such cases, but if yā’ is pronounced as a diphthong /aj/, fatḥah should be written on the preceding consonant to avoid mispronunciation. The word kasrah means 'breaking'.[1]

Ḍammah edit

ـُ

The ḍammah ضَمَّة is a small curl-like diacritic placed above a letter to represent a short /u/ (as in "duke", shorter "you") and its allophones [u, ʊ, o, o̞, ɔ] (as in "put", or "bull"). For example: دُ /du/.[4]

When a ḍammah is placed before a plain letter و (wāw), it represents a long /uː/ (like the 'oo' sound in the English word "swoop"). For example: دُو /duː/. The ḍammah is usually not written in such cases, but if wāw is pronounced as a diphthong /aw/, fatḥah should be written on the preceding consonant to avoid mispronunciation.[1]

The word ḍammah (ضَمَّة) in this context means rounding, since it is the only rounded vowel in the vowel inventory of Arabic.

Alif Khanjariyah edit

ــٰ

The superscript (or dagger) alif أَلِف خَنْجَرِيَّة (alif khanjarīyah), is written as short vertical stroke on top of a consonant. It indicates a long /aː/ sound for which alif is normally not written. For example: هَٰذَا (hādhā) or رَحْمَٰن (raḥmān).

The dagger alif occurs in only a few words, but they include some common ones; it is seldom written, however, even in fully vocalised texts. Most keyboards do not have dagger alif. The word Allah الله (Allāh) is usually produced automatically by entering alif lām lām hāʾ. The word consists of alif + ligature of doubled lām with a shaddah and a dagger alif above lām.

Maddah edit

ـٓآ

The maddah مَدَّة is a tilde-shaped diacritic, which can only appear on top of an alif (آ) and indicates a glottal stop /ʔ/ followed by a long /aː/.

In theory, the same sequence /ʔaː/ could also be represented by two alifs, as in *أَا, where a hamza above the first alif represents the /ʔ/ while the second alif represents the /aː/. However, consecutive alifs are never used in the Arabic orthography. Instead, this sequence must always be written as a single alif with a maddah above it, the combination known as an alif maddah. For example: قُرْآن /qurˈʔaːn/.

Alif waslah edit

ٱ

The waṣlah وَصْلَة, alif waṣlah أَلِف وَصْلَة or hamzat waṣl هَمْزَة وَصْل looks like a small letter ṣād on top of an alif ٱ (also indicated by an alif ا without a hamzah). It means that the alif is not pronounced when its word does not begin a sentence. For example: بِٱسْمِ (bismi), but ٱمْشُوا۟ (imshū not mshū). This is because no Arab word can start with a vowel-less consonant: If the second letter from the waṣlah has a kasrah, the alif-waslah makes the sound /i/. However, when the second letter from it has a dammah, it makes the sound /u/.

It occurs only in the beginning of words, but it can occur after prepositions and the definite article. It is commonly found in imperative verbs, the perfective aspect of verb stems VII to X and their verbal nouns (maṣdar). The alif of the definite article is considered a waṣlah.

It occurs in phrases and sentences (connected speech, not isolated/dictionary forms):

  • To replace the elided hamza whose alif-seat has assimilated to the previous vowel. For example: فِي ٱلْيَمَن or في اليمن (fi l-Yaman) ‘in Yemen’.
  • In hamza-initial imperative forms following a vowel, especially following the conjunction و (wa-) ‘and’. For example: َقُمْ وَٱشْرَبِ ٱلْمَاءَ (qum wa-shrab-i l-mā’) ‘rise and then drink the water’.

Like the superscript alif, it is not written in fully vocalized scripts, except for sacred texts, like the Quran and Arabized Bible.

Sukūn edit

ـْـ

The sukūn سُكُونْ is a circle-shaped diacritic placed above a letter ( ْ). It indicates that the consonant to which it is attached is not followed by a vowel, i.e., zero-vowel.

It is a necessary symbol for writing consonant-vowel-consonant syllables, which are very common in Arabic. For example: دَدْ (dad).

The sukūn may also be used to help represent a diphthong. A fatḥah followed by the letter (yā’) with a sukūn over it (ـَيْ) indicates the diphthong ay (IPA /aj/). A fatḥah, followed by the letter (wāw) with a sukūn, (ـَوْ) indicates /aw/.

ـۡـ

The sukūn may have also an alternative form of the small high head of ḥāʾ (U+06E1 ۡ ), particularly in some Qurans. Other shapes may exist as well (for example, like a small comma above ⟨ʼ⟩ or like a circumflex ⟨ˆ⟩ in nastaʿlīq).[5]

Tanwin (final postnasalized or long vowels) edit

ـٌ‎  ـٍ‎  ـً

The three vowel diacritics may be doubled at the end of a word to indicate that the vowel is followed by the consonant n. They may or may not be considered ḥarakāt and are known as tanwīn تَنْوِين, or nunation. The signs indicate, from left to right, -un, -in, -an.

These endings are used as non-pausal grammatical indefinite case endings in Literary Arabic or classical Arabic (triptotes only). In a vocalised text, they may be written even if they are not pronounced (see pausa). See i‘rāb for more details. In many spoken Arabic dialects, the endings are absent. Many Arabic textbooks introduce standard Arabic without these endings. The grammatical endings may not be written in some vocalized Arabic texts, as knowledge of i‘rāb varies from country to country, and there is a trend towards simplifying Arabic grammar.

The sign ـً is most commonly written in combination with ـًا (alif), ةً (tā’ marbūṭah), أً (alif hamzah) or stand-alone ءً (hamzah). Alif should always be written (except for words ending in tā’ marbūṭah, hamzah or diptotes) even if an is not. Grammatical cases and tanwīn endings in indefinite triptote forms:

Shaddah (consonant gemination mark) edit

ـّـ

The shadda or shaddah شَدَّة (shaddah), or tashdid تَشْدِيد (tashdīd), is a diacritic shaped like a small written Latin "w".

It is used to indicate gemination (consonant doubling or extra length), which is phonemic in Arabic. It is written above the consonant which is to be doubled. It is the only ḥarakah that is commonly used in ordinary spelling to avoid ambiguity. For example: دّ /dd/; madrasah مَدْرَسَة ('school') vs. mudarrisah مُدَرِّسَة ('teacher', female).

I‘jām (phonetic distinctions of consonants) edit

 
7th-century kufic script without any ḥarakāt or i‘jām.

The i‘jām إِعْجَام (sometimes also called nuqaṭ)[6] are the diacritic points that distinguish various consonants that have the same form (rasm), such as ـبـ /b/ ب, ـتـ /t/ ت, ـثـ /θ/ ث, ـنـ /n/ ن, and ـيـ /j/ ي. Typically i‘jām are not considered diacritics but part of the letter.

Early manuscripts of the Qur’ān did not use diacritics either for vowels or to distinguish the different values of the rasm. Vowel pointing was introduced first, as a red dot placed above, below, or beside the rasm, and later consonant pointing was introduced, as thin, short black single or multiple dashes placed above or below the rasm (image). These i‘jām became black dots about the same time as the ḥarakāt became small black letters or strokes.

Typically, Egyptians do not use dots under final yā’ ي, which looks exactly like alif maqṣūrah ى in handwriting and in print. This practice is also used in copies of the muṣḥaf (Qurʾān) scribed by ‘Uthman Ṭāhā. The same unification of and alif maqṣūrā has happened in Persian, resulting in what the Unicode Standard calls "Arabic Letter Farsi Yeh", that looks exactly the same as in initial and medial forms, but exactly the same as alif maqṣūrah in final and isolated forms یـ  ـیـ  ـی.

 
Isolated kāf with ‘alāmātu-l-ihmāl and without top stroke next to initial kāf with top stroke.
سۡ سۜ سۣ سٚ ڛ
Several ways of writing /s/.

At the time when the i‘jām was optional, unpointed letters were ambiguous. To clarify that a letter would lack i‘jām in pointed text (i.e. ح /ħ/, د /d/, ر /r/, س /s/, ص /sˤ/, ط /tˤ/, ع /ʕ/, ل /l/, ه /h/), the letter could be marked with a small v- or seagull-shaped diacritic above, also a superscript semicircle (crescent), a subscript dot (except in the case of ح; three dots were used with س), or a subscript miniature of the letter itself. A superscript stroke known as jarrah, resembling a long fatħah, was used for a contracted (assimilated) sin. Thus ڛ سۣ سۡ سٚ were all used to indicate that the letter in question was truly س and not ش.[7] These signs, collectively known as ‘alāmātu-l-ihmāl, are still occasionally used in modern Arabic calligraphy, either for their original purpose (i.e. marking letters without i‘jām), or often as purely decorative space-fillers. The small ک above the kāf in its final and isolated forms ك  ـك was originally an ‘alāmatu-l-ihmāl that became a permanent part of the letter. Previously this sign could also appear above the medial form of kāf, when that letter was written without the stroke on its ascender. When kaf was written without that stroke, it could be mistaken for lam, thus kaf was distinguished with a superscript kaf or a small superscript hamza (nabrah), and lam with a superscript l-a-m (lam-alif-mim).[8]

Hamza (glottal stop semi-consonant) edit

ئ  ؤ  إ  أ ء

Although normally a diacritic is not considered a letter of the alphabet, the hamza هَمْزة (hamzah, glottal stop), often stands as a separate letter in writing, is written in unpointed texts and is not considered a tashkīl. It may appear as a letter by itself or as a diacritic over or under an alif, wāw, or .

Which letter is to be used to support the hamzah depends on the quality of the adjacent vowels;

  • If the glottal stop occurs at the beginning of the word, it is always indicated by hamza on an alif: above if the following vowel is /a/ or /u/ and below if it is /i/.
  • If the glottal stop occurs in the middle of the word, hamzah above alif is used only if it is not preceded or followed by /i/ or /u/:
    • If /i/ is before or after the glottal stop, a yāʼ with a hamzah is used (the two dots which are usually beneath the yāʾ disappear in this case): ئ.
    • Otherwise, if /u/ is before or after the glottal stop, a wāw with a hamzah is used: ؤ.
  • If the glottal stop occurs at the end of the word (ignoring any grammatical suffixes), if it follows a short vowel it is written above alif, wāw, or the same as for a medial case; otherwise on the line (i.e. if it follows a long vowel, diphthong or consonant).
  • Two alifs in succession are never allowed: /ʔaː/ is written with alif maddah آ and /aːʔ/ is written with a free hamzah on the line اء.

Consider the following words: أَخ /ʔax/ ("brother"), إسْماعِيل /ʔismaːʕiːl/ ("Ismael"), أُمّ /ʔumm/ ("mother"). All three of above words "begin" with a vowel opening the syllable, and in each case, alif is used to designate the initial glottal stop (the actual beginning). But if we consider middle syllables "beginning" with a vowel: نَشْأة /naʃʔa/ ("origin"), أَفْئِدة /ʔafʔida/ ("hearts"—notice the /ʔi/ syllable; singular فُؤاد /fuʔaːd/), رُؤُوس /ruʔuːs/ ("heads", singular رَأْس /raʔs/), the situation is different, as noted above. See the comprehensive article on hamzah for more details.

Tone markers edit

Historically Arabic script has been adopted and used by many tonal languages, examples include Xiao'erjing for Mandarin Chinese as well as Ajami script adopted for writing various languages of Western Africa. However, one of the shortcomings of Arabic, especially in comparison to Latin-derived scripts or other indigenous writing systems was that Arabic did not have a way of indicating tones.

However, in the adoption of the Arabic Script for Rohingya language, known as Rohingya Fonna, 3 tone markers have been developed and used in manuscripts. These tone markers form part of the standardized and accepted orthographic convention of Rohingya. This is the only known instance of tone markers within the Arabic script.[9][10]

Tone markers act as "modifiers" of vowel diacritics. In simpler words, they are "diacritics for the diacritics". They are written "outside" of the word, meaning that they are written above the vowel diacritic if the diacritic is written above the word, and they are written below the diacritic if the diacritic is written below the word. They are only ever written where there are vowel diacritics. This is important to note, as without the diacritic present, there is no way to distinguish between tone markers and I‘jām i.e. dots that are used for purpose of phonetic distinctions of consonants.

◌࣪ / ◌࣭

The Hārbāy as it is called in Rohingya, is a single dot that's placed on top of Fatḥah and Ḍammah, or curly Fatḥah and curly Ḍammah (vowel diacritics unique to Rohinghya), or their respective Fatḥatan and Ḍammatan versions, and it's placed underneath Kasrah or curly Kasrah, or their respective Kasratan version. (e.g. دً࣪ / دٌ࣪ / دࣨ࣪ / دٍ࣭) This tone marker indicates a short high tone (/˥/).[9][10]

◌࣫ / ◌࣮

The Ṭelā as it is called in Rohingya, is two dots that are placed on top of Fatḥah and Ḍammah, or curly Fatḥah and curly Ḍammah, or their respective Fatḥatan and Ḍammatan versions, and it's placed underneath Kasrah or curly Kasrah, or their respective Kasratan version. (e.g. دَ࣫ / دُ࣫ / دِ࣮) This tone marker indicates a long falling tone (/˥˩/).[9][10]

◌࣬ / ◌࣯

The Ṭāna as it is called in Rohingya, is a fish-like looping line that is placed on top of Fatḥah and Ḍammah, or curly Fatḥah and curly Ḍammah, or their respective Fatḥatan and Ḍammatan versions, and it's placed underneath Kasrah or curly Kasrah, or their respective Kasratan version. (e.g. دࣤ࣬ / دࣥ࣬ / دࣦ࣯) This tone marker indicates a long rising tone (/˨˦/).[9][10]

History edit

 
Evolution of early Arabic calligraphy (9th–11th century). The Basmala was taken as an example, from kufic Qur’ān manuscripts. (1) Early 9th century, script with no dots or diacritic marks (see image of early Basmala Kufic);
(2) and (3) 9th–10th century under Abbasid dynasty, Abu al-Aswad's system established red dots with each arrangement or position indicating a different short vowel; later, a second black-dot system was used to differentiate between letters like fā’ and qāf (see image of middle Kufic);
(4) 11th century, in al-Farāhídi's system (system we know today) dots were changed into shapes resembling the letters to transcribe the corresponding long vowels (see image of modern Kufic in Qur'an).

According to tradition, the first to commission a system of harakat was Ali who appointed Abu al-Aswad al-Du'ali for the task. Abu al-Aswad devised a system of dots to signal the three short vowels (along with their respective allophones) of Arabic. This system of dots predates the i‘jām, dots used to distinguish between different consonants.

Abu al-Aswad's system edit

Abu al-Aswad's system of Harakat was different from the system we know today. The system used red dots with each arrangement or position indicating a different short vowel.

A dot above a letter indicated the vowel a, a dot below indicated the vowel i, a dot on the side of a letter stood for the vowel u, and two dots stood for the tanwīn.

However, the early manuscripts of the Qur'an did not use the vowel signs for every letter requiring them, but only for letters where they were necessary for a correct reading.

Al Farahidi's system edit

The precursor to the system we know today is Al Farahidi's system. al-Farāhīdī found that the task of writing using two different colours was tedious and impractical. Another complication was that the i‘jām had been introduced by then, which, while they were short strokes rather than the round dots seen today, meant that without a color distinction the two could become confused.

Accordingly, he replaced the ḥarakāt with small superscript letters: small alif, yā’, and wāw for the short vowels corresponding to the long vowels written with those letters, a small s(h)īn for shaddah (geminate), a small khā’ for khafīf (short consonant; no longer used). His system is essentially the one we know today.[11]

Automatic diacritization edit

The process of automatically restoring diacritical marks is called diacritization or diacritic restoration. It is useful to avoid ambiguity in applications such as Arabic machine translation, text-to-speech, and information retrieval. Automatic diacritization algorithms have been developed.[12][13] For Modern Standard Arabic, the state-of-the-art algorithm has a word error rate (WER) of 4.79%. The most common mistakes are proper nouns and case endings.[14] Similar algorithms exist for other varieties of Arabic.[15]

See also edit

  • Arabic alphabet:
    • I‘rāb (إِعْرَاب), the case system of Arabic
    • Rasm (رَسْم), the basic system of Arabic consonants
    • Tajwīd (تَجْوِيد), the phonetic rules of recitation of Qur'an in Arabic
  • Hebrew:
    • Hebrew diacritics, the Hebrew equivalent
    • Niqqud, the Hebrew equivalent of ḥarakāt
    • Dagesh, the Hebrew diacritic similar to Arabic i‘jām and shaddah

References edit

  1. ^ a b c Karin C. Ryding, "A Reference Grammar of Modern Standard Arabic", Cambridge University Press, 2005, pgs. 25-34, specifically “Chapter 2, Section 4: Vowels”
  2. ^ Anatole Lyovin, Brett Kessler, William Ronald Leben, "An Introduction to the Languages of the World", "5.6 Sketch of Modern Standard Arabic", Oxford University Press, 2017, pg. 255, Edition 2, specifically “5.6.2.2 Vowels”
  3. ^ Amine Bouchentouf, Arabic For Dummies®, John Wiley & Sons, 2018, 3rd Edition, specifically section "All About Vowels"
  4. ^ a b "Introduction to Written Arabic". University of Victoria, Canada.
  5. ^ "Arabic character notes". r12a.
  6. ^ Ibn Warraq (2002). Ibn Warraq (ed.). . Translated by Ibn Warraq. New York: Prometheus. p. 64. ISBN 1-57392-945-X. Archived from the original on 11 April 2019. Retrieved 9 April 2019.
  7. ^ Gacek, Adam (2009). "Unpointed letters". Arabic Manuscripts: A Vademecum for Readers. BRILL. p. 286. ISBN 978-90-04-17036-0.
  8. ^ Gacek, Adam (1989). "Technical Practices and Recommendations Recorded by Classical and Post-Classical Arabic Scholars Concerning the Copying and Correction of Manuscripts" (PDF). In Déroche, François (ed.). Les manuscrits du Moyen-Orient: essais de codicologie et de paléographie. Actes du colloque d'Istanbul (Istanbul 26–29 mai 1986). p. 57 (§ 8. Diacritical marks and vowelisation).
  9. ^ a b c d Priest, Lorna A.; Hosken, Martin (10 August 2010). "Proposal to add Arabic script characters for African and Asian languages" (PDF). The Unicode Consortium. (PDF) from the original on 8 October 2022. Retrieved 5 May 2023.
  10. ^ a b c d Pandey, Anshuman (27 October 2015). "Proposal to encode the Hanifi Rohingya script in Unicode" (PDF). The Unicode Consortium. (PDF) from the original on 12 December 2019. Retrieved 5 May 2023.
  11. ^ Versteegh, C. H. M. (1997). The Arabic Language. Columbia University Press. pp. 56ff. ISBN 978-0-231-11152-2.
  12. ^ Azmi, Aqil M.; Almajed, Reham S. (2013-10-10). "A survey of automatic Arabic diacritization techniques". Natural Language Engineering. 21 (3): 477–495. doi:10.1017/S1351324913000284. ISSN 1351-3249. S2CID 31560671.
  13. ^ Almanea, Manar (2021). "Automatic Methods and Neural Networks in Arabic Texts Diacritization: A Comprehensive Survey". IEEE Access. 9: 145012–145032. doi:10.1109/ACCESS.2021.3122977. ISSN 2169-3536. S2CID 240011970.
  14. ^ Thompson, Brian; Alshehri, Ali (2021-09-28). "Improving Arabic Diacritization by Learning to Diacritize and Translate". arXiv:2109.14150 [cs.CL].
  15. ^ Masmoudi, Abir; Aloulou, Chafik; Abdellahi, Abdel Ghader Sidi; Belguith, Lamia Hadrich (2021-08-08). "Automatic diacritization of Tunisian dialect text using SMT model". International Journal of Speech Technology. 25: 89–104. doi:10.1007/s10772-021-09864-6. ISSN 1572-8110. S2CID 238782966.
  • Alexis Neme and Sébastien Paumier (2019), "Restoring Arabic vowels through omission-tolerant dictionary lookup", Lang Resources & Evaluation, Vol. 53, pp. 1-65

arabic, diacritics, arabic, script, numerous, diacritics, which, include, consonant, pointing, known, iʻjām, ام, supplementary, diacritics, known, tashkīl, يل, latter, include, vowel, marks, termed, ḥarakāt, ات, singular, ḥarakah, early, written, arabic, used,. Arabic script has numerous diacritics which include consonant pointing known as iʻjam إ ع ج ام and supplementary diacritics known as tashkil ت ش ك يل The latter include the vowel marks termed ḥarakat ح ر ك ات singular ح ر ك ة ḥarakah Early written Arabic used only rasm in black Later Arabic added i jam diacritics examples in red so that letters such as these five ـيـ ـنـ ـثـ ـتـ ـبـ b t th n y could be distinguished Ḥarakat diacritics examples in blue which is used in the Qur an but not in most written Arabic indicate short vowels long consonants and some other vocalizations The Arabic script is a modified abjad where short consonants and long vowels are represented by letters but short vowels and consonant length are not generally indicated in writing Tashkil is optional to represent missing vowels and consonant length Modern Arabic is always written with the i jam consonant pointing but only religious texts children s books and works for learners are written with the full tashkil vowel guides and consonant length It is however not uncommon for authors to add diacritics to a word or letter when the grammatical case or the meaning is deemed otherwise ambiguous In addition classical works and historic documents rendered to the general public are often rendered with the full tashkil to compensate for the gap in understanding resulting from stylistic changes over the centuries Contents 1 Tashkil marks used as phonetic guides 1 1 Harakat short vowel marks 1 1 1 Fatḥah 1 1 2 Kasrah 1 1 3 Ḍammah 1 1 4 Alif Khanjariyah 1 2 Maddah 1 3 Alif waslah 1 4 Sukun 1 5 Tanwin final postnasalized or long vowels 1 6 Shaddah consonant gemination mark 2 I jam phonetic distinctions of consonants 3 Hamza glottal stop semi consonant 4 Tone markers 5 History 5 1 Abu al Aswad s system 5 2 Al Farahidi s system 6 Automatic diacritization 7 See also 8 ReferencesTashkil marks used as phonetic guides editThe literal meaning of ت ش ك يل tashkil is forming As the normal Arabic text does not provide enough information about the correct pronunciation the main purpose of tashkil and ḥarakat is to provide a phonetic guide or a phonetic aid i e show the correct pronunciation for children who are learning to read or foreign learners The bulk of Arabic script is written without ḥarakat or short vowels However they are commonly used in texts that demand strict adherence to exact pronunciation This is true primarily of the Qur an ٱل ق ر آن al Qurʾan and poetry It is also quite common to add ḥarakat to hadiths ٱل ح د يث al ḥadith plural al ḥadith and the Bible Another use is in children s literature Moreover ḥarakat are used in ordinary texts in individual words when an ambiguity of pronunciation cannot easily be resolved from context alone Arabic dictionaries with vowel marks provide information about the correct pronunciation to both native and foreign Arabic speakers In art and calligraphy ḥarakat might be used simply because their writing is considered aesthetically pleasing An example of a fully vocalised vowelised or vowelled Arabic from the Bismillah ب س م ٱلل ه ٱلر ح م ن ٱلر ح يم bismi llahi r raḥmani r raḥimiIn the name of God the All Merciful the Especially Merciful Some Arabic textbooks for foreigners now use ḥarakat as a phonetic guide to make learning reading Arabic easier The other method used in textbooks is phonetic romanisation of unvocalised texts Fully vocalised Arabic texts i e Arabic texts with ḥarakat diacritics are sought after by learners of Arabic Some online bilingual dictionaries also provide ḥarakat as a phonetic guide similarly to English dictionaries providing transcription Harakat short vowel marks edit The ḥarakat ح ر ك ات which literally means motions are the short vowel marks There is some ambiguity as to which tashkil are also ḥarakat the tanwin for example are markers for both vowels and consonants Fatḥah edit ـ The fatḥah ف ت ح ة is a small diagonal line placed above a letter and represents a short a like the a sound in the English word cat The word fatḥah itself ف ت ح ة means opening and refers to the opening of the mouth when producing an a For example with dal henceforth the base consonant in the following examples د da When a fatḥah is placed before a plain letter ا alif i e one having no hamza or vowel of its own it represents a long aː close to the sound of a in the English word dad with an open front vowel aeː not back ɑː as in father For example د ا daː The fatḥah is not usually written in such cases When a fathah is placed before the letter ﻱ ya it creates an aj as in lie and when placed before the letter و waw it creates an aw as in cow Although paired with a plain letter creates an open front vowel a often realized as near open ae the standard also allows for variations especially under certain surrounding conditions Usually in order to have the more central a or back ɑ pronunciation the word features a nearby back consonant such as the emphatics as well as qaf or ra A similar back quality is undergone by other vowels as well in the presence of such consonants however not as drastically realized as in the case of fatḥah 1 2 3 Kasrah edit ـ A similar diagonal line below a letter is called a kasrah ك س ر ة and designates a short i as in me be and its allophones i ɪ e e ɛ as in Tim sit For example د di 4 When a kasrah is placed before a plain letter ﻱ ya it represents a long iː as in the English word steed For example د ي diː The kasrah is usually not written in such cases but if ya is pronounced as a diphthong aj fatḥah should be written on the preceding consonant to avoid mispronunciation The word kasrah means breaking 1 Ḍammah edit ـ The ḍammah ض م ة is a small curl like diacritic placed above a letter to represent a short u as in duke shorter you and its allophones u ʊ o o ɔ as in put or bull For example د du 4 When a ḍammah is placed before a plain letter و waw it represents a long uː like the oo sound in the English word swoop For example د و duː The ḍammah is usually not written in such cases but if waw is pronounced as a diphthong aw fatḥah should be written on the preceding consonant to avoid mispronunciation 1 The word ḍammah ض م ة in this context means rounding since it is the only rounded vowel in the vowel inventory of Arabic Alif Khanjariyah edit ــ The superscript or dagger alif أ ل ف خ ن ج ر ي ة alif khanjariyah is written as short vertical stroke on top of a consonant It indicates a long aː sound for which alif is normally not written For example ه ذ ا hadha or ر ح م ن raḥman The dagger alif occurs in only a few words but they include some common ones it is seldom written however even in fully vocalised texts Most keyboards do not have dagger alif The word Allah الله Allah is usually produced automatically by entering alif lam lam haʾ The word consists of alif ligature of doubled lam with a shaddah and a dagger alif above lam Maddah edit Not to be confused with Tilde This section does not cite any sources Please help improve this section by adding citations to reliable sources Unsourced material may be challenged and removed April 2023 template removal help ـ آ The maddah م د ة is a tilde shaped diacritic which can only appear on top of an alif آ and indicates a glottal stop ʔ followed by a long aː In theory the same sequence ʔaː could also be represented by two alifs as in أ ا where a hamza above the first alif represents the ʔ while the second alif represents the aː However consecutive alifs are never used in the Arabic orthography Instead this sequence must always be written as a single alif with a maddah above it the combination known as an alif maddah For example ق ر آن qurˈʔaːn Alif waslah edit Main article Wasla diacritic ٱ The waṣlah و ص ل ة alif waṣlah أ ل ف و ص ل ة or hamzat waṣl ه م ز ة و ص ل looks like a small letter ṣad on top of an alif ٱ also indicated by an alif ا without a hamzah It means that the alif is not pronounced when its word does not begin a sentence For example ب ٱس م bismi but ٱم ش وا imshu not mshu This is because no Arab word can start with a vowel less consonant If the second letter from the waṣlah has a kasrah the alif waslah makes the sound i However when the second letter from it has a dammah it makes the sound u It occurs only in the beginning of words but it can occur after prepositions and the definite article It is commonly found in imperative verbs the perfective aspect of verb stems VII to X and their verbal nouns maṣdar The alif of the definite article is considered a waṣlah It occurs in phrases and sentences connected speech not isolated dictionary forms To replace the elided hamza whose alif seat has assimilated to the previous vowel For example ف ي ٱل ي م ن or في اليمن fi l Yaman in Yemen In hamza initial imperative forms following a vowel especially following the conjunction و wa and For example ق م و ٱش ر ب ٱل م اء qum wa shrab i l ma rise and then drink the water Like the superscript alif it is not written in fully vocalized scripts except for sacred texts like the Quran and Arabized Bible Sukun edit ـ ـ The sukun س ك ون is a circle shaped diacritic placed above a letter It indicates that the consonant to which it is attached is not followed by a vowel i e zero vowel It is a necessary symbol for writing consonant vowel consonant syllables which are very common in Arabic For example د د dad The sukun may also be used to help represent a diphthong A fatḥah followed by the letter ﻱ ya with a sukun over it ـ ي indicates the diphthong ay IPA aj A fatḥah followed by the letter ﻭ waw with a sukun ـ و indicates aw ـ ـ The sukun may have also an alternative form of the small high head of ḥaʾ U 06E1 particularly in some Qurans Other shapes may exist as well for example like a small comma above ʼ or like a circumflex ˆ in nastaʿliq 5 Tanwin final postnasalized or long vowels edit Main article Nunation ـ ـ ـ The three vowel diacritics may be doubled at the end of a word to indicate that the vowel is followed by the consonant n They may or may not be considered ḥarakat and are known as tanwin ت ن و ين or nunation The signs indicate from left to right un in an These endings are used as non pausal grammatical indefinite case endings in Literary Arabic or classical Arabic triptotes only In a vocalised text they may be written even if they are not pronounced see pausa See i rab for more details In many spoken Arabic dialects the endings are absent Many Arabic textbooks introduce standard Arabic without these endings The grammatical endings may not be written in some vocalized Arabic texts as knowledge of i rab varies from country to country and there is a trend towards simplifying Arabic grammar The sign ـ is most commonly written in combination with ـ ا alif ة ta marbuṭah أ alif hamzah or stand alone ء hamzah Alif should always be written except for words ending in ta marbuṭah hamzah or diptotes even if an is not Grammatical cases and tanwin endings in indefinite triptote forms un nominative case an accusative case also serves as an adverbial marker in genitive case Shaddah consonant gemination mark edit Main article Shadda ـ ـ The shadda or shaddah ش د ة shaddah or tashdid ت ش د يد tashdid is a diacritic shaped like a small written Latin w It is used to indicate gemination consonant doubling or extra length which is phonemic in Arabic It is written above the consonant which is to be doubled It is the only ḥarakah that is commonly used in ordinary spelling to avoid ambiguity For example د dd madrasah م د ر س ة school vs mudarrisah م د ر س ة teacher female I jam phonetic distinctions of consonants edit nbsp 7th century kufic script without any ḥarakat or i jam The i jam إ ع ج ام sometimes also called nuqaṭ 6 are the diacritic points that distinguish various consonants that have the same form rasm such as ـبـ b ب ـتـ t ت ـثـ 8 ث ـنـ n ن and ـيـ j ي Typically i jam are not considered diacritics but part of the letter Early manuscripts of the Qur an did not use diacritics either for vowels or to distinguish the different values of the rasm Vowel pointing was introduced first as a red dot placed above below or beside the rasm and later consonant pointing was introduced as thin short black single or multiple dashes placed above or below the rasm image These i jam became black dots about the same time as the ḥarakat became small black letters or strokes Typically Egyptians do not use dots under final ya ي which looks exactly like alif maqṣurah ى in handwriting and in print This practice is also used in copies of the muṣḥaf Qurʾan scribed by Uthman Ṭaha The same unification of ya and alif maqṣura has happened in Persian resulting in what the Unicode Standard calls Arabic Letter Farsi Yeh that looks exactly the same as ya in initial and medial forms but exactly the same as alif maqṣurah in final and isolated forms یـ ـیـ ـی nbsp Isolated kaf with alamatu l ihmal and without top stroke next to initial kaf with top stroke س س س س ڛ Several ways of writing s At the time when the i jam was optional unpointed letters were ambiguous To clarify that a letter would lack i jam in pointed text i e ح ħ د d ر r س s ص sˤ ط tˤ ع ʕ ل l ه h the letter could be marked with a small v or seagull shaped diacritic above also a superscript semicircle crescent a subscript dot except in the case of ح three dots were used with س or a subscript miniature of the letter itself A superscript stroke known as jarrah resembling a long fatħah was used for a contracted assimilated sin Thus ڛ س س س were all used to indicate that the letter in question was truly س and not ش 7 These signs collectively known as alamatu l ihmal are still occasionally used in modern Arabic calligraphy either for their original purpose i e marking letters without i jam or often as purely decorative space fillers The small ک above the kaf in its final and isolated forms ك ـك was originally an alamatu l ihmal that became a permanent part of the letter Previously this sign could also appear above the medial form of kaf when that letter was written without the stroke on its ascender When kaf was written without that stroke it could be mistaken for lam thus kaf was distinguished with a superscript kaf or a small superscript hamza nabrah and lam with a superscript l a m lam alif mim 8 Hamza glottal stop semi consonant editMain article Hamza ئ ؤ إ أ ء Although normally a diacritic is not considered a letter of the alphabet the hamza ه م زة hamzah glottal stop often stands as a separate letter in writing is written in unpointed texts and is not considered a tashkil It may appear as a letter by itself or as a diacritic over or under an alif waw or ya Which letter is to be used to support the hamzah depends on the quality of the adjacent vowels If the glottal stop occurs at the beginning of the word it is always indicated by hamza on an alif above if the following vowel is a or u and below if it is i If the glottal stop occurs in the middle of the word hamzah above alif is used only if it is not preceded or followed by i or u If i is before or after the glottal stop a yaʼ with a hamzah is used the two dots which are usually beneath the yaʾ disappear in this case ئ Otherwise if u is before or after the glottal stop a waw with a hamzah is used ؤ If the glottal stop occurs at the end of the word ignoring any grammatical suffixes if it follows a short vowel it is written above alif waw or ya the same as for a medial case otherwise on the line i e if it follows a long vowel diphthong or consonant Two alifs in succession are never allowed ʔaː is written with alif maddah آ and aːʔ is written with a free hamzah on the line اء Consider the following words أ خ ʔax brother إس ماع يل ʔismaːʕiːl Ismael أ م ʔumm mother All three of above words begin with a vowel opening the syllable and in each case alif is used to designate the initial glottal stop the actual beginning But if we consider middle syllables beginning with a vowel ن ش أة naʃʔa origin أ ف ئ دة ʔafʔida hearts notice the ʔi syllable singular ف ؤاد fuʔaːd ر ؤ وس ruʔuːs heads singular ر أ س raʔs the situation is different as noted above See the comprehensive article on hamzah for more details Tone markers editHistorically Arabic script has been adopted and used by many tonal languages examples include Xiao erjing for Mandarin Chinese as well as Ajami script adopted for writing various languages of Western Africa However one of the shortcomings of Arabic especially in comparison to Latin derived scripts or other indigenous writing systems was that Arabic did not have a way of indicating tones However in the adoption of the Arabic Script for Rohingya language known as Rohingya Fonna 3 tone markers have been developed and used in manuscripts These tone markers form part of the standardized and accepted orthographic convention of Rohingya This is the only known instance of tone markers within the Arabic script 9 10 Tone markers act as modifiers of vowel diacritics In simpler words they are diacritics for the diacritics They are written outside of the word meaning that they are written above the vowel diacritic if the diacritic is written above the word and they are written below the diacritic if the diacritic is written below the word They are only ever written where there are vowel diacritics This is important to note as without the diacritic present there is no way to distinguish between tone markers and I jam i e dots that are used for purpose of phonetic distinctions of consonants The Harbay as it is called in Rohingya is a single dot that s placed on top of Fatḥah and Ḍammah or curly Fatḥah and curly Ḍammah vowel diacritics unique to Rohinghya or their respective Fatḥatan and Ḍammatan versions and it s placed underneath Kasrah or curly Kasrah or their respective Kasratan version e g د د د د This tone marker indicates a short high tone 9 10 The Ṭela as it is called in Rohingya is two dots that are placed on top of Fatḥah and Ḍammah or curly Fatḥah and curly Ḍammah or their respective Fatḥatan and Ḍammatan versions and it s placed underneath Kasrah or curly Kasrah or their respective Kasratan version e g د د د This tone marker indicates a long falling tone 9 10 The Ṭana as it is called in Rohingya is a fish like looping line that is placed on top of Fatḥah and Ḍammah or curly Fatḥah and curly Ḍammah or their respective Fatḥatan and Ḍammatan versions and it s placed underneath Kasrah or curly Kasrah or their respective Kasratan version e g د د د This tone marker indicates a long rising tone 9 10 History edit nbsp Evolution of early Arabic calligraphy 9th 11th century The Basmala was taken as an example from kufic Qur an manuscripts 1 Early 9th century script with no dots or diacritic marks see image of early Basmala Kufic 2 and 3 9th 10th century under Abbasid dynasty Abu al Aswad s system established red dots with each arrangement or position indicating a different short vowel later a second black dot system was used to differentiate between letters like fa and qaf see image of middle Kufic 4 11th century in al Farahidi s system system we know today dots were changed into shapes resembling the letters to transcribe the corresponding long vowels see image of modern Kufic in Qur an According to tradition the first to commission a system of harakat was Ali who appointed Abu al Aswad al Du ali for the task Abu al Aswad devised a system of dots to signal the three short vowels along with their respective allophones of Arabic This system of dots predates the i jam dots used to distinguish between different consonants nbsp Early Basmala Kufic nbsp Middle Kufic nbsp Modern Kufic in Qur anAbu al Aswad s system edit Abu al Aswad s system of Harakat was different from the system we know today The system used red dots with each arrangement or position indicating a different short vowel A dot above a letter indicated the vowel a a dot below indicated the vowel i a dot on the side of a letter stood for the vowel u and two dots stood for the tanwin However the early manuscripts of the Qur an did not use the vowel signs for every letter requiring them but only for letters where they were necessary for a correct reading Al Farahidi s system edit The precursor to the system we know today is Al Farahidi s system al Farahidi found that the task of writing using two different colours was tedious and impractical Another complication was that the i jam had been introduced by then which while they were short strokes rather than the round dots seen today meant that without a color distinction the two could become confused Accordingly he replaced the ḥarakat with small superscript letters small alif ya and waw for the short vowels corresponding to the long vowels written with those letters a small s h in for shaddah geminate a small kha for khafif short consonant no longer used His system is essentially the one we know today 11 Automatic diacritization editThe process of automatically restoring diacritical marks is called diacritization or diacritic restoration It is useful to avoid ambiguity in applications such as Arabic machine translation text to speech and information retrieval Automatic diacritization algorithms have been developed 12 13 For Modern Standard Arabic the state of the art algorithm has a word error rate WER of 4 79 The most common mistakes are proper nouns and case endings 14 Similar algorithms exist for other varieties of Arabic 15 See also editArabic alphabet I rab إ ع ر اب the case system of Arabic Rasm ر س م the basic system of Arabic consonants Tajwid ت ج و يد the phonetic rules of recitation of Qur an in Arabic Hebrew Hebrew diacritics the Hebrew equivalent Niqqud the Hebrew equivalent of ḥarakat Dagesh the Hebrew diacritic similar to Arabic i jam and shaddahReferences edit a b c Karin C Ryding A Reference Grammar of Modern Standard Arabic Cambridge University Press 2005 pgs 25 34 specifically Chapter 2 Section 4 Vowels Anatole Lyovin Brett Kessler William Ronald Leben An Introduction to the Languages of the World 5 6 Sketch of Modern Standard Arabic Oxford University Press 2017 pg 255 Edition 2 specifically 5 6 2 2 Vowels Amine Bouchentouf Arabic For Dummies John Wiley amp Sons 2018 3rd Edition specifically section All About Vowels a b Introduction to Written Arabic University of Victoria Canada Arabic character notes r12a Ibn Warraq 2002 Ibn Warraq ed What the Koran Really Says Language Text amp Commentary Translated by Ibn Warraq New York Prometheus p 64 ISBN 1 57392 945 X Archived from the original on 11 April 2019 Retrieved 9 April 2019 Gacek Adam 2009 Unpointed letters Arabic Manuscripts A Vademecum for Readers BRILL p 286 ISBN 978 90 04 17036 0 Gacek Adam 1989 Technical Practices and Recommendations Recorded by Classical and Post Classical Arabic Scholars Concerning the Copying and Correction of Manuscripts PDF In Deroche Francois ed Les manuscrits du Moyen Orient essais de codicologie et de paleographie Actes du colloque d Istanbul Istanbul 26 29 mai 1986 p 57 8 Diacritical marks and vowelisation a b c d Priest Lorna A Hosken Martin 10 August 2010 Proposal to add Arabic script characters for African and Asian languages PDF The Unicode Consortium Archived PDF from the original on 8 October 2022 Retrieved 5 May 2023 a b c d Pandey Anshuman 27 October 2015 Proposal to encode the Hanifi Rohingya script in Unicode PDF The Unicode Consortium Archived PDF from the original on 12 December 2019 Retrieved 5 May 2023 Versteegh C H M 1997 The Arabic Language Columbia University Press pp 56ff ISBN 978 0 231 11152 2 Azmi Aqil M Almajed Reham S 2013 10 10 A survey of automatic Arabic diacritization techniques Natural Language Engineering 21 3 477 495 doi 10 1017 S1351324913000284 ISSN 1351 3249 S2CID 31560671 Almanea Manar 2021 Automatic Methods and Neural Networks in Arabic Texts Diacritization A Comprehensive Survey IEEE Access 9 145012 145032 doi 10 1109 ACCESS 2021 3122977 ISSN 2169 3536 S2CID 240011970 Thompson Brian Alshehri Ali 2021 09 28 Improving Arabic Diacritization by Learning to Diacritize and Translate arXiv 2109 14150 cs CL Masmoudi Abir Aloulou Chafik Abdellahi Abdel Ghader Sidi Belguith Lamia Hadrich 2021 08 08 Automatic diacritization of Tunisian dialect text using SMT model International Journal of Speech Technology 25 89 104 doi 10 1007 s10772 021 09864 6 ISSN 1572 8110 S2CID 238782966 Alexis Neme and Sebastien Paumier 2019 Restoring Arabic vowels through omission tolerant dictionary lookup Lang Resources amp Evaluation Vol 53 pp 1 65 Retrieved from https en wikipedia org w index php title Arabic diacritics amp oldid 1179600583 harakat, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.