lisanarab.com — the world's most comprehensive integration of a sacred text corpus with classical lexicography and phonosemantic theory.
The Quranic Corpus
Aralex provides complete morphological analysis of every word in the Quran — 130,030 morpheme segments across 77,429 words in 6,236 verses. Every single word is analysed at the morpheme level, not just root level, with a 99.9% root extraction success rate covering ~1,900 unique roots.
Why the Quranic Corpus Matters Linguistically
The Quran represents the most authoritative corpus of Classical Arabic (610–632 CE). It is chronologically precise to 7th century Hijazi Arabic, with unprecedented textual stability through oral and written transmission and standardised Uthmani script orthography (codified 650 CE).
Its stylistic richness covers narrative, legislative, theological, exhortative, and argumentative discourse — making it the ideal corpus for studying the full range of Classical Arabic expression.
Segment-Level Granularity
Each Quranic word is broken into its constituent morphemes. For example, the word بِسْمِ (in the name of) is analysed as:
- بِ — Preposition (prefix meaning "with, in, by")
- سْمِ — Noun, genitive case (root سمو, lemma اسْم)
This granularity enables distinguishing root letters from grammatical affixes, tracking root-pattern combinations (أوزان), identifying phonological changes, and statistical analysis of morphological productivity.
Phonosemantic Theory
Arabic phonosemantic theory (الاشتقاق الأكبر) proposes that individual Arabic letters carry inherent semantic values that contribute to word meanings. The framework rests on classical pillars:
- Al-Khalil ibn Ahmad al-Farahidi (718–786) — pioneered systematic organisation of Arabic letters by articulation points
- Ibn Jinni (932–1002) — formalised the theory of مناسبة الحروف للمعاني (appropriateness of letters to meanings)
- Ibn Faris (d. 1004) — developed الدلالة المحورية (axial semantics)
Modern Systematic Mappings
The most comprehensive contemporary framework (Hassan Abbas, 1998) maps all 28 Arabic letters to semantic fields organised by articulatory features:
- Gutturals (ء، ه، ع، ح، غ، خ) — convey depth, profundity, metaphysical concepts
- Labials (ب، م، ف، و) — correlate with enclosure, gathering, containment
- Dentals & Linguals (ط، د، ت، ث، ذ، ل، ن) — associate with precision, cutting, definition
For example, the letter ق (qaf) maps to hardness, intensity, constriction. The letter ر (ra') maps to vibration, movement, repetition.
Quantitative Validation
Aralex enables quantitative testing of phonosemantic claims against all 1,900 Quranic roots. For any hypothesis — such as Hassan Abbas's claim that ك (kaf) = containment, enclosure, grasping — we extract all roots containing the letter, measure semantic overlap with the proposed field, and calculate statistical significance.
Confirming roots like كتب (writing = grasping ideas), ملك (kingship = grasping authority), and مسك (holding = physical grasping) validate the theory. Counter-examples like ذكر (remembrance) refine it. This is the first platform that moves phonosemantics from anecdotal evidence to corpus-based validation.
Six Classical Arabic Dictionaries
30,310 entries spanning 500+ years of scholarship, searchable simultaneously:
- Kitab al-Ain (الخليل بن أحمد الفراهيدي, 786 CE) — 2,707 entries, the oldest comprehensive Arabic dictionary
- Al-Sihah (الجوهري, 1003 CE) — 5,594 entries
- Maqayis al-Lugha (ابن فارس, 1004 CE) — 4,794 entries, focus on semantic roots
- Al-Muhkam (ابن سيده, 1066 CE) — 6,584 entries, largest scope
- Al-Mufradat fi Gharib al-Quran (الراغب الأصفهاني, 1108 CE) — 1,602 entries, specialised in Quranic terminology
- Lisan al-Arab (ابن منظور, 1311 CE) — 9,029 entries, the most comprehensive classical Arabic dictionary
Triple Validation
For any Arabic root, Aralex enables validation across three layers simultaneously:
- Dictionary layer — classical scholarly definitions from six authoritative sources
- Corpus layer — actual Quranic usage patterns, frequencies, and contextual semantics
- Phonosemantic layer — letter-meaning explanations and statistical validation
When all three layers align, confidence in semantic analysis is high. When they diverge, the anomalies trigger new research questions.
Technology
Built on Haskell/Servant (backend) and PureScript/Halogen (frontend) with four optimised SQLite databases. Sub-100ms response times, full RTL support, and Arabic typography using KFGQPC Uthmanic HAFS font for authentic Quranic rendering.