Linguistics

lisanarab.com — the world's most comprehensive integration of a sacred text corpus with classical lexicography and phonosemantic theory.

The Quranic Corpus

Aralex provides complete morphological analysis of every word in the Quran — 130,030 morpheme segments across 77,429 words in 6,236 verses. Every single word is analysed at the morpheme level, not just root level, with a 99.9% root extraction success rate covering ~1,900 unique roots.

Why the Quranic Corpus Matters Linguistically

The Quran represents the most authoritative corpus of Classical Arabic (610–632 CE). It is chronologically precise to 7th century Hijazi Arabic, with unprecedented textual stability through oral and written transmission and standardised Uthmani script orthography (codified 650 CE).

Its stylistic richness covers narrative, legislative, theological, exhortative, and argumentative discourse — making it the ideal corpus for studying the full range of Classical Arabic expression.

Segment-Level Granularity

Each Quranic word is broken into its constituent morphemes. For example, the word بِسْمِ (in the name of) is analysed as:

بِ — Preposition (prefix meaning "with, in, by")
سْمِ — Noun, genitive case (root سمو, lemma اسْم)

This granularity enables distinguishing root letters from grammatical affixes, tracking root-pattern combinations (أوزان), identifying phonological changes, and statistical analysis of morphological productivity.

Phonosemantic Theory

Arabic phonosemantic theory (الاشتقاق الأكبر) proposes that individual Arabic letters carry inherent semantic values that contribute to word meanings. The framework rests on classical pillars:

Al-Khalil ibn Ahmad al-Farahidi (718–786) — pioneered systematic organisation of Arabic letters by articulation points
Ibn Jinni (932–1002) — formalised the theory of مناسبة الحروف للمعاني (appropriateness of letters to meanings)
Ibn Faris (d. 1004) — developed الدلالة المحورية (axial semantics)

Modern Systematic Mappings

The most comprehensive contemporary framework (Hassan Abbas, 1998) maps all 28 Arabic letters to semantic fields organised by articulatory features:

Gutturals (ء، ه، ع، ح، غ، خ) — convey depth, profundity, metaphysical concepts
Labials (ب، م، ف، و) — correlate with enclosure, gathering, containment
Dentals & Linguals (ط، د، ت، ث، ذ، ل، ن) — associate with precision, cutting, definition

For example, the letter ق (qaf) maps to hardness, intensity, constriction. The letter ر (ra') maps to vibration, movement, repetition.

Quantitative Validation

Aralex enables quantitative testing of phonosemantic claims against all 1,900 Quranic roots. For any hypothesis — such as Hassan Abbas's claim that ك (kaf) = containment, enclosure, grasping — we extract all roots containing the letter, measure semantic overlap with the proposed field, and calculate statistical significance.

Confirming roots like كتب (writing = grasping ideas), ملك (kingship = grasping authority), and مسك (holding = physical grasping) validate the theory. Counter-examples like ذكر (remembrance) refine it. This is the first platform that moves phonosemantics from anecdotal evidence to corpus-based validation.

Six Classical Arabic Dictionaries

30,310 entries spanning 500+ years of scholarship, searchable simultaneously:

Kitab al-Ain (الخليل بن أحمد الفراهيدي, 786 CE) — 2,707 entries, the oldest comprehensive Arabic dictionary
Al-Sihah (الجوهري, 1003 CE) — 5,594 entries
Maqayis al-Lugha (ابن فارس, 1004 CE) — 4,794 entries, focus on semantic roots
Al-Muhkam (ابن سيده, 1066 CE) — 6,584 entries, largest scope
Al-Mufradat fi Gharib al-Quran (الراغب الأصفهاني, 1108 CE) — 1,602 entries, specialised in Quranic terminology
Lisan al-Arab (ابن منظور, 1311 CE) — 9,029 entries, the most comprehensive classical Arabic dictionary

Triple Validation

For any Arabic root, Aralex enables validation across three layers simultaneously:

Dictionary layer — classical scholarly definitions from six authoritative sources
Corpus layer — actual Quranic usage patterns, frequencies, and contextual semantics
Phonosemantic layer — letter-meaning explanations and statistical validation

When all three layers align, confidence in semantic analysis is high. When they diverge, the anomalies trigger new research questions.

Technology

Built on Haskell/Servant (backend) and PureScript/Halogen (frontend) with four optimised SQLite databases. Sub-100ms response times, full RTL support, and Arabic typography using KFGQPC Uthmanic HAFS font for authentic Quranic rendering.

Visit lisanarab.com →