Linguistics

Quranic corpus analysis, classical Arabic dictionaries, and phonosemantic research — lisanarab.com

lisanarab.com — the world's most comprehensive integration of a sacred text corpus with classical lexicography and phonosemantic theory.

The Quranic Corpus

Aralex provides complete morphological analysis of every word in the Quran — 130,030 morpheme segments across 77,429 words in 6,236 verses. Every single word is analysed at the morpheme level, not just root level, with a 99.9% root extraction success rate covering ~1,900 unique roots.

Why the Quranic Corpus Matters Linguistically

The Quran represents the most authoritative corpus of Classical Arabic (610–632 CE). It is chronologically precise to 7th century Hijazi Arabic, with unprecedented textual stability through oral and written transmission and standardised Uthmani script orthography (codified 650 CE).

Its stylistic richness covers narrative, legislative, theological, exhortative, and argumentative discourse — making it the ideal corpus for studying the full range of Classical Arabic expression.

Segment-Level Granularity

Each Quranic word is broken into its constituent morphemes. For example, the word بِسْمِ (in the name of) is analysed as:

This granularity enables distinguishing root letters from grammatical affixes, tracking root-pattern combinations (أوزان), identifying phonological changes, and statistical analysis of morphological productivity.

Phonosemantic Theory

Arabic phonosemantic theory (الاشتقاق الأكبر) proposes that individual Arabic letters carry inherent semantic values that contribute to word meanings. The framework rests on classical pillars:

Modern Systematic Mappings

The most comprehensive contemporary framework (Hassan Abbas, 1998) maps all 28 Arabic letters to semantic fields organised by articulatory features:

For example, the letter ق (qaf) maps to hardness, intensity, constriction. The letter ر (ra') maps to vibration, movement, repetition.

Quantitative Validation

Aralex enables quantitative testing of phonosemantic claims against all 1,900 Quranic roots. For any hypothesis — such as Hassan Abbas's claim that ك (kaf) = containment, enclosure, grasping — we extract all roots containing the letter, measure semantic overlap with the proposed field, and calculate statistical significance.

Confirming roots like كتب (writing = grasping ideas), ملك (kingship = grasping authority), and مسك (holding = physical grasping) validate the theory. Counter-examples like ذكر (remembrance) refine it. This is the first platform that moves phonosemantics from anecdotal evidence to corpus-based validation.

Six Classical Arabic Dictionaries

30,310 entries spanning 500+ years of scholarship, searchable simultaneously:

Triple Validation

For any Arabic root, Aralex enables validation across three layers simultaneously:

When all three layers align, confidence in semantic analysis is high. When they diverge, the anomalies trigger new research questions.

Technology

Built on Haskell/Servant (backend) and PureScript/Halogen (frontend) with four optimised SQLite databases. Sub-100ms response times, full RTL support, and Arabic typography using KFGQPC Uthmanic HAFS font for authentic Quranic rendering.

Visit lisanarab.com →