INDO-EUROPEAN LANGUAGES

Indo-European is the name of a family of languages that by 1000 BC were spoken over most of Europe and in much of Southwest and South Asia; from the second half of the 15th century the Indo-European tongues have spread to most other inhabited parts of the world. The term Indo-Hittite is used by scholars who believe that Hittite and the other Anatolian languages (see below) are not just one branch of Indo-European but rather a branch coordinate with all the rest put together; thus, Indo-Hittite has been used for a family consisting of Indo-European proper plus Anatolian. As long as this view is neither definitively proved nor disproved, it is convenient to keep the traditional use of the term Indo-European. Languages of the family.

The well-attested languages of the Indo-European family fall fairly neatly into the 10 main branches listed below; these are arranged according to the age of their oldest sizable texts.

Anatolian.

Now extinct, Anatolian was spoken during the 1st and 2nd millennia BC in what is presently Asian Turkey and northern Syria. By far the best known of its members is Hittite, the official language of the Hittite empire, which flourished in the 2nd millennium. Very few Hittite texts were known before 1906, and their interpretation as Indo-European was not generally accepted until after 1915; the integration of Hittite data into Indo-European comparative grammar has, therefore, been one of the principal developments of Indo-European studies in the 20th century. The oldest Hittite texts date from the 17th century BC, the latest from the 13th. For more information, see below Anatolian languages.

Indo-Iranian.

Indo-Iranian comprises two main subbranches, Indo-Aryan (Indic) and Iranian. Indo-Aryan languages have been spoken in what is now northern and central India and Pakistan since before 1000 BC. Aside from a very poorly known dialect spoken in or near northern Iraq during the 2nd millennium BC, the oldest record of an Indo-Aryan language is the Vedic Sanskrit of the Rigveda (Rgveda), the oldest of the sacred scriptures of India, dating roughly from 1000 BC. Examples of modern Indo-Aryan languages are Hindi, Bengali, Sinhalese (spoken in Sri Lanka), and the many dialects of Romany, the language of the Gypsies (Rom). Iranian languages were spoken in the 1st millennium BC in present-day Iran and Afghanistan and also in the steppes to the north, from modern Hungary to East (Chinese) Turkistan. The only well-known ancient varieties are Avestan, the sacred language of the Zoroastrians (Parsees), and Old Persian, the official language of Darius I (ruled 522-486 BC) and Xerxes I (486-465 BC) and their successors. Some modern Iranian languages are Persian (Farsi), Pashto (Afghan), Kurdish, and Ossetic. For more information, see below Indo-Iranian languages.

Greek.

Greek, despite its numerous dialects, has been a single language throughout its history. It has been spoken in Greece since at least 1600 BC, and, in all probability, since the end of the 3rd millennium. The earliest texts may be the Linear B tablets, some of which may date from as far back as 1400 BC (the date is disputed), and some of which certainly date to 1200 BC. This material, very sparse and difficult to interpret, was deciphered as Greek in 1952, though some scholars dispute the finding. The Homeric epics--the Iliad and the Odyssey--composed for the most part in the 8th century BC, are the oldest texts of any bulk. For more information, see below Greek language.

Italic.

The principal language of the Italic group is Latin, originally the speech of the city of Rome and the ancestor of the modern Romance languages: Italian, Romanian, Spanish, Portuguese, French, etc. The earliest Latin inscriptions apparently date from the 6th century BC, with literature beginning in the 3rd century. Scholars are not in agreement as to how many other ancient languages of Italy and Sicily belong in the same branch as Latin. For more information on Latin, the languages derived from it, and the other languages that belong to or are sometimes included in the Italic branch of Indo-European, see below Italic languages; Romance languages.

Germanic.

In the middle of the 1st millennium BC, Germanic tribes lived in southern Scandinavia and northern Germany. Their expansions and migrations from the 2nd century BC onward are largely recorded in history. The oldest Germanic language of which much is known is the Gothic of the 4th century AD. Other languages include English, German, Dutch, Danish, Swedish, Norwegian, and Icelandic. For more information on the Germanic languages, see below Germanic languages; English language.

Armenian.

Armenian, like the Greek tongue, is a single language. Speakers of Armenian are recorded as being in what now constitutes eastern Turkey and Armenia as early as the 6th century BC, but the oldest Armenian texts date from the 5th century AD. For more information, see below Armenian language.

Tocharian.

Tocharian, now extinct, was spoken in present-day Chinese Turkistan in the 1st millennium AD. Two distinct languages are known, labelled A (Turfanian) and B (Kuchean); many scholars consider Tocharian A and B to be two dialects of the same language. One group of travel permits for caravans can be dated to the early 7th century, and it appears that other texts date from the same or from neighbouring centuries. These languages became known to scholars only in the first decade of the 20th century; they have been less important for Indo-European studies than has Hittite, partly because their testimony about the Indo-European parent language is obscured by 2,000 more years of change and partly because Tocharian testimony fits fairly well with that of the previously known non-Anatolian languages. For more information, see below Tocharian language.

Celtic.

The Celtic language was spoken in the last centuries before the Christian Era over a wide area of Europe, from Spain and Britain to the Balkans, with one group (the Galatians) even in Asia Minor. Very little of the Celtic of that time and the ensuing centuries has survived, and this branch is known almost entirely from the Insular Celtic languages--Irish, Welsh, and others--spoken in and near the British Isles, as recorded from the 8th century AD onward. For further information, see below Celtic languages.

Balto-Slavic.

The grouping of Baltic and Slavic into a single branch is somewhat controversial, but the exclusively shared features outweigh the divergences. At the beginning of the Christian Era, Baltic and Slavic tribes occupied a large area of eastern Europe, east of the Germanic tribes and north of the Iranians, including much of present-day Poland and what was formerly the western Soviet Union--namely, Belarus, Ukraine, and westernmost Russia. The Slavic area was in all likelihood relatively small, perhaps centred in what is now southern Poland. But in the 5th century AD the Slavs began expanding in all directions, until today the Slavic languages are spoken over the greater part of eastern Europe and northern Asia. The Baltic-speaking area, however, has contracted, so that Baltic languages are presently confined to Lithuania and Latvia. The earliest Slavic texts, written in a dialect called Old Church Slavonic, date from the 9th century AD; the oldest substantial material in Baltic comes from the end of the 14th century, and the oldest connected texts from the 16th century. For more information, see below Baltic languages; Slavic languages.

Albanian.

Albanian, the language of the present-day republic of Albania, is known from the 15th century AD. It presumably continues one of the very poorly attested ancient Indo-European languages of the Balkan peninsula, but which one is not clear. For more information, see below Albanian language.

Etc.

In addition to the tongues just listed, there are several poorly documented extinct languages of which enough is known to be sure that they were Indo-European and that they did not belong in any of the branches enumerated above (e.g., Phrygian, Macedonian). Of a few, too little is known to be sure whether they were Indo-European or not (e.g., Ligurian). Establishment of the family.

Shared characteristics.

The chief reason for grouping the Indo-European languages together is that they share a number of items of basic vocabulary, including grammatical affixes, whose shapes in the different languages can be related to one another by statable phonetic rules. Especially important are the shared patterns of alternation of sounds. Thus the agreement of Sanskrit ás-ti, Latin es-t, and Gothic is-t, all meaning "is," is greatly strengthened by the identical reduction of the root to s- in the plural in all three languages: Sanskrit s-ánti, Latin s-unt, Gothic s-ind "they are." Agreements in pure structure, totally divorced from phonetic substance, are, at best, of dubious value in proving membership in the Indo-European family.

Table 1 gives examples of typical vocabulary items widely shared within the Indo-European family that have been decisive in establishing the family. A blank indicates that the language in question does not use the item in accordance with the given meaning or that its word for that meaning is unknown. Similarities in grammatical endings are shown in Table 2 by samples of noun declension and verb inflection in some of the more archaic languages that have retained the inflectional endings of Indo-European in relatively unchanged form. Note that Old Lithuanian -i and -u were nasalized vowels, representing a continuation from the earlier forms *-in and *-un. (The asterisk marks a form that is not actually found in any document or living dialect but is reconstructed as having once existed in the prehistory of the language.) The statable phonetic rules referred to earlier are not always obvious without careful observation. Note that the English dental consonants t, d, and th do not correspond in a straightforward manner to the Greek dental sounds t, d, and th; that is, English t does not occur where Greek t appears, nor English d where Greek has d. But the relationships between the sounds are not random either--English t does not correspond to Greek t in one word, to d in a second, and to th in a third, according to no discernible pattern. Rather, where Greek has initial t, English has th, as in "that" and "three"; where Greek has d, English has t, as in "tree," "two," and "ten"; and where Greek has th, English has d, as in "daughter." Note also that phonetic similarity as such is not needed to establish relationship. Thus, many of the Armenian words in Table 1 look quite different from the related words in other Indo-European languages. but here too regular rules of correspondence can be found; e.g., Greek initial p corresponds to Armenian h or zero (a lack of consonant) in the words meaning "fire," "father," "foot," "five." Linguistic studies of the family. The ancient Greeks and Romans readily perceived that their languages were related to each other, and, as other European languages became objects of scholarly attention in the late Middle Ages and the Renaissance, many of these were seen to be more similar to Latin and Greek than, for example, to Hebrew or Hungarian. But an accurate idea of the true bounds of the Indo-European family became possible only when, in the 16th century, Europeans began to learn Sanskrit. The massive similarities between Sanskrit and Latin and Greek were noted early, but the first person to make the correct inference and state it conspicuously was the English Orientalist and jurist Sir William Jones, who in 1786 said in his presidential address to the Asiatic Society that Sanskrit bore to both Greek and Latin a stronger affinity, both in the roots of verbs, and in the forms of grammar, than could possibly have been produced by accident; so strong, indeed, that no philologer could examine them all three without believing them to have sprung from some common source, which, perhaps, no longer exists. There is a similar reason, though not quite so forcible, for supposing that both the Gothick [i.e., Germanic] and the Celtick, though blended with a very different idiom, had the same origin with the Sanscrit; and the old Persian might be added to the same family . . . . The detailed evidence on which Jones based his conclusion was not presented until the 19th century. In 1816 Franz Bopp, the German philologist, presented his Über das Conjugationssystem der Sanskritsprache in Vergleichung mit jenem der griechischen, lateinischen, persischen und germanischen Sprache ("On the system of conjugation of the Sanskrit language, in comparison with those of Greek, Latin, Persian, and Germanic"), in which the relation of these five languages was demonstrated on the basis of a detailed comparison of verb morphology (structure). Two years later there appeared the "Undersøgelse om det gamle Nordiske eller Islandske Sprogs Oprindelse" ("Investigation on the Origin of the Old Norse or Icelandic Language"), by the Danish philologist Rasmus Rask, originally written in 1814. This work demonstrated methodically the relation of Germanic to Latin, Greek, Slavic, and Baltic. In 1822 the second edition of the first volume of Jacob Grimm's Deutsche Grammatik ("Germanic Grammar") was published; in this grammar were discussed the peculiar Indo-European vowel alternations called Ablaut by Grimm (e.g., English "sing, sang, sung"; or Greek peíth-o "I persuade," pé-poith-a "I am persuaded," é-pith-on "I persuaded"). In addition, Grimm tried to find the principle behind the correspondences of Germanic stop and spirant consonants (the first made with complete stoppage of the breath, and the second made with constriction of the breath but not complete stoppage) to the consonants of other Indo-European languages. The sound changes implied by these correspondences have become known as "Grimm's Law." Examples of it include the stop consonant p in Latin pater corresponding to the spirant consonant f in "father," and the correspondences between English and Greek t, d, and th discussed above. Bopp demonstrated in 1838 that the Celtic languages were Indo-European, as had been asserted by Jones. In 1850 the German philologist August Schleicher did the same for Albanian, and in 1877 another German philologist, Heinrich Hübschmann, showed that Armenian was an independent branch of Indo-European, rather than a member of the Iranian subbranch. Since then, the Indo-European family has been enlarged by the discovery of Tocharian and of Hittite and other Anatolian languages, and by the recognition, with the aid of Hittite, that Lycian, known and partly deciphered already in the 19th century, belongs to the Anatolian branch of Indo-European. The Indo-European character of Tocharian was announced by the German scholars Emil Sieg and Wilhelm Siegling in 1908. The Norwegian orientalist Jørgen Alexander Knudtzon recognized Hittite as Indo-European on the basis of two letters found in Egypt (translated in Die zwei Arzawa-briefe, 1902; "The Two Arzawa Letters"), but his views were not generally accepted until 1915, when Bedrich Hrozný published the first report of his own decipherment of the much more copious material that had meanwhile been found in the ruins of the Hittite capital itself. The first full comparative grammar of the major Indo-European languages was Bopp's Vergleichende Grammatik des Sanskrit, Zend, Griechischen, Lateinischen, Litthauischen, Altslawischen, Gotischen und Deutschen (1833-52; "Comparative Grammar of Sanskrit, Zend, Greek, Latin, Lithuanian, Old Slavic, Gothic, and German"). But this and August Schleicher's shorter Compendium der vergleichenden Grammatik der indogermanischen Sprachen (1861-62; "Compendium of the Comparative Grammar of the Indo-European Languages") were rendered obsolete by the major breakthrough of the 1870s, when scholars realized that sound correspondences are not merely rules of thumb that do not have to be strictly observed, and that apparent exceptions to sound laws can often be accounted for by stating them more accurately or by reconstructing additional different sounds in the parent language. The difference between Gothic d in fadar "father" and þ in broþar "brother," for example, both corresponding to t in Sanskrit, Greek, and Latin, proved to be correlated with the original position of the accent, a discovery known as Verner's Law (named for the Danish linguist Karl Verner). Thus, d appears when the preceding syllable was originally unaccented (fadar : Greek patér-, Sanskrit pitár-), and þ occurs when the preceding syllable was originally accented (broþar : Greek phrater "member of a clan," Sanskrit bhratar-). The knowledge and opinions that had accumulated by the end of the 19th century are largely incorporated in the German linguist Karl Brugmann's Grundriss der vergleichenden Grammatik der indogermanischen Sprachen (2nd ed., 1897-1916; "Outline of Comparative Indo-European Grammar", which remains the latest fullscale treatment of the family.

The parent language.

By comparing the recorded Indo-European languages, especially the most ancient ones, much of the parent language from which they are descended can be reconstructed. This reconstructed parent language is sometimes called simply "Indo-European," but in this article the term Proto-Indo-European is preferred.

Phonology.

In Proto-Indo-European there were at least 11 stop consonants. In the following grid these sounds are arranged according to the place in the mouth where the stoppage was made and the activity of the vocal cords during and immediately after the stoppage: Labial denotes a sound made with the lips; dental, with the tip of the tongue against the back of the teeth. The palatals were probably made by contact between the upper surface of the tongue and the hard palate (the roof of the mouth), like Hungarian ty and gy in atya and Magyar. The labiovelars were probably made by contact between the upper surface of the tongue and the soft palate (the area behind the hard palate), with a concomitant rounding of the lips. Voiceless designates sounds made without vibration of the vocal cords; voiced sounds are pronounced with vibration of the vocal cords. The exact pronunciation of the "voiced aspirates" is uncertain. There may also have been a voiced labial stop, b, but correspondences pointing to this are few, and rarely extend beyond immediately neighbouring languages. Correspondences that some scholars take as evidence for a set of plain velar consonants (made with the back of the tongue touching the soft palate), k, g, gh, are partly, perhaps entirely, the result of special developments of labiovelars and palatals in specific positions. The evidence for a set of voiceless aspirated stops ph, th, kh, kh, k h is extremely weak. (Aspirated consonants are sounds accompanied by a puff of breath.) There was one sibilant consonant, s, with a voiced alternant, z, that occurred automatically next to voiced stops. The existence of a second apical spirant, þ (presumed pronunciation like that of th in English "thin"), is extremely uncertain. Most scholars now agree that the parent language had one or more additional stop or spirant consonants, for which the label laryngeal is used. These consonants, however, have mostly disappeared or have become identical with other sounds in the recorded Indo-European languages, so that their former existence had to be deduced mainly from their effects on neighbouring sounds. Hence, the laryngeal sounds were not suspected until 1878, and even then they were rejected by most scholars until after 1927, when Kurylowicz showed that Hittite often has h (perhaps a velar spirant like the ch in German ach) in places where a "laryngeal" had been posited on the evidence of the other Indo-European languages. There is still considerable disagreement about how many "laryngeals" there were, what they sounded like, what traces they left, and how best to symbolize them. Probably there were three or four, which can be written H, H, H (and H ), and probably some or all of them were palatal or (labio-)velar spirants. The principal traces they left outside Anatolian are in the quality and length of neighbouring vowels, H (and H ) changing a neighbouring e to a, and H changing it to o, while all laryngeals lengthened a preceding vowel. In Anatolian, H and H remained as h, at least in some positions; H is tentatively set up to account for words with a that lack h in Hittite. When laryngeals between consonants disappeared, a vowel sometimes remained, as in Greek stasis, Sanskrit sthitis, Old English stede "a standing (place)" from Proto-Indo-European *stH tis. Scholars who do not posit "laryngeals" reconstruct a separate Proto-Indo-European vowel (called schwa indogermanicum) to account for these correspondences. Finally, there were the nasal sounds n and m, the liquids l and r, and the semivowels y and w. When y and w occurred between consonants, they were replaced by the vowels i and u. The nasals and liquids functioning as nuclei of syllables in this position (like the final sounds of English "bottom," "button," "bottle," "butter") are traditionally written n, m, l, r. Some scholars dispense with these diacritical marks and with the distinction between syllabic i and u and nonsyllabic y and w, but this obscures certain distinctions, such as that between -wn- in *kwnsu "among dogs," Sanskrit shvasu, and -un- in *tund- "shove," Sanskrit tundate. The vowel system of Proto-Indo-European was dominated by a pattern of alternation called ablaut. The alternant (called a grade) that occurs in a given syllable of a given form is only partly predictable from the shape of the rest of the word. The basic vowel of the system was e ("normal grade"), and the changes it could undergo were loss (zero-grade), change to o (o-grade), lengthening to e (lengthened grade), and lengthening plus change to o (lengthened o-grade). The stem ped- "foot," for example, appears as such in Latin ped-is (normal grade) "of a foot," as -bd- in Avestan fra-bd-a- (zero-grade) "fore-foot," as pod- in Greek pod-es (o-grade) "feet," as *ped- in Latin pes (lengthened grade) "foot" in the nominative singular, and as *pod- in English "foot" (lengthened o-grade). Ablauting forms whose basic vowel is a, o, e, a, or o in the recorded languages (e.g., Greek ag- "lead," op-"see," sta- "stand") are now believed to have had e preceded or followed by laryngeal in the parent language; e.g., *H eg- "lead," *H ek- "see," *steH - "stand." It is uncertain whether there were additional o and a vowels besides those arising by ablaut and from e next to a laryngeal. The vowels i and u did not participate in ablaut alternations, but rather functioned primarily as the syllabic realizations of the consonants y and w, as in *leyk - "leave," zero-grade *lik -, like *derk- "see," zero-grade *drk-. Long i and u in the recorded languages derive, at least in part, from sequences of i or u plus laryngeal; e.g., Latin vivus "alive" from *g iHwós. Thus the parent language had at least the following vowels: (In forming front vowels, the highest point of the tongue is in the front of the mouth; for back vowels, that point is in the back. High vowels are those in which the tongue is highest--closest to the roof of the mouth; mid vowels are made with the tongue between the extremes of high and low.) Of these vowels, i and u really functioned as consonants, and e, o, o were all conditioned alternants of e. But as noted above there may also have been i, u, a, and a second o. The accent just before the breakup of the parent language was apparently mainly one of pitch rather than stress. Each full word had one accented syllable, presumably pronounced on a higher pitch than the others.

Morphology and syntax.

The Proto-Indo-European verb had three aspects: imperfective, perfective, and stative. Aspect refers to the nature of an action as described by the speaker; e.g., an event occurring once, an event recurring repeatedly, a continuing process, or a state. The difference between English simple and "progressive" verb forms is largely one of aspect; e.g., "John wrote a letter yesterday" (implying that he finished it) versus "John was writing a letter yesterday" (describing an ongoing process, with no implication as to whether it was finished or not). The Anatolian languages lack a dimension of aspect, and it is not yet clear what the earlier system underlying both Anatolian and the rest of Indo-European was. The imperfective aspect, traditionally called present, was used for repeated actions and for ongoing processes or states; e.g., *sti-steH - "stand up more than once, be in the process of standing up," *wegh-e- "be in the process of conveying," *es- "be." The perfective aspect, traditionally called aorist, expressed a single, completed occurrence of an action or process; e.g., *cteH - "stand up, come to a stop," *wegh-s- "convey." The stative aspect, traditionally called perfect, described states of the subject; e.g., *woyd- "know," *ste-stoH - "be in a standing position." Verb roots were by themselves either perfective (like *steH - "stand") or imperfective (like *wegh- "convey," *es- "be"). This basic aspect, however, could be reversed by aspect markers; e.g., reduplication for imperfective, as in *sti-steH- (reduplication is the repetition of a word or part of a word), and -s- for perfective, as in *wegh-s-. The stative aspect was always marked by the o-grade of the root in the indicative singular (as in *woyd- "know"), and usually also by reduplication (as in *ste-stoH -); it had personal endings different from those of the other two aspects. From one aspect of a given verb the shape and even the existence of the other two aspects could not be predicted; for example, *es- "be" had only the imperfective aspect. Ways of forming imperfectives were especially numerous and often involved, in addition to their imperfective aspectual meaning, some other notion, such as performing the action habitually or repeatedly (iterative), or causing someone else to perform it (causative). One root could thus have several imperfective stems; so to the root *er- "move" there were at least a causative form, *r-new- "set in motion," and an iterative form, *r-ske- "go repeatedly." The Proto-Indo-European verb was also inflected for mood, by which the speaker could indicate whether he was making statements or inquiries about matters of fact; making predictions, surmises, or wishes about the future or about unreal but imagined situations; or giving commands. Compare English "If John is home now (he is eating lunch)" with the verb "is" in the indicative mood, discussing a matter of fact, with "If John were home now (he would be eating lunch)" with the verb "were" in the subjunctive mood, describing an unreal situation. There were two Proto-Indo-European suffixes expressing mood: -e- alternating with -o- for the subjunctive, corresponding roughly in meaning to the English auxiliaries "shall" and "will," and -yeH - alternating with -iH -for the optative, corresponding rouughly to English "should" and "would." Verbs without one of these two suffixes were marked for mood and tense by their personal endings. These personal endings basically expressed the person and number of the verb's subject, as in Latin amo "I love," amas "you (singular) love," amat "he or she loves," amamus "we love," and so on. In the imperfective and perfective aspects there were two sets of endings, distinguishing two voices: active, in which typically the subject was not affected by the action, and mediopassive, in which typically the subject was affected, directly or indirectly. Thus Sanskrit active yajati and mediopassive yajate both mean "he sacrifices," but the former is said of a priest who performs a sacrifice for the benefit of another, while the latter is said of a layman who hires a priest to perform a sacrifice for him. In the stative aspect there was no distinction of voice. (Voice indicates the relationship of the action expressed by the verb to the subject of the statement.) To mark mood and tense, verbs in the imperfective aspect that did not have a mood suffix had three sets of personal endings in both active and mediopassive voices: imperative, primary, and secondary. Verbs with imperative endings belonged to the imperative mood (used for commands); e.g., *s-dhí "be," *és-tu "let him be." Verbs with primary endings were marked as non-past in tense and indicative in mood; e.g., *és-ti "he is." (Indicative mood signifies objective statements and questions.) Verbs with secondary endings were unmarked for tense and mood, but were most typically used as past indicatives (e.g., *g hén-t "he slew") and to fill out gaps in the imperative paradigm (e.g., *s-té "be" in the plural, * ghn-té "ye slew; slay" in the plural). To mark such forms unambiguously as past indicatives, an augment, usually consisting of the vowel e, could be prefixed; e.g., *é-g hen-t "he slew," *est (= *é-es-t) "he was." Verbs in the perfective aspect without a mood suffix did not occur with primary endings, and so lacked a non-past indicative tense. Verbs in the stative aspect apparently lacked a distinction between primary and secondary endings, so that a form like *wóyd-e "he knows" meant also "he knew." The inflectional categories of the noun were case, number, and gender. Eight cases can be reconstructed: nominative, for the subject of a verb; accusative, for the direct object; genitive, for the relations expressed by English "of"; dative, corresponding to the English preposition "to," as in "give a prize to the winner"; locative, corresponding to "at," "in"; ablative, "from"; instrumental, "with"; and vocative, used for the person being addressed. For examples of some of these see Table 2. Besides singular and plural number, there was a dual number for referring to two items. Each noun belonged to one of three genders: masculine, to which belonged most nouns designating male creatures; feminine, to which belonged most names of female creatures; and neuter, to which belonged only a few words for individual adult living creatures. The gender of nouns not designating living creatures was only partly predictable from their meaning. Adjectives were nouns that varied in gender according to the gender of another noun with which they were in agreement, or, if used by themselves, according to the sex of the entity to which they referred; thus, Latin bonus sermo "good speech" (masculine), bona aetas "good age" (feminine), bonum cor "good heart" (neuter), or bonus "a good man," bona "a good woman," bonum "a good thing." The neuter of an adjective was identical with the masculine except for having different endings in nominative and accusative cases. Feminine gender was either completely identical with the masculine or derived from it by means of a suffix, the two commonest being *-eH - and *-iH - (*-yeH -). Demonstrative, interrogative, relative, and indefinite pronouns were inflected like adjectives, with some special endings. Personal pronouns were inflected very differently. They lacked the category of gender, and marked number and case (in part) not by endings but by different stems, as is still seen in English singular nominative "I"; oblique "my," "me"; plural nominative "we"; plural oblique "our," "us." (The oblique is any case other than nominative or vocative.) Some notable features of Proto-Indo-European syntax are: the non-ergative case system, that is, the subject of an intransitive verb is in the same case as the subject (rather than the object) of a transitive verb; concord (agreement) in case, number, and gender between adjective and noun; and use of singular verbs with neuter plural subjects, as in Greek panta rhei "all things flow," with the same verb as ho potamos rhei "the river (masculine) flows," contrasting with hoi potamoi rheousi "the rivers flow" (indicating that neuter plurals were originally collectives and grammatically singular).

Lexicon and culture.

Much less is known about the parent language's vocabulary than about its phonology and grammar. Sounds and grammatical categories do not easily disappear or undergo radical change in so many daughter languages that their former existence can no longer be detected. It is relatively easy, however, for an individual word to disappear or shift meaning in so many daughter languages that its existence or meaning in the parent language cannot be confidently inferred. Hence, from the linguistic evidence alone, scholars can never say that Proto-Indo-European lacked a word for any particular concept; they can only state the probability that certain items did exist, and from these items make inferences about the culture and location in time and space of the speakers of Proto-Indo-European. Thus is it supposed that the Proto-Indo-European community knew and talked about dogs (*kwón-), horses (*ékwo-), sheep (*H éwi-), and almost certainly cows (*g ów-) and pigs (*suH-). Probably all these animals were domesticated. At least one cereal grain was known (*yewo-), and at least one metal (*H eyos or *H eyos). There were vehicles (*wogho-) with wheels (* k eklo-), pulled by teams joined by yokes (*yugo-). Honey was known, and probably formed the basis of an alcoholic drink (*melit-, *medhu) related to the English "mead." Numerals up through 100 (*kmtóm) were in use. All this suggests a people with a well-developed Neolithic (characterized by simple agriculture and polished stone tools) or even Chalcolithic (copper-or bronze-using) technology. Location and date. Linguists have not found a reliable and precise way to determine from linguistic evidence alone the date at which any set of related languages must have begun diverging. The best that can be done is to estimate the degree of difference between the languages in question, taking into account all that is known about them, and then compare this estimate with the estimated degrees of difference within families of languages--such as the Romance family--whose actual time of divergence is approximately known. Using this sort of "dead reckoning," it can be said that the earliest attested Indo-European languages--Anatolian, Indo-Iranian, and Greek--are different enough that the parent language must have been split into several distinct languages well before 2000 BC, but similar enough that the first split into separate languages is not likely to have been much earlier than 3000 BC, and may have been later. For further progress the linguistic findings must be correlated with those of archaeologists and paleontologists to see if there was a population group within Eurasia that was relatively small and homogeneous before 3000 BC and that underwent considerable expansion and fragmentation beginning about 3000 BC--give or take a few centuries--such that some of its fragments can be ancestral to components of the cultures of the speakers of the various recorded Indo-European languages. The culture of this population group in the centuries around 3000 BC must also correspond to what can be inferred for Proto-Indo-European from the linguistic data. At present the archaeological evidence seems to find such a group in the Kurgan culture of the south Russian steppe, east of the Dnepr (Dnieper) River, north of the Caucasus, and west of the Urals. According to the Lithuanian-American archaeologist Marija Gimbutas, in Indo-European and Indo-Europeans (1970), this culture began spreading west c. 4000-3500 BC (Kurgan II), and began to occupy a really wide area stretching from eastern central Europe to northern Iran c. 3500-3000 BC (Kurgan III). Allowing a few centuries for the speech of widely separated bands to diverge to the point of becoming distinct languages, this agrees tolerably well with the date suggested by the linguistic evidence for breakup of the parent language. So far the Kurgan culture has been traced back to the 5th millennium BC; its earlier antecedents are still unknown. Remote relationship of Indo-European to the Uralic languages is very likely. Geographically, the earliest reconstructible locations of the two families are contiguous; lexically, there are strong resemblances in a number of basic words or word parts, including personal, demonstrative, interrogative, and relative pronouns, personal endings of verbs, the accusative case ending -m, and such words as those for "water" and "name"; typologically, the families are fairly similar (e.g., both have many suffixes, but few or no prefixes or infixes--elements inserted within words). The resemblances, however, are too few to permit the reconstruction of a common "Indo-Uralic" parent language; the two families must have separated several thousand years before the breakup of Indo-European. If Indo-European is related to other language families--e.g., to Hamito-Semitic (Afro-Asiatic) or Caucasian--it must have diverged from them much earlier than from Uralic, because the number of cogent resemblances is much smaller. There is no evidence that Indo-European originated by fusion of components from two or more distinct language families.

Characteristic developments of Indo-European languages.

As Proto-Indo-European was splitting into the dialects that were to become the first generation of daughter languages, different innovations spread over different territories. Indo-Iranian, Balto-Slavic, Armenian, and Albanian agree in changing the palatal stops *k, *g, and *gh into spirants (s, sh, th) or affricates; e.g., Sanskrit ashri- "sharp edge." Old Church Slavonic ostru "sharp," Armenian aseln "needle," Albanian athëtë "bitter" beside Greek ákros "tip," Latin acidus "biting," all from a basic element *Hek- "sharp, pointed." (Spirants, also called fricatives, are sounds produced with audible friction as a result of the air stream passing through a narrow, but unstopped, passage in the mouth; e.g., English s, f, v. Affricates are sounds that begin as stops, with complete stoppage of the air stream, but are released as spirants, or fricatives; e.g., the ch in "church," the j in "jam.") The languages that change the palatal stops to spirants or affricates are not separated from one another by any recorded languages that preserve the palatals as stops; so it is therefore inferred that the change to affricates (whence later spirants) occurred just once, and spread over a cohesive dialect area of Proto-Indo-European. Of the languages that share this change, however, Balto-Slavic shares with Germanic (including English) an m in certain case endings where other Indo-European languages, including Indo-Iranian, Armenian, and Albanian, have bh or a sound regularly developed from bh. Examples of the m ending include English "the-m" and Old Church Slavonic te-mu "to those ones"; the bh and related sounds (ph, v, b) are illustrated in the following: Sanskrit té-bhyas "to those ones," Armenian noro-vk' "with new ones," Albanian male-ve "to mountains," Greek ókhes-phin "with chariots," Latin omni-bus "for all." Because Balto-Slavic and Germanic are neighbours, it is inferred that m replaced bh in these case endings just once in the parent language, and that the area over which this innovation spread only partly overlapped the area that adopted affricated pronunciation of the palatals. This pattern is general for changes dating from the time the parent language was breaking up into distinct languages. Each of the resulting languages shares some innovations with some of its neighbours, but only rarely do different innovations shared by two or more branches of Indo-European cover exactly the same territory. Once the dialects had become differentiated enough to be distinct languages--probably by 2000 BC, at least in most cases--each largely went its own way, and agreements in developments since then are due either to borrowing across language boundaries (as in the notable convergences between Modern Greek, Albanian, Romanian, and the southernmost Slavic languages) or to parallel but independent workings out of the same base material.

Changes in phonology.

In phonology, the most striking changes have been loss or reduction in many languages of final or unaccented syllables, and loss in several languages of certain consonants between vowels, often followed by contraction of the resulting vowel sequence. Thus words in modern Indo-European languages are often much shorter than their Proto-Indo-European ancestors; e.g., English "four," Armenian c'ork', colloquial Persian car "four" from *k etwóres; French vit (pronounced vi) "lives" from *g íH weti; Russian dvesti "two hundred" from *duwoy kmtoy.

Changes in morphology.

Because much of the marking of Proto-Indo-European inflectional categories was done in final syllables, loss and reduction of these syllables have often had serious grammatical consequences. In the noun, loss of endings has generally led to loss or great reduction of the case and gender systems, while ways have generally been found to salvage the distinction between singular and plural. In Modern Persian, for example, where all final syllables have been lost, the old case and gender distinctions have disappeared also, but plural number is still regularly marked, either with -an (originally the genitive plural ending of some nouns) or with -ha (of obscure origin). In the verb, where more endings originally had two syllables, loss of final syllables has had less serious consequences for morphology. Even here, however, some languages, including English, have totally or almost totally given up the marking of subject by personal endings. Compare English "I, we, you, they love" and "he, she loves" with the Spanish conjugation for "love"--amo, amas, ama, amamos, amáis, aman--or the Russian version--ljubljú, ljúbish, ljúbit, ljúbim, ljúbite, ljúbjat. Changes in noun inflection have generally involved simplification. Almost everywhere the dual number has been lost; in many languages the noun genders have been reduced from three to two (as in French, Swedish, Lithuanian, and Hindi), or lost entirely (as in English, Armenian, and Bengali). Only Slavic has complicated the gender system, by imposing on the inherited distinctions contrasts of animate versus inanimate or of personal versus non personal. Everywhere except in the oldest Indo-Iranian languages the original eight Indo-European cases have suffered reduction. Proto-Germanic had only six cases, the functions of ablative (place from which) and locative (place in which) being taken over by constructions of preposition plus the dative case. In Modern English these are reduced to two cases in nouns, a general case that does duty for the vocative, nominative, dative, and accusative ("Henry, did Bill give John the letter?"), and a possessive case continuing the old genitive ("Bill's letter"). In languages such as French and Welsh, nouns are no longer inflected for case at all. In some languages, to be sure, nouns have begun fusing with words placed directly after the nouns to create new case systems, coexisting with relics of the old. Thus, Old Lithuanian had in addition to seven inherited cases an illative (place into), made by adding -n(a) to the accusative (peklosna "into hell"), an allative (place to, toward), made by adding -p(i) to the genitive (Jesausp "to Jesus"), and an adessive (place at which), made by adding -p(i) to the locative (Joniep "in John"). Changes in the verb have been more complex. Besides loss or merger of old categories, many new forms have been created and many old forms have acquired new values. In Ancient Greek the focus of the stative aspect (perfect) has largely shifted from the present state ("he is dead") to the previous event that led to this state ("he has died"). As a result, the perfect came to mean the same as the perfective past (aorist), and has therefore disappeared from Modern Greek. New forms created in Ancient Greek include future and future perfect tenses, based on the desiderative present forms (such as "he wants to walk") of the parent language. In Germanic the principal new creation was the weak past tense (ending in a t or d), such as English "loved," "thought," German liebte, dachte, made by combining the verb stem with a past tense of the Germanic verb for "do." (The strong past tense formed by vowel alternations, like "sing," "sang," "run," "ran," comes from the proto-Indo-European stative aspect.) In some languages participles (verbal adjectives) have come to function as finite verbs. Thus in Hindi mard stri-ko dekhta "the man sees the woman," dekhta "sees" is etymologically a participle "seeing," agreeing in number and gender with the subject mard "man." In the past tense, mard-ne stri dekhi "the man saw the woman," the verb dekhi is etymologically a past passive participle "seen," agreeing in gender and number with the object stri "woman," and the subject is marked with an instrumental ending.

Vocabulary changes.

Changes in vocabulary have been even greater than those in sounds and grammar. Words in modern Indo-European languages have several sources. They may be recognizable loanwords, such as English "skunk," "chain," and "inch" (from Algonkian, French, and Latin, respectively); they may have been formed within the history or prehistory of the language itself, such as English "radar" and "rightness"; they may be of obscure origin, such as English "drink," which is common Germanic but has no cognates outside Germanic, or "boy," which is peculiar to English and Frisian; or they may be inherited words that have changed meaning, such as English "merry" from Proto-Indo-European *mrghu- "short." Only a small fraction of the vocabulary can be traced back to words that can confidently be asserted to have existed in the parent language with approximately their present meaning. The same is true, albeit in a lesser degree, even for the oldest recorded Indo-European languages. None has more than a few hundred words and roots that are clearly inherited from the parent language without essential change of meaning. Table 1 gives examples of words widely retained with little change. Typically they include pronouns; nouns, verbs, and adjectives of relatively simple and ubiquitous meaning; numerals; and simple adverbs and prepositions.

Non-Indo-European influence on the family.

Indo-European languages, like all languages, have always been subject to influence from neighbouring languages, both related and unrelated. Influence of non-Indo-European languages on the sounds and grammar of Proto-Indo-European is not demonstrable, partly because there is no direct evidence about the languages that were in contact with Indo-European before 3000 BC. It can be surmised, however, that some words are loans; e.g., *pelekus "ax," a word for an object likely to be imported or learned of from neighbours with superior technology, and which is not analyzable into a known Indo-European root plus a known Indo-European suffix. When Indo-European languages have been carried within historic times into areas occupied by speakers of other languages, they have generally taken over a number of loanwords, as with English and Spanish in the Americas or Dutch in South Africa. Aside from the special case of the pidgin and creole languages, however, there has been very little effect on sounds and grammar. These have been significantly affected within historic times only when an Indo-European language has been spoken in prolonged close contact with non-Indo-European speakers, as with Ossetic (an Iranian language) in the Caucasus, or when its speakers have been very strongly influenced culturally by speakers of a non-Indo-European language, as with Persian, in which Arabic plays much the same role as Latin does in English. In prehistoric times most branches of Indo-European were carried into territories presumably or certainly occupied by speakers of non-Indo-European languages, and it is reasonable to suppose that these languages had some effect on the speech of the newcomers. For the lexicon, this is indeed demonstrable in Hittite and Greek, at least. It is much less clear, however, that these non-Indo-European languages affected significantly the sounds and grammar of the Indo-European languages that replaced them. Perhaps the best case is India, where certain grammatical features shared by Indo-European and Dravidian languages appear to have spread from Dravidian to Indo-European rather than vice versa. For most other branches of Indo-European languages any attempt to claim prehistoric influence of non-Indo-European languages on sounds and grammar is rendered almost impossible because of ignorance of the non-Indo-European languages with which they might have been in contact. (W.C.)

Anatolian languages

The term Anatolian languages in its most comprehensive use includes both the Indo-European and non-Indo-European languages spoken in Anatolia (Asia Minor) before the Greco-Roman period. The Anatolian languages are known only from texts of the 2nd and 1st millennia BC; the earliest evidence is that of the so-called Cappadocian tablets (19th-18th century BC). The term Asianic is sometimes used as an alternative designation for the Anatolian languages, but, since the discovery in 1915 that Hittite, the main Anatolian language, is an Indo-European language, there has been a tendency to use Asianic in a more restricted sense for the non-Indo-European languages that existed in Anatolia before the entry of the Indo-Europeans. These are called substratum languages. Hattic (or Hattian), also misleadingly called Proto-Hittite, is the best known substratum language. It is completely unrelated to Hittite and its sister languages as well as to Hurrian, a language also spoken in Anatolia. The Anatolian group of Indo-European languages consists of Hittite, Palaic, Luwian, Hieroglyphic Luwian, Lydian, and Lycian. Hittite, Palaic, and Luwian are known from 2nd-millennium cuneiform texts found in the excavations in Bogazköy-Hattusa since 1905; Hieroglyphic Luwian is found on scattered inscriptions and seals from Anatolia (mainly the southern area) and northern Syria dating mainly from later times (i.e., between c. 1200 and 700 BC, although there are earlier examples from the empire period, c. 1400-c. 1190 BC). Lydian and Lycian are known from texts in alphabetic script from c. 600 to 200 BC. It seems fairly reasonable to add the Carian language of southwest Anatolia to this list as well as other less well documented languages like Sidetic. More to the east, in the Caucasus region centring around Lake Van, Hurrian of the 3rd and 2nd millennia BC was replaced in the 1st millennium BC by the related Urartian language. Both of these languages are definitely non-Indo-European.

Historical background of ancient Anatolia.

It is customarily assumed that the Indo-Europeans entered Anatolia around or shortly after 2000 BC, although there are no specific archaeological data that might enable scholars to specify the period of entry or the route the invaders followed. On the basis of the agricultural terminology used in Hittite, it has been suggested that the entry into Anatolia was not a warlike invasion of predominantly male groups. If such had been the case, the influence of substratum languages would have been likely, but, on the contrary, the word stems used are definitely Indo-European. The differences in the terminology used in other Indo-European subgroups indicate that the "Anatolians" seceded from the parent group at an early date, before the common agricultural nomenclature came into being. On the other hand, Hittite shares the Indo-European notion of the hereafter, pictured as a pasture land with grazing cattle "for which the dead king sets out." There is a tendency among linguists to postulate an eastern route of entry into Anatolia by way of the Caucasus, because certain grammatical features--e.g., the loss of the feminine gender--might be explained as having been caused by prolonged contacts with Caucasian languages. It is likely that the Indo-European forebears of the later speakers of Hittite, Palaic, Luwian, and Lydian entered Anatolia together, following a common route, because the Anatolian languages share a considerable number of losses as well as innovations that presuppose a long common past. In the central parts of Anatolia, within the bend of the Halys River (modern Turkish, Kizil Irmak), and in the northern regions, Hittite and Palaic were profoundly influenced by Hattic as a substratum language. The Hattian culture also changed the political and religious concepts of the newcomers, and a clear cultural dependency of the Indo-Europeans on the older Hattian population is evident. Some scholars have stressed the likelihood that farther to the south the Luwians might have been conversant with a different substratum. In view of the absence of textual evidence, and because knowledge of the Luwian vocabulary is rather restricted, it is perhaps not surprising that this possible substratum element escapes definition. (For the history of Anatolia in the 2nd and 1st millennia BC, see TURKEY AND ANCIENT ANATOLIA: Ancient Anatolia.) The most important invaders of Anatolia in the "Dark Age" (after 1190 BC) were the Phrygians. Their language is definitely Indo-European, but it bears no relationship to the Anatolian subgroup. Rather, it seems akin to Thracian, Illyrian, or possibly Greek. Greek, in the second half of the 1st millennium BC, and, later, Latin, from the 2nd century onward, entered central Anatolia as languages of a ruling caste. Much earlier--beginning in Mycenaean times--the west coast had attracted Greek settlers. In the first half of the 1st millennium, the southern and northern shores also attracted Greek-speaking peoples. To the east in the Caucasus region, other Indo-Europeans, the Armenian-speaking invaders, penetrated into the former Urartian territory well before the beginning of the Persian period, probably in the 7th and 6th centuries BC. During Persian times, a Persian ruling caste entered eastern and also northeastern Anatolia and was still clearly recognizable in the Hellenistic and Roman periods (e.g., in Bithynia, Pontus, Cappadocia, and Commagene). Late data on names and scattered remarks made by Fathers of the Church indicate that until late Roman and perhaps even Byzantine times, some Anatolian dialects remained in use in certain isolated parts of the interior.

Classification of the languages.

Research on the Anatolian languages began in 1821 with the Lycian language and passed an initially fruitful phase in the 1880s with work on Hieroglyphic Hittite (nowadays referred to as Hieroglyphic Luwian). In 1902 the Norwegian Assyriologist Jørgen Alexander Knudtzon's study on the Arzawa letters was published; these were two letters exchanged between a king of Arzawa and Pharaoh Amenhotep III that had been found in the Amarna archive. They were written in the Hittite language in cuneiform writing. In 1915 research reached a climax with the interpretation of Cuneiform Hittite by the Czech Orientalist Bedrich Hrozný. In all four of these highlights, the discovery that the texts in question were Indo-European was either clearly expressed or more discreetly implied. This conclusion was based on both the nominal (noun) declension and the verbal conjugation: the languages had a nominative ending in -s, the accusative in -n, verbal endings like -ti and -nti for the 3rd person singular and plural of the present tense, and an imperative form like estu "let it be." These features were deemed to be sufficient proof of their Indo-European origin. Study of the Anatolian subgroup of Indo-European thus began with Lycian, the last Anatolian offshoot in the temporal sequence, then passed the intermediary stage of Hieroglyphic Luwian, and reached the 2nd-millennium Hittite language in 20th-century research. For the relationship between members of the Anatolian subgroup, see Figure 2. The non-Indo-European Hurrian and Urartian languages are related to one another, but modern research indicates that Urartian should not be considered as a direct continuation of Hurrian.

HISTORY AND DEVELOPMENT Languages using cuneiform writing and Anatolian hieroglyphs.

Hattic.

The Hattic language appears as hattili in Hittite cuneiform texts. Called Proto-Hittite by some, it was the language of the linguistic substratum inside the Halys River bend and in more northerly regions. Apparently the Indo-European newcomers of Hittite stock were named with the same designation as their predecessors. All the Hattic material preserved by Hittite scribes belongs to the religious sphere of life: rituals (e.g., connected with the erection of a new building), incantations, antiphons, litanies, and myths. Among the Hattic interpolations in Hittite texts, there are some to which a Hittite translation has been added. It is impossible to ascertain the length of time that the Hattians had been present in Anatolia before the Indo-Europeans entered the country, but it seems certain that during the Hittite New Empire (c. 1400-c. 1190 BC) Hattic was a dead language. Hattic studies began in 1922 with the work of the German Assyriologist Emil Forrer. In 1935, Hans G. Güterbock, a German-born Orientalist, published a large group of texts containing Hattic material and in so doing completed the publication of the Hattic texts stemming from the Winckler excavations (1905-12). Important studies on the subject have continued to appear since then.

Hittite.

The Hittite language is known from the approximately 25,000 tablets or fragments of tablets preserved in the archives of Bogazköy-Hattusa, excavated by German archaeologists beginning in 1905. In Hittite cuneiform texts, the language is referred to as nesili (nasili) "language of Nesa," or nesumnili "language of the Neshite." Earlier Hittite linguistic material may be found in the indigenous proper names and a few loanwords from the local dialect that are recorded in the Cappadocian tablets (the commercial correspondence in Assyrian of Assyrian colonists living in Anatolia, especially in the emporium at Kültepe, near modern Kayseri, between c. 1900 and 1720 BC). The data from Kültepe are sometimes referred to as "Kaneshite" (from Kanesh, the old name of Kültepe); this is obviously the modern equivalent of the word kanisumnili "language of the Kaneshite" found in a Hittite text. It is possible, or even likely, that Kanesh and Nesa do, in fact, refer to the same entity. Hittite tablets from places outside of the Hittite capital are rare; only stray examples have been found--e.g., in Tarsus, Alalakh, Ugarit, and Amarna. These findings attest to the growth of a great Hittite empire, especially between c. 1400 and c. 1190 BC. Old Hittite, the written embodiment of the earliest Indo-European language that has been discovered so far, is known from some tablets preserved in an "old ductus" type of handwriting that was typical of copies from the Old Kingdom period (c. 1700-1500 BC). The "Dark Age" between c. 1500 and c. 1400 BC is sometimes referred to as the period of the so-called Middle Hittite language. Most of the Old and Middle Hittite texts, however, are preserved in copies from the later empire period. The archives of Bogazköy-Hattusa have been found in various places in the citadel, in the Great Temple complex, and in the "House on the Slope." Although the majority of the texts are concerned with religious subjects (oracle texts, hymns, prayers, myths, rituals, and festival texts), these archives also contain material of historical, political, administrative, literary, and legal character. The cuneiform adopted by the Hittite scribes is a variant of a writing system of Mesopotamian origin that closely resembles the ductus and shapes prevalent in tablets of the 17th century BC (layer VII) from Alalakh (modern Atsana in southeastern Turkey). It is possible that the cuneiform script might have been introduced as a result of the Hittites inducing Syrian scribes to transfer their activities to the Hittite capital during the early part of the Old Kingdom, shortly after 1650 BC. It has also been posited, with good reason, that the newly acquired script was first used to write Akkadian and was only later employed for Hittite as well. In addition to the genres enumerated above, the "scholarly literature" deserves to be mentioned. This consists of the material considered by the scribes to be essential for their training; it includes word lists, omens, and ritual prescriptions, all reflecting an encyclopaedic approach aimed at complete coverage of the subjects concerned. The Sumerian texts found in these archives belong to this class of literature. For treaties and correspondence with foreign powers, Akkadian was used as the diplomatic language of that period. Therefore, both Sumerian and Akkadian formed part of the curriculum of the qualified scribes, these languages belonging to the "eight languages" found in the Hittite archives. In actual fact, the first decipherer of Hittite was the Norwegian scholar J.A. Knudtzon, who pointed out in 1902 that the language of the so-called Arzawa letters (i.e., Hittite)--found in the Amarna archive--had an apparent affinity with Indo-European. Because the cuneiform script had already been deciphered, Knudtzon, and Bedrich Hrozný after him, were able to "read" their texts. Thus their discovery consisted more in the interpretation than in the actual decipherment of the written material. The first series of German excavations, lasting from 1905 to 1912, produced about 10,000 tablets. It was work on this corpus that familiarized Hrozný with the contents of these tablets and led him to his epoch-making discovery that Hittite was indeed Indo-European (1915).(See also WRITING: Cuneiform.)

Palaic.

Palaic, which appears as Palaumnili "language of the Palaite" in Hittite cuneiform texts, was the language of the region of Pala (probably Blaëne in the Greek period), in northwest Anatolia. During the Old Hittite kingdom, Pala, Luwiya, and Hattusa formed the three major provinces of the Anatolian part of the Hittite territory. From the intermediary "Dark Age" onward, Kaska nomads made their influence felt in northern Anatolia, and this resulted in a decline of importance for this region. The Indo-European character of Palaic was first advocated by Emil Forrer (1922). Part of the text material is preserved on tablets in "old ductus." The knowledge of the limited vocabulary leaves much to be desired, but parallels--especially in the inflection of the noun, the forms of the demonstrative, relative, and enclitic pronouns, and the verbal endings--vouch for a close relationship to Hittite and Luwian.

Luwian.

Luwian (or Luvian), the language of Anatolia's southern coast, is known from texts stemming from three major periods: (1) the Hittite New Empire (c. 1400-c. 1190 BC); (2) the period of the Neo-Hittite states (c. 1190-c. 700 BC); (3) the period of the Lycian monumental inscriptions (c. 400-200 BC). In addition to the various time periods, there is also a variation in writing system--Mesopotamian cuneiform, Anatolian hieroglyphs, and an alphabet derived from a Greek source--and dialectal differentiation. There are indications that as early as the 15th and 14th centuries BC, there was a West Luwian dialect (the precursor of alphabetic Lycian) and an East Luwian dialect (the forerunner of the later Hieroglyphic Luwian of the Neo-Hittite states). Both of these differed from the Luwian found in the archives of Bogazköy-Hattusa, which was possibly a central dialect. As in the case of Palaic, the pioneering work on Luwian written in cuneiform was done by Emil Forrer (1922). Following this work, new text materials were published in 1953, closely followed by both grammatical and vocabulary studies as well as a standard dictionary of Cuneiform Luwian (1959). The Anatolian hieroglyphic system has a long history, with its logographic beginnings dating back to early Hittite stamp seals of the 18th and 17th centuries BC; the youngest texts seem to date from the last quarter of the 8th century BC. The geographical range of the inscriptions is great, stretching from Sipylus and Karabel in the extreme west to Alaca Hüyük and Bogazköy-Hattusa in the north, Malatya, Samsat, and Tell Ahmar (Til Barsib) in the east, and Hama and ar-Rastan in the south. During the "Dark Age" of the 16th and 15th centuries BC, the early writing grew into a fully developed writing system with logograms (word-signs), syllabic values, and auxiliary signs. During the New Empire, the script was already in use for a multitude of purposes (rock inscriptions, seals, and wooden tablets for everyday use in the temple and the army). Whether an example of the empire period such as the Aleppo inscription already reflects the Luwian language is a moot question but seems likely. It is certain that the later inscriptions of the Neo-Hittite states were in Luwian. The first attempts to decipher Hieroglyphic Luwian, made by the British archaeologist Archibald H. Sayce, were fortunate in some fundamental details, but it was not until the 1930s that systematic and mutually stimulating research by scholars of several countries led to the establishment of a number of syllabic values for the characters as well as to a correct analysis of the sentence structure of the inscriptions. In his publication of the (bilingual) Hittite royal seals (in 1940, 1942), Hans G. Güterbock bridged the gap between the inscriptions of the empire period and the late Neo-Hittite states; the seals found in the French excavations at Ugarit (in northern Syria) served a similar purpose. The most important recent finding was the discovery in 1947 by Helmuth T. Bossert, a German archaeologist, of the Karatepe bilingual inscriptions, written in Phoenician and Hieroglyphic Luwian. On many points the Luwian vocabulary is still an enigma. The unity between the various Luwian dialects and the close relationship of Luwian to the other members of the Anatolian subgroup, however, is secured by several linguistic parallels, especially in the singular inflection of the noun, the forms of certain pronouns, the verbal endings, and a number of lexical (vocabulary) correspondences.

Hurrian.

In earlier stages of research, the terms Mitanni language and Subarian were used as designations for Hurrian. In Hittite cuneiform texts, hurlili "language of the Hurrian" is used. In the last centuries of the 3rd millennium BC, Hurrians were already present in the Mardin region, which, from a geographical point of view, belongs to the North Mesopotamian plain. In Mesopotamian texts (from the time of the Akkad dynasty) some Hurrian personal names and glosses have been found. The customary assumption is that this non-Semitic and also non-Indo-European ethnic group had come from the Armenian mountains. During the beginning of the 2nd millennium BC, the Hurrians apparently spread over larger parts of southeast Anatolia and northern Mesopotamia. Still later, during the intermediary "Dark Age," they are supposed to have infiltrated into Cilicia and the adjacent Taurus and Antitaurus regions (Kizzuwatna in 2nd millennium texts). Before the middle of the 2nd millennium BC, an Indo-Aryan ruling caste wielded some type of authority over parts of Hurrian territory. Some names and words in ancient Near Eastern texts bear witness to their presence. Among these words are a group of technical terms related to the training of horses that found its way into Hittite treatises on that subject; they are most important from a historical point of view. After Sumerian, Akkadian, Hattic, Palaic, and Luwian, Hurrian and these Indo-Aryan glosses constitute the sixth and seventh additional languages of the Hittite archives. Hurrian texts have been found in Urkish (Mardin region, c. 2300 BC), Mari (on the middle Euphrates, 18th century BC), Amarna (Egypt, c. 1400 BC), Bogazköy-Hattusa (Empire period), and Ugarit (on the coastline of northern Syria, 14th century). Amarna yielded the most important Hurrian document, a political letter sent to Pharaoh Amenhotep III. From Mari came a small number of religious texts; from Bogazköy-Hattusa, literary and religious texts; and from Ugarit, vocabularies belonging to the more "scholarly literature" described above and Hurrian religious texts in Ugaritic alphabetic script. Hurrian personal names, found in texts from many sites (Bogazköy-Hattusa, Alalakh, Ugarit, and especially Nuzu), constitute a second linguistic source of major importance. The research on Hurrian started in the 1890s with simultaneous contributions by several scholars. Subsequently, Bedrich Hrozný (1920) and Emil Forrer (1919, 1922) discovered the presence of Hurrian material in the Bogazköy-Hattusa archives.

Urartian.

The terms Chaldean and Vannic have also been used as designations for Urartian during earlier stages of research. Urartian is not a late dialect of Hurrian but a separate language, although both stem from a common parent. During the 9th through 6th centuries BC, Urartian was used in northeastern Anatolia as the official language of the state of Urartu, which centred around the district of Lake Van but also extended over the Transcaucasian regions of modern Russia and into northwestern Iran and at times even into parts of North Syria. The Urartian texts are written in a variant of the Neo-Assyrian script and consist mostly of monumental inscriptions (annals, votive inscriptions related to building and irrigation activities), some small inscriptions on helmets and shields dedicated in the temple, and a few economic cuneiform tablets. Two bilingual inscriptions in Urartian and Assyrian that apparently correspond very closely provided the key to the understanding of the language; the stylistic resemblances to Assyrian texts of the same period guided the further interpretation. Archibald H. Sayce was the first scholar to devote his attention to Urartian in the 1880s and 1890s and continued his activities until 1932. More important were the philological contributions of the German historian Carl F. Lehmann-Haupt between 1892 and 1935. The first reliable description of Urartian grammar was published by the German Orientalist Johannes Friedrich (1933). Next to the Urartian texts in cuneiform writing, there also existed an indigenous hieroglyphic script that is still undeciphered and is too meagrely represented to warrant a serious attempt.

Dialects.

The six modern Iranian languages discussed above are the only ones that have an established literary tradition. They are not, however, homogeneous, each having its own dialect divisions. No definitive dialect classification has yet been made, nor indeed has any attempt at systematic classification of the whole range of Iranian languages won wide acceptance. The usual practice, followed here, is simply to list the main languages in groups of varying size, arranged on a roughly geographic basis. There are two main dialects of Ossetic: the eastern, known as Iron, and the western, known as Digor (Digoron). Of these, Digor is the more archaic, Iron words being often a syllable shorter than their Digor counterparts--e.g., Digor madä, Iron mad "mother." Iron is spoken by the majority of Ossetic speakers and is the basis of the literary language. Chosen in the 19th century for the translation of the Bible, it is still the official language today. Little is known of the other Ossetic dialects. A small amount of the Ossetic dialect of Tual in the south, which differs little from Iron, was published in Georgian script at the beginning of the 19th century. Yaghnabi is still spoken by a small number of people southeast of Samarkand, Uzbekistan. It has two main dialects, eastern and western, which differ only slightly. The characteristic difference is between a western t sound and an eastern s sound from an older [{theta}] sound (as th in English "thin")--e.g., western met, eastern mes "day," beside Sogdian me[{theta}] (Christian Sogdian my[{theta}]). Dialects of the Shughni group are spoken in the Pamirs. Closely related to this group is Yazgulami. A period of a Yazgulami-Shughni common language (protolanguage) has been postulated by some scholars, after which it separated first into Yazgulami and Common Shughni; and then Common Shughni gradually divided into Sarikoli, Oroshori-Bartangi, Roshani-Khufi, and Bajuvi-Shughni. Sarikoli, the easternmost of these dialects, is spoken in northwestern China. Speakers of Wakhi number 10,000 or so in the region of the upper Pyandzh (Panj) River. Vakhan (Wakhan), the Persian name for the region in which Wakhi is spoken, is based on the local name Wux, a Wakhi development of *Waxsu, the old name of the Oxus River (modern Amu Darya). (An asterisk denotes a hypothetical, unattested, reconstructed form or word.) The Wakhi language is remarkably distinct from its neighbours and has many archaic features. Around the bend of the Amu Darya and in the valley of the Varduj River to the southeast, a few people speak dialects of the Sanglechi-Ishkashmi group. This group is clearly distinguished from its neighbours but is closely related to the other languages of the Pamirs. Some 6,000 people speak dialects of the Yidgha-Munji group. Monjan is a very remote valley located in northern Afghanistan, and it is separated by a mountain pass from the Sanglechi-speaking region. Yidgha is spoken in the valley of the Lutkho River and in the nearby city of Chitral, a region now in Pakistan. Yidgha-Munji is most closely related to Pashto. The existence of two dialectal groups within Pashto has long been known. Thus, the word Pashto represents a southwestern dialect form (pasto), in contrast to a northeastern (paxto). According to one hypothesis, Pashto literature, which exists certainly from the 17th century and possibly from the 11th, was created among the northeastern tribes. Two minor dialects, Waziri and Wanetsi, have some features of special interest. Although spoken in a few villages in Afghanistan, two languages have features closely associating them with Western Iranian. These are Parachi, spoken in the Hindu Kush north of Kabul, and Ormuri, found in two dialects, one in the Lowgar River valley south of Kabul and the other in Kaniguram in Waziristan. Farther south is the wholly West Iranian language Balochi, mentioned above. Despite the vast area over which Balochi is spoken, its numerous dialects are all mutually intelligible. The most recent study of the Balochi dialects divides them into six groups: Eastern Hill dialects; Rakhshani dialects including that of Mary; Sarawani; Kechi; Lotuni; and the coastal dialects. Of these, Rakhshani is the most widely spoken and is used for broadcasting both in Pakistan and in Afghanistan, but the coastal dialects have the greatest prestige and the most extensive literature. In the southeastern corner of Iran, Balochi gradually gives way to the Bashkardi dialects. In central Iran the influence of Modern Persian is everywhere strongly felt, and it is often difficult to distinguish between dialects of Modern Persian, Persian with dialectal traits, and closely related languages. In the cities of Yazd and Kerman the Parsis speak the old Gabri dialect, whereas the Muslims speak Persian. Among other central dialects are Natanzi, Soi, Khunsari, Gazi (near Esfahan), Sivandi (northeast of Shiraz), Vafsi, and Ashtiyani, to name but a few. Semnani, spoken east of Tehran, forms a transitional stage between the central dialects and the Caspian dialects. The latter are divided into two groups, Gilaki and Mazandarani (Tabari). Also closely related is Talishi, spoken on the west coast of the Caspian Sea on both sides of the border with Azerbaijan. To this northwestern group belong the so-called southern Tati dialects spoken south and southwest of Qazvin, as well as the scarcely known dialects of Harzan and Galinqaya spoken northwest of Tabriz. The name Tati is usually applied to the dialects spoken in Russian Dagestan and northeastern Azerbaijan. They differ little from Modern Persian. Of the several dialects of Fars province, only Lari, southeast of Shiraz, is notably distinctive. Kumzari in Oman and the Lur dialects of the southwest also differ little from Persian. There are many dialects of Kurdish, the widely spoken West Iranian language that is thought to occupy a dialectal position intermediate between Balochi and Persian. Three main dialect groups can be distinguished--northern, central, and southern. A systematic study has been made of the dialects of Iraq, which include 'Aqrah (Akre), 'Amadiyah, Dahuk, Shaykhan, and Zakhu in the northern group, and Irbil (Arbil), Bingird, Pishdar (Pizhdar), Sulaymaniyah (Suleimaniye), and Warmawah in the central group. The Central Mukri dialect is spoken in the extreme west of Iran, south of Lake Urmia. Gorani is spoken in several dialects, mainly in the Zagros Mountains, and it is strongly influenced by the surrounding Kurdish dialects. The Gorani dialect of Hawraman, Hawrami, is notable for its many archaic features. Closely related to Gorani is Zaza (Dimli), which is spoken west of Iran.

Historical survey of the Iranian languages.

The Iranian protolanguage and its development.

By the time Iranian begins to be attested in the 6th century BC, the language is already found differentiated into several distinct languages. Scholars have reconstructed the sound system and some of the grammatical features of Common Old Iranian, the protolanguage that preceded these dialects. The phonological system that underlay Common Old Iranian was by and large maintained everywhere throughout the Iranian-speaking world. It consisted of the following distinctive consonant sounds: Unfamiliar symbols are taken from the International Phonetic Alphabet, or are conventional transcriptions (e.g., š for the sh sound in "ship," ž for the zh sound in "ažure," c for ch in "church," and j for j in "jam"). The voiced fricatives (i.e., the first three consonants represented in the fourth column--{voiced velar fricative con.}, , and ð), which are produced with vibrating vocal cords and local friction, may be regarded as variants of the voiced stops (e.g., g, b, d); but they are characteristic of Iranian languages generally and especially of the eastern Iranian languages. In addition to these sounds Old Persian had another sibilant sound, often transcribed as ç or ss, which developed from the cluster r (pronounced as the thr in "three"). In Middle Persian it fell together with the s sound. The most noticeable alteration of the old sound system is the introduction in some languages of additional series of consonants under the influence of neighbouring languages. Thus, Ossetic has a series of ejective sounds (uttered with a simultaneous glottal stop) on the pattern of the unrelated Caucasian languages; and a number of Iranian languages have a retroflex series (produced with the tongue tip curled up toward the roof of the mouth) as a result of contact with Indo-Aryan languages. Some of the differences between Iranian languages arose as a result of different developments of the earlier sounds. Thus, the Indo-European sounds k, g, and gh resulted in Indo-Iranian sh, z, and zh, which in turn became s, z, and ž, respectively, in Avestan but, d, and d in Old Persian. Hence, Indo-European *kmtó- "hundred" became Indo-Iranian *shatá-, attested by Old Indo-Aryan shatá-, and then Avestan sata-, but Old Persian ata-. Nevertheless, and d as well as s and z belong to the basic pattern, the difference being merely distributional. The main source of differentiation is in the variation of consonant cluster development and that of groups of consonants and semivowels. Here again it is mainly a question of distributional differences. Thus, the Indo-European group *ku{circumflex} became Indo-Iranian *shu{circumflex}, retained in Old Indo-Aryan in the spelling shv of the standard transcription. Indo-Iranian *shu{circumflex} developed variously in Iranian: s in Old Persian, sp in Avestan and Median, sh (written shsh) in Khotanese, and s in Wakhi. These developments can be seen in the following forms of the Indo-European word *eku{circumflex}o- "horse": Old Indo-Aryan áshva-, Avestan and Median aspa-, Old Persian asa-, Khotanese ashsha-, and Wakhi yas. Yet another development can be seen in Ossetic, in which the word for "mare," Avestan aspa-, appears as Digor äfsä and Iron yäfs. The vowel system of Common Old Iranian consisted of short and long varieties of a, i, and u, and a neutral vowel (similar to the a in "sofa"). This analysis assumes that the Indo-Iranian vocalic r (r) had already developed to r in Proto-Iranian, just as its long counterpart became ar. An early and general monophthongization of the diphthongs ai and au to e and o, respectively, also must be considered characteristic, although it should not be ascribed to Common Old Iranian as is sometimes done. This basic system was almost everywhere maintained, sometimes with the addition of one or two distinctive vowel sounds (phonemes).

The Old Iranian stage.

Old Persian was the language of the Achaemenid court. It is first attested in the inscriptions of Darius I (ruled 522-486 BC), of which the longest, earliest, and most important is that of Bisitun. At Bisitun are also inscribed versions of the same text in Elamite and Babylonian, and fragments of an Aramaic version on papyrus documents from Elephantine (modern Jazirat Aswan) also exist. Old Persian words and names also are to be found in large numbers as loanwords in contemporary Elamite sources and in 5th-century-BC Aramaic documents. As early as the time of Darius the Great's successor, Xerxes I (ruled 486-465 BC), the inscriptions show linguistic tendencies characteristic of the development from Old to Middle Persian. After Xerxes the production of original Old Persian inscriptions declined, probably as a result of the wider adoption of Aramaic and Elamite as the usual means of writing. With Artaxerxes III (ruled 359/358-338 BC), Old Persian inscriptions came to an end. The break is marked by Alexander's destruction of Persepolis in 330 BC. By far the largest part of attested Old Iranian is written in the language now usually called Avestan, after the Avesta, the name given to the collection of works forming the scripture of the Zoroastrians. The name itself is Middle Persian. In former times this language was called Zend, another Middle Persian word, which refers to the Middle Persian (Pahlavi) commentary on the Avesta. Because the homeland of the Avestan language was long thought to be in Bactria, it was often in the past called Bactrian. Bactrian is now used to designate a different Iranian language belonging to the Middle Iranian period. Since the beginning of the 20th century it has been generally accepted that the homeland of the Avesta was Khwarezm, which in ancient times included both Merv and Herat. Merv is now in Turkmenistan, Herat in northwestern Afghanistan. The oldest part of the Avesta is known as the Gathas, the poems composed by Zoroaster (Zarathustra), the founder of the Zoroastrian religion. His date is uncertain but is traditionally ascribed to the 7th to 6th century BC. The so-called Khurda Avesta ("Little Avesta") is a miscellany of texts of later date, the oldest parts of which may have been composed about 400 BC. The language of the Khurda Avesta is different in many details from that of the more archaic language of the Gathas, and it may even represent a different dialect. Many uncertainties surround the detailed interpretation of the Avesta as a result of the method of transmission. The Avesta was not recorded until after the language had ceased to be used, except by Zoroastrian priests. The present manuscripts date from the 13th century and later, although they reflect the recording of the priestly tradition in the special Avestan script during the 6th century AD.

The Middle Iranian stage.

Middle Persian, the major form of which is called Pahlavi, was the official language of the Sasanians (AD 224-651). The most important of the Middle Persian inscriptions is that of Shapur I (d. AD 272), which has parallel versions in Parthian and Greek. Middle Persian was also the language of the Manichaean and Zoroastrian books written during the 3rd to the 10th century AD. The extant literature of the Zoroastrian books is much more extensive than that of the Manichaean texts, but the latter have the advantage of having been recorded in a clear and unambiguous script. Moreover, the Middle Persian of the Zoroastrian books does not simply represent the spoken language of the writers of the 9th-century Zoroastrian texts. It is probable that they spoke early Modern Persian and that their speech often impinged upon their writing but that they strove to write the Middle Persian of several centuries earlier as it was attested in the inscriptions of the early Sasanian dynasty when Middle Persian was the koine. By contrast, in the case of Manichaean Middle Persian, some texts survive unchanged from the 3rd century AD, the time of the Persian teacher Mani himself (AD 216-274). Very little Parthian survives from the pre-Sasanian period. A large number of Parthian ostraca (inscribed pottery fragments) from the 1st century BC were discovered at Nisa near modern Ashkhabad, but they are inscribed in ideographic Aramaic (i.e., Aramaic writing that uses Aramaic words as symbols to represent Parthian words). Dating before the 3rd century are a document from Hawraman, some coin legends, and a dated grave stele. The most copious and important material in Parthian is the work of the Sasanian kings of the 3rd century, who added a Parthian version to their inscriptions--Hajjiabad, Naqsh-e Rustam (Ka'be yi Zardusht), and Paiküla. A few decades later Parthian disappeared as a result of the rise of the Sasanians and the predominance of their native tongue, Middle Persian. Manichaean Parthian of the 3rd century was preserved as a church language in Central Asia. The oldest surviving Sogdian documents are the so-called Ancient Letters found in a watchtower on the Chinese Great Wall, west of Tun-huang, and dated at the beginning of the 4th century AD. Most of the religious literature written in Sogdian dates from the 9th and 10th centuries. The Manichaean, Buddhist, and Christian Sogdian texts come mainly from small communities of Sogdians in the T'u-lu-p'an (Turfan) oasis and in Tun-huang. From Sogdiana itself there is only a small collection of documents from Mt. Mugh in the Zarafshan region, mainly the business correspondence of a minor Sogdian king, Dewashtich, from the time of the Arab conquest about 700. The relationship of the various forms of Sogdian to one another has not yet been sufficiently investigated, so that it is not clear whether different dialects are represented by the extant material or whether the differences can be accounted for by reference to other relevant factors, such as differences of script, period, subject, style, or social milieu. The importance of social milieu can be seen by comparing the elegant Manichaean literature directed to the court with the more vulgar language of the Christian literature directed to the lower classes. Of the Saka dialect known as Tumshuq very little has survived, and despite its evidently close relationship to the much better known Khotanese dialect, full interpretation has proved difficult. Knowledge of Khotanese is more firmly based on a substantial corpus of material, including extensive bilingual texts. Although the chronological range of the extant Khotanese material is limited to only a few centuries, probably the 7th to the 10th, a rapid development of the language is apparent. At the phonological level, most noticeable is the loss of syllables between the older and later stages of the language. Thus, hvatana- "Khotanese" at the oldest stage is successively weakened to hvatäna-, hvamna-, hvana-, hvam. At the morphological level, most striking is the tendency to simplify the case endings and even to replace them by analytical expressions, constructions of two or more words. Thus, Late Khotanese has raksaysa hiya rade "kings of the raksasas," whereas Old Khotanese would have raksaysänu rrunde. The Old Khotanese -änu ending is unmistakably genitive plural, but the Late Khotanese -a is merely a general oblique plural ending and has been reinforced by hiya "own," used to mean "of." Khotan was a great centre of Buddhism during the 1st millennium AD, and all the surviving literature in Khotanese is either Buddhist or coloured by Buddhism. Even in business documents and official letters the Buddhist background is usually not difficult to discern. It can scarcely be coincidental that the Buddhist literature of Khotan, flourishing so vigorously during the 10th century, ended abruptly with the Muslim conquest at the beginning of the 11th. Little survives of Bactrian and Scytho-Sarmatian. Knowledge of Bactrian is based almost entirely on a single inscription of 25 lines from Ateshkadeh-ye Sorkh Kowtal in northern Afghanistan. Even less is known of Scytho-Sarmatian. Little is also known of Old Khwarezmian; that is, Khwarezmian written in the indigenous Khwarezmian script. Apart from a few coin legends and inscriptions on silver vessels, the material that survives consists of inscriptions of the 2nd century AD from Topraq-qal'ah (Toprakkala) and of the 7th from Toqqal'ah, archaeological sites in Uzbekistan. Much more is known of Late Khwarezmian, written in the Arabic script. This material is found mainly in two Arabic works, the 13th-century fiqh work of Mukhtar az-Zahidi, called the Qunyat almunyah, and the Arabic dictionary Muqaddimat al-Adab of az-Zamakhshari (1075-1144), of which a manuscript glossed in Khwarezmian was found.

Modern Iranian.

Of the modern Iranian languages, by far the most widely spoken is Persian, which, as already indicated, developed from Middle Persian and Parthian (with elements from other Iranian languages such as Sogdian) as early as the 9th century AD. Since then, it has changed little except for acquiring an increasing proportion of loanwords, mainly from Arabic. Persian has been a literary language since the 9th century, and there is an increasing awareness of the continuity of its literary tradition with the earlier periods. As the national language of Iran in succession to Middle Persian, it has for centuries strongly influenced the other Iranian languages, especially on Iranian territory. In fact, it seems likely that, with the increase of modern methods of communication, Persian will eventually supplant entirely most of the other languages and dialects. Against this trend stand only Kurdish and Balochi, the speakers of which tend to regard their languages as an expression of their particular identities. Nevertheless, even Kurdish and Balochi have been and continue to be strongly influenced by Persian. Outside Iran the situation is rather different. In Afghanistan the first national language is Pashto, even though Persian is the official second language. Pashto became the official language by royal decree in 1936, and literary activity has been encouraged by the Pashto Tolana (Pashto Society) of Kabul. During the Soviet period both Ossetic and Tajik received official encouragement; nevertheless, both languages were displaced by the Russian language as the language of administration. Other languages also compete with Ossetic and Tajik. Though it has a large body of folk epics, Ossetic became a literary language only in the second half of the 19th century. By contrast, the neighbouring Georgian has a still flourishing ancient literary tradition dating back to the 5th century AD and has many more speakers. Tajik, on the other hand, has a lifeline through its close connection with Persian, but it too has been retreating before Uzbek, an unrelated language of the Turkic group.

Characteristics of the Iranian languages.

All Iranian languages show in their basic elements the characteristic features of an Indo-European language. Apart from the extensive borrowing of Arabic words in Modern Persian, the Iranian languages have scarcely been affected by unrelated languages, with the notable exception of Ossetic, which has been strongly influenced by the neighbouring Caucasian languages. Some dialects of Tajik have been very receptive to Uzbek elements. In the case of languages in contact with Indian civilization, the most noticeable non-Iranian feature often taken over is the Indo-Aryan series of retroflex sounds. These are foreign to Indo-Aryan itself, being a result of the influence of the Dravidian languages. The elaborate phonological and morphological structure of the Indo-European parent language has been progressively simplified in the development of the Iranian languages. The basic phonological structure of Common Old Iranian has on the whole been maintained, but the morphological system has continued to be simplified. There has been a constant move in almost all Iranian languages toward an analytic structure; i.e., the use of prepositions and word order rather than case endings to indicate grammatical relationships.

Phonology.

The most characteristic features of the Iranian phonological system are those that distinguish it from the Indo-Aryan system. These are the development of various fricative sounds (indicated in phonetic symbols as x, f, , and later {back half-close vowel}, , ð), and of the voiced sibilant sounds z and z. Even in Iranian, however, these sounds did not persist universally. In western Middle Iranian the sound was lost, and it is rare in the modern languages. In Pashto the inherited f sound has been discarded. Baluchi, except in the extreme east, is entirely without fricatives. Voiced bilabial and dental fricative sounds ( and ð) were recorded in some early manuscripts of Modern Persian, but they became b and d by the 13th century Two negative features have also resulted in differentiation between Indo-Aryan and Iranian. One is the result of the coalescence in Proto-Iranian of aspirated and unaspirated voiced stops. Thus, Indo-European *b and *bh were maintained in contrast in Indo-Aryan as b and bh, but they fell together in Iranian as b. This resulted in an alteration of the phonological structure because the number of consonant contrasts (oppositions) was reduced. The other negative feature is the absence of the retroflex consonants from Iranian except as a later importation in contiguous regions. Other divergences in development, such as the change of an s sound to h in Iranian, brought about a difference in distribution rather than in structure because h developed also in Indo-Aryan but from Indo-Iranian *zh and *gh before front vowels (e.g., e and i). The features discussed here are illustrated in Table 6. In Old Iranian the stress lay on the next to the last syllable if it was heavy (i.e., contained a long vowel or was closed by a consonant)--otherwise on the preceding syllable. With the loss of final unstressed vowels in the development of many Iranian languages, the stress often came to be on the final syllable. End stress is characteristic of Modern Persian.

Grammar.

In Old Persian the Indo-European inflectional system appears considerably simplified. In particular, the genitive and the dative coalesced into one case and the instrumental and ablative into another. Moreover, in the plural the nominative and accusative cases are not distinguished. This reduced system is still found in the Middle Iranian period in Old Khotanese and to a certain extent in Sogdian. Eastern Iranian is in this respect more conservative than western. By the Middle Iranian period, western Iranian had abandoned nominal (noun, adjective, pronoun) inflection altogether, as is the case with Middle and Modern Persian and with Parthian. In some languages, both western and eastern, two or, rarely, three cases survive. Ossetic is quite exceptional in maintaining an elaborate case system; it is partly a result of secondary, purely Ossetic developments. The elaborate conjugational system of the Indo-European verb followed a similar path to disintegration. In particular, the whole past tense system was given up by the Middle Iranian period. Only a few relics remain of the Indo-European system, such as the partial survival of the augment (a prefixed vowel or lengthening of the initial vowel) in the Sogdian imperfect tense. But a new past tense system developed, based on the old past participle, often combined with auxiliary verbs. Many languages distinguish between transitive and intransitive verbs in the past tense system; and in some, such as Khotanese and Pashto, even gender and number are distinguished. The present tense system was far better preserved. The dual number was in retreat in Old Iranian and is not attested later. The middle voice, a form that indicates that a person or thing both performs and is affected by the action represented, was generally abandoned by the Middle Iranian period, although middle voice inflection is well represented in Khotanese. With these qualifications, the endings of the present indicative (active) have been generally well preserved. A variety of imperative, subjunctive, and optative forms, partly based on inherited forms and partly the result of innovation, is found especially in the eastern languages, including Ossetic. Rigidity of word order is, on the whole, most characteristic of those languages, such as Persian, that have gone furthest in the reduction of the inherited morphological system.

Vocabulary.

The Islamic conquest of Iran during the 7th century entailed not only a change of religion but also a change of language. The sacred language of Islam was Arabic, and the proportion of Arabic words used in Persian rapidly increased until it reached something like the 40 to 50 percent of the present day. Before the introduction of the Arabic element, most loanwords were mainly from other Iranian languages. Most familiar is the extensive borrowing from Median found in Old Persian. In later periods, Modern Persian borrowed words extensively from Turkish and from European languages. Persian is itself the donor language in the case of the other Iranian languages, all of which have drawn upon its vocabulary. Buddhism was similarly responsible for the large proportion of Indo-Aryan words, both Sanskrit and Prakrit, found in Sogdian and especially in Khotanese. A considerable Indian element occurs in the vocabulary of those modern Iranian languages that have been or are in contact with modern Indo-Aryan languages in the northwest, such as Lahnda and Sindhi. There the Dardic languages have also been influential. Baluchi has also borrowed from Brahui, a Dravidian language spoken in Baluchistan in Pakistan. Ossetic occupies an exceptional position. Most of its Persian and Arabic borrowings have come to it through Turkish, but more striking are the large number of words borrowed from the Caucasian languages, especially Georgian. In modern times, Ossetic continues to be influenced by Russian.

Writing systems.

Iranian languages have been written in many different scripts during their long history, although various forms of Aramaic script have been predominant. Modern Persian is written in Arabic script, which is of Aramaic origin. For writing the Persian sounds p, c, ž, and g, four letters have been added by means of diacritical marks. By the addition of further letters, this Perso-Arabic script has been adapted to write not only the other main modern Iranian languages, Pashto, Kurdish, and Baluchi, but also those minor ones that are occasionally recorded. An advantage of the use of this consonantal script is that by not defining vowel qualities it is possible to include local dialect variations to a considerable extent. Two modern Iranian languages spoken on Soviet territory are currently written in a modified version of the Russian alphabet: Tadzhik and Ossetic. Soviet scholars have, however, tended to use modified Latin alphabets to record the minor languages that have no literary tradition, such as some of the Pamir languages. Ossetic has also been written in the Georgian script. Old Persian was written with a cuneiform syllabary, the origin of which is still hotly disputed. Middle Persian, Parthian, Sogdian, and Old Khwarezmian were recorded in various forms of Aramaic script. Two forms of this script as they developed for writing Sogdian were adopted by the Uighurs. In its cursive form this script spread even further, to the Mongols and Manchus. Three other scripts are important for the remaining Middle Iranian languages: Greek script for Bactrian, Arabic script for Late Khwarezmian, and varieties of Central Asian Brahmi script of Indian origin for Khotanese and Tumshuq. The Aramaic script was not systematically adapted to the writing of Middle Iranian; and despite the introduction of a variety of diacritical marks to differentiate letters, considerable ambiguity remained. Moreover, several letters tended to coalesce in form. In this respect, the Pahlavi script, used for writing the Middle Persian of the Zoroastrian books, developed furthest. In it, the original 22 letters of the Aramaic alphabet have been reduced to 14, which are further confused by the use of numerous ligatures (linked letters). It was the realization that this script was inadequate to record precisely the traditional pronunciation of the sacred text of the Avesta that led the Zoroastrian priests to devise the elaborate Avestan script, which, with its 48 distinct letters formed by differentiation out of the 14 used for Pahlavi, was well suited to the task. (See also WRITING.) (R.E.E.)

Greek language

Greek is an Indo-European language whose history can be followed from the 14th century BC to the present day. Its documents cover a longer period of time (34 centuries) than those of any other Indo-European language. There is an Ancient phase, subdivided into a Mycenaean period (texts in syllabic script from the 14th to the 12th centuries BC) and Archaic and Classical periods (beginning with the adoption of the alphabet, from the 8th to the 4th centuries BC); a Hellenistic and Roman phase (4th century BC to 4th century AD); a Byzantine phase (5th-15th centuries AD); and a Modern phase. Separate transliteration tables for Classical and Modern Greek accompany this article. Some differences in transliteration result from changes in pronunciation of the Greek language; others reflect convention, as for example the (chi or khi), which was transliterated by the Romans as ch (because they lacked the letter ....

????

---------------------------------------------------------------------------

Indo-Iranian languages

The Indo-Aryan languages and the Iranian languages together constitute the Indo-Iranian language group, the easternmost major branch of the Indo-European family of languages. Indo-Aryan (Indic) languages are spoken by some 800 million persons in India, Pakistan, Sri Lanka, Nepal, Bangladesh, and other areas of the Himalayan region.
In addition, languages of the Indo-Aryan group are spoken by about 5,000,000 people in Europe, Africa, the Americas, and Oceania: the Gypsy, or Romany, dialects that are distributed about parts of Asia, the Middle East, Europe, and North America are of Indo-Aryan origin. Speakers of Iranian languages number in the tens of millions and live in areas extending from Pakistan to Iran, Afghanistan, Transcaucasia, and Central Asia. Among the Indo-European languages, only Linear A and Linear B and Hittite possess records that go back farther in time than those of Indo-Iranian. The Indo-Iranian tongues have been used as both administrative and literary languages.
Old Persian was the administrative language of the early Achaemenian dynasty dating from the 6th century BC; and an eastern Middle Indo-Aryan dialect was the language of the chancellery of the Mauryan emperor Ashoka in India in the mid-3rd century BC. As literary languages, the Indo-Iranian languages were used in the texts of some of the world's great religions: Indo-Aryan for Buddhism, Hinduism, and Jainism, and Iranian for Zoroastrian and Manichaean texts. The oldest Zoroastrian texts are in dialects included under the name Avestan. Commerce, conquest, and religion spread the influence of these languages.
Indo-Aryan languages, for example, penetrated deep into Southeast Asia; names in Indonesia and other areas and Sanskrit texts in Cambodia reflect this influence. The close relation between the Iranian and Indo-Aryan groups has never been doubted. They share characteristic features that set them apart as a subgroup of Indo-European. The long and short varieties of the Indo-European vowels e, o, and a, for example, appear as long and short a: Sanskrit manas- "mind, spirit," Avestan manah-, but Greek ménos "ardour, force." (In the following examples, a macron ({macron}) indicates a long vowel; a breve ( [{breve}]) indicates a short vowel.

The spellings used in this article for Indo-Aryan and Iranian forms are traditional transliterations for the most part. In some cases, more accurate phonetic symbols are used. These can be found in the International Phonetic Alphabet.) In instances in which some Indo-European languages have an a sound, Indo-Iranian has i as a reflex of Indo-European sounds called laryngeals--e.g., Greek pater "father," Sanskrit pitr-, Avestan and Old Persian pitar-. After stems ending in long or short a, i, or u, an n occurs sometimes before the genitive (possessive) plural ending am (Avestan -am)--e.g., Sanskrit martyanam "of mortals, men" (from martya-); Avestan masyanam (from masya-); Old Persian martiyanam.
In addition to several other similarities in their grammatical systems, Indo-Aryan and Iranian have vocabulary items in common--e.g., such religious terms as Sanskrit yajña-, Avestan yasna- "sacrifice"; and Sanskrit hotr-, Avestan zaotar- "a certain priest"; as well as names of divinities and mythological persons, such as Sanskrit mitra-, Avestan mi[{theta}] ra- "Mithra." Indeed, speakers of both language subgroups used the same word to refer to themselves as a people: Sanskrit arya-, Avestan airya-, Old Persian ariya- "Aryan."
The Indo-Aryan and Iranian language subgroups also differ from each other in a number of linguistic features, among them that Indo-Aryan has an i sound representing an Indo-European laryngeal sound not only in initial syllables but generally also in interior syllables; e.g., Sanskrit duhitr- "daughter" (cf. Greek thugáter). In Iranian, however, the sound is lost in this position; e.g., Avestan dug[{schwa}]dar-, du{voiced velar fricative con.}dar-. Similarly, the word for "deep" is Sanskrit gabhira- (with i for i), but Avestan jafra-.
Iranian also lost the accompanying aspiration (a puff of breath, written as h) that is retained in certain Indo-Aryan consonants; e.g., Sanskrit dha "set, make," bhr, "bear," gharma- "warm," but Avestan and Old Persian da, bar, and Avestan gar[{schwa}]ma-. Further, Iranian changed stops such as p before consonants and r and v to spirants such as f: Sanskrit pra "forth," Avestan fra; Old Persian fra; Sanskrit putra- "son," Avestan pu[{theta}] ra-, Old Persian pussa- (ss represents a sound that is also transliterated as ç). In addition, h replaced s in Iranian except before non-nasal stops (produced by releasing the breath through the mouth) and after i, u, r, k; e.g., Avestan hapta- "seven," Sanskrit sapta-; Avestan haurva- "every, all, whole," Sanskrit sarva-. Iranian also has both xs and s sounds, resulting from different Indo-European k sounds followed by s-like sounds, but Indo-Aryan has only ks; e.g., Avestan xsayeiti "has power, is capable," saeiti "dwells," but Sanskrit ksayati, kseti. Iranian was also relatively conservative in retaining diphthongs that were changed to simple vowels in Indo-Aryan.

Iranian differs from Indo-Aryan in grammatical features as well. The dative singular of -a-stems ends in -ai in Iranian; e.g., Avestan masyai, Old Persian cartanaiy "to do" (an original dative singular form functioning as infinitive of the verb). In Sanskrit the ending is extended with a--martyay-a. Avestan also retains the archaic pronoun forms yus, yuz [{schwa}]m "you" (nominative plural); in Indo-Aryan the -s- was replaced by y (yuyam) on the model of the 1st person plural--vayam "we" (Avestan vaem, Old Persian vayam). Finally, Iranian has a 3rd person pronoun di (accusative dim) that has no counterpart in Indo-Aryan but has one in Baltic.
The original location of the Indo-Iranian group was probably to the north of modern Afghanistan, in the present-day states of Tajikistan, Uzbekistan, Kyrgyzstan, Turkmenistan, and Kazakhstan, where Iranian languages are still spoken. From there, some Iranians migrated to the south and west, the Indo-Aryans to the south and east. From geographical references in the earliest Indo-Aryan literary document, the Rigveda, it is clear that the earliest settlement of Indo-Aryans was in the northwest of the Indian subcontinent.
Migration did not take place at once; there was doubtless a series of migrations. The date of entry of the Indo-Aryans into the subcontinent cannot be precisely determined, though the beginning of the 2nd millennium BC is plausible and generally accepted. There is heated controversy concerning the precise linguistic position of the language of the Indo-Iranian family first attested in Middle Eastern cuneiform texts of c. 1450-1350 BC. Some borrowed words and proper names appearing in these Hittite-Hurrian documents have been interpreted as belonging either to Indo-Iranian, to an Indic subgroup of Indo-Iranian that had not yet fully split, or to Indo-Aryan proper. Complete scholarly agreement on this issue has not been reached. The identification of the Harappan peoples of the Indus Valley, whose writing has not yet been satisfactorily deciphered, also awaits further research; with it may come a possible answer as to whether Indo-Aryans encountered these people or whether their civilization had passed by the time the Indo-Aryans arrived on the subcontinent. Whatever the answers to these problems may be, the reasons for the split of the Indo-Aryans and Iranians are not known. In the following presentation regarding Indo-Aryan documents as evidence for linguistic history, it should be borne in mind that almost all dates are approximations.

THE INDO-ARYAN LANGUAGES

Languages of the group.

Indo-Aryan languages are assigned to three major periods: Old, Middle, and New Indo-Aryan. These periods are linguistic, not strictly chronological. Old Indo-Aryan includes different dialects and linguistic states referred to in common as Sanskrit. The most archaic Old Indo-Aryan is that of sacred texts called Vedas. Classical Sanskrit is the name given to the literary language that represents a polished form of various dialects. The late Vedic dialect described by the grammarian Panini (c. 6th century BC) is also commonly called Classical Sanskrit. Middle Indo-Aryan includes both the dialects of inscriptions from the 3rd century BC to the 4th century AD and literary languages. Apabhramsha dialects represent the latest stage of Middle Indo-Aryan development.
Though all Middle Indo-Aryan languages are included under the name Prakrit, it is customary to speak of the Prakrits as excluding Apabhramsha. New Indo-Aryan is represented by such modern vernaculars as Hindi and Bengali, which began to emerge from about the 10th century AD. These too have earlier and later stages, culminating in the present-day languages. New Indo-Aryan languages accounted for about 490,000,000 speakers in India, or approximately 74 percent of the population in the early 1980s.
Considering the approximately 85,000,000 Bengali speakers in Bangladesh, approximately 63,000,000 speakers accounted for by Punjabi and Sindhi in Pakistan, and 11,000,000 Sinhalese (Sinhala) speakers in Sri Lanka (formerly Ceylon), the total number of New Indo-Aryan speakers is well over 650,000,000. According to the latest Indian census, there are 547 mother tongues of the Indo-Aryan group in use within the bounds of postpartition (1947) India. Some of these are dialects that are used by few speakers; others are official state languages having 30,000,000 or 50,000,000 speakers. The major groups of New Indo-Aryan languages are given in Table 4. Structurally and historically, Hindi and Urdu are one, although they are now official languages of different countries written in different alphabets. The term hindi (also hindvi) is known from as early as the 13th century.
The term zaban-e-urdu "language of the imperial camp" came into use in about the 17th century. In the south, Urdu was used by Muslim conquerors of the 14th century. Many of the languages in Table 4 are official state languages, the media of education up to the university level and of official transactions. Hindi, written in the Devanagari script, is the co-official language (with English) of the Republic of India and is used as a lingua franca throughout North India. It has varieties according to the mother tongue of the area; e.g., Bombay Hindi and Calcutta Hindi. Each of the major state languages has several other dialects in addition to the standard dialect adopted for official purposes. Including the various dialects down to the village level, it can be said that a chain of communication stretches across North India such that each dialect forms a link with each adjacent dialect. On the level of official languages this is not so: a Gujarati speaker will not readily understand colloquial Bengali.

Historical survey of the Indo-Aryan languages.

The points noted above regarding Indo-Aryan migration make it difficult to determine the domain of Proto-Indo-Aryan, the ancestral language of all the known Indo-Aryan tongues, if indeed there was any such single region. All that can be said with certainty is that the Indo-Aryans on the subcontinent first occupied the area comprising most of present-day Punjab (both West and East), Haryana, and the Upper Doab (Ganges-Yamuna interfluve) of Uttar Pradesh. The structure of Proto-Indo-Aryan must have been close to that of early Vedic, with dialectal variations. Old Indo-Aryan. The most archaic Sanskrit is that of the Vedas, of which there are four major text groups called Samhitas: the Rigveda, Atharvaveda, Samaveda, and Yajurveda.
The Yajurveda is in turn divided into two main branches, the White (Shukla) Yajurveda and the Black (Krishna) Yajurveda. The Rigveda, Atharvaveda, and Samaveda are purely metrical texts mainly used by priests in their ritual. The texts of the Black Yajurveda contain both verses used in ritual sacrifice (called mantras) and prose sections that are explanatory in nature, giving mythological explanations of sacrifices and objects used in them, together with etymologies (derivations of words). These sections are known as Brahmana portions. Each Veda also has a particular Brahmana connected with it. The early Vedic texts are pre-Buddhistic; a plausible date accepted for the composition of the Rigveda is between 1200 and 1000 BC, though the exact chronology of these early texts is difficult to establish. The prose passages of Brahmanas and of the early sutra (aphoristic texts) period may be called late Vedic. Also of the late Vedic period is the grammarian Panini, author of a treatise called Astadhyayi, who makes a distinction between the language of sacred texts (chandas) and the usual language of communication (bhasa).
Epic Sanskrit is so called because it is represented principally in the two epics, Mahabharata and Ramayana. In the latter the term samskrta "formed, polished" is encountered, probably for the first time with reference to the language. The date of composition for the core of early Epic Sanskrit is considered to be in the centuries just preceding the Christian era. Classical Sanskrit is the language of the major poetic works (kavya), drama (nataka), tales such as the Hitopadesha and Pañca-tantra, and technical treatises on grammar, philosophy, and ritual. It was used not only by the poet Kalidasa and his predecessors Bhasa, a dramatist, and Ashvaghosa, a Buddhist author, in the first centuries AD but was also continued long after Sanskrit was a commonly used mother tongue; indeed, Sanskrit is a language of learned treatises and commentaries to this day. It is also used as a lingua franca among pandits (Brahmin scholars) from different areas of India. Linguistic developments can be traced from the early Vedic of the Rigveda through the later Samhitas on to the late Vedic of Brahmana prose and sutras, culminating in the language described by Panini, which is tantamount to Classical Sanskrit. For example, the nominative plural form ending in -asas (devasas "gods") was already less frequent than -as in the Rigveda and continued to lose ground later; in Brahmana, -as (e.g., devas) is the normal form.

There are numerous other changes evident. For example, the instrumental singular form of -a-stems ends both in -a and -ena (a pronoun ending) in the Rigveda, with the latter form predominating; thus, virya "heroic might" appears once, and viryena occurs ten times (from virya- "heroic might, act"). In later Vedic -ena is the usual ending. All the early Vedic forms are expressly classed as belonging to the sacred language (chandas) by Panini. The verb also shows chronological differences. For example, the 1st person plural ending -masi (e.g., bharamasi "we bear") predominates over -mas in Rigvedic but not in the Atharvaveda; -mas becomes the normal ending later. Early Vedic distinguishes between the aorist, imperfect, and perfect tenses.
The aorist is commonly used to refer to an action that has recently taken place; the imperfect is a narrative tense referring to actions accomplished in the distant past. The perfect form of the verb originally denoted, as in Greek, a state reached; e.g., bi-bhay-a "is afraid" (root bhi). From earliest Vedic, however, this was not always the use of the perfect. Although the grammarian Panini distinguished between the three tenses noted (he said the perfect is used to denote an action beyond one's ken), the perfect and imperfect both came to be used as narrative tenses. There are also future forms of Vedic, formed with suffixes (-isya and -sya) and used from earliest times.
A future form, composed of an agent noun of the type kartr- "doer" and followed, except in the 3rd person, by forms of the verb as "be" (e.g., kartasmi [karta asmi] "I will do"), was recognized as in common use by Panini but is rare in early Vedic. Early Vedic had a category that went out of use by the late Vedic period of Brahmanas--the injunctive, which was formally a form with secondary endings lacking the augment, a prefixed vowel. The injunctive could be used to denote a general truth. A general truth can also be signified by the subjunctive, which is characterized by the vowel a affixed to the present, aorist, or perfect stem.
Later Vedic retained the injunctive only in negative commands of the type ma vadhis "do not slay." The subjunctive also diminished slowly until it was no longer used; for Panini the subjunctive belonged to sacred literature. The functions of the subjunctive were taken over by the form called optative (and the future form). Noun forms incorporated into the verb system are numerous in early Vedic. Rigvedic has forms with affixes ya and tva functioning as future passive participles (gerundives); e.g., vac-ya- "to be said," kar-tva- "to be performed, done." The Atharvaveda has, additionally, forms with -(i)tavya (hims-itavya- "to be injured") and -aniya (upa-jiv-aniya- "to be subsiisted upon"). By late Vedic, the type with tva had been eliminated; Panini recognized as normal the types karya-, kartavya-, karaniya- "to be done."
In Indo-Aryan, from earliest Vedic down to New Indo-Aryan, forms called absolutives (or gerunds) are used to denote the previous of two or more actions performed (usually) by one agent: "having done . . . he did"; for example, piba nisadya "sit down (nisadya "having sat down") and drink." Rigvedic uses tvi, tva, tvaya, (t)ya to form absolutives, but these were later reduced to two: tva with a simple verb or one compounded with the negative particle, and ya with a verb compounded with a preverb (a preposition-like form).
Early Vedic also uses various case forms of action nouns in the capacity of infinitives; e.g., dative singular -tave (da-tave "to give"), genitive singular -tos (da-tos), both from a noun in -tu, which also supplies the accusative ending -tum (da-tum). There are other types in early Vedic, but the nouns in -tu are important; in late Vedic the accusative -tum and the genitive -tos (construed with ish or shak "be able, can") became the norm. According to Panini, forms in -tum and dative singular forms of action nouns are equivalent variants: bhok-tum gacchati/ bhojanaya gacchati "He is going out to eat."
That some forms fell into disuse in the course of Indo-Aryan is natural; the above represent both chronological and dialectal modifications. Such change was recognized by Indian grammarians; e.g., Patañjali, of the mid-2nd century BC, noted that perfect forms of the type ca-kr-a "you did, have done" (2nd person plural) were not in use at his time; instead, a nominal (adjective) form kr-ta-vant-as was used, consisting of the past passive participle kr-ta- and an adjectival suffix -vant. Indian grammarians also recognized the existence of different dialects. Panini noted forms used by northerners (udicya) and easterners (pracya), as well as various dialectal uses described by grammarians who preceded him. Earlier documents also afford evidence for dialect variation; e.g., the early Vedic of the Rigveda is a dialect in which the Indo-European l sound was for the most part replaced by r--pra "fill," pur-na- "full." This change accords with Iranian; e.g., Avestan p rna "full." These forms contrast with Latin plenus and Gothic fulls, with l. Other dialects kept l and r distinct.
There are also doublets that have both r and l in words with Indo-European r: rohita-/lohita- "red." The variant with l can be assumed to belong to an eastern dialect. This variance accords with Middle Indo-Aryan evidence and the fact that such l forms become more numerous in the tenth book (mandala) of the Rigveda, which is demonstrably more recent than the most ancient parts of the Rigveda and dates from a time when the Indo-Aryans had progressed farther east than their original location on the subcontinent.
The development of retroflex l- and lh- sounds (produced by curling the tip of the tongue upward toward the hard palate) from the retroflex sounds of d (nila- "nest" from nida-) and dh when occurring between vowels is another feature characteristic of some dialects, including the major dialect of the Rigveda. Classical Sanskrit represents a development of one or more such early Old Indo-Aryan dialects. At this state, the archaisms noted above have been eliminated. Moreover, the accentual system of Classical Sanskrit is not the same as that of Vedic, which had a system of pitches; vowels had low, high, or circumflex (first rising, then falling) pitch, and the particular vowel of a word that received high pitch could not be predicted. In Classical Sanskrit, on the other hand, the accent was probably predictable. If the next to the last vowel was long, it received the accent; if not, the vowel preceding it was accented. The Vedic system survived at least to the time of Panini, who described it fully and did not restrict it to sacred language.

For all this simplification, Classical Sanskrit is considerably more complex than Middle Indo-Aryan. In addition to the vowels a, i, and u (in both long and short varieties), it has r and l used as vowels. Consonant clusters occur freely, except in word final position, and the system of sound modification conditioned by the context, called sandhi, is fully operative. Moreover, in its grammatical system Classical Sanskrit maintains the dual number, seven cases in addition to the vocative form (which marks the one addressed), and a complex set of alternations.
For example, to the nominative singular form agni-s "fire," correspond the genitive singular agne-s "of fire" the nominative plural agnay-as "fires," and the instrumental plural agni-bhis "with fires," with differing vowels in the second syllable. There are also separate sets of nominal (noun) and pronominal (pronoun) endings. Some nouns and adjectives inflect as pronouns; e.g., ekasmai, dative singular masculine-neuter of eka- "one." The verb system of Classical Sanskrit also maintains complex alternations. In the present tense of the type bhav-a-ti "becomes, is," the stem (bhav-a-) remains unchanged throughout the paradigm except for lengthening of the -a- to -a- before v and m.
But other verbs have vowel alternation; e.g., as-mi "I am," s-mas "we are"; e-mi "I go," i-mas "we go"; juhomi "I pour," juhumas "we pour." A distinction is observed between active and mediopassive endings: jan-ay-a-ti "engenders" with the active ending -ti, but ja-ya-te "is born" with the mediopassive ending -te. (Mediopassive verb forms are used for the passive, reflexive, and other meanings.) Classical Sanskrit also has a rich system of nominal and verbal derivatives. Compound words are of the following kinds: copulative (dvandva) compounds such as matapitarau "mother and father" (also elliptic pitarau "parents"); the type like tat-purusa- "his man," in which the first member is equivalent to a case other than nominative; the type like bahu-vrihi "much-rice," in which the object denoted is other than that of any of the members of the compound (bahur vrihir yasya "He who has much rice"); and adverbial compounds (avyayibhava) of the type upagni (upa-agni) "near the fire." In addition, there are derivatives with affixes -tara- and -tama, such as priya-tara- "very dear" and priya-tama- "most dear" from the adjective priya-.
Pronouns have derivatives equivalent to case forms; e.g., tatra "there," yatra "where," and kutra "where?" are equivalent to locative forms such as tasmin, yasmin, and kasmin. These can also be used without a noun. Among the derivative verbal systems are the causative and the desiderative ("desire to"); the former has an affix -ay- (gam-ay-a-ti "makes to go," kar-ay-a-ti "has do") or, after roots in -a, -pay- (stha-pay-a-ti "sets in place"). The desiderative is formed with -sa- and reduplication (repetition of a part of the root)--di-drk-sa-te "desires to see" (root drsh). The desiderative also has an agent noun in -u--di-drk-s-u "who wishes to see." Middle Indo-Aryan. The Sanskrit word prakrta, whence the term Prakrit, is a derivative from prakrti- "original, nature."
Grammarians of the Prakrits generally consider the original from which they derive to be the Sanskrit language as described by grammarians going back to Panini. Most modern scholars consider prakrta to refer to the "natural" languages, the vernaculars, as opposed to Sanskrit, the polished language of literature and the educated (shista). There is also linguistic evidence to support this view. Several forms in the Prakrits are found in Vedic but not in Classical Sanskrit. As Classical Sanskrit is not directly derivable from any single Vedic dialect, so the Prakrits cannot be said to derive directly from Classical Sanskrit. The most archaic literary Prakrit is Pali, the language of the Buddhist canon (c. 5th century BC) and of the later stories and commentaries of Theravada Buddhism.
Pali represents essentially a western Middle Indo-Aryan dialect, though there are sufficient easternisms in the canon to have led some scholars to the view that the canon as it exists today is a recast of an original in an eastern dialect. To the Buddhist literature also belongs the Gandhari Dhammapada, the only literary text written in a dialect of the northwest. The Niya documents, official documents written in Prakrit dating from the 3rd century AD, also belong to the northwest. The earliest inscriptional Middle Indo-Aryan is that of the Ashokan inscriptions (3rd century BC). These are more or less full translations from original edicts issued in the language of the east (from the capital Pataliputra in Magadha, modern Patna in Bihar) into the languages of the areas of Ashoka's kingdom.
There are other Prakrit inscriptions up to the 4th century AD, and Sanskrit was not used inscriptionally until the first centuries AD. Literary Prakrits other than Pali were also used in independent works and in dramas along with Sanskrit. According to Prakrit grammarians, Maharastri ("From the Maharashtra Country") is the Prakrit par excellence. It is the language of kavyas (epic poems) such as the Ravanavaha (also called Setubandha) from no later than the 6th century AD.

Maharastri is also the language of lyrics in Rajashekhara's Karpura-mañjari (c. 900), the only extant drama written completely in Prakrit, and of verses recited by women in the classical drama of Kalidasa and his successors, though not earlier. The literary dialect used for conversation among higher personages other than the king and his captains in the drama is Shauraseni, while Magadhi is used by lower personages.
The language of the early Jaina canon, the final version of which was made in the 5th or 6th century AD, is called Ardhamagadhi ("Half Magadhi"); Jaina also used another literary dialect, called Jaina Maharastri in non-canonical works. The oldest poetic work in this is Vimala Suri's Paumacariya (c. 3rd century). Of other Prakrit dialects mentioned by grammarians, Paishaci (or Bhuta-Bhasa, both meaning "Language of Demons") is noteworthy; it is said to be the language of the original Brhatkatha of Gunadhya, source of the Sanskrit book of stories Katha-saritsagara. Buddhist works were also written using a language that has been called Buddhist Hybrid Sanskrit. Among these works is the Mahavastu, the core of which is thought to date from the 2nd century BC.
This language is a Middle Indo-Aryan dialect of indeterminate origin, which steadily became more Sanskritized in prose sections of later works. The most advanced stage of Middle Indo-Aryan, Apabhramsha, was also used as a literary language. That there was literary creation in Apabhramsha by the 6th century is clear from an inscription of King Dharasena II of Valabhi, in which the King praises his father as being adept in Sanskrit, Prakrit, and Apabhramsha composition. Moreover, in the fourth act of Kalidasa's drama Vikramorvashiya there are Apabhramsha verses. Because Kalidasa probably lived in the 3rd or 4th century, literary composition in Apabhramsha is earlier still, if these verses are legitimate.
There is a great deal of later literature in Apabhramsha, for the most part Jaina works; e.g., Paumacariu of Svayambhu (8th-9th century), Harivamsha-purana of Puspadanta (10th century), Sanatkumara-cariu of Haribhadra (12th century). Middle Indo-Aryan is characterized generally by the reduction of the complexities seen in Old Indo-Aryan. The vowel system was reduced by the merger of r (and l) sounds with vowels and the change of the diphthongs ai and au to the vowel sounds e and o; e.g., Pali accha- "bear" (Sanskrit rksa-), ina- "debt" (Sanskrit rna-), uju- "straight" (Sanskrit rju-), pucchati "asks" (Sanskrit prcchati), metti- "friendship" (Sanskrit maitri-), orasa- "breast-born, legitimate" (Sanskrit aurasa-). Moreover, -aya- and -ava- commonly contracted to -e- and -o-; e.g., Pali jeti "conquers" (Sanskrit jayati), odhi- "limit" (Sanskrit avadhi-). Final consonants were deleted, with the exception of -m, which developed to an -m sound before which a vowel was shortened (Pali bhariyam "wife"; Sanskrit bharyam).
Together with the trend toward replacing variable consonant stems by unchanging stems in -a-, this change had serious consequences for the grammar. Consonant stems steadily disappeared and were transformed to stems ending in a vowel; e.g., to Sanskrit sharad- "autumn," sarit- "stream," and sarpis- "butter" correspond the Pali forms sarada-, sarita, and sappi-. Consonant clusters were also modified in Middle Indo-Aryan; e.g., Pali khetta- "field" (corresponding to Sanskrit ksetra-), Pali dakkhina- "right, south" (Sanskrit daksina), aggi- "fire" (Sanskrit agni-), punna- "full" (Sanskrit purna), and tanha- "thrist" (Sanskrit trsna-). The shortening of vowels before modified consonant clusters led to the use of short e and o sounds, which were unknown in Old Indo-Aryan; e.g., Pali semha- "phlegm" (Sanskrit shlesman), ottha- "lip" (Sanskrit ostha-). The above phenomena are not restricted to Pali; they are pan-Middle Indo-Aryan.
Differences between Pali and Ashokan and other Prakrits include the retention of voiceless stops (i.e., p, t, k) between vowels in Pali and Ashokan dialects; other Middle Indo-Aryan dialects modify them. The extreme development appears in literary Maharastri, in which unaspirated stops (pronounced without an accompanying audible release, or pull of breath) other than retroflexes (t, d) and labials (p, b) were deleted, aspirated stops (pronounced with an audible puff of breath) were replaced by h, retroflexes (pronounced by curling the tongue upward toward the hard palate) became voiced, and labials were replaced by v; e.g., loa- "world" (Sanskrit loka-), loana- "eye" (Sanskrit locana-), saha- "branch" (Sanskrit shakha-), padhai "recites, reads" (Sanskrit pathati), and savaha- "curse" (Sanskrit shapatha-).

Essentially on the same level are the dialects of Jaina texts, but in these a y glide prescribed by grammarians occurs when a consonant is elided: vayana- "face" (Sanskrit vadana-); sayala- "whole" (Sanskrit sakala-). In Shauraseni, on the other hand, voiceless stops (e.g., p, t, k) between vowels are voiced (e.g., become b, d, g, respectively); e.g., ido "hence" (Sanskrit itah); tadha "thus" (Sanskrit tatha). Though Pali and Ashokan are at an earlier level of development with respect to these changes, they share with the rest of the Middle Indo-Aryan dialects the replacement of voiced aspirated sounds between vowels by h: lahu- "light, unimportant" from laghu-; dahati "gives" (Sanskrit dadhati).
Similarly, they share the change of dy- to j: joti- "light, brilliance" (Pali jotati "shines," Sanskrit dyotate). Pali and Ashokan, however, retain a y sound, changed to j in most other Prakrits; e.g., the pronoun ya- (feminine ya-), as in Sanskrit, opposed to ja-. The deletion of stop consonants noted above resulted in vowel sequences within words that were unknown to Old Indo-Aryan. Similarly, the extent of sandhi modification was restricted in Middle Indo-Aryan.
The Middle Indo-Aryan vowels i and u do not change to y and v before dissimilar vowels in compounds; e.g., Maharastri rattiandhaa- "dark of night" (Sanskrit ratry-andhaka-). In addition, the first of two contiguous vowels in different words is subject to deletion; e.g., Pali manas'icchasi (from manasa icchasi) "you wish in your mind." In its grammatical system, Middle Indo-Aryan also reduced complexities.
The dual number no longer exists as a separate category; for Sanskrit dvabhyam "by two," Middle Indo-Aryan has dohi(m), with the ending -hi(m) equivalent to the instrumental plural -bhis of Old Indo-Aryan. Among other changes is the replacement of the dative case by the genitive except in particular usages; e.g., the use of forms corresponding to the Old Indo-Aryan dative to denote a purpose. In Middle Indo-Aryan, nominal and pronominal forms are no longer strictly segregated; e.g., Ashokan vijitamhi "in the kingdom" (also vijite) has a pronominal ending equivalent to Sanskrit -smin. In the verb system, the contrast between active (-ti) and mediopassive (-te) endings was obliterated.
Further, the Old Indo-Aryan distinction between aorist, imperfect, and perfect forms was eliminated. With few exceptions, the sigmatic aorist (an aorist form with s) provides the only productive preterite of early Middle Indo-Aryan: Ashokan ni-kkhamisu "they set out" (Sanskrit nir-a-kramisur). In later Prakrits verbally inflected preterites were generally eliminated; in their place was used the past participle.
For example, in Shauraseni devi uva-visa, maharao vi a-ado "Sit down, my queen, the king also has arrived," the past participle a-ado (Sanskrit a-gatah) agrees with maha-rao "king" (Sanskrit maha-rajah) in number and gender. If the verb is transitive, the participle agrees with the direct object, and the agent is denoted by an instrumental form: in Jaina Maharastri, tena vi savvam sittham "He has told everything," tena "by him" denotes the agent, and sittham "told" (Sanskrit shistam) agrees with the neuter singular form savvam (Sanskrit sarvam). When no object is denoted, the verb is in the neuter singular. Old Indo-Aryan used both the participial construction and the finite verb; thus to Prakrit so vi tena samam gao "He also went with him" could correspond Sanskrit so'pi tena saha gatah or so'pi tena sahagamat (saha agamat). The Middle Indo-Aryan development eliminated the latter.

Alternations of the Sanskrit type as-mi, s-mas were eliminated in Middle Indo-Aryan; the predominant type of present tense was formed from an unchanging vowel stem (Pali e-ti, e-nti "go[es]"). Nominal forms of the verb system are of the same types as Old Indo-Aryan; e.g., the Pali future passive participle katabba- (Sanskrit kartavya-) "to be done," Shauraseni karania; Ardhamagadhi, Jaina Maharastri, and Maharastri karanijja- "to be done." The infinitive is commonly formed on the present tense stem, not on the root, as in Old Indo-Aryan. Thus Pali pappotum is formed on the present pappoti; Sanskrit praptum is formed on the root prap, present tense prapnoti.
Middle Indo-Aryan shows evidence of dialectal differentiation. The earliest documents that allow one to determine roughly the dialect distribution are Ashoka's inscriptions. These represent three major dialect areas: east, as in the inscriptions of Jaugada, Dhauli, and Kalsi; west, in Girnar; and northwest, in Mansehra and Shahbazgarhi. Characteristic of the east dialect area is final -e, corresponding to -o in the west and -as in Sanskrit; in the east dialect area l also regularly corresponds to r of the west and of Sanskrit.
Moreover, in the east dialect area there is a tendency to insert a vowel within consonant clusters, while in the west and northwest one of the consonants is assimilated to the other without an intervening vowel. For example, to Sanskrit rajñas "of the king" corresponds Girnar rañño, Shahbazgarhi raño, Jaugada lajine. Northwest stands apart in retaining three spirant sounds, sh, s, s, which merge to s elsewhere. Ashoka's eastern dialect, from the Magadha country, shows an s sound for Old Indo-Aryan sh, s, s, rather than the sh sound typical of literary Magadhi. Grammatical features also show dialectal variation; e.g., the Ashokan dative singular form is -aya in the western dialects (Girnar atthaya "for the purpose of") but -aye in the east (Kalsi, Dhauli atthaye).

As noted above, the most advanced development of Middle Indo-Aryan is seen in Apabhramsha. Sound changes that are typical of Apabhramsha include the replacement of the vowel sound a by u in final syllables; e.g., karahu "you do, make," corresponding to karaha (karadha) in other Prakrits. From stems in -aya- develop forms in -au and nasalized -au (nasalization is here indicated by a tilde): bhadarau "honored one, king" (Prakrit bhattarayo), hau "I" (Ashokan hakam). Nasalization also appears in environments in which earlier m occurred between vowels; e.g., gau "village" (from gama, Sanskrit grama). Numerous other sound changes are evident, among them the development of -s(s)- between vowels into h: taho "of him" (from Prakrit tassa, Sanskrit tasya); hohinti "will be" (compare Pali hossati). Apabhramsha contractions, such as -aya- changing to -a and -iya to -i, foreshadow New Indo-Aryan, in which the development was extended; e.g., Apabhramsha paniu "water" (Old Indo-Aryan paniyam), Gujarati pani, Hindi pani. In other points Apabhramsha also presaged New Indo-Aryan.
The interest of Apabhramsha lies in the fact that contracted forms presage the New Indo-Aryan opposition of masculine, neuter, and feminine nouns; thus, Apabhramsha -au, -au, -i, Gujarati -o, -u, -i (gayo, gayu, gai "went"), Hindi -a, -i (gaya, gai). The case system of Apabhramsha is also at a more advanced level of disintegration than that of earlier Middle Indo-Aryan, with the instrumental and locative plurals being identical in form (-ahi or -ehi for -a-stems) and instrumental singular forms also being used as locatives. In the Apabhramsha verb system, present tense stems in -a predominate. Apabhramsha verb endings differ from those of other Prakrits. Most interesting is the 3rd person plural type kara-hi "they do," which coexists with karanti. The form kara-hi, corresponding to the 3rd person singular kara-i "he does," is formed on the model of the pair kara-u (1st person singular, "I do") and kara-hu (1st person plural, "we do").
Here again Apabhramsha comes close to New Indo-Aryan. Moreover, Apabhramsha has some causative formations that do not occur elsewhere in Middle Indo-Aryan but are known from New Indo-Aryan--bham-ada-i "causes to turn," Gujarati bhamare che "causes to turn round," and pais-ara-i "causes to enter," Gujarati p sare che "causes to enter, to penetrate." Also noteworthy are two syntactic usages that closely parallel those present in New Indo-Aryan. The present participle is used as a conditional; e.g., jai hau mi tena sahu tau karantu to kim asamahie sahu marantu "Even if I had performed (karantu) ascetic acts with him, would I have died without mental concentration?" in which the participles karantu and marantu have the value of conditionals. In Sanskrit the conditionals a-kar-isya-m and a-mar-isya-m are used; but in speaking Gujarati a person would say jo hu . . . karat . . . to marat, and Hindi would have the forms karta . . . marta.
The Apabhramsha gerundive in -iv(v)a or -ev(v)a can be used as an infinitive; e.g., pi-evae lagga "began to drink." This is the Gujarati construction pi-va lagyo "began to drink," in which pi-va is an inflected form of pi-vu, that is, a verbal noun (infinitive) corresponding etymologically to the Apabhramsha gerundive. Influences on Middle Indo-Aryan. In the mid-2nd century BC, the grammarian Patañjali explained that to speak faultlessly the language now called Sanskrit (as described by Panini) one should imitate the correct speakers (called shista "learned, educated") of Aryavarta ("Country of the Aryans").

Earlier, the grammarian Katyayana (c. 3rd-4th century BC) had noted that Panini gave lists of verb roots in order that certain Middle Indo-Aryan forms not be accepted as having been correctly derived from a Sanskrit verb root. Moreover, Patañjali noted that one should study grammar in order to learn not to use incorrect words such as helayah instead of herayah (a phrase used in calling to people) or gavi instead of gauh "cow"; gavi is a Middle Indo-Aryan word. The observations of these grammarians are considered to lend support to the view that by the 6th or 5th century BC Sanskrit as a medium of learned conversation coexisted with Middle Indo-Aryan.
Further, the Pali canon records that the Buddha enjoined his followers to use the vernaculars in communicating his teachings, and the Jaina canon identifies Ardhamagadhi as the language to be employed for communicating the teachings of Mahavira. Similarly, Ashoka used Middle Indo-Aryan, not Sanskrit, in the inscriptions he ordered written throughout his kingdom; Sanskrit does not appear on inscriptions until the early centuries AD (e.g., Rudravarman's inscription at Junagarh, c. AD 150). The coexistence of Old Indo-Aryan and Middle Indo-Aryan is to be accepted even for the time when the earliest Old Indo-Aryan texts were put to writing. Middle Indo-Aryan shows similar evidence of the influence of linguistically more advanced vernaculars on literary compositions. The Prakrits of elegant literary compositions must have been artificial, different in many respects from the vernaculars current at the time, though reflecting languages that were current at some former time.
The Old Indo-Aryan and Middle Indo-Aryan stages, then, present a picture of concurrent vernaculars with dialects and literary languages influenced by the vernaculars; it is impossible to compartmentalize the different stages as beginning and ending at any definite date. The literary languages borrowed words and suffixes from earlier languages. There are Prakritisms (i.e., forms of earlier Prakrits) in Apabhramsha; e.g., the genitive singular ending -ssa instead of -ho and 2nd person plural verb forms terminating in -ha instead of -hu. All the literary Prakrits had recouurse to Sanskrit as a source for borrowing words. Words that were incorporated into the Prakrits from Sanskrit with no change in form are called samskrta-sama "identical with Sanskrit" (or tat-sama "identical with that") and are contrasted with words termed samskrta-bhava (tad-bhava) "whose origin is in Sanskrit"--that is, words that the grammarians can derive from Sanskrit by using certain rules. Another class of words, called deshya (or deshi) "belonging to the area, country," includes items that the grammarians cannot derive easily from Sanskrit and that are supposed to have been in use in particular areas from early times.
Many or most of the deshya words are indeed derivable from Sanskrit, but some are of Dravidian origin; e.g., akka "sister" (Telugu akka), atta "father's sister" (Telugu atta), appa "father" (Telugu appa), ura "village" (Telugu uru), pulli "tiger" (Telugu puli). Borrowing from Dravidian occurred also at earlier times; the Dravidians originally occupied territory much farther north than they did in Middle Indo-Aryan times. The Rgveda has such words as kunda "pitcher, pot," which is doubtless of Dravidian origin (Tamil kutam "pot").
Such borrowings become more numerous in later Sanskrit. It is not always certain that borrowing proceeded from Dravidian to Indo-Aryan, however, because Dravidian languages freely borrowed from Indo-Aryan. Thus, some scholars claim that Sanskrit katu "sharp, pungent" is from Dravidian, but others claim that it is a Middle Indo-Aryan form deriving from an earlier *krt-u "cutting" (root krt). (An asterisk [*] preceding a form indicates that it is not attested but has been reconstructed as a hypothetical form.) Whatever the judgment on any individual word, it is clear that Indo-Aryan did borrow from Dravidian, and this phenomenon is important in considering a group of sounds that sets Indo-Aryan apart from the rest of Indo-European--the retroflexes.
Without doubt the influence of Dravidian is to be considered as contributing to the extension of these sounds beyond their limited occurrence in inherited Indo-European items such as nida "nest" (from *ni-sd-o), is-ta "desired" (from *is-to), and stir-na "spread out" (from *str-no). The Munda languages (or, more generally, the Austro-Asiatic languages) are also a source of some borrowing into Indo-Aryan; e.g., Sanskrit jambala "mud" (Santali jobo). In the 8th century AD, the philosopher Kumarila mentioned not only Dravidian but also Persian and Greek as sources of foreign words.
Such borrowing can be traced back to early times. In the 6th century BC Darius counted Gandhara as a province of his kingdom, and Alexander the Great penetrated into northern India in the 4th century BC.

From Iranian come words such as that meaning "inscription, writing, script"; in the northwest inscriptions of Ashoka the word is dipi (Old Persian dipi) and Sanskrit has lipi, the form in other Ashokan versions and in Pali. Also from Persian is Sanskrit ksatrapa "satrap"--Old Persian xsassa-pavan-. Of Greek origin are such mathematical and astronomical terms as Sanskrit kendra "centre" (Greek kéntron), jamitra "diameter" (diámetron), and hora "hour" (hora). Yavana "foreigner," originally the Greek word for Ionian, is known from as early as the time of Panini. Later, Arabic words such as tashli "trigon" came into Sanskrit. The modern Indo-Aryan stage.
The division of the Indian subcontinent into linguistic states and even into countries (Pakistan, Bangladesh, and India) is a recent phenomenon (see Table 4). Even after independence from Britain was achieved and partition had taken place, Bombay state existed until it was split into Gujarat and Maharashtra states in 1960. The division of Punjab into Punjab and Haryana states in 1966 occurred as a result of Punjabi agitation for a separate linguistic state. Before independence, under British rule (entrenched from the 18th century), there were princely states within dialect areas; under Mughal rule (16th-18th centuries), Persian was the language which was used by the court and by courts of justice and this practice continued in the latter function for a time under the British.

Though Hindi-Urdu may have been a lingua franca, however, the great dialectal diversity of earlier times continued. Some of the modern Indo-Aryan languages have literary traditions reaching back centuries, with enough textual continuity to distinguish Old, Middle, and Modern Bengali, Gujarati, and so on. Bengali can trace its literature back to Old Bengali carya-padas, late Buddhist verses thought to date from the 10th century; Gujarati literature dates from the 12th century (Shalibhadra's Bharateshvara-bahubali-rasa) and to a period when the area of western Rajasthan and Gujarat are believed to have had a literary language in common, called Old Western Rajasthani. Jñaneshvara's commentary on the Bhagavadgita in Old Marathi dates from the 13th century and early Maithili from the 14th century (Jyotishvara's Varna-ratnakara), while Assamese literary work dates from the 14th and 15th centuries (Madhava Kandali's translation of the Ramayana, Shankaradeva's Vaisnavite works).
Also of the 14th century are the Kashmiri poems of Lalla (Lallavakyani), and Nepali works have also been assigned to this epoch. The work of Jagannath Das in Old Oriya dates from the 15th century. Amir Khosrow used the term hindvi in the 13th century, and he composed couplets that contained Hindi. In early times, however, other dialects were predominant in the midlands (Madhyadesha) as literary media, especially Braj Bhasa (e.g., Surdas' Sursagar, 16th century) and Awadhi (Ramcaritmanas of Tulsidas, 16th century). In the south, in Golconda (Andhra, near Hyderabad), Urdu poetry was seriously cultivated in the 17th century, and Urdu poets later came north to Delhi and Lucknow. Punjabi was used in Sikh works as early as the 16th century, and Sindhi was used in Sufi (Islamic) poetry of the 17th-19th centuries. In addition, there is evidence in late Middle Indo-Aryan works for the use of early New Indo-Aryan; e.g., provincial words and verses are cited.
The creation of linguistic states has reinforced the use of certain standard dialects for communication within a state in official transactions, teaching, and on the radio. In addition, attempts are being made to evolve standardized technical vocabularies in these languages. Dialectal diversity has not ceased, however, resulting in much bilingualism; for example, a native speaker of Braj Bhasa uses Hindi for communicating in large cities such as Delhi. Moreover, the attempt to establish a single national language other than English continues. This search has its origin in national and Hindu movements of the 19th century down to the time of Mahatma Gandhi, who promoted the use of a simplified Hindi-Urdu, called Hindustani.

The constitution of India in 1947 stressed the use of Hindi, providing for it to be the official national language after a period of 15 years during which English would continue in use. When the time came, however, Hindi could not be declared the sole national language; English remains a co-official language. Though Hindi can claim to be the lingua franca of a large population in North India, other languages such as Bengali have long and great literary traditions--including the work of Nobel Prize winner Rabindranath Tagore--and equal status as intellectual languages, so that resistance to the imposition of Hindi exists.
This resistance is even stronger in Dravidian-speaking southern India. The use of English as an official language entails problems, however, because with the use of state languages for education, the level of English competence is declining. Another danger faced is the agitation for more separate linguistic states, threatening India with linguistic fragmentation hearkening back to earlier days.

Characteristics of the modern Indo-Aryan languages.

The trends noted in Middle Indo-Aryan continue in New Indo-Aryan. The Middle Indo-Aryan vowel sequences ai and au were changed to single vowels during the development of New Indo-Aryan, final vowels were shortened and deleted, and d and dh sounds between vowels were replaced by the sounds r and rh.
The noun cases were further reduced, and the introduction of nominal (noun) forms into the verb system became more pronounced. Literary languages tend to become somewhat removed from the usual standard colloquial. Literary, or High, Hindi, for example, tends to replace some of the Perso-Arabic vocabulary with Sanskritic items, whereas literary Urdu makes great use of Perso-Arabic words. The gap is formalized in Bengali, in which a distinction is made between the highly Sanskritic language Sadhu-Bhasa and the colloquial standard called Calit-Bhasa. Phonology. [Note: The forms of the words given below reflect actual pronunciation, rather than being transliterated versions of the standard orthographies. For New Indo-Aryan the symbols , pronounced as the a in English "sofa," and a are used for the sounds earlier transcribed as a and a, respectively; e.g., Gujarati karu "I do" and maro "beat" are now written kru and maro.
This practice permits certain contrasts to be made among sounds that are significant in the description of dialectal features. In Kashmiri words, a is short, opposed to a.] Vowels in sequence contracted in early New Indo-Aryan; e.g., Old Indo-Aryan ashiti became Middle Indo-Aryan asii, Hindi and Punjabi ssi, and Bengali asi "80." Further, ai and au sounds changed to e and o, and au to u, while iu developed into i. The diphthongs ai and au were retained well into the New Indo-Aryan period and are still pronounced in some areas; e.g., Braj Bhasa kr u "I do," kr i "he does." Middle Indo-Aryan -d- and -dh developed into the flaps r and rh; e.g., Prakrit sadia "woman's garment," Kashmiri, Lahnda, Hindi, Gujarati, Bhojpuri, Bengali, Oriya sari "sari"; and Prakrit padh- "recite, read," Sindhi p rh-nu, Lahnda prh- n, Hindi, Punjabi prh-na, Gujarati prh-vu, Marathi prh-n "study." Stress is not generally contrastive in New Indo-Aryan as it is, for example, in English (e.g., noun "éxport," verb "expórt"), though different areas have different rules for placing major emphasis on a given syllable.

For example, in Hindi, in which vowel length is pertinent, gilá "swallowed" has major stress on the last syllable, gila "wet," on the first. In Gujarati, on the other hand, vowel length is not pertinent; the stress position depends on which vowels occur in contiguous syllables and on the structure of the syllables, whether open or closed; e.g., júno "old," but dukán "store." In Bengali each syllable of a word receives about equal stress. The sounds that most clearly distinguish Indo-Aryan from the rest of Indo-European are the voiced aspirate stops (gh and the like, pronounced with an accompanying audible puff of breath) and the retroflexes (t and so on, pronounced by curling the tongue upward toward the hard palate).
In the outlying New Indo-Aryan areas, however, the sound system is reduced. Sinhalese has no aspirated stops, Assamese has no retroflexes, and Kashmiri has no voiced aspirates. The geographic position of these languages doubtless contributed to these losses: Sinhalese coexists with Tamil, Assamese is surrounded by Tibeto-Burman languages, and Kashmiri is on the border of the Iranian area. New Indo-Aryan shows evidence of early dialect distribution; this is discernible by considering sound changes proper to each group. The eastern group (Assamese, Bengali, Oriya) has three important changes. Long and short i and u merged; e.g., Assamese nila, Oriya nil{half-open back vowel} ({half-open back vowel} is similar to the o of "coffee" in some English dialects), Bengali nil "blue-black" but Sanskrit nila; Assamese dhuli, Bengali dhulo, Oriya dhuli "dust" but Hindi dhul and Sanskrit dhuli.
The vowel sound a of Middle Indo-Aryan was replaced by {half-open back vowel} in Bengali and Oriya and {back open vowel} (similar to the o of "hot" in southern British English) in Assamese in initial position and open syllables; e.g., Bengali m{half-open back vowel}ron, Oriya m{half-open back vowel}r{half-open back vowel}n, Assamese m{back open vowel}r{back open vowel}n "death"; Sindhi, m rno "mortal, death," Sinhalese mr n, Gujarati, Marathi mr n (compare Sanskrit marana-).
Moreover, in this group a vowel is affected by the quality of the vowel in a following syllable. For example, in Bengali ami kori "I do," the verb root has o followed by i in the next syllable, but tumi k{half-open back vowel}ro "you do" has an {half-open back vowel} sound; similarly, ami kini "I buy" but tumi keno. As a result of vowel assimilation also, Assamese has an {half-open back vowel} sound instead of {back open vowel} representing Middle Indo-Aryan a: Assamese x{half-open back vowel}hur, Bengali sosur "husband's father" (compare Hindi s sur, Prakrit sasura-, Sanskrit shvashura-). Assamese and Bengali are set off from Oriya. In the former two, Middle Indo-Aryan d and dh merge medially to d (then r) with a subsequent development to r in Assamese; e.g., Oriya darhi, Bengali dari, Assamese dari "beard"; Hindi, Gujarati darhi, Prakrit dadhia. Assamese is also distinguished from Bengali by several developments, among them the merger of Assamese retroflex sounds with dental sounds; e.g., Assamese ut "camel" but Bengali ut, Oriya ot{half-open back vowel}, Sindhi uthu, Lahnda, Pahari utth, and so on. Assamese also has s for earlier c and ch sounds and a z sound for j and jh; e.g., Assamese kas "glass," Bengali kac; Assamese azi "today," Oriya aji, Bengali, Hindi aj.
In addition, Assamese replaced an s sound initially by x and between vowels by h--x{half-open back vowel}hur. Particular sound changes also characterize languages of the northwest. In this group, an older voiceless stop (e.g., t) became voiced (e.g., became d) after a nasal sound; in other areas, the voiceless stop is retained: Kashmiri dand, Punjabi dnd, Sindhi dndu "tooth" (the d in Sindhi is an imploded stop; see below) but Assamese, Bengali, Hindi, Gujarati, Marathi dãt, Sinhalese dt (Sanskrit danta-). Moreover, in the northwest group a voiced stop (e.g., d) preceded by a nasal was assimilated to the latter, resulting in two nasals, which were subsequently reduced to one in some areas; in the rest of New Indo-Aryan, the vowel preceding the nasal was nasalized.
Thus, Kashmiri don "churning stick," Sindhi d nu "tribute," Punjabi dnn "fine," Lahnda dnn "force," Kumauni dan "roof" contrast with Assamese dãr "pole," Bengali dãr "oar," Hindi dãd "oppression, fine," and others; all forms derive from Old Indo-Aryan danda- "stick, staff, club, royal power, fine, punishment." In the sequence of a short vowel followed by two consonants, Pahari differs from the rest of the northwest group and agrees with the rest of New Indo-Aryan.
In the northwest this sequence either remained unchanged or the cluster was simplified without lengthening of the vowel; other languages generally simplified the cluster and lengthened the vowel: Punjabi bh tt, Sindhi bhtu, Lahnda bht, Kashmiri bat{central close vowel} "cooked rice, food" but Nepali, Kumauni, Hindi, Assamese, Bengali, Gujarati, Marathi bhat. Dardic occupies a special position. The sibilant sounds did not all merge here. For example, Kashmiri, a Dardic tongue, has surah "16" with s rather than s, as in most other Indo-Aryan languages, and sat "7" with s. Further, voiced aspirated stops merged with unaspirated stops in Dardic; e.g., Kashmiri gur "horse" but Hindi ghora; Kashmiri d{half-open back vowel}d "milk" but Hindi dudh. One major feature distinguishing Sindhi from the rest of the northwest group is the development of a series of imploded stops (also called suction stops and recursive stops), for b, d, j, and g. Implosive stops also occur in the Sindhi vicinity; for example, Kacchi has imploded b. Another feature that distinguishes Sindhi from other northwest languages, including Kacchi, is the retention of the
Middle Indo-Aryan final short vowels; e.g., Sindhi khi "eye" but Hindi ãkh (Middle Indo-Aryan akkhi-). Punjabi is distinguished from other members of the northwest group by its tonal system, having low (`), mid ({macron}), and high () tones. Initial voiced aspirated stops of earlier Indo-Aryan appear in Punjabi as voiceless stops with low tone on the following vowel; e.g., Punjabi kòra but Hindi ghora; Punjabi tài "2 1/2" but Hindi dhai. Non-initially, a voiced aspirate became unaspirated and the preceding vowel received high tone; thus, Punjabi dd "milk" but Hindi dudh, and Punjabi láb "profit" but Hindi labh. Gujarati, Marathi, and Konkani in the west and southwest differ from the languages of the midlands in that, as in the east, there is no contrast between long and short i and u vowels.
The i of Gujarati and Marathi vis "20" is pronounced like the ee of English "teeth," the i of Gujarati iccha and Marathi iccha "wish" like the i of "pitch," but such a difference is not contrastive, as it is in Hindi (gila "wet": gila "swallowed"). Gujarati has certain features that, in turn, set it apart from the other languages of this group. In addition to e and o sounds, it has the open vowels , {half-open back vowel}; e.g., c{half-open back vowel}thu "fourth" (Middle Indo-Aryan cauttha), bs-vu "to sit" (Middle Indo-Aryan baisai "sits"). Moreover, Gujarati has murmured vowels, generally developed from vowels followed by h; e.g., k h che "says" (h represents murmuring of the vowel), Old Gujarati kahai chai. Marathi and Konkani have two series of affricate sounds; e.g., c (pronounced as the ch in English "chat"; the equivalent of c in some other languages) and c (pronounced as the ts of "rats").
There was clearly mutual influence of Indo-Aryan languages at an early time, together with movement of groups of speakers (compare the position of Pahari). Thus, while Punjabi s cc "true" is the expected form comparable to Middle Indo-Aryan sacca- (Old Indo-Aryan satya-), Hindi s c "true" does not represent the expected outcome. The item sc must come from the Punjabi area.

Modern Iranian.

The discontinuity already observed between Old and Middle Iranian is even more striking between Middle and Modern Iranian.
There are no modern counterparts to Khwarezmian, Bactrian, and Saka, and there is no direct continuity in the case of any of the other Middle Iranian languages. Even Modern Persian does not represent a straightforward continuation of Middle Persian but is rather a koine (a dialect or language of a small area that becomes a common or standard language of a larger area), based mainly on Middle Persian and Parthian but including elements from other languages and dialects. Although Sogdian is known in several forms, possibly representing different dialects, none of these can be considered the direct ancestor of modern Yaghnabi, spoken at present in the valley of the Yaghnob River, a tributary of the Zeravshan. Yaghnabi, nevertheless, certainly belongs linguistically to the Sogdian family. Similarly, the languages of the Scytho-Sarmatian inscriptions may represent dialects of a language family of which Modern Ossetic is a continuation, but it does not simply represent the same language at an earlier date.
Only four of the many modern Iranian languages are the official languages of the state in which they are spoken. The chief of these is Persian (known in Persian as Farsi), the national language of Iran, which is spoken by about 27,000,000 people as a native language.
A dialect of Persian known as Dari is recognized, moreover, as a second language in Afghanistan. The national language of Afghanistan is the East Iranian language known as Pashto, of which there are some 9,000,000 speakers, many living in Pakistan. Tajik is spoken by at least 7,000,000 people widely spread throughout Tajikistan and the rest of Central Asia and is readily intelligible to speakers of Persian, to which it is very closely related, although it is in some respects more archaic. In addition to being the national language of Tajikistan, Tajik is important as the lingua franca of the Pamirs mountain range, a region where a remarkable variety of Iranian languages and dialects are spoken. Some 700,000 people speak Ossetic. Most of the Ossetes live in North Ossetia in Russia and South Ossetia in Georgia. Although spoken in the heart of the Caucasus Mountains, Ossetic is an East Iranian language not mutually intelligible with any other Iranian language.

Two other Iranian languages, Kurdish and Balochi (Baluchi), are spoken over a vast area, although they have not been officially accepted as the national language of an established state. Kurdish is spoken by more than 20,000,000 people living in Iran, Iraq, Turkey, Syria, and Transcaucasia (Kurdistan). More than 5,000,000 people speak Balochi as their chief language; they are spread widely over parts of eastern Iran, Pakistan, Afghanistan, and Central Asia. In Iran, Balochi speakers live mainly in Baluchistan, a region in the southeast that now forms part of a province with Sistan.
In Pakistan, Balochi speakers live mainly in the southwestern province of Balochistan; in Central Asia, they are found mainly around Mary (Merv) in southern Turkmenistan; and in Afghanistan, they are widely scattered, mainly over the southwestern portion of the country. There is a sizable Balochi colony in Oman, and many Balochi merchants have settled in the sheikhdoms of southern Arabia and along the east coast of Africa as far south as Kenya.
Linguistically, Balochi and Kurdish are both West Iranian languages. Balochi is thus much more closely related to Kurdish than it is to its close neighbour Pashto. According to the most likely theory, the present eastern location of Balochi speakers is the result of migrations from the region of the Caspian Sea during the Middle Ages. Dialects.

In the cities of Yazd and Kerman the Parsis speak the old Gabri dialect, whereas the Muslims speak Persian. Among other central dialects are Natanzi, Soi, Khunsari, Gazi (near Esfahan), Sivandi (northeast of Shiraz), Vafsi, and Ashtiyani, to name but a few. Semnani, spoken east of Tehran, forms a transitional stage between the central dialects and the Caspian dialects. The latter are divided into two groups, Gilaki and Mazandarani (Tabari).
Also closely related is Talishi, spoken on the west coast of the Caspian Sea on both sides of the border with Azerbaijan. To this northwestern group belong the so-called southern Tati dialects spoken south and southwest of Qazvin, as well as the scarcely known dialects of Harzan and Galinqaya spoken northwest of Tabriz. The name Tati is usually applied to the dialects spoken in Russian Dagestan and northeastern Azerbaijan.
They differ little from Modern Persian. Of the several dialects of Fars province, only Lari, southeast of Shiraz, is notably distinctive. Kumzari in Oman and the Lur dialects of the southwest also differ little from Persian. There are many dialects of Kurdish, the widely spoken West Iranian language that is thought to occupy a dialectal position intermediate between Balochi and Persian. Three main dialect groups can be distinguished--northern, central, and southern.
A systematic study has been made of the dialects of Iraq, which include 'Aqrah (Akre), 'Amadiyah, Dahuk, Shaykhan, and Zakhu in the northern group, and Irbil (Arbil), Bingird, Pishdar (Pizhdar), Sulaymaniyah (Suleimaniye), and Warmawah in the central group.
The Central Mukri dialect is spoken in the extreme west of Iran, south of Lake Urmia. Gorani is spoken in several dialects, mainly in the Zagros Mountains, and it is strongly influenced by the surrounding Kurdish dialects.
The Gorani dialect of Hawraman, Hawrami, is notable for its many archaic features.
Closely related to Gorani is Zaza (Dimli), which is spoken west of Iran. Characteristics of the Iranian languages. All Iranian languages show in their basic elements the characteristic features of an Indo-European language. Apart from the extensive borrowing of Arabic words in Modern Persian, the Iranian languages have scarcely been affected by unrelated languages, with the notable exception of Ossetic, which has been strongly influenced by the neighbouring Caucasian languages.
Some dialects of Tajik have been very receptive to Uzbek elements. In the case of languages in contact with Indian civilization, the most noticeable non-Iranian feature often taken over is the Indo-Aryan series of retroflex sounds. These are foreign to Indo-Aryan itself, being a result of the influence of the Dravidian languages. The elaborate phonological and morphological structure of the Indo-European parent language has been progressively simplified in the development of the Iranian languages.
The basic phonological structure of Common Old Iranian has on the whole been maintained, but the morphological system has continued to be simplified. There has been a constant move in almost all Iranian languages toward an analytic structure; i.e., the use of prepositions and word order rather than case endings to indicate grammatical relationships. Grammar. In Old Persian the Indo-European inflectional system appears considerably simplified. In particular, the genitive and the dative coalesced into one case and the instrumental and ablative into another.
Moreover, in the plural the nominative and accusative cases are not distinguished. This reduced system is still found in the Middle Iranian period in Old Khotanese and to a certain extent in Sogdian. Eastern Iranian is in this respect more conservative than western. By the Middle Iranian period, western Iranian had abandoned nominal (noun, adjective, pronoun) inflection altogether, as is the case with Middle and Modern Persian and with Parthian. In some languages, both western and eastern, two or, rarely, three cases survive. Ossetic is quite exceptional in maintaining an elaborate case system; it is partly a result of secondary, purely Ossetic developments.

The elaborate conjugational system of the Indo-European verb followed a similar path to disintegration. In particular, the whole past tense system was given up by the Middle Iranian period. Only a few relics remain of the Indo-European system, such as the partial survival of the augment (a prefixed vowel or lengthening of the initial vowel) in the Sogdian imperfect tense. But a new past tense system developed, based on the old past participle, often combined with auxiliary verbs. Many languages distinguish between transitive and intransitive verbs in the past tense system; and in some, such as Khotanese and Pashto, even gender and number are distinguished. The present tense system was far better preserved. The dual number was in retreat in Old Iranian and is not attested later.
The middle voice, a form that indicates that a person or thing both performs and is affected by the action represented, was generally abandoned by the Middle Iranian period, although middle voice inflection is well represented in Khotanese. With these qualifications, the endings of the present indicative (active) have been generally well preserved. A variety of imperative, subjunctive, and optative forms, partly based on inherited forms and partly the result of innovation, is found especially in the eastern languages, including Ossetic. Rigidity of word order is, on the whole, most characteristic of those languages, such as Persian, that have gone furthest in the reduction of the inherited morphological system. Vocabulary. The Islamic conquest of Iran during the 7th century entailed not only a change of religion but also a change of language. The sacred language of Islam was Arabic, and the proportion of Arabic words used in Persian rapidly increased until it reached something like the 40 to 50 percent of the present day. Before the introduction of the Arabic element, most loanwords were mainly from other Iranian languages.
Most familiar is the extensive borrowing from Median found in Old Persian. In later periods, Modern Persian borrowed words extensively from Turkish and from European languages. Persian is itself the donor language in the case of the other Iranian languages, all of which have drawn upon its vocabulary. Buddhism was similarly responsible for the large proportion of Indo-Aryan words, both Sanskrit and Prakrit, found in Sogdian and especially in Khotanese.
A considerable Indian element occurs in the vocabulary of those modern Iranian languages that have been or are in contact with modern Indo-Aryan languages in the northwest, such as Lahnda and Sindhi. There the Dardic languages have also been influential. Baluchi has also borrowed from Brahui, a Dravidian language spoken in Baluchistan in Pakistan. Ossetic occupies an exceptional position. Most of its Persian and Arabic borrowings have come to it through Turkish, but more striking are the large number of words borrowed from the Caucasian languages, especially Georgian. In modern times, Ossetic continues to be influenced by Russian.

Writing systems. Iranian languages have been written in many different scripts during their long history, although various forms of Aramaic script have been predominant. Modern Persian is written in Arabic script, which is of Aramaic origin. For writing the Persian sounds p, c, z, and g, four letters have been added by means of diacritical marks. By the addition of further letters, this Perso-Arabic script has been adapted to write not only the other main modern Iranian languages, Pashto, Kurdish, and Baluchi, but also those minor ones that are occasionally recorded. An advantage of the use of this consonantal script is that by not defining vowel qualities it is possible to include local dialect variations to a considerable extent.
Two modern Iranian languages spoken on Soviet territory are currently written in a modified version of the Russian alphabet: Tadzhik and Ossetic. Soviet scholars have, however, tended to use modified Latin alphabets to record the minor languages that have no literary tradition, such as some of the Pamir languages. Ossetic has also been written in the Georgian script. Old Persian was written with a cuneiform syllabary, the origin of which is still hotly disputed. Middle Persian, Parthian, Sogdian, and Old Khwarezmian were recorded in various forms of Aramaic script. Two forms of this script as they developed for writing Sogdian were adopted by the Uighurs.

In its cursive form this script spread even further, to the Mongols and Manchus. Three other scripts are important for the remaining Middle Iranian languages: Greek script for Bactrian, Arabic script for Late Khwarezmian, and varieties of Central Asian Brahmi script of Indian origin for Khotanese and Tumshuq. The Aramaic script was not systematically adapted to the writing of Middle Iranian; and despite the introduction of a variety of diacritical marks to differentiate letters, considerable ambiguity remained. Moreover, several letters tended to coalesce in form. In this respect, the Pahlavi script, used for writing the Middle Persian of the Zoroastrian books, developed furthest. In it, the original 22 letters of the Aramaic alphabet have been reduced to 14, which are further confused by the use of numerous ligatures (linked letters). It was the realization that this script was inadequate to record precisely the traditional pronunciation of the sacred text of the Avesta that led the Zoroastrian priests to devise the elaborate Avestan script, which, with its 48 distinct letters formed by differentiation out of the 14 used for Pahlavi, was well suited to the task.

Iranian languages

Subgroup of the Indo-Iranian branch of the Indo-European language family. The subgroup's languages are spoken in Iran, Kurdistan, Afghanistan, Tajikistan, Pakistan, and scattered areas of the Caucasus Mountains. Old Iranian, which is closely related to Sanskrit, the ancient Indo-Aryan language, is known from the Avesta (the sacred book of Zoroastrianism) and from Old Persian cuneiform inscriptions of the Achaemenid kings. The Middle Iranian stage (3rd century BC to 8th-10th century AD) was characterized by a significant simplification of the verbal system and, in some areas, by reductions in noun inflection as well. Among the most important representatives of Middle Iranian are Parthian, Pahlavi, and Sogdian. Major modern Iranian languages are Persian, Kurdish, Pashto, an official language of Afghanistan; and Tajik, spoken in Tajikistan. All Iranian languages currently spoken show a simplification of the earlier sound systems and a preference for the use of auxiliary verbs in place of the complex verb conjugations of the earlier Iranian languages.