Parsing for Mando'a
Posted: 29 Dec 2017 11:19
So, as I've already mentioned in my introductory post, I'm working on a way to have an interlinear parsing done automatically for Mando'a (I'm using a freeware linguist software, but since I'm not a linguist and it's all in English, it's a bit of a guesswork at times - so feel free to point out anything that is amiss). I do hope this is in the right place, since I have to admit, some things are sorted/structured a bit confusing for me.
At the moment, I am mostly concerned with structuring the different parts of a word and finding the corresponding rules for these parts - and I still have no idea what I'm going to do with the beten, since it has different roles that aren't always that clearly defined. My final goal is to have a toolbox for easy construction of new words using existing ones in compounds or creating derived words after settling on a new root.
What I've come up with for now:
Affixes:
ke'- = prefix, inflectional, attaches to verbs, denotes imperative (imp)
forms: ke'- > used when the stem initial is a consonant sound (#_C)
forms: k'- > used when the stem initial is a vowel sound (#_V)
ven'- = prefix, inflectional, attaches to verbs, denotes future tense (fut)
ru'- = prefix, inflectional, attaches to verbs, denotes past tense (pst)
-r = suffix, inflectional, attaches to verbs, denotes the infinitive (inf)
-ir = suffix, derivational, attaches to nouns, changes to verb [in inf]
forms: -ir > used when the stem ends in a final consonant (C_#)
forms: -r > used when the stem ends in a final vowel (V_#)
!Issue: the derivational suffix forms the infinitive, which means it is already inflected; I've thought of leaving it as -i being the derivational suffix, however it would mean that noun = verb (ind) in all cases where a noun ends in a vowel. It isn't uncommon for a language to have noun = verb (German has some words like that, English has a lot of them ...). I have read that some of those pairs swap stress, so one might try to put that as the rule for now, even though I'd need to look at the actual Mando'a words to make sure that it is at last mostly followed.
-yc = suffix, derivational, attaches to nouns, changes to adjectives
!Issue: as far as I understand adjectives, -yc and -la are the default endings for the absolute form and -shya and -ne would be comparative and superlative respective; so, while an adjective is built noun+yc, the comparative and superlative would be noun+shya and noun+ne. So, either adjectives and nouns have the complete same stem (unlike verbs who have always a final vowel sound), or adjectives as such don't exist and -yc, -shya and -ne are actually inflections of nouns ... or am I overlooking something in the derivation of adjectives?
be'- = prefix, inflectional, attaches to nouns, denotes a possessive (poss)
forms: be'- > used when the stem initial is a consonant sound (#_C)
forms: b'- > used when the stem initial is a vowel sound (#_V)
as suffix (archaic, rare):
-b > used when the stem ends on a final vowel (V_#)
(truthfully, I'm conflicted of even including this one, and it is partly because it should be a rare form - and the implication it leaves when looking at dalab is quite uncomfortable to me; dalab could either mean "sheath" or "woman's" and while the context should make it obvious, that kind of double meaning is quite derogatory in my opinion; grammatically, that's neither here or there, though)
-e = suffix, inflectional, attaches to nouns, denotes a plural (pl)
forms: -e > used when the stem ends in a final consonant not a t or b
forms: -'e > used when the stem ends in a final vowel that is not i
forms: -se > used when the stem ends in a final i
forms: -ise (variant: -'se) > used when the stem ends in a final t or b
!Issue: I have to look more closely at how the different forms actually are. I already know that ad'ike is the plural of ad'ika, which either makes it irregular or I need to figure out how to incorporate a>e in the rule set. In that regard, I am still uncertain if -e and -se are part of the same pluralization (like in English -s and the form -es for certain words) or if it is a different kind of plural (I think I've read that idea on tumblr on how the -ii, -iise is actually a way to denote 'sth. disliked' or something similar). Both seem equally possible to me.
nu'- = prefix, inflectional, attaches to verbs/nouns/adjectives, denotes a negative (neg)
forms: ne'- > (I actually don't know if there is a set rule for using ne' or nu', so I'd probably leave it as a wildcard to chose between)
forms: n'- > used when the stem initial is a vowel sound (#_V)
!Issue: I've already found exceptions (nu'amyc -> which means nu can also be used before vowels)
for pronouns: I am not clear on how to structure these. I can see that the words for the 1st Person work like C-i > C-er for the possessive form (that the archaic form is C-or seems to indicate a sound shift to me). The reflexive form seems to be -(a)st, since that's the only one I found. (I think I saw somewhere the reconstructed forms niist and mhiist, but I'm not certain of that ...) Of course, those only apply to the forms where there is a difference between them. Those I found are listed as "subject/object" > "possessive" > "reflexive"
1P sg: ni > ner > * [niist?]
2P sg/pl: gar > gar > *
3P sg: kaysh > kaysh > *
3P sg: * > * > ast
1P pl: mhi > mhor {rare archaic} / cuun > * [mhiist?]
3P pl: val > val > *
I did notice that mhor is given as "ours" in the dictionary - which either means that there were different forms for things like "mine, yours, ours" etc. or they are used the same. "cuun" completely falls out of it. I would venture a guess that the reflexive forms for gar, kaysh and val are either simply the same as the others, demanding a lot from context, or they might actually get a -(a)st ending to correspond with the one form we have - both could be possible; if ni, mhi and ast have the same build-up, maybe "ast" would be rendered "ai > ar > ast" or something. "cuun" might be of the same vein as gar and val, getting mixed up with the inflected forms of "mhi". It is interesting that there seems to be a difference between "it" and "he/she" - it could mean there's a difference between "cuun" and the "mhi"-forms as well. While it is a bit more difficult to render it in English, considering that Mando'a has no genders, it could point to forms for animate and inanimate objects. Or sentient and non-sentient things. Something like that. Like that, "cuun" and "mhi" could have, originally, been inclusive and exclusive 1P, which would not be too amiss in a culture that has a great deal of "we against them" in it. It could have gone out of use in that when Mando'a became less of a language spoken to outsiders, so the exlusive form was simply no longer needed as a separate entity ...
But that's going far beyond the idea for interlinear parsing and I don't know if that has been discussed elsewhere already.
While I'm trying to keep this as 'canon' as possible, since the program I'm using gives the option to declare usages, restrictions and declare words, variants and senses as dialects, I am not too restrictive on incorporating other things. So, if there are forms I've missed that seem important but aren't strictly in the original source but reconstructed.
At the moment, I am mostly concerned with structuring the different parts of a word and finding the corresponding rules for these parts - and I still have no idea what I'm going to do with the beten, since it has different roles that aren't always that clearly defined. My final goal is to have a toolbox for easy construction of new words using existing ones in compounds or creating derived words after settling on a new root.
What I've come up with for now:
Affixes:
ke'- = prefix, inflectional, attaches to verbs, denotes imperative (imp)
forms: ke'- > used when the stem initial is a consonant sound (#_C)
forms: k'- > used when the stem initial is a vowel sound (#_V)
ven'- = prefix, inflectional, attaches to verbs, denotes future tense (fut)
ru'- = prefix, inflectional, attaches to verbs, denotes past tense (pst)
-r = suffix, inflectional, attaches to verbs, denotes the infinitive (inf)
-ir = suffix, derivational, attaches to nouns, changes to verb [in inf]
forms: -ir > used when the stem ends in a final consonant (C_#)
forms: -r > used when the stem ends in a final vowel (V_#)
!Issue: the derivational suffix forms the infinitive, which means it is already inflected; I've thought of leaving it as -i being the derivational suffix, however it would mean that noun = verb (ind) in all cases where a noun ends in a vowel. It isn't uncommon for a language to have noun = verb (German has some words like that, English has a lot of them ...). I have read that some of those pairs swap stress, so one might try to put that as the rule for now, even though I'd need to look at the actual Mando'a words to make sure that it is at last mostly followed.
-yc = suffix, derivational, attaches to nouns, changes to adjectives
!Issue: as far as I understand adjectives, -yc and -la are the default endings for the absolute form and -shya and -ne would be comparative and superlative respective; so, while an adjective is built noun+yc, the comparative and superlative would be noun+shya and noun+ne. So, either adjectives and nouns have the complete same stem (unlike verbs who have always a final vowel sound), or adjectives as such don't exist and -yc, -shya and -ne are actually inflections of nouns ... or am I overlooking something in the derivation of adjectives?
be'- = prefix, inflectional, attaches to nouns, denotes a possessive (poss)
forms: be'- > used when the stem initial is a consonant sound (#_C)
forms: b'- > used when the stem initial is a vowel sound (#_V)
as suffix (archaic, rare):
-b > used when the stem ends on a final vowel (V_#)
(truthfully, I'm conflicted of even including this one, and it is partly because it should be a rare form - and the implication it leaves when looking at dalab is quite uncomfortable to me; dalab could either mean "sheath" or "woman's" and while the context should make it obvious, that kind of double meaning is quite derogatory in my opinion; grammatically, that's neither here or there, though)
-e = suffix, inflectional, attaches to nouns, denotes a plural (pl)
forms: -e > used when the stem ends in a final consonant not a t or b
forms: -'e > used when the stem ends in a final vowel that is not i
forms: -se > used when the stem ends in a final i
forms: -ise (variant: -'se) > used when the stem ends in a final t or b
!Issue: I have to look more closely at how the different forms actually are. I already know that ad'ike is the plural of ad'ika, which either makes it irregular or I need to figure out how to incorporate a>e in the rule set. In that regard, I am still uncertain if -e and -se are part of the same pluralization (like in English -s and the form -es for certain words) or if it is a different kind of plural (I think I've read that idea on tumblr on how the -ii, -iise is actually a way to denote 'sth. disliked' or something similar). Both seem equally possible to me.
nu'- = prefix, inflectional, attaches to verbs/nouns/adjectives, denotes a negative (neg)
forms: ne'- > (I actually don't know if there is a set rule for using ne' or nu', so I'd probably leave it as a wildcard to chose between)
forms: n'- > used when the stem initial is a vowel sound (#_V)
!Issue: I've already found exceptions (nu'amyc -> which means nu can also be used before vowels)
for pronouns: I am not clear on how to structure these. I can see that the words for the 1st Person work like C-i > C-er for the possessive form (that the archaic form is C-or seems to indicate a sound shift to me). The reflexive form seems to be -(a)st, since that's the only one I found. (I think I saw somewhere the reconstructed forms niist and mhiist, but I'm not certain of that ...) Of course, those only apply to the forms where there is a difference between them. Those I found are listed as "subject/object" > "possessive" > "reflexive"
1P sg: ni > ner > * [niist?]
2P sg/pl: gar > gar > *
3P sg: kaysh > kaysh > *
3P sg: * > * > ast
1P pl: mhi > mhor {rare archaic} / cuun > * [mhiist?]
3P pl: val > val > *
I did notice that mhor is given as "ours" in the dictionary - which either means that there were different forms for things like "mine, yours, ours" etc. or they are used the same. "cuun" completely falls out of it. I would venture a guess that the reflexive forms for gar, kaysh and val are either simply the same as the others, demanding a lot from context, or they might actually get a -(a)st ending to correspond with the one form we have - both could be possible; if ni, mhi and ast have the same build-up, maybe "ast" would be rendered "ai > ar > ast" or something. "cuun" might be of the same vein as gar and val, getting mixed up with the inflected forms of "mhi". It is interesting that there seems to be a difference between "it" and "he/she" - it could mean there's a difference between "cuun" and the "mhi"-forms as well. While it is a bit more difficult to render it in English, considering that Mando'a has no genders, it could point to forms for animate and inanimate objects. Or sentient and non-sentient things. Something like that. Like that, "cuun" and "mhi" could have, originally, been inclusive and exclusive 1P, which would not be too amiss in a culture that has a great deal of "we against them" in it. It could have gone out of use in that when Mando'a became less of a language spoken to outsiders, so the exlusive form was simply no longer needed as a separate entity ...
But that's going far beyond the idea for interlinear parsing and I don't know if that has been discussed elsewhere already.
While I'm trying to keep this as 'canon' as possible, since the program I'm using gives the option to declare usages, restrictions and declare words, variants and senses as dialects, I am not too restrictive on incorporating other things. So, if there are forms I've missed that seem important but aren't strictly in the original source but reconstructed.