Page 1 of 1

Parsing for Mando'a

Posted: 29 Dec 2017 11:19
by yatenari
So, as I've already mentioned in my introductory post, I'm working on a way to have an interlinear parsing done automatically for Mando'a (I'm using a freeware linguist software, but since I'm not a linguist and it's all in English, it's a bit of a guesswork at times - so feel free to point out anything that is amiss). I do hope this is in the right place, since I have to admit, some things are sorted/structured a bit confusing for me.

At the moment, I am mostly concerned with structuring the different parts of a word and finding the corresponding rules for these parts - and I still have no idea what I'm going to do with the beten, since it has different roles that aren't always that clearly defined. My final goal is to have a toolbox for easy construction of new words using existing ones in compounds or creating derived words after settling on a new root.
What I've come up with for now:

Affixes:
ke'- = prefix, inflectional, attaches to verbs, denotes imperative (imp)
forms: ke'- > used when the stem initial is a consonant sound (#_C)
forms: k'- > used when the stem initial is a vowel sound (#_V)
ven'- = prefix, inflectional, attaches to verbs, denotes future tense (fut)
ru'- = prefix, inflectional, attaches to verbs, denotes past tense (pst)
-r = suffix, inflectional, attaches to verbs, denotes the infinitive (inf)

-ir = suffix, derivational, attaches to nouns, changes to verb [in inf]
forms: -ir > used when the stem ends in a final consonant (C_#)
forms: -r > used when the stem ends in a final vowel (V_#)
!Issue: the derivational suffix forms the infinitive, which means it is already inflected; I've thought of leaving it as -i being the derivational suffix, however it would mean that noun = verb (ind) in all cases where a noun ends in a vowel. It isn't uncommon for a language to have noun = verb (German has some words like that, English has a lot of them ...). I have read that some of those pairs swap stress, so one might try to put that as the rule for now, even though I'd need to look at the actual Mando'a words to make sure that it is at last mostly followed.

-yc = suffix, derivational, attaches to nouns, changes to adjectives
!Issue: as far as I understand adjectives, -yc and -la are the default endings for the absolute form and -shya and -ne would be comparative and superlative respective; so, while an adjective is built noun+yc, the comparative and superlative would be noun+shya and noun+ne. So, either adjectives and nouns have the complete same stem (unlike verbs who have always a final vowel sound), or adjectives as such don't exist and -yc, -shya and -ne are actually inflections of nouns ... or am I overlooking something in the derivation of adjectives?

be'- = prefix, inflectional, attaches to nouns, denotes a possessive (poss)
forms: be'- > used when the stem initial is a consonant sound (#_C)
forms: b'- > used when the stem initial is a vowel sound (#_V)
as suffix (archaic, rare):
-b > used when the stem ends on a final vowel (V_#)
(truthfully, I'm conflicted of even including this one, and it is partly because it should be a rare form - and the implication it leaves when looking at dalab is quite uncomfortable to me; dalab could either mean "sheath" or "woman's" and while the context should make it obvious, that kind of double meaning is quite derogatory in my opinion; grammatically, that's neither here or there, though)

-e = suffix, inflectional, attaches to nouns, denotes a plural (pl)
forms: -e > used when the stem ends in a final consonant not a t or b
forms: -'e > used when the stem ends in a final vowel that is not i
forms: -se > used when the stem ends in a final i
forms: -ise (variant: -'se) > used when the stem ends in a final t or b
!Issue: I have to look more closely at how the different forms actually are. I already know that ad'ike is the plural of ad'ika, which either makes it irregular or I need to figure out how to incorporate a>e in the rule set. In that regard, I am still uncertain if -e and -se are part of the same pluralization (like in English -s and the form -es for certain words) or if it is a different kind of plural (I think I've read that idea on tumblr on how the -ii, -iise is actually a way to denote 'sth. disliked' or something similar). Both seem equally possible to me.

nu'- = prefix, inflectional, attaches to verbs/nouns/adjectives, denotes a negative (neg)
forms: ne'- > (I actually don't know if there is a set rule for using ne' or nu', so I'd probably leave it as a wildcard to chose between)
forms: n'- > used when the stem initial is a vowel sound (#_V)
!Issue: I've already found exceptions (nu'amyc -> which means nu can also be used before vowels)


for pronouns: I am not clear on how to structure these. I can see that the words for the 1st Person work like C-i > C-er for the possessive form (that the archaic form is C-or seems to indicate a sound shift to me). The reflexive form seems to be -(a)st, since that's the only one I found. (I think I saw somewhere the reconstructed forms niist and mhiist, but I'm not certain of that ...) Of course, those only apply to the forms where there is a difference between them. Those I found are listed as "subject/object" > "possessive" > "reflexive"

1P sg: ni > ner > * [niist?]
2P sg/pl: gar > gar > *
3P sg: kaysh > kaysh > *
3P sg: * > * > ast
1P pl: mhi > mhor {rare archaic} / cuun > * [mhiist?]
3P pl: val > val > *

I did notice that mhor is given as "ours" in the dictionary - which either means that there were different forms for things like "mine, yours, ours" etc. or they are used the same. "cuun" completely falls out of it. I would venture a guess that the reflexive forms for gar, kaysh and val are either simply the same as the others, demanding a lot from context, or they might actually get a -(a)st ending to correspond with the one form we have - both could be possible; if ni, mhi and ast have the same build-up, maybe "ast" would be rendered "ai > ar > ast" or something. "cuun" might be of the same vein as gar and val, getting mixed up with the inflected forms of "mhi". It is interesting that there seems to be a difference between "it" and "he/she" - it could mean there's a difference between "cuun" and the "mhi"-forms as well. While it is a bit more difficult to render it in English, considering that Mando'a has no genders, it could point to forms for animate and inanimate objects. Or sentient and non-sentient things. Something like that. Like that, "cuun" and "mhi" could have, originally, been inclusive and exclusive 1P, which would not be too amiss in a culture that has a great deal of "we against them" in it. It could have gone out of use in that when Mando'a became less of a language spoken to outsiders, so the exlusive form was simply no longer needed as a separate entity ...
But that's going far beyond the idea for interlinear parsing and I don't know if that has been discussed elsewhere already.

While I'm trying to keep this as 'canon' as possible, since the program I'm using gives the option to declare usages, restrictions and declare words, variants and senses as dialects, I am not too restrictive on incorporating other things. So, if there are forms I've missed that seem important but aren't strictly in the original source but reconstructed.

Re: Parsing for Mando'a

Posted: 02 Jan 2018 11:08
by yatenari
So, I've been looking at how best to separate the different words on morpheme level. It isn't strictly necessary for the parsing to work, since Mando'a doesn't have that many classic inflections and instead of derivational affixes the words can simply be added as a completed word. This is the easiest approach especially for adjectives, since they them to be the most irregular so far (except some words that are given as adjectives in the translation are actually not adjectives in Mando'a - in some cases that seems to be the most likely conclusion, but so far I'm trying to keep it as true to the original source as possible). I have considered some roots from the different words (like *cet for something pointed, *cir for the general meaning of cold etc.) and going from that looking on how the words differ in both form an meaning.

There are some of the derivative affixes that are already more or less given in the original list (-yc and -la to change primarily nouns into adjectives and 'ad and 'ika to form new nouns). Some endings to words seem more or less common, so they might have the characteristic of affixes (-an in slightly varying forms might be such an affix, -ii could be seen as one as well). And then there are parts that could be called affixes that change the meaning of the word (nu'-, but maybe also dar', ge' or sol'). The first could be considered an inflection (denoting a negative) or a derivation (denoting an opposite), so I'm still unsure on how to classify that prefix. The others are actually more words than they are affixes (since they keep their meaning they're part of compounds or contractions instead of derivations), but since there are certain forms, one could see them as a separate class of affixes. If they're considered affixes, then they should follow a set of rules for how they attach to a given word stem. If they're simply compounds/contractions, then the word forms would be more arbitrarily. For parsing, the former is better, since then there only needs to be the stem and the different affixes known (i. e. dar' as an affix and buir as a stem, thus dar'buir would not have to be defined to have it correctly parsed - regardless of the additional information that is solely connected to dar'buir as a word). It would make it easier if a text with new, previously undefined words, should be parsed (which helps in translating it literally).

Another point that gives me a headache is the use of the beten. Sometimes it's part of the word, then it is lost in further compounds (or not) and some seem to be arbitrarily added to words - which made me actually come up with trac'yar ti betene for it (... who spots the non-SW reference?). Since the goal is to find rules for existing phenomena and not correcting them - at least for the plural -e or -'e I came up with a way to explain the difference. While in the form bavodu'e it is due to the two vowels (and the one after the ba' seems to vanish somewhere in the plural form ...), I've seen it quite often in vod'e and the like. Vode An is quite obvious in having the form vode being correct - but since a beten is an actual phoneme, there is no reason to simply dismiss vod'e as incorrect. If actually speaking the beten and not considering it a completely different word, then the beten could be used as a sign that the plural is emphatised (and Vode An would either not have it because it's a song or because the emphasis is already done through an).
On the topic of plurals, there are a few archaic forms which might be interesting to look at, especially for word-building (if one wants to do things a bit more corresponding with the natural evolution of a language - it would also create more words, if there is an older compound and a newer one where the old one would have shifted in meaning ...). The main source is Dha Werda Verda. There is the archaic plural ending -a in verda and werda - which might imply a shift from a to e in that instance. There is also adu which is given as a plural in the translation, so there seem to be two plural inflections: -a and -u. Verda and werda end both in their singular form in -e, while adu is rendered as ad. It could mean that -a is the ending given to only final e-nouns or generally if a noun ends in a vowel, while -u is attaches to final consonants.

... and I noted that I missed at least two forms for my pronoun table, so I will have to look into my prior conclusions again *sigh* I really hate how sometimes things are hard to find if not having the exact right search query (and that is not even getting at alternate spellings).

Re: Parsing for Mando'a

Posted: 15 Jan 2018 17:22
by Munnodol
yatenari wrote:So, I've been looking at how best to separate the different words on morpheme level. It isn't strictly necessary for the parsing to work, since Mando'a doesn't have that many classic inflections and instead of derivational affixes the words can simply be added as a completed word. This is the easiest approach especially for adjectives, since they them to be the most irregular so far (except some words that are given as adjectives in the translation are actually not adjectives in Mando'a - in some cases that seems to be the most likely conclusion, but so far I'm trying to keep it as true to the original source as possible). I have considered some roots from the different words (like *cet for something pointed, *cir for the general meaning of cold etc.) and going from that looking on how the words differ in both form an meaning.

There are some of the derivative affixes that are already more or less given in the original list (-yc and -la to change primarily nouns into adjectives and 'ad and 'ika to form new nouns). Some endings to words seem more or less common, so they might have the characteristic of affixes (-an in slightly varying forms might be such an affix, -ii could be seen as one as well). And then there are parts that could be called affixes that change the meaning of the word (nu'-, but maybe also dar', ge' or sol'). The first could be considered an inflection (denoting a negative) or a derivation (denoting an opposite), so I'm still unsure on how to classify that prefix. The others are actually more words than they are affixes (since they keep their meaning they're part of compounds or contractions instead of derivations), but since there are certain forms, one could see them as a separate class of affixes. If they're considered affixes, then they should follow a set of rules for how they attach to a given word stem. If they're simply compounds/contractions, then the word forms would be more arbitrarily. For parsing, the former is better, since then there only needs to be the stem and the different affixes known (i. e. dar' as an affix and buir as a stem, thus dar'buir would not have to be defined to have it correctly parsed - regardless of the additional information that is solely connected to dar'buir as a word). It would make it easier if a text with new, previously undefined words, should be parsed (which helps in translating it literally).

Another point that gives me a headache is the use of the beten. Sometimes it's part of the word, then it is lost in further compounds (or not) and some seem to be arbitrarily added to words - which made me actually come up with trac'yar ti betene for it (... who spots the non-SW reference?). Since the goal is to find rules for existing phenomena and not correcting them - at least for the plural -e or -'e I came up with a way to explain the difference. While in the form bavodu'e it is due to the two vowels (and the one after the ba' seems to vanish somewhere in the plural form ...), I've seen it quite often in vod'e and the like. Vode An is quite obvious in having the form vode being correct - but since a beten is an actual phoneme, there is no reason to simply dismiss vod'e as incorrect. If actually speaking the beten and not considering it a completely different word, then the beten could be used as a sign that the plural is emphatised (and Vode An would either not have it because it's a song or because the emphasis is already done through an).
On the topic of plurals, there are a few archaic forms which might be interesting to look at, especially for word-building (if one wants to do things a bit more corresponding with the natural evolution of a language - it would also create more words, if there is an older compound and a newer one where the old one would have shifted in meaning ...). The main source is Dha Werda Verda. There is the archaic plural ending -a in verda and werda - which might imply a shift from a to e in that instance. There is also adu which is given as a plural in the translation, so there seem to be two plural inflections: -a and -u. Verda and werda end both in their singular form in -e, while adu is rendered as ad. It could mean that -a is the ending given to only final e-nouns or generally if a noun ends in a vowel, while -u is attaches to final consonants.

... and I noted that I missed at least two forms for my pronoun table, so I will have to look into my prior conclusions again *sigh* I really hate how sometimes things are hard to find if not having the exact right search query (and that is not even getting at alternate spellings).
Looks good, can I give you another perspective? You listed quite a few affixes, but there are other roots in mando'a that are brought together. Mando'a has a rule where new words are made from existing words, but for a lot of words there appears to be no separation, and therefore it is by itself. That is not the case. I'm not saying you haven't done this, but next time you go through the dictionary look for patterns. I'm working on creating a Mando'a learning book |(with the hopes of getting it published) and it is my belief that Mando'a has many more roots, stems or other morphemes that just can't function on their own.
Here's an example:

1) There is no conventional definition for the word <alii> but you see it appear in several words <alii'gai>, <aliik>, and <aliit>. Now look at their definitions. They mean "flag/colors", "Sigil/Symbol on Armor", and "family/clan" respectively. Semantically speaking, there are a lot of similarities between these words, and there is only one thing relating them: <alii>. So, this means that <alii> has something to do with symbols, mainly when representing a clan or faction. With this information, we can then deduce that <k> and <t> must also be morphemes, although their meaning so far is unknown to me, but one can be found.

What we are working with here are bound morphemes, or morphemes that can't exist on their own (one example in English is the word "submit" if you broke apart the morphemes "sub" and "mit" you would not get any new meaning, they must be put together or with other words to work). <gai> in <alii'gai> is a free morpheme because it is its own word and meaning, but <alii> must be paired with something so that a word is created. Mando'a has quite a few bound morphemes, that range from describing clans to expressing emotion, and if you want to understand word building, you need to know all these morphemes. That way, you can create new words that meet the requirements the language created.

Re: Parsing for Mando'a

Posted: 15 Jan 2018 21:48
by yatenari
Well, that's just it - I'm looking at affixes which attach to stems and roots. The reason why I'm looking at the affixes first (especially suffixes at this point) is because I simply can't determine which are the roots just yet. The program I'm using doesn't allow for a simply import of an Excel sheet, so by typing in every word manually, I've already noted certain words that are related through their components. However, I also noted that Mando'a does not have a singular way of creating new words by attaching them to each other (something I am quite familiar with from German, where you can simply stick words together to get a new ones).

Taking your example with <alii'gai>, <aliik> and <aliit>:
<alii'gai> could be a root *alii and the word <gai>, or it could actually be a contraction from *aliitgai, since the /t/ and /g/ are hard to pronounce in that. While it appears that *alii might mean some sort of symbol, it could also be that <alii'gai> means "flag, colors" in translation, but literally means "clan name" with a bit of historical shifting in it (like, a flag is originally a clan emblem and this emblem depicted the actual name, comparable to some coat of arms). That is why I'm trying to look at what endings words take and try to find some pattern there instead of looking at roots I can't yet sufficiently determine (I have made some proposed roots for my list, but for parsing they aren't needed - yet). Currently, I'm trying to group the endings to determine where the potential suffix starts and how they differ in form (for those that seem similar and thus might be allomorphs).

I am not strictly looking at affixes that have a definite meaning in themselves (like sticking -'ad to a word - <ad> is a word, so it is less an affix than a compound word, even though I might add it as a derivational affix since it can change the word class). Compare -er in English: adding it to a verb is a way to create an actor.

Btw, <submit> might not be that good an example since it's a word that's originally Latin and thus might not be the best to explain English morphemes (since, both parts originally work independently of one another as <sub> and <mittere>, becoming a compound as <submittere>). But I get what you mean ;)

Re: Parsing for Mando'a

Posted: 19 Jan 2018 13:54
by Munnodol
yatenari wrote:Well, that's just it - I'm looking at affixes which attach to stems and roots. The reason why I'm looking at the affixes first (especially suffixes at this point) is because I simply can't determine which are the roots just yet. The program I'm using doesn't allow for a simply import of an Excel sheet, so by typing in every word manually, I've already noted certain words that are related through their components. However, I also noted that Mando'a does not have a singular way of creating new words by attaching them to each other (something I am quite familiar with from German, where you can simply stick words together to get a new ones).

Taking your example with <alii'gai>, <aliik> and <aliit>:
<alii'gai> could be a root *alii and the word <gai>, or it could actually be a contraction from *aliitgai, since the /t/ and /g/ are hard to pronounce in that. While it appears that *alii might mean some sort of symbol, it could also be that <alii'gai> means "flag, colors" in translation, but literally means "clan name" with a bit of historical shifting in it (like, a flag is originally a clan emblem and this emblem depicted the actual name, comparable to some coat of arms). That is why I'm trying to look at what endings words take and try to find some pattern there instead of looking at roots I can't yet sufficiently determine (I have made some proposed roots for my list, but for parsing they aren't needed - yet). Currently, I'm trying to group the endings to determine where the potential suffix starts and how they differ in form (for those that seem similar and thus might be allomorphs).

I am not strictly looking at affixes that have a definite meaning in themselves (like sticking -'ad to a word - <ad> is a word, so it is less an affix than a compound word, even though I might add it as a derivational affix since it can change the word class). Compare -er in English: adding it to a verb is a way to create an actor.

Btw, <submit> might not be that good an example since it's a word that's originally Latin and thus might not be the best to explain English morphemes (since, both parts originally work independently of one another as <sub> and <mittere>, becoming a compound as <submittere>). But I get what you mean ;)

What Iam saying is that Mando'a, while not explicitly stating it, may prefer certain morphemes over others (I use "morpheme" because in incoporates both words and affixes) alii is but one example, though I admit that the other reading is equally likely, and is most likely the better option considering consonants are dropped if compounded with a word beginning with a consonant (aran+suum=arasuum. Similar things happen with vowels.) Even so, one thing I notice is that <ii> is used a lot (though I am trying to figure out why, I think it is for marking some sort of noun) and one thing that occurs is in the word <aruetii>. Now <aru'e> is most likely the root, but when <ii> is added, an additional <t> is added to break them up. So ultimately, in order to see who is right, we would have to see <alii> when paired with a word beginning with a vowel. Both our options are viable, it depends on further analysis or personal preference.

Personally, I do not worry about historical shifting, since we only have these few words and very little if any observable data to go off of. So our best bet is to take these definitions as literal, and maybe apply more as we figure out more. yes, but how would you know the affix without the root? If we understand the base meaning of the word, you get a better understanding of the morphemes being applied to it, and not all have definite meanings that are stated.

Also, considering English mophemes and other rules are a combination of several languages and variations, they apply when appropriate (hence why the -ves pluralization applies largely to germanic words like <wolf> or <knife> and Latin words like <cactus>, which receives a mutually exclusive one as well). So the idea of <sub> and <mit> being bound in English, but free in Latin stands, because the morphological rules are for English, not Latin.