VSCA understands that your language may have digraphs, where the "t" in the sequence "th" is something else than just a "t" on its own. This is one of the powers of the VSCA, and also one thing that may cause some problems when you're rulecrafting. In this page I'll explain a few things about symbols (a symbol is either a monograph or a polygraph) that you should know.
Polygraphs can appear in three places: variable values, New fields, and Original fields. Sequences in Position and Exception fields are automatically considered polygraphs. There is something special about polygraphs in variable values. Read on to discover.
Let's assume that your original language has a digraph, like "dh", and you want to change that into "th" at the end of a word. You'll understand that a rule like dh/th/_# will not do what you expect: this rule means "change every d or h into a t or h at the end of a word".
To tell VSCA that the "dh" and the "th" are polygraphs (digraphs, to be precise, but VSCA doesn't care what kind of n-graph you want to use), you surround them in square brackets:
[dh]/[th]/_#
You can mix polygraphs and monographs in a rule:
bdg[dh]/ptk[th]/_#
You can also replace a polygraph by a monograph. We already did this in our Latin-to-Spanish example. The reverse is also possible.
[ii]/i/_ i/[ii]/_
In Position fields you don't need to tell VSCA it's dealing with polygraphs. For example, the rule a/e/_i# means "change an a into an e if it appear before a word-final i".
The entire process of VSCA consists of two main phases. You might not be very interested in this, but please bear with me - understanding this will help you craft better rulesets.
During the first phase, VSCA reads your ruleset and it transforms your rules and assignments to an internal format that is easier to deal with. What also In analogy with some programming languages, this phase is called compile time.
The second phase is called runtime - this is where your lexicon file is read word for word, and where each word is ran through all soundchange rules.
We can say that the runtime phase itself consists of various steps too. The obvious first step is reading the word, and then there are two steps for each rule in the ruleset. I want to talk about these. For the record, the last step of course is outputting the (perhaps changed) word.
When a word is sent to a rule, it is first split up into symbols that make sense to the rule, and then into symbols that seem to make sense to the language. Let's take the word "awa:ka", for example, and the single rule [a:]/[i:]/_. The word will be split up into ("a", "w", "a:", "k", "a"). Now it is very easy for VSCA to take out the middle symbol and replace it with "i:".
Now imagine you want to do a more complex vowel shit.
a/e/_ [a:]/[i:]/_
If you run the same word "awa:ka" through this, it will come out as "ewe:ke". The reason is simple to explain: for the first rule, the word is split into ("a", "w", "a", ":", "k", "a"). VSCA then takes out those "a"s and puts back a number of "e"s. By the time the word is sent to the second rule, there is no "a:" left to be replaced with "i:".
You could work-around this by putting the rules in reversed order, but once rulesets get more complex, it might become too tedious to think about the order in which rules should appear. Well, you have to think about that anyway, but not because of VSCA's polygraph recognization.
Another work-around would be to put the entire vowel shift in one rule:
a[a:]/e[i:]/_
Now the word is nicely split up into ("a", "w", "a:", "k", "a") again, and VSCA replaces the "a"s with "e"s and the "a:"s with "i:"s.
But there is more. As I said earlier, there is something special polygraphs inside variable values. The key is that, since you assign a polygraph to a variable, VSCA assumes that this polygraph is something constant througout your language. Therefore it will remember the polygraph and use it during the word-splitting step.
This is easier to explain with an example.
# Short vowels SV=aeiou w/v/_<SV>
Change "w" into "v" before a short vowel, right? Right. Let's take the same word again. It will be split up into ("a", "w", "a", ":", "k", "a"). VSCA sees the "w", it sees that the next symbol is an "a", "a" appears to be a short vowel, so VSCA replaces the "w" with a "v", even though it really occured before a long vowel. The thing here is that I never told VSCA that "a:" is a long vowel. Or, more accurately, I never told VSCA that the "a" in "a:" is not the same as an "a" on its own.
# Short vowels SV=aeiou LV=[a:][e:][i:][o:][u:] # ( by the way, you can also write LV=[<SV>:] ) w/v/_<SV>
Now VSCA knows you want to distinguish "a:" from "a" + ":". It sees that the "w" occurs before "a:". and "a:" doesn't appear to be a short vowel.
VSCA - Anything. Anywhere. Anytime