Text generation Part Two

Prerequisites: You should have read Text Generation Part One.

Anatomy of a Phrase

Stored inside each instance of ishml.Phrase is an array of text strings, other instances of ishml.phrase, numbers, and functions. The elements of this array are the sub-phrases that comprise phrase. When a phrase's .say() method is called, the phrase is evaluated and the phrase's text property is updated. The default evaluation is to join all the elements of the sub-phrase array together. If an array element is another phrase or a function, it will be evaluated before concatenation. We can change the default evaluation by adding modifiers to the phrase's definition.

Modifiers that are exposed through ishml.Template are referred to as prefixes since they appear at the beginning of a phrase definition. An unlimited number of prefixes may by chained together with dot notation to apply layer upon layer of modifications. Prefixes are applied from right to left. That is, the prefix closest to the left parenthesis or backtick is applied to the base phrase first. Then the next closest prefix is applied to that result and so on until the beginning of the chain is reached. Chaining permits the creation of complex phrases from a single line of code.

Modifiers that are exposed through ishml.Phrase are referred to suffixes since they are appended after a phrase to create a new phrase.

The JavasScript tab in Listing 1 shows examples of prefixes in use. An alphabetical list of all built-in prefixes is available in the API Reference.

Listing 1

The phrase defined for unmodifiedPhrase evaluates to JackJillJamal went to the humane society and came home with dogcatocelotemu.

Adding the .pick prefix to the inner phrases of pickPhrase causes a name and an animal to be chosen at random each time .say is called. The inner phrases of cyclePhrase uses the .cycle prefix to cause names and animals to be selected in order, looping around to the begin again once the list is exhausted.

The .a prefix adds the appropriate indefinite article, a or an to the animal selected by .pick.

Listing 2

Changing the order of prefixes often changes the final result of text generation. Examine the JavaScript in Listing 2. The example1 phrase applies the appropriate indefinite article to the list as a whole. The example2 phrase applies the indefinite article to each animal individually before creating the list. This is solely due to the order in which the prefixes are chained. The example3 phrase capitalizes the first letter of the resulting text. The example4 phrase capitalizes the first letter of each animal.

Generating Random Text

Four prefixes inject randomness into phrase evaluation, each in different ways. Study Listing 3. The .roll prefix selects an element from the phrase's array purely at random, like rolling dice.

Listing 3

The .pick prefix also selects an element from the phrase's array at random, but if the element selected is the same as the previously selected element, the next element in the list is chosen. This prevents the same element from being selected twiced in a row.

The .favor prefix weights the selection so that there is a decreasing likelihood of an element being selected the further down it appears in the list. For example, in an array with 5 elements, the first element is 5 times more likely to be selected than the last element, the second element is 4 times more likely than the last, and so on. Mathmatically, the odds of selecting the element at index i of an array of length n is ( n - i ) out of n * ( n + 1 ) / 2. In an array with 5 elements, this translates to odds of 5:15, 4:15, 3:15, 2:15, and 1:15 for each element respectively.

The .shuffle prefix shuffles the elements of the array like a deck of cards. When used in conjunction with .cycle it ensures that each alternative will be dealt out in random order before reshuffling the array when the cycle repeats.

ISHML's random number generator is deterministic. In other words, the so-called random selection prefixes are based on a mathematical formula that mimics true randomness. This pseudorandomness is generally good enough for most text generation needs in most applictaions. Because the results are deterministic, we can actually control the outcomes of the random selection prefix by setting a phrase's seed. Take a look as Listing 4.

Listing 4

The example1 phrase will produce different results with every click of the Click Me button.

The example2 phrase runs through the same sequence starting with taking third in the race twice and then losing. Even clicking the results link will not change the outcome. The fix is in because the phrase's seed was set to a specific value with .seed(). A seed may be any numeric value less than one, but not less than zero. Setting the seed of a phrase guarantees reproducible results. Different seeds will generate different results.

Alternatively, the seed may be set using the phrase's .say() method as seen in example3. Because the same seed is used every time this phrase's text is generated, the outcome is always the same.

Pinning Things Down

Listing 5 below introduces the .pin prefix. Normally, when the cycle repeats, the .cycle prefix sends a reset command to the prefix to the right of it and then that prefix resets itself and passes the reset command down the prefix chain. However, the .pin prefix stops the reset command from propagating further. The behavior of reshufflePhrase is to reshuffle the list everytime the cycle completes a loop through the list. The addition of the .pin prefix in pinShufflePhrase stops the reset command before it reaches .shuffle, which prevents the list from being reshuffled.

Listing 5

The .pin call also be combined with a random selection prefix to make a selection sticky. The stickyPickyPhrase causes an item to be picked at random the first and then that item will continue to be selected.

Series

The .series prefix selects each element from the phrase's array in order, like .cycle, but once it reaches the end of the list, it returns the empty string instead of looping back to the beginning. See Listing 6.

Listing 6

Suffixes

The .series prefix is often paired with .then. It signals the start of another ishml.Phrase that will be used instead of the previous phrase once the series has reached its end. Listing 6 above shows some examples of .then in use.

Modifiers that are methods or getters of ishml.Phrase are referred to as suffixes. Although .then is technically a suffix because it is a getter property of ishml.phrase, it behaves a lot like a prefix because the .then property returns a prefix for prefix chaining instead of an instance of ishml.Phrase. This makes the .then suffix is a bit of an oddball. Another oddball is .also, which we will cover in part three.

Aside from .then and .also, suffixes return instances of ishml.Phrase. This allows an unlimited number of suffixes to be chained together with dot notation to apply layer upon layer of modifications. Suffixes are applied from left to right. That is, the suffix closest to the right parenthesis parenthesis or closing backtick is applied to the phrase first. Then the next closest suffix is applied to that result and so on until the beginning of the chain is reached. If a phrase definition also has prefixes, the prefixes are applied first to create a phrase and the suffixes are applied to that phrase to create a new phrase.

Unlike prefixes, which take a hybrid approach, a suffix is either a method or a getter. Suffixes that are methods must take arguments and must include a pair a parentheses or back ticks. Others suffixes are defined as getters and parentheses and back ticks must be omitted. The use of getters improves code readablity since the phrases aren't cluttered with empty pairs of parentheses.

Listing 7 shows examples of suffixes in use. An alphabetical list of all built-in suffixes is available in the API Reference.

Listing 7

The example1 phrase demonstrates the .ing suffix, which turns a verb into a gerund. The en-us.js script extends ishml.Phrase to provide a library of many modern American-English suffixes for nouns and verbs.

When a phrase has both prefixes and suffixes attached, the prefixes are always applied first and then suffixes are applied to the result. This sometimes leads to unintended results like example2 in the listing above. Only the last verb has the ing ending applied to it because the list prefix was applied first. Then the list text was capitalized before finally adding ing to the end of the resulting text.

This behavior may be changed by nesting phrases. When phrases are nested, the inner phrase is evaluated before the outer phrase. The inner, inner phrase of example3 first applies .ing to each of the verbs. Once this phrase has been evaulated, the phrase surrounding it turns it into a comma delimited list followed by capitalizing the first letter of result. There are no restrictions on how deeply a phrase may be nested.

Continue to part three to learn about tags and applying logic to phrases.