Parsing Tutorial Part Two

Prerequisites: You should have read Parsing Tutorial Part One.

Part two of the parsing tutorial covers rule cloning, choice rules, and semantic analysis.

Figure 1— Command
Subject Verb Direct Object Preposition Indirect Object Indirect Object Direct Object

Let's look at a more complex grammar than the grammar covered in part one, which allows parsing of commands that contain a subject, verb, direct object and indirect object. Subject, direct object, and indirect object are each optional, but if the indirect object is present, the direct object must be present as well. Direct and indirect objects may appear in either order, but if the direct object is first, a preposition must precede the indirect object. (We'll continue to use the same lexicon.)

//Code Listing 1
var nounPhrase=ISHML.Rule()

nounPhrase
    .snip("article").snip("adjectives").snip("noun")

nounPhrase.article
    .configure({minimum:0, filter:(definition)=>definition.part==="article" })
nounPhrase.adjectives
    .configure(
    { minimum:0, maximum:Infinity,
            filter:(definition)=>definition.part==="adjective"
    })

nounPhrase.noun.configure({filter:(definition)=>definition.part==="noun" })

nounPhrase.semantics=(interpretation)=>
{
    var {gist, remainder}=interpretation
    if (gist.article)
    {
        gist.noun.definitions=gist.noun.definitions.filter((definition)=>
        {
            return !(definition.role==="npc")
        })
        if(gist.noun.definitions.length===0){return false}
    }
    return true
}

var command=ISHML.Rule()

command.snip("subject",nounPhrase.clone()).snip("verb").snip("object")
command.subject.configure({minimum:0})
command.verb.configure({filter:(definition)=>definition.part==="verb"})
command.object.configure({minimum:0,mode:ISHML.enum.mode.any})
    .snip(1)
    .snip(2)

command.object[1].snip("directObject",nounPhrase.clone()).snip("indirectObject")
command.object[1].indirectObject.snip("preposition").snip("nounPhrase",nounPhrase.clone())
command.object[1].indirectObject
    .configure({minimum:0})
command.object[1].indirectObject.preposition
    .configure({filter:(definition)=>definition.part==="preposition"})

command.object[2].snip("indirectObject",nounPhrase.clone()).snip("directObject",nounPhrase.clone())

command.semantics=(interpretation)=>
{
    var {gist}=interpretation
    if (gist.object)
    {
        if(gist.object.indirectObject)
        {
            if(gist.object.indirectObject)
            {
                if(gist.object.indirectObject.preposition)
                {
                    gist.preposition=gist.object.indirectObject.preposition
                }
                gist.indirectObject=gist.object.indirectObject.nounPhrase || gist.object.indirectObject
            }
        }
        gist.directObject=gist.object.directObject.nounPhrase || gist.object.directObject
        delete gist.object
    }
    return true
}
            

Rule Cloning

Refer to Code Listing 1. First, we create a nounPhrase rule that we can re-use in the subject, directObject, and indirectObject rules. You should recognize the code for the nounPhrase rule from part one.

(The version of nounPhrase shown here also defines a .semantics property, which will be explained toward the end of this tutorial.)

Now that we have a nounPhrase rule, let's put it to work in our root rule, command. We use .snip() to create sub-rules, subject, verb and object. The verb rule is defined just as it was in part one. However, the subject rule is created by passing in nounPhrase.clone() as the second argument of .snip(). This creates a deep copy of nounPhrase as the subject rule. Similarly, nounPhrase appears as a sub-rule of indirectObject and directObject. In both cases, they are entirely new instances of ISHML.Rule because of the use of .clone(). They are identical copies of the original nounPhrase rule. Any subsequent changes to the orginal nounPhrase rule will not affect the cloned rules. Conversely, changes to the cloned rules will not affect the original rule.

(It should be noted that it is sometimes useful to recursively define a rule by referencing itself. In that case .clone() is not used. This advanced use of rules will be covered in part three of this tutorial.)

Choice Rules

By default all sub-rules created with .snip() are treated as a sequence of snippets when parsed. In the resulting syntax tree, all the snippets are child nodes of the parent node. However, we can change the mode that the parent rule operates under so that it treats the sub-rules as alternatives to choose among instead of a sequence.

The object rule is configured with the mode option set to ISHML.enum.mode.any, which means that the sub-rules of object are treated as alternative choices. Next, we .snip() two sub-rules, which we call 1 and 2. By convention, we list each .snip() on a separate line to visually cue that they are choice sub-rules, not sequence sub-rules.

The names we give the choice sub-rules are unimportant, because only the resulting child nodes from one of these choices, not the choice node itself, is attached to the syntax tree. Therefore, the syntax tree may contain command.object.indirectObject, but will never contain command.object[1].indirectObject or command.object[2].indirectObject.

The ISHML.enum.mode.any mode for rules allows the parser to generate alternative interpretations when provided with ambiguous input text. For example, "take ruby slipper" can be interpreted as "take the slipper to Ruby," or as "take the red slipper." To stop considering choice rules after the first alternative is generated, use ISHML.enum.mode.apt and create the choice sub-rules in priority order.

Semantics

So far, we have only discussed rules as defining a syntax tree. The rules validate the input text as having the correct phrase structure, but syntax alone cannot determine if the input is meaningful or simply well structured nonsense.

To discard interpretations that are nonsensical, we define a rule's .semantics property with a function that will return true for meainingful or false for nonsensical. After a rule's syntactic processing is complete, the function defined by the .semantics property is called with the branch of the syntax tree currently under consideration, an instance of ISHML.Interpretation, passed as the argument. This allows us to compare nodes in the branch and render judgements about the interpretation. It is also possible to alter the structure of the branch by editing the interpretation's .gist property.

Study the definition for nounPhrase.semantics in Code Listing 1. If the article node is present in the branch, then we want throw away all definitions in the noun node where the definition's .role property is set to "npc" since people are not referred to with an article. If, after cleaning up the definitions, there are no definitions left for the noun, then we discard the branch as nonsense and return false.

The command rule also has .semantics defined. In this case we're not checking for nonsense. Instead, we're flattening out the tree structure to make easier to work with and more consistent. Without the semantics function, the noun of the subject is pretty easily accessed: .gist.subject.noun. However, the noun of the direct object is not: .gist.object.directObject.nounPhrase.noun. The semantics function for command moves the directObject and indirectObject nodes to directly below the root node, becoming siblings to subject. The function always returns true because it makes no judgements about the meaning of the interpretation, just restructures it. You may argue that we're stretching the meaning of the term semantics past its limits and you might be right, but in the context of the ISHML API, semantics encompasses any sort of processing that is done after syntactic analysis.

Continue to part three to learn about recursive rules.