Parsing Tutorial Part Two

Prerequisites: You should have read Parsing Tutorial Part One.

Part two of the parsing tutorial covers rule cloning, choice rules, and semantic analysis.

Figure 1— Command
Subject Verb Direct Object Preposition Indirect Object Indirect Object Direct Object

Let's look at a more complex grammar than the grammar covered in part one, which allows parsing of commands that contain a subject, verb, direct object and indirect object. Subject, direct object, and indirect object are each optional, but if the indirect object is present, the direct object must be present as well. Direct and indirect objects may appear in either order, but if the direct object is first, a preposition must precede the indirect object. (We'll continue to use the same lexicon.)

//Code Listing 1
var nounPhrase=ishml.Rule()

nounPhrase
.snip("article").snip("adjectives").snip("noun")

nounPhrase.article
.configure({minimum:0, filter:(definition)=>definition.part==="article" })
nounPhrase.adjectives
.configure(
{ minimum:0, maximum:Infinity,
filter:(definition)=>definition.part==="adjective"
})

nounPhrase.noun.configure({filter:(definition)=>definition.part==="noun" })

nounPhrase.semantics=(interpretation)=>
{
var {gist, remainder}=interpretation
if (gist.article)
{
gist.noun.definitions=gist.noun.definitions.filter((definition)=>
{
return !(definition.role==="npc")
})
if(gist.noun.definitions.length===0){return false}
}
return true
}
/*Note (a)*/

var command=ishml.Rule().configure({entire:true})

command.snip("subject",nounPhrase.clone()).snip("verb").snip("object")
command.subject.configure({minimum:0})
command.verb.configure({filter:(definition)=>definition.part==="verb"})
command.object.configure({minimum:0,mode:ishml.enum.mode.any})
.snip(1)
.snip(2)

command.object[1].snip("directObject",nounPhrase.clone()).snip("indirectObject")
command.object[1].indirectObject.snip("preposition").snip("nounPhrase",nounPhrase.clone())
command.object[1].indirectObject
.configure({minimum:0})
command.object[1].indirectObject.preposition
.configure({filter:(definition)=>definition.part==="preposition"})

command.object[2].snip("indirectObject",nounPhrase.clone()).snip("directObject",nounPhrase.clone())

command.semantics=(interpretation)=>
{
    var {gist}=interpretation
    if (gist.object)
    {
        if(gist.object.indirectObject)
        {
        if(gist.object.indirectObject)
        {
            if(gist.object.indirectObject.preposition)
            {
                gist.preposition=gist.object.indirectObject.preposition
            }
            gist.indirectObject=gist.object.indirectObject.nounPhrase || gist.object.indirectObject
        }
    }
    gist.directObject=gist.object.directObject.nounPhrase || gist.object.directObject
    delete gist.object
    }
    return true
}

/* Note (b) */

Rule Cloning

Refer to Code Listing 1. First, we create a nounPhrase rule that we can re-use in the subject, directObject, and indirectObject rules. You should recognize the code for the nounPhrase rule from part one.

(The version of nounPhrase shown here also defines a .semantics property, which will be explained toward the end of this tutorial.)

Our root rule is command. We want the entire input text to match against the command rule with no remaining text left over. We set the entire property of the rule to true. With this setting, we would expect input like take and take the ruby to be valid, but take the flower to fail because there is no rule that matches flower. If entire is left at its default value, false, take the flower would produce a successful match of take and a remainder of the flower since grammar accepts verbs without subject or object.

Next we create sub-rules for command. We use .snip() to create sub-rules, subject, verb and object. The verb rule is defined just as it was in part one. However, the subject rule is created by passing in nounPhrase.clone() as the second argument of .snip(). This creates a deep copy of nounPhrase as the subject rule. Similarly, nounPhrase appears as a sub-rule of indirectObject and directObject. In both cases, they are entirely new instances of ISHML.Rule because of the use of .clone(). They are identical copies of the original nounPhrase rule. Any subsequent changes to the orginal nounPhrase rule will not affect the cloned rules. Conversely, changes to the cloned rules will not affect the original rule.

(It should be noted that it is sometimes useful to recursively define a rule by referencing itself. In that case .clone() is not used. This advanced use of rules will be covered in part three of this tutorial.)

Choice Rules

By default all sub-rules created with .snip() are treated as a sequence of snippets when parsed. In the resulting syntax tree, all the snippets are child nodes of the parent node. However, we can change the mode that the parent rule operates under so that it treats the sub-rules as alternatives to choose among instead of a sequence.

The object rule is configured with the mode option set to ishml.enum.mode.any, which means that the sub-rules of object are treated as alternative choices. Next, we .snip() two sub-rules, which we call 1 and 2. By convention, we list each .snip() on a separate line to visually cue that they are choice sub-rules, not sequence sub-rules.

The names we give the choice sub-rules are unimportant, because only the resulting child nodes from one of these choices, not the choice node itself, is attached to the syntax tree. Therefore, the syntax tree may contain command.object.indirectObject, but will never contain command.object[1].indirectObject or command.object[2].indirectObject.

The ishml.enum.mode.any mode for rules allows the parser to generate alternative interpretations when provided with ambiguous input text. For example, "take ruby slipper" can be interpreted as "take the slipper to Ruby," or as "take the red slipper." To stop considering choice rules after the first alternative is generated, use ISHML.enum.mode.apt and create the choice sub-rules in priority order.

Semantics

So far, we have only discussed rules as defining a syntax tree. The rules validate the input text as having the correct phrase structure, but syntax alone cannot determine if the input is meaningful or simply well structured nonsense.

To discard interpretations that are nonsensical, we define a rule's .semantics property with a function that will return true for meainingful or false for nonsensical. After a rule's syntactic processing is complete, the function defined by the .semantics property is called with the branch of the syntax tree currently under consideration, an instance of ISHML.Interpretation, passed as the argument. This allows us to compare nodes in the branch and render judgements about the interpretation. It is also possible to alter the structure of the branch by editing the interpretation's .gist property.

Study the definition for nounPhrase.semantics in Code Listing 1. If the article node is present in the branch, then we want throw away all definitions in the noun node where the definition's .role property is set to "npc" since people are not referred to with an article. If, after cleaning up the definitions, there are no definitions left for the noun, then we discard the branch as nonsense and return false.

The command rule also has .semantics defined. In this case we're not checking for nonsense. Instead, we're flattening out the tree structure to make easier to work with and more consistent. Without the semantics function, the noun of the subject is pretty easily accessed: .gist.subject.noun. However, the noun of the direct object is not: .gist.object.directObject.nounPhrase.noun. The semantics function for command moves the directObject and indirectObject nodes to directly below the root node, becoming siblings to subject. The function always returns true because it makes no judgements about the meaning of the interpretation, just restructures it. You may argue that we're stretching the meaning of the term semantics past its limits and you might be right, but in the context of the ISHML API, semantics encompasses any sort of processing that is done after syntactic analysis.

//Code Listing 2    

//Add code below after note (a) in listing 1
nounPhrase.mismatch=(interpretation)=>
{
    interpretation.gist.error=true
    interpretation.gist.errorMessage=
        `Expected end of nounPhrase. Found: "${interpretation.remainder}".`
    interpretation.valid=false
    return interpretation
}
nounPhrase.noun.mismatch=(interpretation)=>
{
    interpretation.gist.error=true
    interpretation.gist.errorMessage=
        "Expected noun. Found: "${interpretation.remainder}"
    interpretation.valid=false
    return interpretation
}

//Add code below after note (b) in listing 1

command.mismatch=(interpretation)=>
{
    interpretation.gist.error=true
    interpretation.gist.errorMessage=
        `Expected end of command. Found: "${interpretation.remainder}".`
    interpretation.valid=false
    return interpretation
}
command.verb.mismatch=(interpretation)=>
{
    interpretation.gist.error=true
    interpretation.gist.errorMessage=
        `Expected verb. Found: "${interpretation.remainder}"".`
    interpretation.valid=false
    return interpretation   
}

Error Handling

If an invalid command is parsed using the command rule in Code Listing 1, the parser will return {success: false} since the command rule is set to match the entire input. This can make it difficult to troubleshoot your grammar or gracefully respond to invalid input. To improve error handling, we define a rule's .mismatch property with a custom function. Study Code listing 2. The .mismatch functions receive an ishml.interpretation object containing the gist and the remainder for the branch of the syntax tree that each rule is attempting to generate. The only requirement of the function is to return an interpretation. The .mismatch functions generally set the interpretation's .valid property to false to indicate the that interpretation is invalid. The function may also arbitrarily manipulate the interpretation to provide additional feedback to be used in the application. In this example, .error and .errorMessage properties have been added to the interpretation's .gist property. This results in nested errors that mirrors the rule's syntax tree.

//Code Listing 3
var parser = ishml.Parser({ lexicon: lexicon, grammar: command })

var example1 = parser.analyze("take ruby slipper")
var example2 = parser.analyze("take ruby slipper to Ruby")
var example3 = parser.analyze("take the flower")

Open your browser's developer tools, refer to Code Listing 3, and try the following exercises:

  1. Enter console.log(example1) in the command console. Did example1 parse successfully? How many interpretations were returned?
  2. Enter console.log(example2) in the command console. Did example2 parse successfully? How many interpretations were returned?
  3. Enter console.log(example3) in the command console. Did example3 parse successfully? Why or why not?
  1. Yes, two interpretations were returned. The first interpretation treats ruby as an adjective modifying the noun, slipper. The second interpretation treats ruby as a noun functioning as an indirect object.
  2. Yes, the value of example2.success is true. One interpretation was returned. The interpretation is identical to the second interpretation in example 1.
  3. No, the value of example3.success is false. Examining example3.interpretations[0].gist and example3.interpretations[1].gist provides additional insight as to why the input could not be validated successfully.

Continue to part three to learn about recursive rules.