Prerequisites: You should have read Parsing Tutorial Part One.
Part two of the parsing tutorial covers rule cloning, choice rules, and semantic analysis.
Let's look at a more complex grammar than the grammar covered in part one, which allows parsing of commands that contain a subject, verb, direct object and indirect object. Subject, direct object, and indirect object are each optional, but if the indirect object is present, the direct object must be present as well. Direct and indirect objects may appear in either order, but if the direct object is first, a preposition must precede the indirect object. (We'll continue to use the same lexicon.)
//Code Listing 1 var nounPhrase=ishml.Rule() nounPhrase .snip("article").snip("adjectives").snip("noun") nounPhrase.article .configure({minimum:0, filter:(definition)=>definition.part==="article" }) nounPhrase.adjectives .configure( { minimum:0, maximum:Infinity, filter:(definition)=>definition.part==="adjective" }) nounPhrase.noun.configure({filter:(definition)=>definition.part==="noun" }) nounPhrase.semantics=(interpretation)=> { var {gist, remainder}=interpretation if (gist.article && gist.noun.definition.role !=="npc") { return false } if (gist.adjectives) { if (gist.adjectives.every(token=>token.definition.key===gist.noun.definition.key)) { return true } else { return false } } return true } /*Note (a)*/ var command=ishml.Rule().configure({entire:true}) command.snip("subject",nounPhrase.clone()).snip("verb").snip("object") command.subject.configure({minimum:0}) command.verb.configure({filter:(definition)=>definition.part==="verb"}) command.object.configure({minimum:0,mode:ishml.enum.mode.any}) .snip(1) .snip(2) command.object[1].snip("directObject",nounPhrase.clone()).snip("indirectObject") command.object[1].indirectObject.snip("preposition").snip("nounPhrase",nounPhrase.clone()) command.object[1].indirectObject .configure({minimum:0}) command.object[1].indirectObject.preposition .configure({filter:(definition)=>definition.part==="preposition"}) command.object[2].snip("indirectObject",nounPhrase.clone()).snip("directObject",nounPhrase.clone()) command.semantics=(interpretation)=> { var {gist}=interpretation if (gist.object) { if(gist.object.indirectObject) { if(gist.object.indirectObject) { if(gist.object.indirectObject.preposition) { gist.preposition=gist.object.indirectObject.preposition } gist.indirectObject=gist.object.indirectObject.nounPhrase || gist.object.indirectObject } } gist.directObject=gist.object.directObject.nounPhrase || gist.object.directObject delete gist.object } return true } /* Note (b) */
Refer to Code Listing 1. First, we create a nounPhrase
rule that we can re-use in the subject
, directObject
, and indirectObject
rules. You should recognize the code for the nounPhrase
rule from part one.
(The version of nounPhrase
shown here also defines a .semantics
property, which will be explained toward the end of this tutorial.)
Our root rule is command
. We want the entire input text to match against the command
rule with no remaining text left over. We set the entire
property of the rule to true
. With this setting, we would expect input like take
and take the ruby
to be valid, but take the flower
to fail because there is no rule that matches flower
. If entire
is left at its default value, false
, take the flower
would produce a successful match of take
and a remainder of the flower
since grammar accepts verbs without subject or object.
Next we create sub-rules for command
. We use .snip()
to create sub-rules, subject
, verb
and object
. The verb
rule is defined just as it was in part one. However, the subject
rule is created by passing in nounPhrase.clone()
as the second argument of .snip()
. This creates a deep copy of nounPhrase
as the subject
rule. Similarly, nounPhrase
appears as a sub-rule of indirectObject
and directObject
. In both cases, they are entirely new instances of ISHML.Rule
because of the use of .clone()
. They are identical copies of the original nounPhrase
rule. Any subsequent changes to the orginal nounPhrase
rule will not affect the cloned rules. Conversely, changes to the cloned rules will not affect the original rule.
(It should be noted that it is sometimes useful to recursively define a rule by referencing itself. In that case .clone()
is not used. This advanced use of rules will be covered in part three of this tutorial.)
By default all sub-rules created with .snip()
are treated as a sequence of snippets when parsed. In the resulting syntax tree, all the snippets are child nodes of the parent node. However, we can change the mode that the parent rule operates under so that it treats the sub-rules as alternatives to choose among instead of a sequence.
The object
rule is configured with the mode
option set to ishml.enum.mode.any
, which means that the sub-rules of object
are treated as alternative choices. Next, we .snip()
two sub-rules, which we call 1
and 2
. By convention, we list each .snip()
on a separate line to visually cue that they are choice sub-rules, not sequence sub-rules.
The names we give the choice sub-rules are unimportant, because only the resulting child nodes from one of these choices, not the choice node itself, is attached to the syntax tree. Therefore, the syntax tree may contain command.object.indirectObject
, but will never contain
or command.object[1].indirectObject
.command.object[2].indirectObject
The ishml.enum.mode.any
mode for rules allows the parser to generate alternative interpretations when provided with ambiguous input text. For example, "take ruby slipper" can be interpreted as "take the slipper to Ruby," or as "take the red slipper." To stop considering choice rules after the first alternative is generated, use ISHML.enum.mode.apt
and create the choice sub-rules in priority order.
So far, we have only discussed rules as defining a syntax tree. The rules validate the input text as having the correct phrase structure, but syntax alone cannot determine if the input is meaningful or simply well structured nonsense.
To discard interpretations that are nonsensical, we define a rule's .semantics
property with a function that will return true
for meainingful or false
for nonsensical. After a rule's syntactic processing is complete, the function defined by the .semantics
property is called with the branch of the syntax tree currently under consideration, an instance of ISHML.Interpretation
, passed as the argument. This allows us to compare nodes in the branch and render judgements about the interpretation. It is also possible to alter the structure of the branch by editing the interpretation's .gist
property.
Study the definition for nounPhrase.semantics
in Code Listing 1. If the article
node is present in the branch, then we want throw away all interpretations where noun.defintion.role
is set to "npc" since people are not referred to with an article. We also want to make sure that all the adjectives
refer to the noun
of the nounPhrase
and throw away all interpretations that do not match.
The command
rule also has .semantics
defined. In this case we're not checking for nonsense. Instead, we're flattening out the tree structure to make easier to work with and more consistent. Without the semantics function, the noun of the subject is pretty easily accessed: .gist.subject.noun
. However, the noun of the direct object is not: .gist.object.directObject.nounPhrase.noun
. The semantics function for command
moves the directObject
and indirectObject
nodes to directly below the root node, becoming siblings to subject
. The function always returns true
because it makes no judgements about the meaning of the interpretation, just restructures it. You may argue that we're stretching the meaning of the term semantics past its limits and you might be right, but in the context of the ISHML API, semantics encompasses any sort of processing that is done after syntactic analysis.
//Code Listing 2 //Add code below after note (a) in listing 1 nounPhrase.mismatch=(interpretation)=> { interpretation.gist.error=true interpretation.gist.errorMessage= `Expected end of nounPhrase. Found: "${interpretation.remainder}".` interpretation.valid=false return interpretation } nounPhrase.noun.mismatch=(interpretation)=> { interpretation.gist.error=true interpretation.gist.errorMessage= "Expected noun. Found: "${interpretation.remainder}" interpretation.valid=false return interpretation } //Add code below after note (b) in listing 1 command.mismatch=(interpretation)=> { interpretation.gist.error=true interpretation.gist.errorMessage= `Expected end of command. Found: "${interpretation.remainder}".` interpretation.valid=false return interpretation } command.verb.mismatch=(interpretation)=> { interpretation.gist.error=true interpretation.gist.errorMessage= `Expected verb. Found: "${interpretation.remainder}"".` interpretation.valid=false return interpretation }
If an invalid command is parsed using the command
rule in Code Listing 1, the parser will return {success: false}
since the command
rule is set to match the entire input. This can make it difficult to troubleshoot your grammar or gracefully respond to invalid input. To improve error handling, we define a rule's .mismatch
property with a custom function. Study Code listing 2. The .mismatch
functions receive an ishml.interpretation
object containing the gist and the remainder for the branch of the syntax tree that each rule is attempting to generate. The only requirement of the function is to return an interpretation. The .mismatch
functions generally set the interpretation's .valid
property to false
to indicate the that interpretation is invalid. The function may also arbitrarily manipulate the interpretation to provide additional feedback to be used in the application. In this example, .error
and .errorMessage
properties have been added to the interpretation's .gist
property. This results in nested errors that mirrors the rule's syntax tree.
//Code Listing 3 var parser = ishml.Parser({ lexicon: lexicon, grammar: command }) var example1 = parser.analyze("take ruby slipper") var example2 = parser.analyze("take ruby slipper to Ruby") var example3 = parser.analyze("take the flower")
Open your browser's developer tools, refer to Code Listing 3, and try the following exercises:
console.log(example1)
in the command console. Did example1
parse successfully? How many interpretations were returned?
console.log(example2)
in the command console. Did example2
parse successfully? How many interpretations were returned?
console.log(example3)
in the command console. Did example3
parse successfully? Why or why not?
rubyas an adjective modifying the noun,
slipper. The second two interpretation treat
rubyas a noun (Miss Ruby, or the ruby gem ) functioning as an indirect object.
example2.success
is true
. There are two interpretations because the second rubymay be interpreted as a person or a gem. The first ruby can only be treated as an adjective, because treating as indirect object would have caused only partial interpretation. When there is at least one complete interpretation, only the complete interpretations are returned.
example3.success
is false
. Examining example3.interpretations[0].gist
and example3.interpretations[1].gist
provides additional insight as to why the input could not be validated successfully.
Continue to part three to learn about recursive rules.