General

ISHML stands for Interactive Story grapH Management Library, but you can call it "Ishmael." It facilitates the creation of interactive fiction in JavaScript and is intended for client-side applications.

The ISHML library is a fluent API with straightforwardly named properties and methods, many of which are chainable.

The library exposes its methods and properties via a global variable, ISHML.

The new operator is optional. ISHML constructors always return a new object instance when called.

The ISHML library is intended for modern browsers and relies on features of JavaScript found in the ES2016 specification. Particularly, ISHML relies on all modern browsers now iterating an object's own enumerable properties by starting with integer keys in numerical order followed by the order in which the properties are created. (Oddly, console.log(), as implemented in browsers, still insists on alphabetizing object properties for your convenience, which is confusing, but harmless.)

See tutorials for examples of use.

.Interpretation()

ishml.js, ishml-lp.js

An Interpretation represents one possible parsing of a string of characters in the context of a specific lexicon and grammar.

Constructor

Instances of ISHML.Interpretation are not intended to be created by calling the object's constructor. Instead, they are generated through calls to ISHML.Parser.analyze() or ISHML.Rule.parse().

Properties

.gist Object

The gist contains the syntax tree resulting from parsing. Parsing breaks down text into a sequence of terms. Each of these terms may then be broken down into a sequence of one or more sub-terms, recursively. This process forms a syntax tree from the text. The structure of the gist follows the structure of the grammar rule used to create it. The nodes of the syntax tree are properties named after the sub-rules that define them. These properties may in turn have other properties defined on them all the way down the syntax tree to the terminal nodes of the tree. A terminal property contains the matching token for the sub-rule. In the case where a sub-rule defines the maximum number of tokens to match to be more than one, the value stored in the terminal property is an array of tokens.

.remainder ISHML.Tokenization

The remainder contains an array of tokens in the form of a tokenization, if any, that were not matched during parsing. Successful parsing will result in a remainder with an array length of zero.

See also parsing tutorial.

.Lexicon()

ishml.js, ishml-lp.js

A Lexicon stores a collection of tokens.

Constructor

ISHML.Lexicon()

Returns an instance of the Lexicon object. Use of new operator is optional. Takes no argument.

Used by ISHML.Parser.analyze() and ISHML.Rule.parse().

Methods

.register(lexeme...).as(definition)

Adds tokens to the lexicon.

The lexeme is the string of characters to be matched. More than one lexeme may be specified for the same definition. The same lexeme may have multiple definitions.

The definition may be a simple value or a complex object. It is an arbitrary payload to be retrieved when the lexeme is matched. A definition typically holds one or more references to objects and functions defined elsewhere in the application.

Returns the instance of ISHML.Lexicon. This method is chainable.

.search(lexeme[, options])

Searches for full and partial matches in the lexicon.

Returns an array of search results. Each search result is a plain JavaScript object with a token property containing the matching token and, a remainder property containing the remaining unmatched string of characters from the lexeme argument.

The lexeme argument is a string of characters to be matched against the entries in the lexicon.

The options argument is a plain javaScript object with properties listed below that override the default behavior of search.

.caseSensitve Boolean

Defaults to false. Set to true for case sensitive searches.

.full Boolean

Defaults to false for partial matching. Set to true for full matching.

A partial match is a match of the lexicon entry's full lexeme against the initial characters of the lexeme argument, but not the other way around.

A full match matches all the characters in the lexeme argument against the lexicon entry with no characters leftover.

.greedy Boolean

Defaults to false. Set to true to return the longest match. Only applicable when full is set to false for partial matching.

.lax Boolean

Applies to partial matching. Defaults to false. Set to true to return partial matches even if the next character in the lexeme argument does not match the separator or end of string.

.separator RegExp

Applies to partial matching. Defaults to /\s/, whitespace. May be set to any regular expression. For no separator, set to empty string. When lax is set to false, a potential partial match will only be considered a match if the next character in the lexeme argument matches the separator or is the end of string.

.tokenize(text[, {options}])

Tokenizes a string according to entries in the lexicon.

Returns a plain JavaScript object with complete and partial properties.

The complete property is an array of all possible tokenizations that completely match the text argument. If there are no complete tokenizations possible, complete contains a zero length array.

In the event that the text argument could not be completely matched, the partial property contains all possible partial tokenzations of the text argument. If complete tokenization is possible, the partial property will contain a zero-lenght array.

The text argument is a string of characters to be matched against he entries in the lexicon.

The options argument is a plain javaScript object with properties to override the default behavior. The option properties for tokenize() are the same as search() above with the addition of the two options listed below.

.fuzzy Boolean

Defaults to false. Skips over any characters in the text argument that have no matching lexeme in the lexicon by adding fuzz tokens, {lexeme:characters, definitions:[{fuzz:characters}]}, to the tokenization.

.smooth Boolean

Defaults to false. Skips over any characters in the text argument that have no matching lexeme in the lexicon and continues tokenizing to the end of text argument. Mismatched characters are lost.

.unregister(lexeme[,definition])

Removes a definition from a token in the lexicon.

The lexeme argument is a string of characters identifying a token in the lexicon.

The definition argument is a JavaScript object matching the original definition under which the lexeme was registered. The function does a shallow comparison of the properties and values of the definition argument to the definition stored in the lexicon. If they are found to be equal, the definition in the lexicon is deleted.

If no definition argument is provided, all definitions connected with the lexeme argument are removed from the lexicon, which, in essence, deletes the token.

Returns the instance object of method. This method is chainable.

See also parsing tutorial.

.Parser()

ishml.js, ishml-lp.js

A Parser object is a recursive descent parser, with backtracking, that works with a grammar and a lexicon to provide lexical, syntactic, and semantic analysis of input. It handles ambiguously defined tokens and grammars and may potentially generate multiple interpretations from the same text.

Recursive decent parsers are susceptible to exponential run times. Writing the grammar in such a way as to reduce backtracking can improve run times.

Constructor

ISHML.Parser({lexicon:lexicon, grammar:rule})

Returns an instance of the Parser object. Use of new operator is optional.

The lexicon argument is an instance of the ISHML.Lexicon object that should be used tokenizing input.

The rule argument is an instance of the ISHML.Rule object that should be used for syntactic parsing and semantic analysis.

Properties

.lexicon

The ISHML.Lexicon object that was specified in the constructor.

.grammar

The ISHML.Rule object that was specified in the constructor.

Methods

.analyze(text, options)

Tokenizes and parses the text argument and return an array of ISHML.Interpretation.

The text argument is a string of characters to be analyzed.

The options argument is a plain javaScript object with properties that affect how the text argument is tokenized.

caseSensitve:Boolean

Defaults to false. Set to true for case sensitive tokenization.

full Boolean

Defaults to false for partial matching. Set to true for full matching.

A partial match is a match of the lexicon entry's full lexeme against the initial characters of the lexeme argument, but not the other way around.

A full match matches all the characters in the lexeme argument against the lexicon entry with no characters leftover.

.fuzzy Boolean

Defaults to false. Skips over any characters in the text argument that have no matching lexeme in the lexicon by adding fuzz tokens, {lexeme:characters, definitions:[{fuzz:characters}]}, to the tokenization.

.greedy Boolean

Defaults to false. Set to true to return the longest match. Only applicable when full is set to false for partial matching.

.lax Boolean

Applies to partial matching. Defaults to false. Set to true to return partial matches even if the next character in the lexeme argument does not match the separator or end of string.

.separator RegExp

Applies to partial matching. Defaults to /\s/, whitespace. May be set to any regular expression. For no separator, set to empty string. When lax is set to false, a potential partial match will only be considered a match if the next character in the lexeme argument matches the separator or is the end of string.

.smooth Boolean

Defaults to false. Skips over any characters in the text argument that have no matching lexeme in the lexicon and continues tokenizing to the end of text argument. Mismatched characters are lost.

See also parsing tutorial.

.Rule()

ishml.js, ishml-lp.js

A Rule object is a grammar rule that describes the syntax tree that will result from parsing some text with the rule. Rules are, in spirit, a JavaScript adaptation of BNF notation.

Rules are built from other rules and have an object structure that resembles the syntax tree that results when the rule's .parse() method is called.

Constructor

ISHML.Rule()

Returns a new ISHML.Rule object instance. Use of new operator is optional. Takes no argument.

Properties

Enumerable properties are of type ISHML.Rule. They are created with the .snip() method, which forms a tree structure of rules, mirroring the intended syntax tree resulting from parsing.

The following non-enumerable properties set the rule's behavior when its .parse() method is called. These properties may be set directly or with the .configure() method.

.filter function

Filters the array of definitions associated with the token(s) to be processed when rule's .parse() method is called. Defaults to (definition)=>true. Returning true from the filter function indicates that the definition should be kept. Returning false removes the definition from the definitions array of the token in the resulting interpretation. A token that has no definitions left after filtering is consider a non-matching token for the rule.

.greedy Boolean

Set to true to consider the longest possible array of terms fitting the rule's criteria. Only applicable when minimum and maximum are set to different values and maximum is greater than one. Defaults to false, which generates all possible interpretations between minimum and maximum inclusively.

.keep Boolean

Includes the result of a rule's parsing in the final result of parent rule. Defaults to true. Set to false to require the rule to parse succesfully, but skip its result.

.maximum Integer

Sets the maximum number of times to repeat the rule. Defaults to 1. To allow an indefinite number of repitions, set maximum to Infinity.

.minimum Integer

Sets the minimum number of times to repeat the rule. Defaults to 1. Set minimum to 0 to make the rule optional.

.mode all | any | apt

Sets parsing mode for sub-rules of a rule. Defaults to ISHML.enum.all, which treats the sub-rules as part of a sequence, each of which must parse successfully in order for the parent rule to be considered successfully parsed. The syntax trees generated by the sub-rules are appended to the node generate by the parent rule.

ISHML.enum.any treats each sub-rule as a choice. At least one sub-rule must parse successfully in order for the rule to parse successfully. If more than one choice parses successfully, multiple alternative interpretations are generated. The resulting sub-tree generated by the sub-rule has its root node removed and becomes the syntax tree generated by the parent rule.

ISHML.enum.apt treats each sub-rule as a choice. At least one choice must parse successfully in order for the rule to parse successfully. Parsing of sub-rules stops after the first successful choice is parsed and only one interpretation is generated. The resulting sub-tree generated by the sub-rule has its root node removed and becomes the syntax tree generated by the parent rule.

.semantics function

Checks the rule's generated syntax tree for semantic correctness and optionally edit the syntax tree. Defaults to (interpretation)=>true, which accepts all interpretations as semantically correct. Returning false removes the interpretation from further consideration. Returning true allows the interperation to continue processing. Optionally, you may alter the content interpretation.gist and return the altered interpretation as alternative to returning true.

Methods

.clone()

Creates a deep copy of the rule.

.configure(options)

Configures behavior of rule.

The options argument is a plain javaScript object with properties that are the same as the non-enumerable properties of ISHML.Rule.

Returns the rule. This method is chainable.

.parse(tokenization)

Parses a tokenization into one or more interpretations.

If the rule contains sub-rules the parse method of each sub-rule is called recursively to build the syntax tree. If the rule has no sub-rule, the rule is a terminal rule and the next token(s) in the tokenizations will be processed.

Returns an array of interpretations.

.snip(key [, rule])

Creates a new ISHML.rule instance as an enumerable property of the rule.

The key argument is the name to be used for the sub-rule and may be a string or integer. If the sub-rule is to be accessed using dot notation, the requirements for dot notation must be observed when naming the key. For convenience, spaces are automatically converted to underscores.

The rule argument is the ISHML.Rule instance to be assigned to the new property. Cloning of rule is recommended, for example, command.snip("subject",nounPhrase.clone()), unless the rule is being defined recursively.

If rule is omitted, a new instance of ISHML.Rule is used.

Returns the rule. This method is chainable.

See also parsing tutorial.

.Token()

ishml.js, ishml-lp.js

A token object is the smallest unit of text along with its definition that is meaningful to an application.

Constructor

Instances of ISHML.Token are not intended to be created by calling the object's constructor. Instead, they are produced (directly or indirectly) as a result of calls to ISHML.Lexicon.search(), ISHML.Lexicon.tokenize(), or ISHML.Parser.analyze().

Properties

.lexeme String

is the string of characters that identifies the token.

.definitions Array

is an Array of definitions retrieved from the Lexicon that give meaning to the lexeme. A definition may be a simple value or a complex object. It is an arbitrary payload. A definition typically holds one or more references to objects and functions defined elsewhere in the application.

See also parsing tutorial.

Tokenization

ishml.js, ishml-lp.js

A tokenization is an array of tokens suitable for parsing into a syntax tree.

Constructor

Instances of ISHML.Tokenization are not intended to be created by calling the object's constructor. Instead, they are produced (directly or indirectly) as a result of calls ISHML.Lexicon.tokenize() or ISHML.Parser.analyze().

Properties

.tokens Array

is the array of tokens.

.remainder String

is the leftover text, if any, that occurs after a partial tokenization. A remainder with length zero indicates that the text was fully tokenized.

See also parsing tutorial.