The XML 1.0 specification defines the behavior of two different types of XML parsers: validating and non-validating. Validating parsers enforce all document constraints, including Well-Formedness Constraints (WFCs) and Validity Constraints (VCs), while non-validating parsers enforce Well-Formedness Constraints only and ignore Validity Constraints. To enforce all VCs, validating parsers typically require that a document conform to some type of schema, like a DTD. Non-validating parsers only require that a document be well-formed (that it conform to all WFCs).
The Pharo XMLParser library supports both validating and non-validating modes of operation, and it uses separate exception classes, XMLWellFormednessException and XMLValidationException, to signal violations of WFCs and VCs.
By default, XMLParser operates as a validating parser. But it actually supports two different levels of validation: “soft” and “standard.”
With “soft” validation (the default), XMLParser will enforce all entity-related VCs, check that the name of the root element matches the name specified by the DOCTYPE declaration (if a DOCTYPE declaration is present), will validate any xml:id attributes, and if the document has an internal or external DTD subset with at least one ELEMENT or ATTLIST declaration, then it will attempt to validate the entire document against the DTD schema. In other words, in this mode, validation against a DTD will only be attempted if one is present.
With “standard” validation, all of the constraints enforced by “soft” validation are in effect. In addition, a DTD (with ELEMENT and ATTLIST declarations) or some other type of schema (only DTDs are presently supported) describing the structure of the document is required, and the absence of one is treated as a validation error. This is the behavior mandated by the XML 1.0 specification for validating parsers.
To get “standard” validation, just set #requiresSchema: to true. Enabling #requriesSchema: enables validation (if it wasn’t already), and disabling validation (with #isValidating:) also disables #requriesSchema:.
To implement the content model regular expression syntax of ELEMENT declarations, XMLParser uses a variant of the classic Thompson NFA construction with lazy, bounded conversion to DFAs.
Feel free to contact me with any XMLParser-related questions.