fbpx
Wikipedia

Overlapping markup

In markup languages and the digital humanities, overlap occurs when a document has two or more structures that interact in a non-hierarchical manner. A document with overlapping markup cannot be represented as a tree. This is also known as concurrent markup. Overlap happens, for instance, in poetry, where there may be a metrical structure of feet and lines; a linguistic structure of sentences and quotations; and a physical structure of volumes and pages and editorial annotations.[1][2]

History

 
The structural differences between multiple editions of Frankenstein have been analysed with overlapping techniques.[3]

The problem of non-hierarchical structures in documents has been recognised since 1988; resolving it against the dominant paradigm of text as a single hierarchy (an ordered hierarchy of content objects or OHCO) was initially thought to be merely a technical issue, but has, in fact, proven much more difficult.[4] In 2008, Jeni Tennison identified markup overlap as "the main remaining problem area for markup technologists".[5] Markup overlap continues to be a primary issue in the digital study of theological texts in 2019, and is a major reason for the field retaining specialised markup formats—the Open Scripture Information Standard and the Theological Markup Language—rather than the inter-operable Text Encoding Initiative-based formats common to the rest of the digital humanities.[6]

Properties and types

A distinction exists between schemes that allow non-contiguous overlap, and those that allow only contiguous overlap. Often, 'markup overlap' strictly means the latter. Contiguous overlap can always be represented as a linear document with milestones (typically co-indexed start- and end-markers), without the need for fragmenting a (logical) component into multiple physical ones. Non-contiguous overlap may require document fragmentation. Another distinction in overlapping markup schemes is whether elements can overlap with other elements of the same kind (self-overlap).[2]

A scheme may have a privileged hierarchy. Some XML-based schemes, for example, represent one hierarchy directly in the XML document tree, and represent other, overlapping, structures by another means; these are said to be non-privileged.

Schmidt (2012) identifies a tripartite classification of instances of overlap: 1. "Variation of content and structure", 2. "Overlay of multiple perspectives or markup sets", and 3. "Overlap of individual start and end tags within a single markup perspective"; additionally, some apparent instances of overlap are in fact schema definition problems, which can be resolved hierarchically. He contends that type 1 is best resolved by a system of multiple documents external to the markup, but types 2 and 3 require dealing with internally.

Approaches and implementations

DeRose (2004, Evaluation criteria) identifies several criteria for judging solutions to the overlap problem:

  • readability and maintainability,
  • tool support and compatibility with XML,
  • possible validation schemes, and
  • ease of processing.

Tag soup is, strictly speaking, not overlapping markup—it is malformed HTML, which is a non-overlapping language, and may be ill-defined. Some web browsers attempted to represent overlapping start and end tags with non-hierarchical Document Object Models (DOM), but this was not standardised across all browsers and was incompatible with the innately hierarchical nature of the DOM.[7][8]HTML5 defines how processors should deal with such mis-nested markup in the HTML syntax and turn it into a single hierarchy.[9] With XHTML and SGML-based HTML, however, mis-nested markup is a strict error and makes processing by standards-compliant systems impossible.[10] The HTML standard defines a paragraph concept which can cause overlap with other elements and can be non-contiguous.[11]

SGML, which early versions of HTML were based on, has a feature called CONCUR that allows multiple independent hierarchies to co-exist without privileging any. DTD validation is only defined for each individual hierarchy with CONCUR. Validation across hierarchies is not defined by the standard. CONCUR cannot support self-overlap, and it interacts poorly with some of SGML's abbreviatory features. This feature has been poorly supported by tools and has seen very little actual use; using CONCUR to represent document overlap was not a recommended use case, according to a commentary by the standard's editor.[12][13]

Within hierarchical languages

There are several approaches to representing overlap in a non-overlapping language.[14] The Text Encoding Initiative, as an XML-based markup scheme, cannot directly represent overlapping markup. All four of the below approaches are suggested.[15] The Open Scripture Information Standard is another XML-based scheme, designed to mark up the Bible. It uses empty milestone elements to encode non-privileged components.[16]

To illustrate these approaches, marking up the sentences and lines of a fragment of Richard III by William Shakespeare will be used as a running example. Where there is a privileged hierarchy, the lines will be used.

Multiple documents

Multiple documents can each provide different internally consistent hierarchies. The advantage of this approach is that each document is simple and can be processed with existing tools, but requires maintenance of redundant content and it can be difficult to cross-reference between different views.[17] With multiple documents, the overlap can be analysed with data comparison and delta encoding techniques, and, in an XML context, specific XML tree differencing algorithms are available.[18][19]

Schmidt (2012, 3.5 Variation) recommends this approach for encoding multiple variants of a single text and to accept the duplication of the parts which do not vary, rather than attempting to create a structure that represents all of the variation present; further, he suggests that this alignment be performed automatically, and that misalignment is rare in practice.[20]

Example, with lines marked up:

 <line>I, by attorney, bless thee from thy mother,</line> <line>Who prays continually for Richmond's good.</line> <line>So much for that.—The silent hours steal on,</line> <line>And flaky darkness breaks within the east.</line> 

With sentences marked up:

 <sentence>I, by attorney, bless thee from thy mother, Who prays continually for Richmond's good.</sentence> <sentence>So much for that.</sentence><sentence>—The silent hours steal on, And flaky darkness breaks within the east.</sentence> 

Milestones

Milestones are empty elements that mark the beginning and end of a component, typically using the XML ID mechanism to indicate which "begin" element goes with which "end" element. Milestones can be used to embed a non-privileged structure within a hierarchical language, In their basic form they can only represent contiguous overlap. Generic XML can of course parse the milestone elements, but do not understand their special meaning and so cannot easily process or validate the non-privileged structure.[21][22]

Milestone have the advantage that the markup for overlapping elements is located right at the relevant boundaries, like other markup. This is an advantage for maintainability and readability.[23] CLIX (DeRose 2004) is an example of such an approach.

Example:

 <line><sentence-start />I, by attorney, bless thee from thy mother,</line> <line>Who prays continually for Richmond's good.<sentence-end /></line> <line><sentence-start />So much for that.<sentence-end /><sentence-start />—The silent hours steal on,</line> <line>And flaky darkness breaks within the east.<sentence-end /></line> 

Punctuation and spaces have been identified as a type of milestone-style 'crypto-overlap' or 'pseudo-markup', as the boundaries of words, clauses, sentences and the like do not necessarily align with the formal markup boundaries hierarchically.[24][25]

It is also possible to use more complex milestones to represent non-contiguous structures. For example, TAGML's "suspend" and "resume" semantic[26] can be expressed using milestones, for example by adding an attribute to indicate whether each milestone represents a start, suspend, resume, or end point. Re-ordering and even self-overlap can be achieved similarly, by annotating each milestone with a "next chunk" reference.

Joins

Joins are pointers within a privileged hierarchy to other components of the privileged hierarchy, which may be used to reconstruct a non-privileged component akin to following a linked list. A single non-privileged element is segmented into several partial elements within the privileged hierarchy; the partial elements themselves do not represent a single unit in the non-privileged hierarchy, which can be misleading and make processing difficult.[27][28] While this approach can support some discontiguous structures, it is not able to re-order elements.[29] A slightly different approach can, however, express re-ordering by expressing the join away from the content, at the cost of directness and maintainability.[30]

Join-based representations can introduce the possibility of cycles between elements; detecting and rejecting these adds complexity to implementations.[31]

Example:

 <line><sentence id="a">I, by attorney, bless thee from thy mother,</sentence></line> <line><sentence continues="a">Who prays continually for Richmond's good.</sentence></line> <line><sentence id="b">So much for that.</sentence><sentence id="c">—The silent hours steal on,</sentence></line> <line><sentence continues="c">And flaky darkness breaks within the east.</sentence></line> 

Stand-off markup

Stand-off markup is similar to using joins, except that there may be no privileged hierarchy: each part of the document is given a label (or might be referred to by an offset), and the document structure is expressed by pointing to the content from markup that 'stands off' from the content (possibly in an entirely different file), and might contain no content itself. The TEI guidelines identify the unity of the elements as a primary advantage of stand-off markup over joins, in addition to the ability to produce and distribute annotations separately from the text, possibly even by different authors applying markup to a read-only document,[32] allowing collaborative approaches to markup by a divide and conquer strategy.[33]

Example:

 <span id="a">I, by attorney, bless thee from thy mother,</span> <span id="b">Who prays continually for Richmond's good.</span> <span id="c">So much for that.</span><span id="d">—The silent hours steal on,</span> <span id="e">And flaky darkness breaks within the east.</span> ... <line contents="a" /> <line contents="b" /> <line contents="c d" /> <line contents="e" /> <sentence contents="a b" /> <sentence contents="c" /> <sentence contents="d e" /> 

It has been claimed that separating markup and text can result in overall simplification and increased maintainability,[34] and by 2017, ``[t]he current state of the art to [represent] (...) linguistically annotated data is to use a graph-based representation serialized as standoff XML as a pivot format´´,[35] i.e., that standoff was the most widely accepted approach to address the overlapping markup challenge.

Standoff formalisms have been the basis for an ISO standard for linguistic annotation,[36] they have been successfully applied for developing corpus management systems,[37] and (as of April 2020) they are actively being developed in the TEI.[38]

Challenges

Representing overlapping markup within hierarchical languages is challenging, for reasons of redundancy and/or complexity. In the 2000s to 2010s, standoff formalisms were generally accepted as the most promising approach here,[35] but a disadvantage of standoff is that validation is very challenging.[39] Standoff formalisms are not natively supported by database management systems, so that (by 2017) it was suggested to ``use ... standoff XML as a pivot format (...) and relational data bases for querying.´´[35] In practical applications, this requires complicated architectures and/or labor-intense transformation between pivot format and internal representation. As a result, maintenance is problematic.[40] This has been a motivation to develop corpus management systems on the basis of graph data bases and for using established graph-based formalisms as pivot formats.

Special-purpose languages

For implementing the above-mentioned strategies, either existing markup languages (such as the TEI) can be extended or special-purpose languages can be designed. To design an entirely new markup language allow to forego the tool support in existing languages for a less complicated semantic model and more convenient syntax.

Historical formalisms

  • LMNL is a non-hierarchical markup language first described in 2002 by Jeni Tennison and Wendell Piez, annotating ranges of a document with properties and allowing self-overlap. CLIX, which originally stood for 'Canonical LMNL In XML', provides a method for representing any LMNL document in a milestone-style XML document.[41] It also has another XML serialisation, xLMNL.[42]
  • MECS was developed by the University of Bergen's Wittgenstein Archive. However, it had several problems: it allowed some non-sensical documents of overlapping elements, it could not support self-overlap, and it did not have the capacity to define a DTD-like grammar.[43] The theory of General Ordered-Descendant Directed Acyclic Graphs (GODDAGs), while not strictly a markup language itself, is a general data model for non-hierarchical markup. Restricted GODDAGs were designed specifically to match the semantics of MECS; general GODDAGs may be non-contiguous and need a more powerful language.[44] TexMECS is a successor to MECS, which has a formal grammar and is designed to represent every GODDAG and nothing that is not a GODDAG.[45]
  • XCONCUR (previously MuLaX) is a melding-together of XML and SGML's CONCUR, and also contains a validation language, XCONCUR-CL, and a SAX-like API.[46][47][48]
  • Marinelli, Vitali and Zacchiroli provide algorithms to convert between restricted GODDAGs, ECLIX, LMNL, parallel documents in XML, contiguous stand-off markup and TexMECS.[49]

None of these formalisms seem to be maintained anymore. Consensus community seems to be to employ standoff XML or graph-based formalisms.

Actively maintained standoff XML languages

  • GrAF-XML,[50] standoff-XML serialization of the Linguistic Annotation Framework (LAF),[36] used, e.g., for the American National Corpus[51]
  • PAULA-XML,[52] standoff-XML serialization of the data model underlying the corpus management system ANNIS and the converter suite SALT[53]
  • NAF (NLP Annotation Format / Newsreader Annotation Format),[54] standoff XML format originally developed in the NewsReader project (FP7, 2013-2015[55]), currently used by NLP tools such as FreeLing[56] (with support for English, Spanish, Portuguese, Italian, French, German, Russian, Catalan, Galician, Croatian, Slovene, etc.), and EusTagger[57] (with support for Basque, English, Spanish).
  • The Charles Harpur Critical Archive is encoded using 'multi-version documents' (MVD) to represent the variant versions of documents and as a means of indicating additions, deletions and revisions using a tactical combination of multiple documents and stand-off ranges within an underlying graph-based model. MVD is presented as an application file format, requiring specialised tools to view or edit.[58]

Standoff approaches have two parts, commonly called the "content" and the "annotations." These can be expressed in unrelated representations. Simple standoff annotations per se, involve no more than a list of (location, type) pairs. Thus, in a few applications[example needed] standoff annotations are expressed in CSV, JSON(-LD, or other representations. (e.g., Web Annotation[59]) or graph formalisms grounded in string URIs (see below). However, representing and validating content in such representations is much more difficult and much less common.

Graph-based formalisms

Standoff markup employs a data model based on directed graphs,[60] thus complicating its representation when grounding markup information in a tree. Representing overlapping hierarchies in a graph eliminates this challenge. Standoff annotations can thus be more adequately represented as generalised directed multigraphs and use formalisms and technologies developed for this purpose, most notably those based on the Resource Description Framework (RDF).[61][62] EARMARK is an early RDF/OWL representation that encompasses General Ordered-Descendant Directed Acyclic Graphs (GODDAGs).[14] The theory of GODDAGs, while not strictly a markup language itself, is a general data model for non-hierarchical markup.

RDF is a semantic data model that is linearization-independent, and it provides different linearisations, including an XML format (RDF/XML) that can be modeled to mirror standoff XML, a linearisation that lets RDF be expressed in XML attributes (RDFa), a JSON format (JSON-LD), and binary formats designed to facilitate querying or processing (RDF-HDT,[63] RDF-Thrift[64]). RDF is semantically equivalent to graph-based data models underlying standoff markup, it does not require special-purpose technology for storing, parsing and querying. Multiple interlinked RDF files representing a document or a corpus constitute an example of Linguistic Linked Open Data.

An established technique to link arbitrary graphs with an annotated document is to use URI fragment identifiers to refer to parts of a text and/or document, see overview under Web annotation. The Web Annotation standard provides format-specific `selectors' as an additional means, e.g., offset-, string-match- or XPath-based selectors.[65]

Native RDF vocabularies capable to represent linguistic annotations include:[66]

  • Web Annotation[67]
  • NLP Interchange Format (NIF)[68]
  • LAPPS Interchange Format (LIF)[69]

Related vocabularies include

  • POWLA, an OWL2/DL serialization of PAULA-XML[70]
  • RDF-NAF, an RDF serialization of the NLP Annotation Format[71]

In early 2020, W3C Community Group LD4LT has launched an initiative to harmonize these vocabularies and to develop a consolidated RDF vocabulary for linguistic annotations on the web.[72]

Notes

  1. ^ Text Encoding Initiative.
  2. ^ a b DeRose 2004, The problem types.
  3. ^ Piez 2014.
  4. ^ Renear, Mylonas & Durand 1993.
  5. ^ Tennison 2008.
  6. ^ MoChridhe 2019.
  7. ^ Hickson 2002.
  8. ^ Sivonen 2003.
  9. ^ HTML, § 8.2.8 An introduction to error handling and strange cases in the parser.
  10. ^ Sperberg-McQueen & Huitfeldt 2000, 2.1. Non-SGML Notations.
  11. ^ HTML, § 3.2.5.4 Paragraphs.
  12. ^ Sperberg-McQueen & Huitfeldt 2000, 2.2. CONCUR.
  13. ^ DeRose 2004, SGML CONCUR.
  14. ^ a b Di Iorio, Peroni & Vitali 2009.
  15. ^ Text Encoding Initiative, § 20 Non-hierarchical Structures.
  16. ^ Durusau 2006.
  17. ^ Text Encoding Initiative, § 20.1 Multiple Encodings of the Same Information.
  18. ^ Schmidt 2009.
  19. ^ La Fontaine 2016.
  20. ^ Schmidt 2012, 4.1 Automating Variation.
  21. ^ Text Encoding Initiative, § 20.2 Boundary Marking with Empty Elements.
  22. ^ Sperberg-McQueen & Huitfeldt 2000, 2.4. Milestones.
  23. ^ DeRose 2004, TEI-style milestones.
  24. ^ Birnbaum & Thorsen 2015.
  25. ^ Haentjens Dekker & Birnbaum 2017.
  26. ^ Dekker 2018.
  27. ^ Text Encoding Initiative, § 20.3 Fragmentation and Reconstitution of Virtual Elements.
  28. ^ DeRose 2004, Segmentation.
  29. ^ Sperberg-McQueen & Huitfeldt 2000, 2.5. Fragmentation.
  30. ^ DeRose 2004, Joins.
  31. ^ Schmidt 2012, 3.4 Interlinking.
  32. ^ Text Encoding Initiative, § 20.4 Stand-off Markup.
  33. ^ Schmidt 2012, 4.2 Markup Outside the Text.
  34. ^ Eggert & Schmidt 2019, Conclusion.
  35. ^ a b c Ide et al. 2017, p.99.
  36. ^ a b "Iso 24612:2012".
  37. ^ Chiarcos et al. 2008.
  38. ^ "Standoff: Annotation microstructure · Issue #1745 · TEIC/TEI". GitHub.
  39. ^ Sperberg-McQueen & Huitfeldt 2000, 2.6. Standoff Markup.
  40. ^ DeRose 2004, Standoff markup.
  41. ^ DeRose 2004, CLIX and LMNL.
  42. ^ Piez 2012.
  43. ^ Sperberg-McQueen & Huitfeldt 2000, 2.7. MECS.
  44. ^ Sperberg-McQueen & Huitfeldt 2000.
  45. ^ Huitfeldt & Sperberg-McQueen 2003.
  46. ^ Hilbert, Schonefeld & Witt 2005.
  47. ^ Witt et al. 2007.
  48. ^ Schonefeld 2008.
  49. ^ Marinelli, Vitali & Zacchiroli 2008.
  50. ^ "ISO GrAF".
  51. ^ "Home". anc.org.
  52. ^ https://www.sfb632.uni-potsdam.de/en/paula.html[bare URL]
  53. ^ Zipser, Florian (2016-11-18). "Salt". corpus-tools.org. doi:10.5281/zenodo.17557. Retrieved 2022-09-11. {{cite journal}}: Cite journal requires |journal= (help)
  54. ^ "NAF". GitHub. 30 June 2021.
  55. ^ "Building structured event indexes of large volumes of financial and economic data for decision making". Community Research and Development Information Service (CORDIS).
  56. ^ . Archived from the original on 2012-04-29. Retrieved 2020-04-06.
  57. ^ "Text Analysis | HiTZ Zentroa".
  58. ^ Eggert & Schmidt 2019.
  59. ^ "Web Annotation Data Model".
  60. ^ Ide & Suderman 2007.
  61. ^ Cassidy 2010, cassidy.
  62. ^ Chiarcos 2012, POWLA.
  63. ^ "Home". rdfhdt.org.
  64. ^ "RDF Binary using Apache Thrift".
  65. ^ "Selectors and States".
  66. ^ Cimiano, Philipp; Chiarcos, Christian; McCrae, John P.; Gracia, Jorge (2020). Linguistic Linked Data. Representation, Generation and Applications. Cham: Springer.
  67. ^ Verspoor, Karin; Livingston, Kevin (2012). "Towards Adaptation of Linguistic Annotations to Scholarly Annotation Formalisms on the Semantic Web". Proceedings of the Sixth Linguistic Annotation Workshop, Jeju, Republic of Korea: 75–84. Retrieved 6 April 2020.
  68. ^ "NLP Interchange Format (NIF) 2.0 - Overview and Documentation".
  69. ^ "LIF Overview".
  70. ^ "POWLA". January 2022.
  71. ^ "NLP Annotation Format | Background information on NAF".
  72. ^ "Towards a consolidated LOD vocabulary for linguistic annotations". GitHub. 7 September 2021.

References

  • Birnbaum, David J; Thorsen, Elise (2015). "Markup and meter: Using XML tools to teach a computer to think about versification". Proceedings of Balisage: The Markup Conference 2015. Balisage: The Markup Conference 2015. Vol. 15. Montréal. doi:10.4242/BalisageVol15.Birnbaum01. ISBN 978-1-935958-11-6.
  • Cassidy, Steve (2010). An RDF realisation of LAF in the DADA annotation server (PDF). Proceedings of ISA-5. Hong Kong. CiteSeerX 10.1.1.454.9146.
  • Chiarcos, Christian (2012). "POWLA: Modeling linguistic corpora in OWL/DL" (PDF). The Semantic Web: Research and Applications. Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012, Heraklion, Crete; LNCS 7295). Lecture Notes in Computer Science. Vol. 7295. pp. 225–239. doi:10.1007/978-3-642-30284-8_22. ISBN 978-3-642-30283-1. Retrieved 2016-05-24.
  • Chiarcos, Christian; Dipper, Stefanie; Götze, Michael; Leser, Ulf; Lüdeling, Anke; Ritz, Julia; Stede, Manfred (2008). "A flexible framework for integrating annotations from different tools and tagsets". Traitement Automatique des Langues. 49 (2): 271–293.
  • Dekker, Ronald Haentjens; Bleeker, Elli; Buitendijk, Bram; Kulsdom, Astrid; Birnbaum, David J (2018). "TAGML: A markup language of many dimensions". Proceedings of Balisage: The Markup Conference 2018. Balisage: The Markup Conference 2018. Vol. 21. Rockville, MD. doi:10.4242/BalisageVol21.HaentjensDekker01. ISBN 978-1-935958-18-5.
  • DeRose, Steven (2004). Markup Overlap: A Review and a Horse. Extreme Markup Languages 2004. Montréal. CiteSeerX 10.1.1.108.9959. Retrieved 2014-10-14.
  • Di Iorio, Angelo; Peroni, Silvio; Vitali, Fabio (August 2009). "Towards markup support for full GODDAGs and beyond: the EARMARK approach". Proceedings of Balisage: The Markup Conference 2009. Balisage: The Markup Conference 2009. Vol. 3. Montréal. doi:10.4242/BalisageVol3.Peroni01. ISBN 978-0-9824344-2-0.
  • Eggert, Paul; Schmidt, Desmond A (2019). "The Charles Harpur Critical Archive: A History and Technical Report". International Journal of Digital Humanities. 1 (1). Retrieved 2019-03-25.
  • Haentjens Dekker, Ronald; Birnbaum, David J (2017). "It's more than just overlap: Text As Graph". Proceedings of Balisage: The Markup Conference 2017. Balisage: The Markup Conference 2017. Vol. 19. Montréal. doi:10.4242/BalisageVol19.Dekker01. ISBN 978-1-935958-15-4.
  • Durusau, Patrick (2006). (PDF). Archived from the original (PDF) on 2014-10-23. Retrieved 2014-10-14.
  • Ian Hickson (2002-11-21). "Tag Soup: How UAs handle <x> <y> </x> </y>". Retrieved 2017-11-05.
  • Hilbert, Mirco; Schonefeld, Oliver; Witt, Andreas (2005). Making CONCUR work. Extreme Markup Languages 2005. Montréal. CiteSeerX 10.1.1.104.634. Retrieved 2014-10-14.
  • Huitfeldt, Claus; Sperberg-McQueen, C M (2003). . Archived from the original on 2017-02-27. Retrieved 2014-10-14.
  • Ide, Nancy; Chiarcos, Christian; Stede, Manfred; Cassidy, Steve (2017). "Designing Annotation Schemes: From Model to Representation". In Ide, Nancy; Pustejovsky, James (eds.). Handbook of Linguistic Annotation. Dordrecht: Springer. p. 99. doi:10.1007/978-94-024-0881-2_3. ISBN 978-94-024-0879-9.
  • La Fontaine, Robin (2016). "Representing Overlapping Hierarchy as Change in XML". Proceedings of Balisage: The Markup Conference 2016. Balisage: The Markup Conference 2016. Vol. 17. Montréal. doi:10.4242/BalisageVol17.LaFontaine01. ISBN 978-1-935958-13-0.
  • Marinelli, Paolo; Vitali, Fabio; Zacchiroli, Stefano (January 2008). "Towards the unification of formats for overlapping markup" (PDF). New Review of Hypermedia and Multimedia. 14 (1): 57–94. CiteSeerX 10.1.1.383.1636. doi:10.1080/13614560802316145. ISSN 1361-4568. S2CID 16909224. Retrieved 2014-10-14.
  • MoChridhe, Race J (2019-04-24). "Twenty Years of Theological Markup Languages: A Retro- and Prospective". Theological Librarianship. 12 (1). doi:10.31046/tl.v12i1.523. ISSN 1937-8904. S2CID 171582852. Retrieved 2019-07-15.
  • Piez, Wendell (August 2012). "Luminescent: parsing LMNL by XSLT upconversion". Proceedings of Balisage: The Markup Conference 2012. Balisage: The Markup Conference 2012. Vol. 8. Montréal. doi:10.4242/BalisageVol8.Piez01. ISBN 978-1-935958-04-8. Retrieved 2014-10-14.
  • Piez, Wendell (2014). Hierarchies within range space: From LMNL to OHCO. Balisage: The Markup Conference 2014. Montréal. doi:10.4242/BalisageVol13.Piez01.
  • Renear, Allen; Mylonas, Elli; Durand, David (1993-01-06). "Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies". CiteSeerX 10.1.1.172.9017. hdl:2142/9407. Retrieved 2016-10-02. {{cite journal}}: Cite journal requires |journal= (help)
  • Schonefeld, Oliver (August 2008). A Simple API for XCONCUR: Processing concurrent markup using an event-centric API. Balisage: The Markup Conference 2008. Montréal. doi:10.4242/BalisageVol1.Schonefeld01. Retrieved 2014-10-14.
  • Sperberg-McQueen, C M; Huitfeldt, Claus (2000). "GODDAG: A Data Structure for Overlapping Hierarchies". Lecture Notes in Computer Science. 2023 (2023): 139–160. doi:10.1007/978-3-540-39916-2_12. ISBN 978-3-540-21070-2. Retrieved 2014-10-14.
  • Schmidt, Desmond (2009). "Merging Multi-Version Texts: A Generic Solution to the Overlap Problem". Merging Multi-Version Texts: a General Solution to the Overlap Problem. Balisage: The Markup Conference 2009. Proceedings of Balisage: The Markup Conference 2009. Vol. 3. Montréal. doi:10.4242/BalisageVol3.Schmidt01. ISBN 978-0-9824344-2-0.
  • Schmidt, Desmond (2012). "The role of markup in the digital humanities". Historical Social Research. 27 (3): 125–146. doi:10.12759/hsr.37.2012.3.125-146.
  • Henri Sivonen (2003-08-16). "Tag Soup: How Mac IE 5 and Safari handle <x> <y> </x> </y>". Retrieved 2017-11-05.
  • Ide, Nancy; Suderman, Keith (2007). GrAF: A graph-based format for linguistic annotations (PDF). Proceedings of the First Linguistic Annotation Workshop (LAW-2007, Prague, Czech Republic). pp. 1–8. CiteSeerX 10.1.1.146.4543.
  • Tennison, Jenni (2008-12-06). "Overlap, Containment and Dominance". Retrieved 2016-10-02.
  • Witt, Andreas; Schonefeld, Oliver; Rehm, Georg; Khoo, Jonathan; Evang, Kilian (2007). On the Lossless Transformation of Single-File, Multi-Layer Annotations into Multi-Rooted Trees. Extreme Markup Languages 2007. Montréal. Retrieved 2014-10-14.
  • Text Encoding Initiative Consortium (16 September 2014). "Guidelines for Electronic Text Encoding and Interchange" (5 ed.). Retrieved 2014-10-14.
  • WHATWG. "HTML Living Standard". Retrieved 2019-03-25.

overlapping, markup, markup, languages, digital, humanities, overlap, occurs, when, document, more, structures, that, interact, hierarchical, manner, document, with, overlapping, markup, cannot, represented, tree, this, also, known, concurrent, markup, overlap. In markup languages and the digital humanities overlap occurs when a document has two or more structures that interact in a non hierarchical manner A document with overlapping markup cannot be represented as a tree This is also known as concurrent markup Overlap happens for instance in poetry where there may be a metrical structure of feet and lines a linguistic structure of sentences and quotations and a physical structure of volumes and pages and editorial annotations 1 2 Contents 1 History 2 Properties and types 3 Approaches and implementations 3 1 Within hierarchical languages 3 1 1 Multiple documents 3 1 2 Milestones 3 1 3 Joins 3 1 4 Stand off markup 3 1 5 Challenges 3 2 Special purpose languages 3 2 1 Historical formalisms 3 2 2 Actively maintained standoff XML languages 3 3 Graph based formalisms 4 Notes 5 ReferencesHistory Edit The structural differences between multiple editions of Frankenstein have been analysed with overlapping techniques 3 The problem of non hierarchical structures in documents has been recognised since 1988 resolving it against the dominant paradigm of text as a single hierarchy an ordered hierarchy of content objects or OHCO was initially thought to be merely a technical issue but has in fact proven much more difficult 4 In 2008 Jeni Tennison identified markup overlap as the main remaining problem area for markup technologists 5 Markup overlap continues to be a primary issue in the digital study of theological texts in 2019 and is a major reason for the field retaining specialised markup formats the Open Scripture Information Standard and the Theological Markup Language rather than the inter operable Text Encoding Initiative based formats common to the rest of the digital humanities 6 Properties and types EditA distinction exists between schemes that allow non contiguous overlap and those that allow only contiguous overlap Often markup overlap strictly means the latter Contiguous overlap can always be represented as a linear document with milestones typically co indexed start and end markers without the need for fragmenting a logical component into multiple physical ones Non contiguous overlap may require document fragmentation Another distinction in overlapping markup schemes is whether elements can overlap with other elements of the same kind self overlap 2 A scheme may have a privileged hierarchy Some XML based schemes for example represent one hierarchy directly in the XML document tree and represent other overlapping structures by another means these are said to be non privileged Schmidt 2012 identifies a tripartite classification of instances of overlap 1 Variation of content and structure 2 Overlay of multiple perspectives or markup sets and 3 Overlap of individual start and end tags within a single markup perspective additionally some apparent instances of overlap are in fact schema definition problems which can be resolved hierarchically He contends that type 1 is best resolved by a system of multiple documents external to the markup but types 2 and 3 require dealing with internally Approaches and implementations EditDeRose 2004 Evaluation criteria identifies several criteria for judging solutions to the overlap problem readability and maintainability tool support and compatibility with XML possible validation schemes and ease of processing Tag soup is strictly speaking not overlapping markup it is malformed HTML which is a non overlapping language and may be ill defined Some web browsers attempted to represent overlapping start and end tags with non hierarchical Document Object Models DOM but this was not standardised across all browsers and was incompatible with the innately hierarchical nature of the DOM 7 8 HTML5 defines how processors should deal with such mis nested markup in the HTML syntax and turn it into a single hierarchy 9 With XHTML and SGML based HTML however mis nested markup is a strict error and makes processing by standards compliant systems impossible 10 The HTML standard defines a paragraph concept which can cause overlap with other elements and can be non contiguous 11 SGML which early versions of HTML were based on has a feature called CONCUR that allows multiple independent hierarchies to co exist without privileging any DTD validation is only defined for each individual hierarchy with CONCUR Validation across hierarchies is not defined by the standard CONCUR cannot support self overlap and it interacts poorly with some of SGML s abbreviatory features This feature has been poorly supported by tools and has seen very little actual use using CONCUR to represent document overlap was not a recommended use case according to a commentary by the standard s editor 12 13 Within hierarchical languages Edit There are several approaches to representing overlap in a non overlapping language 14 The Text Encoding Initiative as an XML based markup scheme cannot directly represent overlapping markup All four of the below approaches are suggested 15 The Open Scripture Information Standard is another XML based scheme designed to mark up the Bible It uses empty milestone elements to encode non privileged components 16 To illustrate these approaches marking up the sentences and lines of a fragment of Richard III by William Shakespeare will be used as a running example Where there is a privileged hierarchy the lines will be used Multiple documents Edit Multiple documents can each provide different internally consistent hierarchies The advantage of this approach is that each document is simple and can be processed with existing tools but requires maintenance of redundant content and it can be difficult to cross reference between different views 17 With multiple documents the overlap can be analysed with data comparison and delta encoding techniques and in an XML context specific XML tree differencing algorithms are available 18 19 Schmidt 2012 3 5 Variation recommends this approach for encoding multiple variants of a single text and to accept the duplication of the parts which do not vary rather than attempting to create a structure that represents all of the variation present further he suggests that this alignment be performed automatically and that misalignment is rare in practice 20 Example with lines marked up lt line gt I by attorney bless thee from thy mother lt line gt lt line gt Who prays continually for Richmond s good lt line gt lt line gt So much for that The silent hours steal on lt line gt lt line gt And flaky darkness breaks within the east lt line gt With sentences marked up lt sentence gt I by attorney bless thee from thy mother Who prays continually for Richmond s good lt sentence gt lt sentence gt So much for that lt sentence gt lt sentence gt The silent hours steal on And flaky darkness breaks within the east lt sentence gt Milestones Edit Milestones are empty elements that mark the beginning and end of a component typically using the XML ID mechanism to indicate which begin element goes with which end element Milestones can be used to embed a non privileged structure within a hierarchical language In their basic form they can only represent contiguous overlap Generic XML can of course parse the milestone elements but do not understand their special meaning and so cannot easily process or validate the non privileged structure 21 22 Milestone have the advantage that the markup for overlapping elements is located right at the relevant boundaries like other markup This is an advantage for maintainability and readability 23 CLIX DeRose 2004 is an example of such an approach Example lt line gt lt sentence start gt I by attorney bless thee from thy mother lt line gt lt line gt Who prays continually for Richmond s good lt sentence end gt lt line gt lt line gt lt sentence start gt So much for that lt sentence end gt lt sentence start gt The silent hours steal on lt line gt lt line gt And flaky darkness breaks within the east lt sentence end gt lt line gt Punctuation and spaces have been identified as a type of milestone style crypto overlap or pseudo markup as the boundaries of words clauses sentences and the like do not necessarily align with the formal markup boundaries hierarchically 24 25 It is also possible to use more complex milestones to represent non contiguous structures For example TAGML s suspend and resume semantic 26 can be expressed using milestones for example by adding an attribute to indicate whether each milestone represents a start suspend resume or end point Re ordering and even self overlap can be achieved similarly by annotating each milestone with a next chunk reference Joins Edit Joins are pointers within a privileged hierarchy to other components of the privileged hierarchy which may be used to reconstruct a non privileged component akin to following a linked list A single non privileged element is segmented into several partial elements within the privileged hierarchy the partial elements themselves do not represent a single unit in the non privileged hierarchy which can be misleading and make processing difficult 27 28 While this approach can support some discontiguous structures it is not able to re order elements 29 A slightly different approach can however express re ordering by expressing the join away from the content at the cost of directness and maintainability 30 Join based representations can introduce the possibility of cycles between elements detecting and rejecting these adds complexity to implementations 31 Example lt line gt lt sentence id a gt I by attorney bless thee from thy mother lt sentence gt lt line gt lt line gt lt sentence continues a gt Who prays continually for Richmond s good lt sentence gt lt line gt lt line gt lt sentence id b gt So much for that lt sentence gt lt sentence id c gt The silent hours steal on lt sentence gt lt line gt lt line gt lt sentence continues c gt And flaky darkness breaks within the east lt sentence gt lt line gt Stand off markup Edit Stand off markup is similar to using joins except that there may be no privileged hierarchy each part of the document is given a label or might be referred to by an offset and the document structure is expressed by pointing to the content from markup that stands off from the content possibly in an entirely different file and might contain no content itself The TEI guidelines identify the unity of the elements as a primary advantage of stand off markup over joins in addition to the ability to produce and distribute annotations separately from the text possibly even by different authors applying markup to a read only document 32 allowing collaborative approaches to markup by a divide and conquer strategy 33 Example lt span id a gt I by attorney bless thee from thy mother lt span gt lt span id b gt Who prays continually for Richmond s good lt span gt lt span id c gt So much for that lt span gt lt span id d gt The silent hours steal on lt span gt lt span id e gt And flaky darkness breaks within the east lt span gt lt line contents a gt lt line contents b gt lt line contents c d gt lt line contents e gt lt sentence contents a b gt lt sentence contents c gt lt sentence contents d e gt It has been claimed that separating markup and text can result in overall simplification and increased maintainability 34 and by 2017 t he current state of the art to represent linguistically annotated data is to use a graph based representation serialized as standoff XML as a pivot format 35 i e that standoff was the most widely accepted approach to address the overlapping markup challenge Standoff formalisms have been the basis for an ISO standard for linguistic annotation 36 they have been successfully applied for developing corpus management systems 37 and as of April 2020 they are actively being developed in the TEI 38 Challenges Edit Representing overlapping markup within hierarchical languages is challenging for reasons of redundancy and or complexity In the 2000s to 2010s standoff formalisms were generally accepted as the most promising approach here 35 but a disadvantage of standoff is that validation is very challenging 39 Standoff formalisms are not natively supported by database management systems so that by 2017 it was suggested to use standoff XML as a pivot format and relational data bases for querying 35 In practical applications this requires complicated architectures and or labor intense transformation between pivot format and internal representation As a result maintenance is problematic 40 This has been a motivation to develop corpus management systems on the basis of graph data bases and for using established graph based formalisms as pivot formats Special purpose languages Edit For implementing the above mentioned strategies either existing markup languages such as the TEI can be extended or special purpose languages can be designed To design an entirely new markup language allow to forego the tool support in existing languages for a less complicated semantic model and more convenient syntax Historical formalisms Edit LMNL is a non hierarchical markup language first described in 2002 by Jeni Tennison and Wendell Piez annotating ranges of a document with properties and allowing self overlap CLIX which originally stood for Canonical LMNL In XML provides a method for representing any LMNL document in a milestone style XML document 41 It also has another XML serialisation xLMNL 42 MECS was developed by the University of Bergen s Wittgenstein Archive However it had several problems it allowed some non sensical documents of overlapping elements it could not support self overlap and it did not have the capacity to define a DTD like grammar 43 The theory of General Ordered Descendant Directed Acyclic Graphs GODDAGs while not strictly a markup language itself is a general data model for non hierarchical markup Restricted GODDAGs were designed specifically to match the semantics of MECS general GODDAGs may be non contiguous and need a more powerful language 44 TexMECS is a successor to MECS which has a formal grammar and is designed to represent every GODDAG and nothing that is not a GODDAG 45 XCONCUR previously MuLaX is a melding together of XML and SGML s CONCUR and also contains a validation language XCONCUR CL and a SAX like API 46 47 48 Marinelli Vitali and Zacchiroli provide algorithms to convert between restricted GODDAGs ECLIX LMNL parallel documents in XML contiguous stand off markup and TexMECS 49 None of these formalisms seem to be maintained anymore Consensus community seems to be to employ standoff XML or graph based formalisms Actively maintained standoff XML languages Edit GrAF XML 50 standoff XML serialization of the Linguistic Annotation Framework LAF 36 used e g for the American National Corpus 51 PAULA XML 52 standoff XML serialization of the data model underlying the corpus management system ANNIS and the converter suite SALT 53 NAF NLP Annotation Format Newsreader Annotation Format 54 standoff XML format originally developed in the NewsReader project FP7 2013 2015 55 currently used by NLP tools such as FreeLing 56 with support for English Spanish Portuguese Italian French German Russian Catalan Galician Croatian Slovene etc and EusTagger 57 with support for Basque English Spanish The Charles Harpur Critical Archive is encoded using multi version documents MVD to represent the variant versions of documents and as a means of indicating additions deletions and revisions using a tactical combination of multiple documents and stand off ranges within an underlying graph based model MVD is presented as an application file format requiring specialised tools to view or edit 58 Standoff approaches have two parts commonly called the content and the annotations These can be expressed in unrelated representations Simple standoff annotations per se involve no more than a list of location type pairs Thus in a few applications example needed standoff annotations are expressed in CSV JSON LD or other representations e g Web Annotation 59 or graph formalisms grounded in string URIs see below However representing and validating content in such representations is much more difficult and much less common Graph based formalisms Edit Standoff markup employs a data model based on directed graphs 60 thus complicating its representation when grounding markup information in a tree Representing overlapping hierarchies in a graph eliminates this challenge Standoff annotations can thus be more adequately represented as generalised directed multigraphs and use formalisms and technologies developed for this purpose most notably those based on the Resource Description Framework RDF 61 62 EARMARK is an early RDF OWL representation that encompasses General Ordered Descendant Directed Acyclic Graphs GODDAGs 14 The theory of GODDAGs while not strictly a markup language itself is a general data model for non hierarchical markup RDF is a semantic data model that is linearization independent and it provides different linearisations including an XML format RDF XML that can be modeled to mirror standoff XML a linearisation that lets RDF be expressed in XML attributes RDFa a JSON format JSON LD and binary formats designed to facilitate querying or processing RDF HDT 63 RDF Thrift 64 RDF is semantically equivalent to graph based data models underlying standoff markup it does not require special purpose technology for storing parsing and querying Multiple interlinked RDF files representing a document or a corpus constitute an example of Linguistic Linked Open Data An established technique to link arbitrary graphs with an annotated document is to use URI fragment identifiers to refer to parts of a text and or document see overview under Web annotation The Web Annotation standard provides format specific selectors as an additional means e g offset string match or XPath based selectors 65 Native RDF vocabularies capable to represent linguistic annotations include 66 Web Annotation 67 NLP Interchange Format NIF 68 LAPPS Interchange Format LIF 69 Related vocabularies include POWLA an OWL2 DL serialization of PAULA XML 70 RDF NAF an RDF serialization of the NLP Annotation Format 71 In early 2020 W3C Community Group LD4LT has launched an initiative to harmonize these vocabularies and to develop a consolidated RDF vocabulary for linguistic annotations on the web 72 Notes Edit Text Encoding Initiative a b DeRose 2004 The problem types Piez 2014 Renear Mylonas amp Durand 1993 Tennison 2008 MoChridhe 2019 Hickson 2002 sfn error no target CITEREFHickson2002 help Sivonen 2003 sfn error no target CITEREFSivonen2003 help HTML 8 2 8 An introduction to error handling and strange cases in the parser Sperberg McQueen amp Huitfeldt 2000 2 1 Non SGML Notations HTML 3 2 5 4 Paragraphs Sperberg McQueen amp Huitfeldt 2000 2 2 CONCUR DeRose 2004 SGML CONCUR a b Di Iorio Peroni amp Vitali 2009 Text Encoding Initiative 20 Non hierarchical Structures Durusau 2006 Text Encoding Initiative 20 1 Multiple Encodings of the Same Information Schmidt 2009 La Fontaine 2016 Schmidt 2012 4 1 Automating Variation Text Encoding Initiative 20 2 Boundary Marking with Empty Elements Sperberg McQueen amp Huitfeldt 2000 2 4 Milestones DeRose 2004 TEI style milestones Birnbaum amp Thorsen 2015 Haentjens Dekker amp Birnbaum 2017 Dekker 2018 sfn error no target CITEREFDekker2018 help Text Encoding Initiative 20 3 Fragmentation and Reconstitution of Virtual Elements DeRose 2004 Segmentation Sperberg McQueen amp Huitfeldt 2000 2 5 Fragmentation DeRose 2004 Joins Schmidt 2012 3 4 Interlinking Text Encoding Initiative 20 4 Stand off Markup Schmidt 2012 4 2 Markup Outside the Text Eggert amp Schmidt 2019 Conclusion a b c Ide et al 2017 p 99 a b Iso 24612 2012 Chiarcos et al 2008 Standoff Annotation microstructure Issue 1745 TEIC TEI GitHub Sperberg McQueen amp Huitfeldt 2000 2 6 Standoff Markup DeRose 2004 Standoff markup DeRose 2004 CLIX and LMNL Piez 2012 Sperberg McQueen amp Huitfeldt 2000 2 7 MECS Sperberg McQueen amp Huitfeldt 2000 Huitfeldt amp Sperberg McQueen 2003 Hilbert Schonefeld amp Witt 2005 Witt et al 2007 Schonefeld 2008 Marinelli Vitali amp Zacchiroli 2008 ISO GrAF Home anc org https www sfb632 uni potsdam de en paula html bare URL Zipser Florian 2016 11 18 Salt corpus tools org doi 10 5281 zenodo 17557 Retrieved 2022 09 11 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help NAF GitHub 30 June 2021 Building structured event indexes of large volumes of financial and economic data for decision making Community Research and Development Information Service CORDIS Home FreeLing Home Page Archived from the original on 2012 04 29 Retrieved 2020 04 06 Text Analysis HiTZ Zentroa Eggert amp Schmidt 2019 Web Annotation Data Model Ide amp Suderman 2007 Cassidy 2010 cassidy Chiarcos 2012 POWLA Home rdfhdt org RDF Binary using Apache Thrift Selectors and States Cimiano Philipp Chiarcos Christian McCrae John P Gracia Jorge 2020 Linguistic Linked Data Representation Generation and Applications Cham Springer Verspoor Karin Livingston Kevin 2012 Towards Adaptation of Linguistic Annotations to Scholarly Annotation Formalisms on the Semantic Web Proceedings of the Sixth Linguistic Annotation Workshop Jeju Republic of Korea 75 84 Retrieved 6 April 2020 NLP Interchange Format NIF 2 0 Overview and Documentation LIF Overview POWLA January 2022 NLP Annotation Format Background information on NAF Towards a consolidated LOD vocabulary for linguistic annotations GitHub 7 September 2021 References EditBirnbaum David J Thorsen Elise 2015 Markup and meter Using XML tools to teach a computer to think about versification Proceedings of Balisage The Markup Conference 2015 Balisage The Markup Conference 2015 Vol 15 Montreal doi 10 4242 BalisageVol15 Birnbaum01 ISBN 978 1 935958 11 6 Cassidy Steve 2010 An RDF realisation of LAF in the DADA annotation server PDF Proceedings of ISA 5 Hong Kong CiteSeerX 10 1 1 454 9146 Chiarcos Christian 2012 POWLA Modeling linguistic corpora in OWL DL PDF The Semantic Web Research and Applications Proceedings of the 9th Extended Semantic Web Conference ESWC 2012 Heraklion Crete LNCS 7295 Lecture Notes in Computer Science Vol 7295 pp 225 239 doi 10 1007 978 3 642 30284 8 22 ISBN 978 3 642 30283 1 Retrieved 2016 05 24 Chiarcos Christian Dipper Stefanie Gotze Michael Leser Ulf Ludeling Anke Ritz Julia Stede Manfred 2008 A flexible framework for integrating annotations from different tools and tagsets Traitement Automatique des Langues 49 2 271 293 Dekker Ronald Haentjens Bleeker Elli Buitendijk Bram Kulsdom Astrid Birnbaum David J 2018 TAGML A markup language of many dimensions Proceedings of Balisage The Markup Conference 2018 Balisage The Markup Conference 2018 Vol 21 Rockville MD doi 10 4242 BalisageVol21 HaentjensDekker01 ISBN 978 1 935958 18 5 DeRose Steven 2004 Markup Overlap A Review and a Horse Extreme Markup Languages 2004 Montreal CiteSeerX 10 1 1 108 9959 Retrieved 2014 10 14 Di Iorio Angelo Peroni Silvio Vitali Fabio August 2009 Towards markup support for full GODDAGs and beyond the EARMARK approach Proceedings of Balisage The Markup Conference 2009 Balisage The Markup Conference 2009 Vol 3 Montreal doi 10 4242 BalisageVol3 Peroni01 ISBN 978 0 9824344 2 0 Eggert Paul Schmidt Desmond A 2019 The Charles Harpur Critical Archive A History and Technical Report International Journal of Digital Humanities 1 1 Retrieved 2019 03 25 Haentjens Dekker Ronald Birnbaum David J 2017 It s more than just overlap Text As Graph Proceedings of Balisage The Markup Conference 2017 Balisage The Markup Conference 2017 Vol 19 Montreal doi 10 4242 BalisageVol19 Dekker01 ISBN 978 1 935958 15 4 Durusau Patrick 2006 OSIS Users Manual OSIS Schema 2 1 1 PDF Archived from the original PDF on 2014 10 23 Retrieved 2014 10 14 Ian Hickson 2002 11 21 Tag Soup How UAs handle lt x gt lt y gt lt x gt lt y gt Retrieved 2017 11 05 Hilbert Mirco Schonefeld Oliver Witt Andreas 2005 Making CONCUR work Extreme Markup Languages 2005 Montreal CiteSeerX 10 1 1 104 634 Retrieved 2014 10 14 Huitfeldt Claus Sperberg McQueen C M 2003 TexMECS An experimental markup meta language for complex documents Archived from the original on 2017 02 27 Retrieved 2014 10 14 Ide Nancy Chiarcos Christian Stede Manfred Cassidy Steve 2017 Designing Annotation Schemes From Model to Representation In Ide Nancy Pustejovsky James eds Handbook of Linguistic Annotation Dordrecht Springer p 99 doi 10 1007 978 94 024 0881 2 3 ISBN 978 94 024 0879 9 La Fontaine Robin 2016 Representing Overlapping Hierarchy as Change in XML Proceedings of Balisage The Markup Conference 2016 Balisage The Markup Conference 2016 Vol 17 Montreal doi 10 4242 BalisageVol17 LaFontaine01 ISBN 978 1 935958 13 0 Marinelli Paolo Vitali Fabio Zacchiroli Stefano January 2008 Towards the unification of formats for overlapping markup PDF New Review of Hypermedia and Multimedia 14 1 57 94 CiteSeerX 10 1 1 383 1636 doi 10 1080 13614560802316145 ISSN 1361 4568 S2CID 16909224 Retrieved 2014 10 14 MoChridhe Race J 2019 04 24 Twenty Years of Theological Markup Languages A Retro and Prospective Theological Librarianship 12 1 doi 10 31046 tl v12i1 523 ISSN 1937 8904 S2CID 171582852 Retrieved 2019 07 15 Piez Wendell August 2012 Luminescent parsing LMNL by XSLT upconversion Proceedings of Balisage The Markup Conference 2012 Balisage The Markup Conference 2012 Vol 8 Montreal doi 10 4242 BalisageVol8 Piez01 ISBN 978 1 935958 04 8 Retrieved 2014 10 14 Piez Wendell 2014 Hierarchies within range space From LMNL to OHCO Balisage The Markup Conference 2014 Montreal doi 10 4242 BalisageVol13 Piez01 Renear Allen Mylonas Elli Durand David 1993 01 06 Refining our Notion of What Text Really Is The Problem of Overlapping Hierarchies CiteSeerX 10 1 1 172 9017 hdl 2142 9407 Retrieved 2016 10 02 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Schonefeld Oliver August 2008 A Simple API for XCONCUR Processing concurrent markup using an event centric API Balisage The Markup Conference 2008 Montreal doi 10 4242 BalisageVol1 Schonefeld01 Retrieved 2014 10 14 Sperberg McQueen C M Huitfeldt Claus 2000 GODDAG A Data Structure for Overlapping Hierarchies Lecture Notes in Computer Science 2023 2023 139 160 doi 10 1007 978 3 540 39916 2 12 ISBN 978 3 540 21070 2 Retrieved 2014 10 14 Schmidt Desmond 2009 Merging Multi Version Texts A Generic Solution to the Overlap Problem Merging Multi Version Texts a General Solution to the Overlap Problem Balisage The Markup Conference 2009 Proceedings of Balisage The Markup Conference 2009 Vol 3 Montreal doi 10 4242 BalisageVol3 Schmidt01 ISBN 978 0 9824344 2 0 Schmidt Desmond 2012 The role of markup in the digital humanities Historical Social Research 27 3 125 146 doi 10 12759 hsr 37 2012 3 125 146 Henri Sivonen 2003 08 16 Tag Soup How Mac IE 5 and Safari handle lt x gt lt y gt lt x gt lt y gt Retrieved 2017 11 05 Ide Nancy Suderman Keith 2007 GrAF A graph based format for linguistic annotations PDF Proceedings of the First Linguistic Annotation Workshop LAW 2007 Prague Czech Republic pp 1 8 CiteSeerX 10 1 1 146 4543 Tennison Jenni 2008 12 06 Overlap Containment and Dominance Retrieved 2016 10 02 Witt Andreas Schonefeld Oliver Rehm Georg Khoo Jonathan Evang Kilian 2007 On the Lossless Transformation of Single File Multi Layer Annotations into Multi Rooted Trees Extreme Markup Languages 2007 Montreal Retrieved 2014 10 14 Text Encoding Initiative Consortium 16 September 2014 Guidelines for Electronic Text Encoding and Interchange 5 ed Retrieved 2014 10 14 WHATWG HTML Living Standard Retrieved 2019 03 25 Retrieved from https en wikipedia org w index php title Overlapping markup amp oldid 1127107762, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.