Internet-Draft | Terminology for RFCXML Evolution | July 2023 |
Bormann | Expires 28 January 2024 | [Page] |
The canonical format for RFCs is called RFCXML, with the currently effective details originally documented in the RFC 799x series. This format has experienced some uncontrolled evolution since, partially caused by an unwillingness to recognize the need for overt, deliberate evolution.¶
Controlled RFCXML evolution is going to be complex. Its discussion will need agreed terminology, without which it will devolve into a Tower of Babel.¶
This note is to be removed before publishing as an RFC.¶
Status information for this document may be found at https://datatracker.ietf.org/doc/draft-bormann-rswg-terminology/.¶
Discussion of this document takes place on the rswg Working Group mailing list (mailto:[email protected]), which is archived at https://mailarchive.ietf.org/arch/browse/rswg/. Subscribe at https://www.ietf.org/mailman/listinfo/rswg/.¶
Source for this draft and an issue tracker can be found at https://github.com/cabo/rswg-terminology.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 28 January 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
The canonical format for RFCs is called RFCXML, with the currently effective details originally documented in the RFC 799x series. This format has experienced some uncontrolled evolution since, partially caused by an unwillingness to recognize the need for overt, deliberate evolution.¶
Controlled RFCXML evolution is going to be complex. Its discussion will need agreed terminology, without which it will devolve into a Tower of Babel.¶
Although this document is not an IETF Standards Track publication, it adopts the conventions for normative language to provide clarity of instructions to the implementer. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Ultimately, this document should turn into a definitions section of some other document. For now, we will use a mix of prose and definition styles.¶
XML does not define the meaning of its instances. Saying "this document is in XML" doesn't tell you much more about its semantics than "this document is in ASCII".¶
When we talk about the specific semantics instilled into an XML document by the RFCXML format, we will therefore always use the term RFCXML. This term can be split into several aspects:¶
<em>
and <strong>
.
(Officially, these are also semantic markup, but as soon as a
"Conventions" section says "Newly defined terms are shown in
italics", that is no longer true.)¶
Semantic aspects that are not rendered are hidden semantics.
E.g., the <keywords>
element is entirely not rendered in today's
renderings; it is intended for processes outside of rendering/human
consumption (e.g., search).
The <sourcecode name=
attribute is rendered only in certain cases,
but can be used by sourcecode extraction processes (e.g., for CI or
for re-use of pseudo-code in other contexts) in other cases, too.¶
RFCXML currently has three rendering targets offered by the RFC editor:
TXT, HTML, PDF.
HTML and PDF are typographic renderings, TXT is a typewriter
rendering.
IETF datatracker
also has a forth and fifth rendering target, which
uses HTML or PDF, but tries to emulate TXT rendering while doing so.¶
Some semantics is hidden in some of these renderings, but not others.
E.g., the <tt>
element serves to identify text as different from
normal running text, semantically similar to the way <sourcecode>
and <artwork>
do, but syntactically more like <em>
or <strong>
.
Since xml2rfc 3.10.0, the semantics of <tt>
are suppressed in TXT
renderings, which leads to problems (not just the middle-of-the-river
semantic change, but also for new documents: the inability to express
certain semantics in a way that they are recognizable in TXT but not
distracting in the typographic renderings).
A recent poll whether <em>
should also be suppressed in TXT ended
with a negative result.¶
Note that suppression of certain semantics in certain rendering targets is fine if the semantics is ancillary. Different documents differ in their usage of certain markup semantics, and even different authors of the same document may disagree whether some usage is ancillary or essential (i.e., of semantic intent, conveying meaning): From an author's view, usage of specific markup can be for aesthetic purposes, it can increase ease of use of the document, it can help prevent a misunderstanding (which can have very different levels of likelihood to occur), or it can be essential.¶
RFCs are instances of RFCXML, specifically the publishing subset of RFCXML. As of today, these instances are immutable. Format evolution may call for a way to evolve the instances along with an evolved format specification.¶
Most RFCs are the result of a consensus process, either full IETF consensus or maybe just the review of a smaller group whether the document should be published (IAB, IRTF RG, ISE review).¶
This consensus is almost exclusively formed by review processes that involve reviewing renderings, only very rarely by looking at the RFCXML instance itself. These review processes are often extremely expensive, as they involve contributions from sought-after experts in the field. Their output constitutes much of the value of the RFC series.¶
During the review processes, the document instance is not an RFC. Specifically, the authoring subset of RFCXML is used, which has slightly different characteristics from the publishing subset. As mentioned, we sometimes also use different renderers during the authoring/reviewing process (e.g., datatracker's distinct HTML/PDF renderings), reducing the congruence of the reviewed document with what its users will see.¶
The definition of RFCXML will evolve, by adding functionality, or by taking elements and attributes out of service (sometimes called deprecating, but see below) that have been obsoleted in some way.¶
This is relatively straightforward for new documents.¶
Documents that have been in the authoring process and have already received expensive review generally need a transition strategy, such as translation from the format defined by an older RFCXML specification to a newer one. This transition often needs to be synchronized with tool development more than with consensus processes on the format itself, which can give tools a de-facto normative role.¶
Documents that already have been published cannot benefit from format evolution as long as their XML instances are immutable. This can be accommodated by keeping RFCXML able to process published documents — just those, not the entirety of potential instances of a previous RFCXML specification. This support would be tagged as for backwards compatibility only. (Backwards compatibility for documents in authoring/reviewing stage would reduce disruption.)¶
The corpus of published RFCXML-form documents is large enough that
any translation processes to a new RFCXML specification need to be
automated.
Such automated processes can then also be made available for
authoring/reviewing (xml2rfc's --v2v3
process is a nicely carried
out example for that) or just focused on the finite set of documents
published to a previous RFCXML specification.¶
A format change can affect the Syntax (grammar, other syntactic details not captured in the grammar), the Semantics, and/or the Rendering (possibly hiding some information in some renderings).¶
A term that has been used in a non-standard way in the creation of RFCXMLv3 is deprecation. In RFC799x, it means that the deprecated feature is no longer available for publishing. It is still available during authoring/reviewing, with an understanding that these processes provide a way to do a reviewed manual translation or to at least review automated translation.¶
Here, Backwards compatibility often can only be ascertained by manual review: It is not sufficient that the new system does not crash with the old data, the old data MUST be useful in the sense that it would survive the same review processes. (These are generally too expensive to be redone just for an RFCXML format change.)¶
A non-backwards-compatible (NBC) change to the RFCXML format can have detectable impact on a document, e.g., by now failing its validation. Or the impact can be non-detectable, i.e., requiring human review to detect, such as a semantic change that creates a different rendering that (potentially) has a different meaning.¶
A semantic refinement allows instances of the updated RFCXML
specification to express more detailed information than previously
possible. E.g., the <em>
element could be split into usages for
term definitions, true emphasis, and other usages of italic type.
It could carry hints as to how to emulate it in typewriter renderings.¶
A semantic refinement can be done in a roughly backwards-compatible
way, by retaining the unrefined alternative (e.g. <em>
). Giving
that alternative more limited semantics (e.g., by adding an attribute with a
default value) is no longer truly backwards-compatible, as it is a
(usually hidden!) semantic change.
Retaining it without "deprecating" it will require some will-power ---
but many documents may not have a need for the specific refinement
(e.g., proposed in the example) and would be well-served by retaining the
unrefined alternative.¶
If there is a need to translate RFC instances to new format specifications, they are no longer immutable (and/or their names need to be augmented by a revision indicator, possibly with a way added to obtain the most recent revision).¶
Opening up mutability provides an opportunity to correct errors in the originally published document, such as errata.¶
Such an instance update also can be used to replace now deprecated (in the English sense) markup by modern one.¶
An example for a detectable NBC change would be to only allow digits
and single spaces between them in <rfc updates=
attributes. Correcting
this in the now failing instances would probably be done by manual
intervention, as the number of instances is too small to justify
automation.¶
This document has no IANA actions.¶
TBD¶