Internet-Draft AECN July 2023
Shi & Zhou Expires 11 January 2024 [Page]
Workgroup:
Congestion Control Working Group
Internet-Draft:
draft-shi-ccwg-advanced-ecn-00
Published:
Intended Status:
Standards Track
Expires:
Authors:
H. Shi, Ed.
Huawei
T. Zhou
Huawei

Advanced Explicit Congestion Notification

Abstract

This document proposes Advanced Explicit Congestion Notification mechanism enabling host to obtain the congestion information at the bottleneck. The sender sets the congestion information collection command in the packet header indicating the network device to update the congestion information field per hop. The receiver carries the updated congestion information back to the sender in the ACK. The sender then leverage the rich congestion information to do congestion control.

Discussion Venues

This note is to be removed before publishing as an RFC.

Discussion of this document takes place on the Congestion Control Working Group Working Group mailing list ([email protected]), which is archived at https://mailarchive.ietf.org/arch/browse/ccwg/.

Source for this draft and an issue tracker can be found at https://github.com/VMatrix1900/draft-ccwg-advanced-ecn.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 11 January 2024.

Table of Contents

1. Introduction

Traditionally, congestion control has depended on implicit congestion detection by the host, where hosts gauge congestion primarily through packet loss or variations in round-trip times. Explicit Congestion Notification (ECN) represents a substantial improvement, as it facilitates network devices to explicitly signal congestion to the endpoints before packet loss occurs. Low Latency, Low Loss, Scalable throughput (L4S) leverages ECN to meticulously control the queuing delay. It uses ECN markings to maintain low queuing delays and avoid bufferbloat. However, ECN is limited by the use of a single bit of information. This limitation constrains the granularity of congestion information that can be conveyed. L4S's requirement for more detailed congestion signals demands an enhanced utilization of ECN, which could involve employing additional bits for a more precise representation of congestion levels and better control over delay and throughput in contemporary network environments.

HPCC[I-D.draft-an-ccwg-hpcc] leverages more extensive congestion signals from the network by utilizing in-band telemetry, which facilitates the gathering of detailed load information from each switch it traverses. This enhanced approach enables HPCC to make more informed decisions on controlling network congestion and converge fast. However, one caveat associated with this approach is that HPCC utilizes an append mode for in-band telemetry. In append mode, as the packet traverses the network, it accumulates data from each switch, which consequently increases the size of the packet. This growth in packet size can potentially lead to issues such as exceeding the Maximum Transmission Unit (MTU) size which makes it unsuitable for the internet. Another caveat is that each sender need to repeat the computation to get the bottleneck information even if they shares the same path.

This document defines Advanced ECN which expands the 1 bit congestion notification to multiple bits and enables network device to update the congestion information per hop. When the packet arrives at the receiver, the congestion information field will reflect the congestion status of the path. By offloading the congestion information calculation to the network device, the computing burden of the endpoint can be reduced.

1.1. Terminology

  • ECN: Explicit Congestion Notification
  • AECN: Advanced Explicit Congestion Notification
  • HPCC: High Precision Congestion Control[I-D.draft-an-ccwg-hpcc]
  • DRE: Discounting Rate Estimator[CONGA]

1.2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Overview

Figure 1 shows the overview procedure of AECN. First the sender MUST marks the packet with AECN command and initial Congestion Info(called AECN header, see Section 3). The AECN Command specified what kind of the congestion information that the endpoint intend to collect from network devices. As the packet traverses through the network, each router MUST update the Congestion Info field based on the AECN command and the router's local load condition. Upon reaching the receiver, the updated congestion information within the packet is extracted and then communicated back to the sender, typically using the transport protocol's acknowledgment mechanism. The sender, now equipped with the congestion information reflective of the packet's journey, uses this data to make informed adjustments to its sending rate.

              pkt+                     pkt+                     pkt+
         AECN Command+            AECN Command+            AECN Command+
+------+Congestion Info0+-------+Congestion Info1+-------+Congestion Info2+--------+
|Sender|===============>|Router1|===============>|Router2|===============>|Receiver|
+------+     Link-1     +-------+     Link-2     +-------+     Link-3     +--------+
  /|\                                                                         |
   |                                                                          |
   +--------------------------------------------------------------------------+
                                        ACKs
Figure 1: Overview of Advanced ECN

3. AECN header format and encapsulation

Figure 2 shown the format of AECN. The AECN header SHOULD be encapsulated in IPv6 extension header[RFC8200] such as SRH, Hop by Hop Options Header etc.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Flags     |           Congestion Info Type                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     Congestion Info Data                      |
~                            ....                               ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: AECN header format

where:

Flags: An 8-bit field. The Bit 7 of Flags indicates the Congestion Info is customized and used only in limited domain such as Data center network. If the Bit 7 is 0, the Congestion Info Type is a bitmap. Other bits are reserved.

Congestion Info Type: A 24-bit map that specifies the present Congestion Info Data. Supported Congestion Info Data is listed in Table 1. Note that it is possible for multiple Congestion Info Data to coexist in one packet.

Congestion Info Data: A variable length field including the congestion information data. Router MUST update this field based on local load status.

Table 1: Congestion Info Data
Bit Congestion Info Data Length Operation
0 Inflight Ratio 8 Max
1 DRE 8 Max
2 Queue Utilization Ratio 8 Max
3 Queue Delay 8 Add
4 Congested Hops 8 Add

4. Example: HPCC with AECN

HPCC calculates the inflight ratio of each link(represent the link utilization of the link) from the collected raw load information carried in the INT. Then maximum inflight ratio along the path is identified and used to adjust the sending rate. The formula to calculate the inflight ratio of each link is shown below:

txRate = (txBytes_1 - txBytes_2)/(t_1-t_2)
inflight ratio = qlen/(B*T) + txRate/B

where: txBytes: link total transmitted bytes associated with timestamp ts

qlen: link queue length

B: link bandwidth

T: Baseline RTT

Leveraging AECN, the router participates in calculation of the maximum inflight ratio. Each router MUST calculate the inflight ratio of the down link and then compare it to the one in the AECN header and keep the larger one. When the packet arrives at the endpoint, the Congestion Info field of the AECN header already contains the maximum inflight ratio. The sending rate adjustment algorithm remains unchanged. By allowing routers to conduct these calculations, the computing overhead is reduced for the endpoint. Since the update of value is in-place, the packet size remains unchanged regardless of the hops count.

5. Security Considerations

TBD.

6. IANA Considerations

TBD.

7. References

7.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[RFC8200]
Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", STD 86, RFC 8200, DOI 10.17487/RFC8200, , <https://www.rfc-editor.org/rfc/rfc8200>.

7.2. Informative References

[CONGA]
Alizadeh, M., Edsall, T., Dharmapurikar, S., Vaidyanathan, R., Chu, K., Fingerhut, A., Lam, V., Matus, F., Pan, R., Yadav, N., and G. Varghese, "CONGA: distributed congestion-aware load balancing for datacenters", Proceedings of the 2014 ACM conference on SIGCOMM, DOI 10.1145/2619239, , <https://doi.org/10.1145/2619239>.
[I-D.draft-an-ccwg-hpcc]
An, Q., Gao, J., Anubolu, S., Pan, R., Lee, J., Gafni, B., Shpigelman, Y., Tantsura, J., and G. Caspary, "HPCC++: Enhanced High Precision Congestion Control", Work in Progress, Internet-Draft, draft-an-ccwg-hpcc-00, , <https://datatracker.ietf.org/doc/html/draft-an-ccwg-hpcc-00>.

Authors' Addresses

Hang Shi (editor)
Huawei
Beijing
China
Tianran Zhou
Huawei