TCP拥塞窗口验证

程序员文章站 2022-07-13 18:06:01

...

如果在一个RTO时长内，拥塞窗口没有被完全的使用，TCP发送端将减小拥塞窗口。因为此时TCP发送端的拥塞窗口可能并非当前的网络状况，所以发送端应减小拥塞窗口。根据RFC2861，ssthresh应设置为其当前值与3/4倍的拥塞窗口值两者之间的最大值，而拥塞窗口设置为实际使用的量和当前拥塞窗口值之和的一半。

在如下发送函数tcp_write_xmit中，如果实际执行了发送报文操作，即sent_pkts数量不为零，在发送之后，调用tcp_cwnd_validate函数验证拥塞窗口。其参数is_cwnd_limited表明报文发送是否被拥塞窗口所限，其由两个部分决定，其一是函数tcp_tso_should_defer中的赋值；其二是判断网络中的报文数量是否大于拥塞窗口，两处赋值为或的关系。

static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, int push_one, gfp_t gfp)
{
    max_segs = tcp_tso_segs(sk, mss_now);
    while ((skb = tcp_send_head(sk))) {
        ...
        tso_segs = tcp_init_tso_segs(skb, mss_now);

        if (tso_segs == 1) {
        } else {
            if (!push_one &&
                tcp_tso_should_defer(sk, skb, &is_cwnd_limited,
                         &is_rwnd_limited, max_segs))
                break;
        }
        ...
    }
    ...
    if (likely(sent_pkts)) {
        ...
        is_cwnd_limited |= (tcp_packets_in_flight(tp) >= tp->snd_cwnd);
        tcp_cwnd_validate(sk, is_cwnd_limited);
        return false;
    }

以下为tcp_tso_should_defer函数，在发送单个报文时不执行。如果判断到拥塞窗口小于发送窗口，并且拥塞窗口小于等于报文长度时，意味着当前报文不能发送，设置拥塞窗口限制标志is_cwnd_limited。

static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
                 bool *is_cwnd_limited, bool *is_rwnd_limited, u32 max_segs)
{
    send_win = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;

    /* From in_flight test above, we know that cwnd > in_flight.  */
    cong_win = (tp->snd_cwnd - in_flight) * tp->mss_cache;

    ...
    /* Ok, it looks like it is advisable to defer.
     * Three cases are tracked :
     * 1) We are cwnd-limited
     * 2) We are rwnd-limited
     * 3) We are application limited.
     */
    if (cong_win < send_win) {
        if (cong_win <= skb->len) {
            *is_cwnd_limited = true;
            return true;
        }
    } else {

以下拥塞窗口验证函数，第一次执行此函数时，max_packets_out和max_packets_seq均未赋值，分别为两者赋值为packets_out和SND.NXT的值。之后再次执行此函数时，只有进入下一个发送窗口期或者发送报文大于记录值时进行更新。由此可见，前者max_packets_out记录的为上一个窗口发送的最大报文数量；而后者max_packets_seq记录的为最大的发送序号。

static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
{
    const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
    struct tcp_sock *tp = tcp_sk(sk);

    /* Track the maximum number of outstanding packets in each
     * window, and remember whether we were cwnd-limited then.
     */
    if (!before(tp->snd_una, tp->max_packets_seq) ||
        tp->packets_out > tp->max_packets_out) {
        tp->max_packets_out = tp->packets_out;
        tp->max_packets_seq = tp->snd_nxt;
        tp->is_cwnd_limited = is_cwnd_limited;
    }

参数is_cwnd_limited记录了上一个发送窗口期是否受到了拥塞窗口的限制。函数tcp_is_cwnd_limited判断连接的发送是否受限于拥塞窗口，为真表明当前发送使用了全部可用网络资源，反之，表明存在空闲的网络资源。

在后一种情况下，记录当前网络中报文数量到变量snd_cwnd_used中。如果内核配置开启了在空闲时长超过RTO之后，复位拥塞窗口的功能，即tcp_slow_start_after_idle为真，并且空闲时长大于等于RTO，并且拥塞控制算法未定义相关处理，这里调用tcp_cwnd_application_limited函数，处理应用限速的情况。

    if (tcp_is_cwnd_limited(sk)) {
        /* Network is feed fully. */
        tp->snd_cwnd_used = 0;
        tp->snd_cwnd_stamp = tcp_jiffies32;
    } else {
        /* Network starves. */
        if (tp->packets_out > tp->snd_cwnd_used)
            tp->snd_cwnd_used = tp->packets_out;

        if (sock_net(sk)->ipv4.sysctl_tcp_slow_start_after_idle &&
            (s32)(tcp_jiffies32 - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto &&
            !ca_ops->cong_control)
            tcp_cwnd_application_limited(sk);

以下检查是否为发送端缓存不足引起的空闲，首先排除拥塞窗口的原因，其次是发送队列为空，而且应用程序遇到了缓存限值，记录下发送缓存限值标志TCP_CHRONO_SNDBUF_LIMITED。

        /* The following conditions together indicate the starvation
         * is caused by insufficient sender buffer:
         * 1) just sent some data (see tcp_write_xmit)
         * 2) not cwnd limited (this else condition)
         * 3) no more data to send (tcp_write_queue_empty())
         * 4) application is hitting buffer limit (SOCK_NOSPACE)
         */
        if (tcp_write_queue_empty(sk) && sk->sk_socket &&
            test_bit(SOCK_NOSPACE, &sk->sk_socket->flags) &&
            (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT))
            tcp_chrono_start(sk, TCP_CHRONO_SNDBUF_LIMITED);

对于是否为拥塞窗口受限，内核的判断与RFC2861略有不同，RFC2861建议如果CWND没有全部的使用，不应增加其值，这正符合内核在拥塞避免阶段的实现。但是，对于慢启动阶段，内核允许拥塞窗口增长到使用量的一倍。参见tcp_is_cwnd_limited相关注释，在初始窗口为10，发送了9个报文后，如果所有报文都被确认了，将窗口增加到18。这将有利于限速应用程序更好的探测网络带宽。

/* We follow the spirit of RFC2861 to validate cwnd but implement a more
 * flexible approach. The RFC suggests cwnd should not be raised unless
 * it was fully used previously. And that's exactly what we do in
 * congestion avoidance mode. But in slow start we allow cwnd to grow
 * as long as the application has used half the cwnd.
 * Example :
 *    cwnd is 10 (IW10), but application sends 9 frames.
 *    We allow cwnd to reach 18 when all frames are ACKed.
 * This check is safe because it's as aggressive as slow start which already
 * risks 100% overshoot. The advantage is that we discourage application to
 * either send more filler packets or data to artificially blow up the cwnd
 * usage, and allow application-limited process to probe bw more aggressively.
 */
static inline bool tcp_is_cwnd_limited(const struct sock *sk)
{
    const struct tcp_sock *tp = tcp_sk(sk);

    /* If in slow start, ensure cwnd grows to twice what was ACKed. */
    if (tcp_in_slow_start(tp))
        return tp->snd_cwnd < 2 * tp->max_packets_out;

    return tp->is_cwnd_limited;
}

以下函数tcp_cwnd_application_limited在网络空闲（未充分利用）RTO时长之后调整拥塞窗口，此调整不针对重传阶段，以及应用程序受到发送缓存限值的情况。首先获得窗口的使用情况，取值为初始窗口值和tcp_cwnd_validate函数中记录的使用值snd_cwnd_used之间的较大值。拥塞窗口值调整为原窗口值与窗口使用值之和的一半。

/* RFC2861, slow part. Adjust cwnd, after it was not full during one rto.
 * As additional protections, we do not touch cwnd in retransmission phases,
 * and if application hit its sndbuf limit recently.
 */
static void tcp_cwnd_application_limited(struct sock *sk)
{
    struct tcp_sock *tp = tcp_sk(sk);

    if (inet_csk(sk)->icsk_ca_state == TCP_CA_Open &&
        sk->sk_socket && !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
        /* Limited by application or receiver window. */
        u32 init_win = tcp_init_cwnd(tp, __sk_dst_get(sk));
        u32 win_used = max(tp->snd_cwnd_used, init_win);
        if (win_used < tp->snd_cwnd) {
            tp->snd_ssthresh = tcp_current_ssthresh(sk);
            tp->snd_cwnd = (tp->snd_cwnd + win_used) >> 1;
        }
        tp->snd_cwnd_used = 0;
    }
    tp->snd_cwnd_stamp = tcp_jiffies32;
}

内核版本 5.0

上一篇： BT656 # cat /proc/umap/vi打印4通道1路工作模式

下一篇：海思HI35XX串口调试

TCP拥塞窗口验证

tcp滑动窗口和读写缓冲区

[TCP/IP] TCP如何实现流量控制和拥塞控制

如何控制弹出一个NTLM验证窗口？

利用ZYNQ SOC快速打开算法验证通路（6）——LWIP实现千兆TCP/IP网络传输

CentOS 7安装TCP BBR拥塞算法

TCP拥塞窗口验证

Linux网络协议栈：用eBPF写TCP拥塞控制算法

TCP拥塞控制ABC Appropriate Byte Counting 的利弊说

TCP的拥塞控制

怎么配置TCP BBR作为默认的拥塞控制算法？