欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

TCP拥塞窗口验证

程序员文章站 2022-07-13 18:06:01
...

如果在一个RTO时长内,拥塞窗口没有被完全的使用,TCP发送端将减小拥塞窗口。因为此时TCP发送端的拥塞窗口可能并非当前的网络状况,所以发送端应减小拥塞窗口。根据RFC2861,ssthresh应设置为其当前值与3/4倍的拥塞窗口值两者之间的最大值,而拥塞窗口设置为实际使用的量和当前拥塞窗口值之和的一半。

在如下发送函数tcp_write_xmit中,如果实际执行了发送报文操作,即sent_pkts数量不为零,在发送之后,调用tcp_cwnd_validate函数验证拥塞窗口。其参数is_cwnd_limited表明报文发送是否被拥塞窗口所限,其由两个部分决定,其一是函数tcp_tso_should_defer中的赋值;其二是判断网络中的报文数量是否大于拥塞窗口,两处赋值为或的关系。

static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, int push_one, gfp_t gfp)
{
    max_segs = tcp_tso_segs(sk, mss_now);
    while ((skb = tcp_send_head(sk))) {
        ...
        tso_segs = tcp_init_tso_segs(skb, mss_now);

        if (tso_segs == 1) {
        } else {
            if (!push_one &&
                tcp_tso_should_defer(sk, skb, &is_cwnd_limited,
                         &is_rwnd_limited, max_segs))
                break;
        }
        ...
    }
    ...
    if (likely(sent_pkts)) {
        ...
        is_cwnd_limited |= (tcp_packets_in_flight(tp) >= tp->snd_cwnd);
        tcp_cwnd_validate(sk, is_cwnd_limited);
        return false;
    }

以下为tcp_tso_should_defer函数,在发送单个报文时不执行。如果判断到拥塞窗口小于发送窗口,并且拥塞窗口小于等于报文长度时,意味着当前报文不能发送,设置拥塞窗口限制标志is_cwnd_limited。

static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
                 bool *is_cwnd_limited, bool *is_rwnd_limited, u32 max_segs)
{
    send_win = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;

    /* From in_flight test above, we know that cwnd > in_flight.  */
    cong_win = (tp->snd_cwnd - in_flight) * tp->mss_cache;

    ...
    /* Ok, it looks like it is advisable to defer.
     * Three cases are tracked :
     * 1) We are cwnd-limited
     * 2) We are rwnd-limited
     * 3) We are application limited.
     */
    if (cong_win < send_win) {
        if (cong_win <= skb->len) {
            *is_cwnd_limited = true;
            return true;
        }
    } else {

以下拥塞窗口验证函数,第一次执行此函数时,max_packets_out和max_packets_seq均未赋值,分别为两者赋值为packets_out和SND.NXT的值。之后再次执行此函数时,只有进入下一个发送窗口期或者发送报文大于记录值时进行更新。由此可见,前者max_packets_out记录的为上一个窗口发送的最大报文数量;而后者max_packets_seq记录的为最大的发送序号。

static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
{
    const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
    struct tcp_sock *tp = tcp_sk(sk);

    /* Track the maximum number of outstanding packets in each
     * window, and remember whether we were cwnd-limited then.
     */
    if (!before(tp->snd_una, tp->max_packets_seq) ||
        tp->packets_out > tp->max_packets_out) {
        tp->max_packets_out = tp->packets_out;
        tp->max_packets_seq = tp->snd_nxt;
        tp->is_cwnd_limited = is_cwnd_limited;
    }

参数is_cwnd_limited记录了上一个发送窗口期是否受到了拥塞窗口的限制。函数tcp_is_cwnd_limited判断连接的发送是否受限于拥塞窗口,为真表明当前发送使用了全部可用网络资源,反之,表明存在空闲的网络资源。

在后一种情况下,记录当前网络中报文数量到变量snd_cwnd_used中。如果内核配置开启了在空闲时长超过RTO之后,复位拥塞窗口的功能,即tcp_slow_start_after_idle为真,并且空闲时长大于等于RTO,并且拥塞控制算法未定义相关处理,这里调用tcp_cwnd_application_limited函数,处理应用限速的情况。

    if (tcp_is_cwnd_limited(sk)) {
        /* Network is feed fully. */
        tp->snd_cwnd_used = 0;
        tp->snd_cwnd_stamp = tcp_jiffies32;
    } else {
        /* Network starves. */
        if (tp->packets_out > tp->snd_cwnd_used)
            tp->snd_cwnd_used = tp->packets_out;

        if (sock_net(sk)->ipv4.sysctl_tcp_slow_start_after_idle &&
            (s32)(tcp_jiffies32 - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto &&
            !ca_ops->cong_control)
            tcp_cwnd_application_limited(sk);

以下检查是否为发送端缓存不足引起的空闲,首先排除拥塞窗口的原因,其次是发送队列为空,而且应用程序遇到了缓存限值,记录下发送缓存限值标志TCP_CHRONO_SNDBUF_LIMITED。

        /* The following conditions together indicate the starvation
         * is caused by insufficient sender buffer:
         * 1) just sent some data (see tcp_write_xmit)
         * 2) not cwnd limited (this else condition)
         * 3) no more data to send (tcp_write_queue_empty())
         * 4) application is hitting buffer limit (SOCK_NOSPACE)
         */
        if (tcp_write_queue_empty(sk) && sk->sk_socket &&
            test_bit(SOCK_NOSPACE, &sk->sk_socket->flags) &&
            (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT))
            tcp_chrono_start(sk, TCP_CHRONO_SNDBUF_LIMITED);

对于是否为拥塞窗口受限,内核的判断与RFC2861略有不同,RFC2861建议如果CWND没有全部的使用,不应增加其值,这正符合内核在拥塞避免阶段的实现。但是,对于慢启动阶段,内核允许拥塞窗口增长到使用量的一倍。参见tcp_is_cwnd_limited相关注释,在初始窗口为10,发送了9个报文后,如果所有报文都被确认了,将窗口增加到18。这将有利于限速应用程序更好的探测网络带宽。

/* We follow the spirit of RFC2861 to validate cwnd but implement a more
 * flexible approach. The RFC suggests cwnd should not be raised unless
 * it was fully used previously. And that's exactly what we do in
 * congestion avoidance mode. But in slow start we allow cwnd to grow
 * as long as the application has used half the cwnd.
 * Example :
 *    cwnd is 10 (IW10), but application sends 9 frames.
 *    We allow cwnd to reach 18 when all frames are ACKed.
 * This check is safe because it's as aggressive as slow start which already
 * risks 100% overshoot. The advantage is that we discourage application to
 * either send more filler packets or data to artificially blow up the cwnd
 * usage, and allow application-limited process to probe bw more aggressively.
 */
static inline bool tcp_is_cwnd_limited(const struct sock *sk)
{
    const struct tcp_sock *tp = tcp_sk(sk);

    /* If in slow start, ensure cwnd grows to twice what was ACKed. */
    if (tcp_in_slow_start(tp))
        return tp->snd_cwnd < 2 * tp->max_packets_out;

    return tp->is_cwnd_limited;
}

以下函数tcp_cwnd_application_limited在网络空闲(未充分利用)RTO时长之后调整拥塞窗口,此调整不针对重传阶段,以及应用程序受到发送缓存限值的情况。首先获得窗口的使用情况,取值为初始窗口值和tcp_cwnd_validate函数中记录的使用值snd_cwnd_used之间的较大值。拥塞窗口值调整为原窗口值与窗口使用值之和的一半。

/* RFC2861, slow part. Adjust cwnd, after it was not full during one rto.
 * As additional protections, we do not touch cwnd in retransmission phases,
 * and if application hit its sndbuf limit recently.
 */
static void tcp_cwnd_application_limited(struct sock *sk)
{
    struct tcp_sock *tp = tcp_sk(sk);

    if (inet_csk(sk)->icsk_ca_state == TCP_CA_Open &&
        sk->sk_socket && !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
        /* Limited by application or receiver window. */
        u32 init_win = tcp_init_cwnd(tp, __sk_dst_get(sk));
        u32 win_used = max(tp->snd_cwnd_used, init_win);
        if (win_used < tp->snd_cwnd) {
            tp->snd_ssthresh = tcp_current_ssthresh(sk);
            tp->snd_cwnd = (tp->snd_cwnd + win_used) >> 1;
        }
        tp->snd_cwnd_used = 0;
    }
    tp->snd_cwnd_stamp = tcp_jiffies32;
}

内核版本 5.0