欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

记录一次线上内存问题的排查过程

程序员文章站 2022-07-15 08:04:40
...

所用工具MAT、IDEA

一、发现问题

线上有一个微服务内存已经将近90%,回收不过来,导致频繁gc,cpu也跟着从20%升至40%。

先临时升级机器内存,情况得到缓解,内存回到50%,cpu也降了下来,但是内存还在缓慢增长。

二、定位问题

第一步:怀疑有内存泄露。连续2天,dump了2份该微服务内存。

发现有一个ConcurrentHashMap持有的内存特别大,占整个堆内存的一半。查看这个类下具体的对象内容,发现它的key是dubbo.monitor下的Statistics类。

记录一次线上内存问题的排查过程
记录一次线上内存问题的排查过程
查看dubbo源码,发现dubbo.monitor会对每次接口调用进行统计,记录哪个client调用哪个server的哪个method多少次。

public class DubboMonitor implements Monitor {

    private static final Logger logger = LoggerFactory.getLogger(DubboMonitor.class);

    /**
     * The length of the array which is a container of the statistics
     */
    private static final int LENGTH = 10;

    /**
     * The timer for sending statistics
     */
    private final ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(3, new NamedThreadFactory("DubboMonitorSendTimer", true));

    /**
     * The future that can cancel the <b>scheduledExecutorService</b>
     */
    private final ScheduledFuture<?> sendFuture;

    private final Invoker<MonitorService> monitorInvoker;

    private final MonitorService monitorService;

    /**
     * The time interval for timer <b>scheduledExecutorService</b> to send data
     */
    private final long monitorInterval;

    private final ConcurrentMap<Statistics, AtomicReference<long[]>> statisticsMap = new ConcurrentHashMap<Statistics, AtomicReference<long[]>>();

    public DubboMonitor(Invoker<MonitorService> monitorInvoker, MonitorService monitorService) {
        this.monitorInvoker = monitorInvoker;
        this.monitorService = monitorService;
        this.monitorInterval = monitorInvoker.getUrl().getPositiveParameter("interval", 60000);
        // collect timer for collecting statistics data
        sendFuture = scheduledExecutorService.scheduleWithFixedDelay(() -> {
            try {
                // collect data
                send();
            } catch (Throwable t) {
                logger.error("Unexpected error occur at send statistic, cause: " + t.getMessage(), t);
            }
        }, monitorInterval, monitorInterval, TimeUnit.MILLISECONDS);
    }

    public void send() {
        logger.debug("Send statistics to monitor " + getUrl());
        String timestamp = String.valueOf(System.currentTimeMillis());
        for (Map.Entry<Statistics, AtomicReference<long[]>> entry : statisticsMap.entrySet()) {
            // get statistics data
            Statistics statistics = entry.getKey();
            AtomicReference<long[]> reference = entry.getValue();
            long[] numbers = reference.get();
            long success = numbers[0];
            long failure = numbers[1];
            long input = numbers[2];
            long output = numbers[3];
            long elapsed = numbers[4];
            long concurrent = numbers[5];
            long maxInput = numbers[6];
            long maxOutput = numbers[7];
            long maxElapsed = numbers[8];
            long maxConcurrent = numbers[9];
            String protocol = getUrl().getParameter(DEFAULT_PROTOCOL);

            // send statistics data
            URL url = statistics.getUrl()
                    .addParameters(MonitorService.TIMESTAMP, timestamp,
                            MonitorService.SUCCESS, String.valueOf(success),
                            MonitorService.FAILURE, String.valueOf(failure),
                            MonitorService.INPUT, String.valueOf(input),
                            MonitorService.OUTPUT, String.valueOf(output),
                            MonitorService.ELAPSED, String.valueOf(elapsed),
                            MonitorService.CONCURRENT, String.valueOf(concurrent),
                            MonitorService.MAX_INPUT, String.valueOf(maxInput),
                            MonitorService.MAX_OUTPUT, String.valueOf(maxOutput),
                            MonitorService.MAX_ELAPSED, String.valueOf(maxElapsed),
                            MonitorService.MAX_CONCURRENT, String.valueOf(maxConcurrent),
                            DEFAULT_PROTOCOL, protocol
                    );
            monitorService.collect(url);

            // reset
            long[] current;
            long[] update = new long[LENGTH];
            do {
                current = reference.get();
                if (current == null) {
                    update[0] = 0;
                    update[1] = 0;
                    update[2] = 0;
                    update[3] = 0;
                    update[4] = 0;
                    update[5] = 0;
                } else {
                    update[0] = current[0] - success;
                    update[1] = current[1] - failure;
                    update[2] = current[2] - input;
                    update[3] = current[3] - output;
                    update[4] = current[4] - elapsed;
                    update[5] = current[5] - concurrent;
                }
            } while (!reference.compareAndSet(current, update));
        }
    }

    @Override
    public void collect(URL url) {
        // data to collect from url
        int success = url.getParameter(MonitorService.SUCCESS, 0);
        int failure = url.getParameter(MonitorService.FAILURE, 0);
        int input = url.getParameter(MonitorService.INPUT, 0);
        int output = url.getParameter(MonitorService.OUTPUT, 0);
        int elapsed = url.getParameter(MonitorService.ELAPSED, 0);
        int concurrent = url.getParameter(MonitorService.CONCURRENT, 0);
        // init atomic reference
        Statistics statistics = new Statistics(url);
        AtomicReference<long[]> reference = statisticsMap.get(statistics);
        if (reference == null) {
            statisticsMap.putIfAbsent(statistics, new AtomicReference<long[]>());
            reference = statisticsMap.get(statistics);
        }
        // use CompareAndSet to sum
        long[] current;
        long[] update = new long[LENGTH];
        do {
            current = reference.get();
            if (current == null) {
                update[0] = success;
                update[1] = failure;
                update[2] = input;
                update[3] = output;
                update[4] = elapsed;
                update[5] = concurrent;
                update[6] = input;
                update[7] = output;
                update[8] = elapsed;
                update[9] = concurrent;
            } else {
                update[0] = current[0] + success;
                update[1] = current[1] + failure;
                update[2] = current[2] + input;
                update[3] = current[3] + output;
                update[4] = current[4] + elapsed;
                update[5] = (current[5] + concurrent) / 2;
                update[6] = current[6] > input ? current[6] : input;
                update[7] = current[7] > output ? current[7] : output;
                update[8] = current[8] > elapsed ? current[8] : elapsed;
                update[9] = current[9] > concurrent ? current[9] : concurrent;
            }
        } while (!reference.compareAndSet(current, update));
    }

    @Override
    public List<URL> lookup(URL query) {
        return monitorService.lookup(query);
    }

    @Override
    public URL getUrl() {
        return monitorInvoker.getUrl();
    }

    @Override
    public boolean isAvailable() {
        return monitorInvoker.isAvailable();
    }

    @Override
    public void destroy() {
        try {
            ExecutorUtil.cancelScheduledFuture(sendFuture);
        } catch (Throwable t) {
            logger.error("Unexpected error occur at cancel sender timer, cause: " + t.getMessage(), t);
        }
        monitorInvoker.destroy();
    }

}

看着好像没什么问题,不存在内存泄露,这应该是dubbo的常规操作。但是为了统计这些东西占用了这么多堆空间,dubbo设计的是不是不合理啊…… 别的微服务堆内存也是这样么?为什么它们的内存没有出现缓慢增长?

第二步:dump了另外2个正常微服务内存对比看看。

一看!震惊了!!另外2个微服务里没有发现一个持有特别大的内存的ConcurrentHashMap!!!

按照占用内存排序,只有一个DubboMonitor和dubbo相关,占得内存稍微多点,但和那个ConcurrentHashMap也不是一个量级的。

打开DubboMonitor查看里面的对象,发现和ConcurrentHashMap存储的内容是类似的,也是记录了dubbo的监控信息。

记录一次线上内存问题的排查过程

那就出现了2个问题:

  1. 为什么异常微服务和正常微服务dubbo监控的内容结构不一样?
  2. 同样是记录dubbo的监控信息,这些微服务都是2C的,存储的内存结构也是类似的,那为什么占用内存差这么多?

第1个问题,在对照代码后发现,异常微服务用的是Apache的dubbo2.7.3(阿里版、Apache版),正常微服务是用的Alibaba的dubbo2.8.4(当当版)

dubbo的版本比较混乱,这里简单介绍一下
dubbo最开始是Alibaba做的,后来贡献给了Apache。目前最新的版本是2.7.6
另外,还有一个当当版的分支,项目名称叫做dubbox。但是因为是基于Alibaba dubbo开发的,所以包的名称还都是alibaba.dubbo。目前最新的版本是2.8.4

第2个问题,再详细分析它们的存储的内容后,发现一个问题:异常微服务的dubbo监控信息里的client的ip都是公网ip(推断是用户ip),正常微服务的dubbo监控信息里的client的ip都是私网ip。(与运维核对后发现是网关ip)用户ip和网关ip当然不是一个数量级的,所有异常微服务的dubbo监控使用了大量的内存……

第三步:看源码。为什么一个记录的用户ip,一个记录的网关ip?

client是从哪里取的值呢?应该是remote_addr或者header.x-forwarded-for

如果从remote_addr里取值的话应该取的是网关的ip,如果从header.x-forwarded-for里取值应该取的是用户的真实ip

看dubbo源码,发现两个版本的dubbo的MonitorFilter处理流程确实是不一样的:

Apache版的MonitorFilter

public class MonitorFilter extends ListenableFilter {

...

	@Override
    public Result invoke(Invoker<?> invoker, Invocation invocation) throws RpcException {
        if (invoker.getUrl().hasParameter(MONITOR_KEY)) {
            invocation.setAttachment(MONITOR_FILTER_START_TIME, String.valueOf(System.currentTimeMillis()));
            getConcurrent(invoker, invocation).incrementAndGet(); // count up
        }
        return invoker.invoke(invocation); // proceed invocation chain
    }
	
...

	class MonitorListener implements Listener {

        @Override
        public void onResponse(Result result, Invoker<?> invoker, Invocation invocation) {
            if (invoker.getUrl().hasParameter(MONITOR_KEY)) {
                collect(invoker, invocation, result, RpcContext.getContext().getRemoteHost(), Long.valueOf(invocation.getAttachment(MONITOR_FILTER_START_TIME)), false);
                getConcurrent(invoker, invocation).decrementAndGet(); // count down
            }
        }

        @Override
        public void onError(Throwable t, Invoker<?> invoker, Invocation invocation) {
            if (invoker.getUrl().hasParameter(MONITOR_KEY)) {
                collect(invoker, invocation, null, RpcContext.getContext().getRemoteHost(), Long.valueOf(invocation.getAttachment(MONITOR_FILTER_START_TIME)), true);
                getConcurrent(invoker, invocation).decrementAndGet(); // count down
            }
        }

...
	
}

当当版的MonitorFilter

public class MonitorFilter implements Filter {

...

	public Result invoke(Invoker<?> invoker, Invocation invocation) throws RpcException {
        if (invoker.getUrl().hasParameter(Constants.MONITOR_KEY)) {
            RpcContext context = RpcContext.getContext(); // 提供方必须在invoke()之前获取context信息
            long start = System.currentTimeMillis(); // 记录起始时间戮
            getConcurrent(invoker, invocation).incrementAndGet(); // 并发计数
            try {
                Result result = invoker.invoke(invocation); // 让调用链往下执行
                collect(invoker, invocation, result, context, start, false);
                return result;
            } catch (RpcException e) {
                collect(invoker, invocation, null, context, start, true);
                throw e;
            } finally {
                getConcurrent(invoker, invocation).decrementAndGet(); // 并发计数
            }
        } else {
            return invoker.invoke(invocation);
        }
    }

...

}

最主要的区别是:context的获取时机不一样,Apache版是在所有的invoke都调用完执行collect时才获取context;当当版是在invoke执行之前先保存了一份context。为什么要提前保存一份,难道在执行invoke的时候会改变context的值么?其实两个版本collect函数内部的处理和对context处理都有不同,但是后来发现不是这里的问题,所以这里不再展开

第四步:看代码无果,开启debug调试大法(真相在即)

本地debug,追踪ip来源。

发现线程栈帧不同,异常微服务多了一个栈帧RemoteIpValue
记录一次线上内存问题的排查过程

看源码发现是tomcat的类,用来从header.x-forwarded-for里获取用户ip的


public class RemoteIpValve extends ValveBase {

...

    public void invoke(Request request, Response response) throws IOException, ServletException {
        final String originalRemoteAddr = request.getRemoteAddr();
        final String originalRemoteHost = request.getRemoteHost();
        final String originalScheme = request.getScheme();
        final boolean originalSecure = request.isSecure();
        final int originalServerPort = request.getServerPort();
        final String originalProxiesHeader = request.getHeader(proxiesHeader);
        final String originalRemoteIpHeader = request.getHeader(remoteIpHeader);
        boolean isInternal = internalProxies != null &&
                internalProxies.matcher(originalRemoteAddr).matches();

        if (isInternal || (trustedProxies != null &&
                trustedProxies.matcher(originalRemoteAddr).matches())) {
            String remoteIp = null;
            // In java 6, proxiesHeaderValue should be declared as a java.util.Deque
            LinkedList<String> proxiesHeaderValue = new LinkedList<>();
            StringBuilder concatRemoteIpHeaderValue = new StringBuilder();

            for (Enumeration<String> e = request.getHeaders(remoteIpHeader); e.hasMoreElements();) {
                if (concatRemoteIpHeaderValue.length() > 0) {
                    concatRemoteIpHeaderValue.append(", ");
                }

                concatRemoteIpHeaderValue.append(e.nextElement());
            }

            String[] remoteIpHeaderValue = commaDelimitedListToStringArray(concatRemoteIpHeaderValue.toString());
            int idx;
            if (!isInternal) {
                proxiesHeaderValue.addFirst(originalRemoteAddr);
            }
            // loop on remoteIpHeaderValue to find the first trusted remote ip and to build the proxies chain
            for (idx = remoteIpHeaderValue.length - 1; idx >= 0; idx--) {
                String currentRemoteIp = remoteIpHeaderValue[idx];
                remoteIp = currentRemoteIp;
                if (internalProxies !=null && internalProxies.matcher(currentRemoteIp).matches()) {
                    // do nothing, internalProxies IPs are not appended to the
                } else if (trustedProxies != null &&
                        trustedProxies.matcher(currentRemoteIp).matches()) {
                    proxiesHeaderValue.addFirst(currentRemoteIp);
                } else {
                    idx--; // decrement idx because break statement doesn't do it
                    break;
                }
            }
            // continue to loop on remoteIpHeaderValue to build the new value of the remoteIpHeader
            LinkedList<String> newRemoteIpHeaderValue = new LinkedList<>();
            for (; idx >= 0; idx--) {
                String currentRemoteIp = remoteIpHeaderValue[idx];
                newRemoteIpHeaderValue.addFirst(currentRemoteIp);
            }
            if (remoteIp != null) {

                request.setRemoteAddr(remoteIp);
                request.setRemoteHost(remoteIp);

                if (proxiesHeaderValue.size() == 0) {
                    request.getCoyoteRequest().getMimeHeaders().removeHeader(proxiesHeader);
                } else {
                    String commaDelimitedListOfProxies = listToCommaDelimitedString(proxiesHeaderValue);
                    request.getCoyoteRequest().getMimeHeaders().setValue(proxiesHeader).setString(commaDelimitedListOfProxies);
                }
                if (newRemoteIpHeaderValue.size() == 0) {
                    request.getCoyoteRequest().getMimeHeaders().removeHeader(remoteIpHeader);
                } else {
                    String commaDelimitedRemoteIpHeaderValue = listToCommaDelimitedString(newRemoteIpHeaderValue);
                    request.getCoyoteRequest().getMimeHeaders().setValue(remoteIpHeader).setString(commaDelimitedRemoteIpHeaderValue);
                }
            }

            if (protocolHeader != null) {
                String protocolHeaderValue = request.getHeader(protocolHeader);
                if (protocolHeaderValue == null) {
                    // Don't modify the secure, scheme and serverPort attributes
                    // of the request
                } else if (isForwardedProtoHeaderValueSecure(protocolHeaderValue)) {
                    request.setSecure(true);
                    request.getCoyoteRequest().scheme().setString("https");
                    setPorts(request, httpsServerPort);
                } else {
                    request.setSecure(false);
                    request.getCoyoteRequest().scheme().setString("http");
                    setPorts(request, httpServerPort);
                }
            }

            if (log.isDebugEnabled()) {
                log.debug("Incoming request " + request.getRequestURI() + " with originalRemoteAddr '" + originalRemoteAddr
                          + "', originalRemoteHost='" + originalRemoteHost + "', originalSecure='" + originalSecure + "', originalScheme='"
                          + originalScheme + "' will be seen as newRemoteAddr='" + request.getRemoteAddr() + "', newRemoteHost='"
                          + request.getRemoteHost() + "', newScheme='" + request.getScheme() + "', newSecure='" + request.isSecure() + "'");
            }
        } else {
            if (log.isDebugEnabled()) {
                log.debug("Skip RemoteIpValve for request " + request.getRequestURI() + " with originalRemoteAddr '"
                        + request.getRemoteAddr() + "'");
            }
        }
        if (requestAttributesEnabled) {
            request.setAttribute(AccessLog.REMOTE_ADDR_ATTRIBUTE,
                    request.getRemoteAddr());
            request.setAttribute(Globals.REMOTE_ADDR_ATTRIBUTE,
                    request.getRemoteAddr());
            request.setAttribute(AccessLog.REMOTE_HOST_ATTRIBUTE,
                    request.getRemoteHost());
            request.setAttribute(AccessLog.PROTOCOL_ATTRIBUTE,
                    request.getProtocol());
            request.setAttribute(AccessLog.SERVER_PORT_ATTRIBUTE,
                    Integer.valueOf(request.getServerPort()));
        }
        try {
            getNext().invoke(request, response);
        } finally {
            request.setRemoteAddr(originalRemoteAddr);
            request.setRemoteHost(originalRemoteHost);
            request.setSecure(originalSecure);
            request.getCoyoteRequest().scheme().setString(originalScheme);
            request.setServerPort(originalServerPort);

            MimeHeaders headers = request.getCoyoteRequest().getMimeHeaders();
            if (originalProxiesHeader == null || originalProxiesHeader.length() == 0) {
                headers.removeHeader(proxiesHeader);
            } else {
                headers.setValue(proxiesHeader).setString(originalProxiesHeader);
            }

            if (originalRemoteIpHeader == null || originalRemoteIpHeader.length() == 0) {
                headers.removeHeader(remoteIpHeader);
            } else {
                headers.setValue(remoteIpHeader).setString(originalRemoteIpHeader);
            }
        }
    }

...

}

为什么异常微服务调用了RemoteIpValve,而正常微服务没调???

结果发现异常微服务和正常微服务的springboot版本不一样!它们对用户ip的处理逻辑是不一样的 (╯‵□′)╯︵┻━┻

异常微服务用的springboot2.2.2,而正常微服务用的springboot1.5.22

springboot2.2.2有什么特殊处理么?是的,请看:

虽然,ServerProperties类里配置了默认是不使用header.x-forwarded-for的

public class ServerProperties {

...

	/**
	 * Strategy for handling X-Forwarded-* headers.
	 */
	private ForwardHeadersStrategy forwardHeadersStrategy = ForwardHeadersStrategy.NONE;
	
...

}

但是,TomcatWebServerFactoryCustomizer类里有个判断,看当前平台是否为云平台,如果是则会使用header.x-forwarded-for

public class TomcatWebServerFactoryCustomizer implements WebServerFactoryCustomizer<ConfigurableTomcatWebServerFactory>, Ordered {

...

	private boolean getOrDeduceUseForwardHeaders() {
		if (this.serverProperties.getForwardHeadersStrategy().equals(ServerProperties.ForwardHeadersStrategy.NONE)) {
			CloudPlatform platform = CloudPlatform.getActive(this.environment);
			return platform != null && platform.isUsingForwardHeaders();
		}
		return this.serverProperties.getForwardHeadersStrategy().equals(ServerProperties.ForwardHeadersStrategy.NATIVE);
	}

	private void customizeRemoteIpValve(ConfigurableTomcatWebServerFactory factory) {
		Tomcat tomcatProperties = this.serverProperties.getTomcat();
		String protocolHeader = tomcatProperties.getProtocolHeader();
		String remoteIpHeader = tomcatProperties.getRemoteIpHeader();
		// For back compatibility the valve is also enabled if protocol-header is set
		if (StringUtils.hasText(protocolHeader) || StringUtils.hasText(remoteIpHeader)
				|| getOrDeduceUseForwardHeaders()) {
			RemoteIpValve valve = new RemoteIpValve();
			valve.setProtocolHeader(StringUtils.hasLength(protocolHeader) ? protocolHeader : "X-Forwarded-Proto");
			if (StringUtils.hasLength(remoteIpHeader)) {
				valve.setRemoteIpHeader(remoteIpHeader);
			}
			// The internal proxies default to a white list of "safe" internal IP
			// addresses
			valve.setInternalProxies(tomcatProperties.getInternalProxies());
			valve.setHostHeader(tomcatProperties.getHostHeader());
			valve.setPortHeader(tomcatProperties.getPortHeader());
			valve.setProtocolHeaderHttpsValue(tomcatProperties.getProtocolHeaderHttpsValue());
			// ... so it's safe to add this valve by default.
			factory.addEngineValves(valve);
		}
	}
...

}

springboot列举了几种常见的云平台类型:


/**
 * Simple detection for well known cloud platforms. For more advanced cloud provider
 * integration consider the Spring Cloud project.
 *
 * @author Phillip Webb
 * @since 1.3.0
 * @see "https://cloud.spring.io"
 */
public enum CloudPlatform {

	/**
	 * Cloud Foundry platform.
	 */
	CLOUD_FOUNDRY {

		@Override
		public boolean isActive(Environment environment) {
			return environment.containsProperty("VCAP_APPLICATION") || environment.containsProperty("VCAP_SERVICES");
		}

	},

	/**
	 * Heroku platform.
	 */
	HEROKU {

		@Override
		public boolean isActive(Environment environment) {
			return environment.containsProperty("DYNO");
		}

	},

	/**
	 * SAP Cloud platform.
	 */
	SAP {

		@Override
		public boolean isActive(Environment environment) {
			return environment.containsProperty("HC_LANDSCAPE");
		}

	},

	/**
	 * Kubernetes platform.
	 */
	KUBERNETES {

		private static final String SERVICE_HOST_SUFFIX = "_SERVICE_HOST";

		private static final String SERVICE_PORT_SUFFIX = "_SERVICE_PORT";

		@Override
		public boolean isActive(Environment environment) {
			if (environment instanceof ConfigurableEnvironment) {
				return isActive((ConfigurableEnvironment) environment);
			}
			return false;
		}

		private boolean isActive(ConfigurableEnvironment environment) {
			PropertySource<?> environmentPropertySource = environment.getPropertySources()
					.get(StandardEnvironment.SYSTEM_ENVIRONMENT_PROPERTY_SOURCE_NAME);
			if (environmentPropertySource instanceof EnumerablePropertySource) {
				return isActive((EnumerablePropertySource<?>) environmentPropertySource);
			}
			return false;
		}

		private boolean isActive(EnumerablePropertySource<?> environmentPropertySource) {
			for (String propertyName : environmentPropertySource.getPropertyNames()) {
				if (propertyName.endsWith(SERVICE_HOST_SUFFIX)) {
					String serviceName = propertyName.substring(0,
							propertyName.length() - SERVICE_HOST_SUFFIX.length());
					if (environmentPropertySource.getProperty(serviceName + SERVICE_PORT_SUFFIX) != null) {
						return true;
					}
				}
			}
			return false;
		}

	};

	/**
	 * Determines if the platform is active (i.e. the application is running in it).
	 * @param environment the environment
	 * @return if the platform is active.
	 */
	public abstract boolean isActive(Environment environment);

	/**
	 * Returns if the platform is behind a load balancer and uses
	 * {@literal X-Forwarded-For} headers.
	 * @return if {@literal X-Forwarded-For} headers are used
	 */
	public boolean isUsingForwardHeaders() {
		return true;
	}

	/**
	 * Returns the active {@link CloudPlatform} or {@code null} if one cannot be deduced.
	 * @param environment the environment
	 * @return the {@link CloudPlatform} or {@code null}
	 */
	public static CloudPlatform getActive(Environment environment) {
		if (environment != null) {
			for (CloudPlatform cloudPlatform : values()) {
				if (cloudPlatform.isActive(environment)) {
					return cloudPlatform;
				}
			}
		}
		return null;
	}

}

如果系统的环境变量中同时含有_SERVICE_HOST_SERVICE_PORT结尾的系统变量则认为是KUBERNETES平台。isUsingForwardHeaders返回true,所以CloudPlatform中所有的云平台都会使用header.x-forwarded-for。原因是:springboot2认为云平台都有load balancer,所以X-Forwarded-For headers are used。(见isUsingForwardHeaders函数的注释)

记录一次线上内存问题的排查过程
我们刚好用的就是KUBERNETES平台!
系统的环境变量中刚好就是同时含有_SERVICE_HOST_SERVICE_PORT结尾的系统变量!!
DUBBO_MONITOR_27_SERVICE_PRODUCTION_SERVICE_HOST
DUBBO_MONITOR_27_SERVICE_PRODUCTION_SERVICE_PORT
DUBBO_MONITOR_SERVICE_PRODUCTION_MICRO_SERVICE_SERVICE_HOST
DUBBO_MONITOR_SERVICE_PRODUCTION_MICRO_SERVICE_SERVICE_PORT
就是这么巧 !!!

记录一次线上内存问题的排查过程

在springboot1.5.22中的代码与springboot2.2.2略有不同,但是逻辑类似。真正的差异是CloudPlatform的范围不同!!

public enum CloudPlatform {

	/**
	 * Cloud Foundry platform.
	 */
	CLOUD_FOUNDRY {

		@Override
		public boolean isActive(Environment environment) {
			return environment.containsProperty("VCAP_APPLICATION") || environment.containsProperty("VCAP_SERVICES");
		}

	},

	/**
	 * Heroku platform.
	 */
	HEROKU {

		@Override
		public boolean isActive(Environment environment) {
			return environment.containsProperty("DYNO");
		}

	};

	/**
	 * Determines if the platform is active (i.e. the application is running in it).
	 * @param environment the environment
	 * @return if the platform is active.
	 */
	public abstract boolean isActive(Environment environment);

	/**
	 * Returns if the platform is behind a load balancer and uses
	 * {@literal X-Forwarded-For} headers.
	 * @return if {@literal X-Forwarded-For} headers are used
	 */
	public boolean isUsingForwardHeaders() {
		return true;
	}

	/**
	 * Returns the active {@link CloudPlatform} or {@code null} if one cannot be deduced.
	 * @param environment the environment
	 * @return the {@link CloudPlatform} or {@code null}
	 */
	public static CloudPlatform getActive(Environment environment) {
		if (environment != null) {
			for (CloudPlatform cloudPlatform : values()) {
				if (cloudPlatform.isActive(environment)) {
					return cloudPlatform;
				}
			}
		}
		return null;
	}

}

三、找到问题

1. springboot获取用户ip的配置

  • 默认使用remote_addr

  • 当系统运行在云平台上时,默认使用header.x-forwarded-for,如果想关闭需要如下配置

    server.forward-headers-strategy=FRAMEWORK
    
    // server.forward-headers-strategy的取值范围
    public enum ForwardHeadersStrategy {
    	NATIVE,  // 强制使用header.x-forwarded-for
    	FRAMEWORK, // 强制使用remote_addr
    	NONE // 默认(普通运行使用remote_addr,云平台运行使用header.x-forwarded-for)
    }
    
  • 用户可以通过配置来指定从哪个Header字段中获取用户ip

    server.use-forward-headers=true
    server.tomcat.protocol-header=X-Forwarded-Proto  # 这行可以省略
    server.tomcat.remote-ip-header=x-forwarded-for
    

2. springboot升级带来的隐藏变更

springboot2.2.2可以识别的云平台:CLOUD_FOUNDRY、HEROKU、SAP、KUBERNETES
springboot1.5.22可以识别的云平台:CLOUD_FOUNDRY、HEROKU

四、结论

异常的微服务使用了springboot2.2.2并且运行在springboot可以识别的云平台上,所以默认从header.x-forwarded-for中获取到了用户真实ip。导致dubbo监控在记录用户ip的时候记的是用户真实ip,从而占用了大量的内存。