tomcat 源码研究之http报文解析
程序员文章站
2022-07-12 16:41:25
...
一,我们知道tomcat作为web服务器 已经在行业中应用多年了,深入各方面。但是作为一个java方向的web开发者,我们是否对其源码有过深入的研究呢?当然了tomcat本身涉及的知识点很多,我在这里只是讲他的一方面应用。
众所周知,客户端发送了http请求以后,tomcat服务器会接受到http请求封装成request,但是tomcat是怎么进行解析的呢?下面就是我要讲述的内容。
二,再说解析http请求报文之前我们先说说http请求在tomcat中所走的流程
当tomcat启动以后,Connector的组件会找到相应protocolhandler(这个在我们server.xml文件中配置的http1.1的protocol和ajp的protocol,tomcat默认是这样的配置)去处理监听对应的请求,而http的protocolhandler即Http11Protocol这个类会用JIoEndpoint这类去负责socket监听
代码如下
接下来是processSocket(Socket socket) 作用是封装socket请求,进入连接池 等待解析处理代码如下
//接着到了SocketProcessor类的run方法中
上述代码中handler恰恰是类Http11Processor 它process函数在父类中如下
上述代码两个重要的函数 getInputBuffer().parseRequestLine(keptAlive) 和 getInputBuffer().parseHeaders() 来解析http请求头的
二,我们知道http报文格式如下图
(1),getInputBuffer().parseRequestLine(boolean)这个方法是用来解析“请求行”
代码如下
上述代码就是一个解析请求行的过程
大致流程如下
跳过空白字符->读取Method方法根据空白字判断结束->跳过空白字符->读取uri如果有请求参数则通过?判断->跳过空白字符->读取最后一部分,如果遇到空白字符则请求头完->根据start和end计算出协议所占字节数
(2)getInputBuffer().parseHeaders() 这个方法用来解析的请求头的
方法如下
以上是解析 请求头中的key:value 方法
通过冒号:定位key的字节 通过换行符定位value的值通过多次循环解析多个key-value
如果两次同时遇到回车换行符表示解析完成
(3)整个消息头就是这样解析的 但是我们的消息体是怎么读取的
要知道 http的get方法直接是读取的请求参数 uri问号后面的部分。
但是post却是怎么读取的呢? tomcat采取了延迟解析 只有用到的时候才解析
我们都知道我们获取参数是通过 request的getParameterxxx()一堆方法的和getInputStream()这类方法才用到来获取消息体的
但是这些方法最终都指向了
方法 Request的 parseParameters
代码如下
这里我们只分析 form表单消息体
readPostBody这个方法
getStream 是获得就是 inputBuffer对象的封装
而inputBuffer 里面却是coyoteRequest 这个封装
当inputBuffer读取字节时
doReader恰恰又回到了 parseHeaders()和 parseRequestLine()这两个方法所在类InternalInputBuffer
代码如下
上述类恰好是InternalInputBuffer这类的嵌套类
就这样完成了inputStream中的消息体字节的读取
众所周知,客户端发送了http请求以后,tomcat服务器会接受到http请求封装成request,但是tomcat是怎么进行解析的呢?下面就是我要讲述的内容。
二,再说解析http请求报文之前我们先说说http请求在tomcat中所走的流程
当tomcat启动以后,Connector的组件会找到相应protocolhandler(这个在我们server.xml文件中配置的http1.1的protocol和ajp的protocol,tomcat默认是这样的配置)去处理监听对应的请求,而http的protocolhandler即Http11Protocol这个类会用JIoEndpoint这类去负责socket监听
代码如下
protected class Acceptor extends AbstractEndpoint.Acceptor { @Override public void run() { int errorDelay = 0; // Loop until we receive a shutdown command while (running) { //代码省略... try { //if we have reached max connections, wait //超过最大连接数将经行等待 countUpOrAwaitConnection(); Socket socket = null; try { // Accept the next incoming connection from the server // socket //获得socket 在这儿监听停顿 socket = serverSocketFactory.acceptSocket(serverSocket); } catch (IOException ioe) { countDownConnection(); // Introduce delay if necessary errorDelay = handleExceptionWithDelay(errorDelay); // re-throw throw ioe; } // Successful accept, reset the error delay errorDelay = 0; // Configure the socket if (running && !paused && setSocketOptions(socket)) { // Hand this socket off to an appropriate processor //当有 请求到达时 ,这儿负责socket请求的处理 if (!processSocket(socket)) { countDownConnection(); // Close socket right away closeSocket(socket); } //代码省略... }
接下来是processSocket(Socket socket) 作用是封装socket请求,进入连接池 等待解析处理代码如下
protected boolean processSocket(Socket socket) { // Process the request from this socket try { SocketWrapper wrapper = new SocketWrapper(socket); wrapper.setKeepAliveLeft(getMaxKeepAliveRequests()); wrapper.setSecure(isSSLEnabled()); // During shutdown, execu4tor may be null - avoid NPE if (!running) { return false; } //把socket请求封装成SocketProcessor 然后放入线程池进行处理 getExecutor().execute(new SocketProcessor(wrapper)); //代码省略...... }
//接着到了SocketProcessor类的run方法中
public void run() { boolean launch = false; synchronized (socket) { try { SocketState state = SocketState.OPEN; try { // SSL handshake serverSocketFactory.handshake(socket.getSocket()); } catch (Throwable t) { ExceptionUtils.handleThrowable(t); if (log.isDebugEnabled()) { log.debug(sm.getString("endpoint.err.handshake"), t); } // Tell to close the socket state = SocketState.CLOSED; } //这里socket非close状态下进入process处理函数 //以下的if else语句中还有其他条件进入的process函数 我在这里就只讲这处的 if ((state != SocketState.CLOSED)) { if (status == null) { state = handler.process(socket, SocketStatus.OPEN_READ); } else { state = handler.process(socket,status); } } //代码省略....
上述代码中handler恰恰是类Http11Processor 它process函数在父类中如下
public SocketState process(SocketWrapper socketWrapper) throws IOException { RequestInfo rp = request.getRequestProcessor(); rp.setStage(org.apache.coyote.Constants.STAGE_PARSE); // Setting up the I/O setSocketWrapper(socketWrapper); getInputBuffer().init(socketWrapper, endpoint); getOutputBuffer().init(socketWrapper, endpoint); // Flags keepAlive = true; comet = false; openSocket = false; sendfileInProgress = false; readComplete = true; if (endpoint.getUsePolling()) { keptAlive = false; } else { keptAlive = socketWrapper.isKeptAlive(); } if (disableKeepAlive()) { socketWrapper.setKeepAliveLeft(0); } while (!getErrorState().isError() && keepAlive && !comet && !isAsync() && upgradeInbound == null && httpUpgradeHandler == null && !endpoint.isPaused()) { // Parsing the request header try { setRequestLineReadTimeout(); //取得http报头的第一行 格式如: GET /cuidiwhere/article/details/12361425 HTTP/1.1 if (!getInputBuffer().parseRequestLine(keptAlive)) { if (handleIncompleteRequestLineRead()) { break; } } //http头 //======================================================================= if (endpoint.isPaused()) { response.setStatus(503); setErrorState(ErrorState.CLOSE_CLEAN, null); } else { keptAlive = true; // Set this every time in case limit has been changed via JMX request.getMimeHeaders().setLimit(endpoint.getMaxHeaderCount()); // Currently only NIO will ever return false here //解析http header头的内容 /** * 格式如下: * Host: blog.****.net Connection: keep-alive Pragma: no-cache Cache-Control: no-cache Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp; */ if (!getInputBuffer().parseHeaders()) { // We've read part of the request, don't recycle it // instead associate it with the socket openSocket = true; readComplete = false; break; } if (!disableUploadTimeout) { setSocketTimeout(connectionUploadTimeout); } } } catch (IOException e) { if (getLog().isDebugEnabled()) { getLog().debug( sm.getString("http11processor.header.parse"), e); } setErrorState(ErrorState.CLOSE_NOW, e); break; } catch (Throwable t) { ExceptionUtils.handleThrowable(t); UserDataHelper.Mode logMode = userDataHelper.getNextMode(); if (logMode != null) { String message = sm.getString( "http11processor.header.parse"); switch (logMode) { case INFO_THEN_DEBUG: message += sm.getString( "http11processor.fallToDebug"); //$FALL-THROUGH$ case INFO: getLog().info(message); break; case DEBUG: getLog().debug(message); } } // 400 - Bad Request response.setStatus(400); setErrorState(ErrorState.CLOSE_CLEAN, t); getAdapter().log(request, response, 0); } if (!getErrorState().isError()) { // Setting up filters, and parse some request headers rp.setStage(org.apache.coyote.Constants.STAGE_PREPARE); try { //准备request的信息 prepareRequest(); } catch (Throwable t) { ExceptionUtils.handleThrowable(t); if (getLog().isDebugEnabled()) { getLog().debug(sm.getString( "http11processor.request.prepare"), t); } // 500 - Internal Server Error response.setStatus(500); setErrorState(ErrorState.CLOSE_CLEAN, t); getAdapter().log(request, response, 0); } } if (maxKeepAliveRequests == 1) { keepAlive = false; } else if (maxKeepAliveRequests > 0 && socketWrapper.decrementKeepAlive() <= 0) { keepAlive = false; } // Process the request in the adapter if (!getErrorState().isError()) { try { rp.setStage(org.apache.coyote.Constants.STAGE_SERVICE); //处理解析好的请求 adapter.service(request, response); // Handle when the response was committed before a serious // error occurred. Throwing a ServletException should both // set the status to 500 and set the errorException. // If we fail here, then the response is likely already // committed, so we can't try and set headers. if(keepAlive && !getErrorState().isError() && ( response.getErrorException() != null || (!isAsync() && statusDropsConnection(response.getStatus())))) { setErrorState(ErrorState.CLOSE_CLEAN, null); } setCometTimeouts(socketWrapper); //省略代码
上述代码两个重要的函数 getInputBuffer().parseRequestLine(keptAlive) 和 getInputBuffer().parseHeaders() 来解析http请求头的
二,我们知道http报文格式如下图
(1),getInputBuffer().parseRequestLine(boolean)这个方法是用来解析“请求行”
代码如下
public boolean parseRequestLine(boolean useAvailableDataOnly) throws IOException { int start = 0; // // Skipping blank lines // //首先读取一段字节buf 然后通过循环判断buf中的每个字节是否是\r\n 然后跳过 /** * 用do while 循环是来去除空格的 */ byte chr = 0; do { // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } // Set the start time once we start reading data (even if it is // just skipping blank lines) if (request.getStartTime() < 0) { request.setStartTime(System.currentTimeMillis()); } chr = buf[pos++]; } while ((chr == Constants.CR) || (chr == Constants.LF)); //当循环跳出时说明 此时pos已经不是\r\n需要回退一步 pos--; // Mark the current buffer position start = pos; // // Reading the method name // Method name is always US-ASCII // boolean space = false; /** * 当请求头没有 空格换行符时开始读取请求行 * 以下是读取请求行中method方法的 */ while (!space) { // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } // Spec says no CR or LF in method name if (buf[pos] == Constants.CR || buf[pos] == Constants.LF) { throw new IllegalArgumentException( sm.getString("iib.invalidmethod")); } // Spec says single SP but it also says be tolerant of HT //当遇到空格和制表符时停止读取 表示method已经读取完 存入request中 if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) { space = true; request.method().setBytes(buf, start, pos - start); } pos++; } /** 去除空格 * while 循环是来去除空白字符时 */ // Spec says single SP but also says be tolerant of multiple and/or HT while (space) { // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) { pos++; } else { space = false; } } // Mark the current buffer position start = pos; int end = 0; int questionPos = -1; // // Reading the URI // boolean eol = false; /** * 当没有空白字符时,读取uri */ while (!space) { // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } // Spec says single SP but it also says be tolerant of HT if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) { space = true; end = pos; } else if ((buf[pos] == Constants.CR) || (buf[pos] == Constants.LF)) { // HTTP/0.9 style request eol = true; space = true; end = pos; } else if ((buf[pos] == Constants.QUESTION) && (questionPos == -1)) { questionPos = pos; } pos++; } //读取请求参数以后的参数 ,即?后面的请求参数 request.unparsedURI().setBytes(buf, start, end - start); if (questionPos >= 0) { request.queryString().setBytes(buf, questionPos + 1, end - questionPos - 1); request.requestURI().setBytes(buf, start, questionPos - start); } else { request.requestURI().setBytes(buf, start, end - start); } // Spec says single SP but also says be tolerant of multiple and/or HT /** 去除空格 * while 循环是来去除空格的 */ while (space) { // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) { pos++; } else { space = false; } } // Mark the current buffer position start = pos; end = 0; // // Reading the protocol // Protocol is always US-ASCII // /** * 查看是否读取到末尾 如果末尾 * 那么就暂停 */ while (!eol) { // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } if (buf[pos] == Constants.CR) { end = pos; } else if (buf[pos] == Constants.LF) { if (end == 0) end = pos; eol = true; } pos++; } /** * 根据end和start计算出协议头的字节数 格式如HTTP/1.1 */ if ((end - start) > 0) { request.protocol().setBytes(buf, start, end - start); } else { request.protocol().setString(""); } return true; }
上述代码就是一个解析请求行的过程
大致流程如下
跳过空白字符->读取Method方法根据空白字判断结束->跳过空白字符->读取uri如果有请求参数则通过?判断->跳过空白字符->读取最后一部分,如果遇到空白字符则请求头完->根据start和end计算出协议所占字节数
(2)getInputBuffer().parseHeaders() 这个方法用来解析的请求头的
方法如下
@Override public boolean parseHeaders() throws IOException { if (!parsingHeader) { throw new IllegalStateException( sm.getString("iib.parseheaders.ise.error")); } /** * 每循环一次表示读取一次key:value * parseHeader()表示读取一行的key:value * http报头可能有多行的key:value组成 */ while (parseHeader()) { // Loop until we run out of headers } parsingHeader = false; end = pos; return true; } /** * Parse an HTTP header. * * @return false after reading a blank line (which indicates that the * HTTP header parsing is done */ @SuppressWarnings("null") // headerValue cannot be null private boolean parseHeader() throws IOException { // // Check for blank line // byte chr = 0; /** *跳过空白字符 *如果遇到特殊不符合的字符则返回不进行解析 */ while (true) { // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } chr = buf[pos]; if (chr == Constants.CR) { // Skip /** * 在清楚空白字符时 如果再次遇到换行符,那么就结束所有的http头解析 * 因为在上一次key:value中已经解析了换行符了,如果再次开头遇到 * 那么就表示解析完了 * 因为两次同时遇到换行符只能是消息体了 */ } else if (chr == Constants.LF) { pos++; return false; } else { break; } pos++; } // Mark the current buffer position int start = pos; // // Reading the header name // Header name is always US-ASCII // boolean colon = false; MessageBytes headerValue = null; while (!colon) { // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } //遇到冒号:表示前面读取的字符就是key值 然后设置colon为空true 跳出此次循环 //并将key存入headerValue中了 if (buf[pos] == Constants.COLON) { colon = true; headerValue = headers.addValue(buf, start, pos - start); } else if (!HTTP_TOKEN_CHAR[buf[pos]]) { // If a non-token header is detected, skip the line and // ignore the header skipLine(start); return true; } chr = buf[pos]; //大写转化成小写 将key的大写形式转成成小写 if ((chr >= Constants.A) && (chr <= Constants.Z)) { buf[pos] = (byte) (chr - Constants.LC_OFFSET); } pos++; } // Mark the current buffer position start = pos; int realPos = pos; // // Reading the header value (which can be spanned over multiple lines) // /** * pos读到冒号:点上 key:(此处可能有空格换行等待空白字符)value */ boolean eol = false; boolean validLine = true; while (validLine) { boolean space = true; // Skipping spaces /** * 去除value之前的各种空白字符 */ while (space) { // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } if ((buf[pos] == Constants.SP) || (buf[pos] == Constants.HT)) { pos++; } else { space = false; } } int lastSignificantChar = realPos; // Reading bytes until the end of the line while (!eol) { // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } if (buf[pos] == Constants.CR) { // Skip //当遇到\n时表示表已经读取完此行的key:value了 } else if (buf[pos] == Constants.LF) { eol = true; } else if (buf[pos] == Constants.SP) { buf[realPos] = buf[pos]; realPos++; } else { buf[realPos] = buf[pos]; realPos++; lastSignificantChar = realPos; } pos++; } realPos = lastSignificantChar; // Checking the first character of the new line. If the character // is a LWS, then it's a multiline header // Read new bytes if needed if (pos >= lastValid) { if (!fill()) throw new EOFException(sm.getString("iib.eof.error")); } chr = buf[pos]; if ((chr != Constants.SP) && (chr != Constants.HT)) { validLine = false; } else { eol = false; // Copying one extra space in the buffer (since there must // be at least one space inserted between the lines) buf[realPos] = chr; realPos++; } } /** * 跳出循环后计算出本次读取的最后字节即value的值 * 至此一行key:value读取完了 * * */ // Set the header value headerValue.setBytes(buf, start, realPos - start); return true; }
以上是解析 请求头中的key:value 方法
通过冒号:定位key的字节 通过换行符定位value的值通过多次循环解析多个key-value
如果两次同时遇到回车换行符表示解析完成
(3)整个消息头就是这样解析的 但是我们的消息体是怎么读取的
要知道 http的get方法直接是读取的请求参数 uri问号后面的部分。
但是post却是怎么读取的呢? tomcat采取了延迟解析 只有用到的时候才解析
我们都知道我们获取参数是通过 request的getParameterxxx()一堆方法的和getInputStream()这类方法才用到来获取消息体的
但是这些方法最终都指向了
方法 Request的 parseParameters
代码如下
protected void parseParameters() { parametersParsed = true; Parameters parameters = coyoteRequest.getParameters(); boolean success = false; try { // Set this every time in case limit has been changed via JMX parameters.setLimit(getConnector().getMaxParameterCount()); // getCharacterEncoding() may have been overridden to search for // hidden form field containing request encoding String enc = getCharacterEncoding(); boolean useBodyEncodingForURI = connector.getUseBodyEncodingForURI(); if (enc != null) { parameters.setEncoding(enc); if (useBodyEncodingForURI) { parameters.setQueryStringEncoding(enc); } } else { parameters.setEncoding (org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING); if (useBodyEncodingForURI) { parameters.setQueryStringEncoding (org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING); } } parameters.handleQueryParameters(); if (usingInputStream || usingReader) { success = true; return; } if( !getConnector().isParseBodyMethod(getMethod()) ) { success = true; return; } String contentType = getContentType(); if (contentType == null) { contentType = ""; } int semicolon = contentType.indexOf(';'); if (semicolon >= 0) { contentType = contentType.substring(0, semicolon).trim(); } else { contentType = contentType.trim(); } //如果消息体是文件流 则用parseParts()解析 if ("multipart/form-data".equals(contentType)) { parseParts(); success = true; return; } if (!("application/x-www-form-urlencoded".equals(contentType))) { success = true; return; } int len = getContentLength(); if (len > 0) { int maxPostSize = connector.getMaxPostSize(); if ((maxPostSize > 0) && (len > maxPostSize)) { if (context.getLogger().isDebugEnabled()) { context.getLogger().debug( sm.getString("coyoteRequest.postTooLarge")); } checkSwallowInput(); return; } byte[] formData = null; if (len < CACHED_POST_LEN) { if (postData == null) { postData = new byte[CACHED_POST_LEN]; } formData = postData; } else { formData = new byte[len]; } try { //如果消息体是form参数 则用ReadPostBody解析 if (readPostBody(formData, len) != len) { return; } } catch (IOException e) { // Client disconnect if (context.getLogger().isDebugEnabled()) { context.getLogger().debug( sm.getString("coyoteRequest.parseParameters"), e); } return; } //存放到Parametes对象中 parameters.processParameters(formData, 0, len); } else if ("chunked".equalsIgnoreCase( coyoteRequest.getHeader("transfer-encoding"))) { byte[] formData = null; try { formData = readChunkedPostBody(); } catch (IOException e) { // Client disconnect or chunkedPostTooLarge error if (context.getLogger().isDebugEnabled()) { context.getLogger().debug( sm.getString("coyoteRequest.parseParameters"), e); } return; } if (formData != null) { parameters.processParameters(formData, 0, formData.length); } } success = true; } finally { if (!success) { parameters.setParseFailed(true); } } }
这里我们只分析 form表单消息体
readPostBody这个方法
protected int readPostBody(byte body[], int len) throws IOException { int offset = 0; do { int inputLen = getStream().read(body, offset, len - offset); if (inputLen <= 0) { return offset; } offset += inputLen; } while ((len - offset) > 0); return len; }
getStream 是获得就是 inputBuffer对象的封装
而inputBuffer 里面却是coyoteRequest 这个封装
public void setCoyoteRequest(org.apache.coyote.Request coyoteRequest) { this.coyoteRequest = coyoteRequest; inputBuffer.setRequest(coyoteRequest); }
当inputBuffer读取字节时
@Override public int realReadBytes(byte cbuf[], int off, int len) throws IOException { if (closed) { return -1; } if (coyoteRequest == null) { return -1; } if(state == INITIAL_STATE) { state = BYTE_STATE; } //coyoteRequest 恰恰是doRead方法 int result = coyoteRequest.doRead(bb); return result; }
doReader恰恰又回到了 parseHeaders()和 parseRequestLine()这两个方法所在类InternalInputBuffer
代码如下
protected class InputStreamInputBuffer implements InputBuffer { /** * Read bytes into the specified chunk. */ @Override public int doRead(ByteChunk chunk, Request req ) throws IOException { if (pos >= lastValid) { //字节不够则填充字节到buf中 if (!fill()) return -1; } int length = lastValid - pos; chunk.setBytes(buf, pos, length); pos = lastValid; return (length); } }
上述类恰好是InternalInputBuffer这类的嵌套类
就这样完成了inputStream中的消息体字节的读取
上一篇: 使用线程池改进Bio
下一篇: servlet监听器功能讲解及代码案例
推荐阅读
-
(二)androidpn-server tomcat版源码解析之--push消息处理 博客分类: 项目框架 androdipnjava推送
-
(二)androidpn-server tomcat版源码解析之--push消息处理 博客分类: 项目框架 androdipnjava推送
-
tomcat 源码研究之http报文解析
-
Tomcat源码解析之Web请求与处理
-
tomcat源码研究之参数编码格式处理
-
tomcat源码研究之 web.xml中load-on-startup标签的含义
-
tomcat源码研究之参数编码格式处理
-
tomcat源码研究之 web.xml中load-on-startup标签的含义
-
Tomcat源码解析之Web请求与处理