Javac工作原理分析(2):语法分析器

程序员文章站 2022-05-23 10:14:18

...

语法分析器是将词法分析器分析的Token流组建成更加结构化的语法树，也就是将一个个单词组装成一句话，一个完整的语句。哪些词组合在一起是主语、哪些是谓语、宾语、定语…要做进一步区分。
语法树及各种语法节点对应的类关系图如下：

每个语法树上的节点都是com.sun.tools.javac.tree.JCTree的一个实例，关于语法树有如下规则：
1).每个语法节点都会实现一个接口xxxTree，这个接口又继承自com.sun.source.tree.Tree接口，如IfTree语法节点表示一个if类型的表达式，BinaryTree语法节点代表一个二元操作表达式等
2).每个语法节点都是JCTree的子类，并且会实现第一节点中的xxxTree接口类，这个类的名称类似于JCxxx，如实现IfTree接口的实现类为JCIf，实现BinaryTree接口的类为JCBinary等
3).所有的JCxxx类都作为一个静态内部类定义在JCTree中
JCTree中有3个重要属性需要说明说明一下：
TreeTag：每个语法节点都会用一个整形常数表示，并且每个节点类型的数值是在前一个的基础上加1。顶层节点TOPLEVEL是1，而IMPORT节点等于TOPLEVEL加1，等于2
pos：也是一个整数，它存储的是这个语法节点在源代码中的起始位置，一个文件的位置是0，而-1表示不存在
type：表示这个节点是什么java类型，如int、float还是String

回顾一下package的词法分析方法

public JCExpression qualident(boolean allowAnnos) {
        JCExpression t = toP(F.at(token.pos).Ident(ident()));
        while (token.kind == DOT) {
            int pos = token.pos;
            nextToken();
            List<JCAnnotation> tyannos = null;
            if (allowAnnos) {
                tyannos = typeAnnotationsOpt();
            }
            t = toP(F.at(pos).Select(t, ident()));
            if (tyannos != null && tyannos.nonEmpty()) {
                t = toP(F.at(tyannos.head.pos).AnnotatedType(tyannos, t));
            }
        }
        return t;
    }

该函数中第一句就是调用TreeMaker类，根据Name对象构建了一个JCIdent语法节点。

JCExpression t = toP(F.at(token.pos).Ident(ident()));

根据上一篇文章所说，Package节点解析完成后会进入while循环，首先解析importDeclaration，解析规则与pakcage的类似：首先检查Token是不是IMPORT，如果是，用import的语法规则来解析import节点，最后构造一个import语法树。源码如下：

protected JCTree importDeclaration() {
    int pos = token.pos;
    nextToken();
    boolean importStatic = false;
    if (token.kind == STATIC) {
        importStatic = true;
        nextToken();
    }
    JCExpression pid = toP(F.at(token.pos).Ident(ident()));
    do {
        int pos1 = token.pos;
        accept(DOT);
        if (token.kind == STAR) {
            pid = to(F.at(pos1).Select(pid, names.asterisk));
            nextToken();
            break;
        } else {
            pid = toP(F.at(pos1).Select(pid, ident()));
        }
    } while (token.kind == DOT);
    accept(SEMI);
    return toP(F.at(pos).Import(pid, importStatic));
}

第四行检测是不是有static关键字。如果有，设置标识表示这个import语句是一个静态类引入，然后解析第一个类路径。如果是多级目录，则继续读取下一个token，并构造为JCFieldAccess节点，这个节点同样也是嵌套节点。如果最后一个Token是“*”，则设置这个JCFieldAccess的Token名称为asterisk。当这个import的语句解析完成后读取一个“;”，表示一个完整的import语句解析完成。最后将这个解析的语法节点作为子节点构造在新创建的JCImport节点中。整个JCImport节点语法树如下图：

Import节点解析完成后就是类的解析了，类包括interface、class、enum，下面以class 为例来介绍class是如何解析成一棵语法树的。

protected JCClassDecl classDeclaration(JCModifiers mods, Comment dc) {
        int pos = token.pos;
        accept(CLASS);
        Name name = ident();

        List<JCTypeParameter> typarams = typeParametersOpt();

        JCExpression extending = null;
        if (token.kind == EXTENDS) {
            nextToken();
            extending = parseType();
        }
        List<JCExpression> implementing = List.nil();
        if (token.kind == IMPLEMENTS) {
            nextToken();
            implementing = typeList();
        }
        List<JCTree> defs = classOrInterfaceBody(name, false);
        JCClassDecl result = toP(F.at(pos).ClassDef(
            mods, name, typarams, extending, implementing, defs));
        attach(result, dc);
        return result;
    }    

List<JCTree> classOrInterfaceBody(Name className, boolean isInterface) {
        accept(LBRACE);
        if (token.pos <= endPosTable.errorEndPos) {
            // error recovery
            skip(false, true, false, false);
            if (token.kind == LBRACE)
                nextToken();
        }
        ListBuffer<JCTree> defs = new ListBuffer<>();
        while (token.kind != RBRACE && token.kind != EOF) {
            defs.appendList(classOrInterfaceBodyDeclaration(className, isInterface));
            if (token.pos <= endPosTable.errorEndPos) {
               // error recovery
               skip(false, true, true, false);
           }
        }
        accept(RBRACE);
        return defs.toList();
    }

protected List<JCTree> classOrInterfaceBodyDeclaration(Name className, boolean isInterface) {
        if (token.kind == SEMI) {
            nextToken();
            return List.nil();
        } else {
            Comment dc = token.comment(CommentStyle.JAVADOC);
            int pos = token.pos;
            JCModifiers mods = modifiersOpt();
            if (token.kind == CLASS ||
                token.kind == INTERFACE ||
                token.kind == ENUM) {
                return List.of(classOrInterfaceOrEnumDeclaration(mods, dc));
            } else if (token.kind == LBRACE &&
                       (mods.flags & Flags.StandardFlags & ~Flags.STATIC) == 0 &&
                       mods.annotations.isEmpty()) {
                if (isInterface) {
                    error(token.pos, "initializer.not.allowed");
                }
                return List.of(block(pos, mods.flags));
            } else {
                pos = token.pos;
                List<JCTypeParameter> typarams = typeParametersOpt();
                // if there are type parameters but no modifiers, save the start
                // position of the method in the modifiers.
                if (typarams.nonEmpty() && mods.pos == Position.NOPOS) {
                    mods.pos = pos;
                    storeEnd(mods, pos);
                }
                List<JCAnnotation> annosAfterParams = annotationsOpt(Tag.ANNOTATION);

                if (annosAfterParams.nonEmpty()) {
                    checkAnnotationsAfterTypeParams(annosAfterParams.head.pos);
                    mods.annotations = mods.annotations.appendList(annosAfterParams);
                    if (mods.pos == Position.NOPOS)
                        mods.pos = mods.annotations.head.pos;
                }

                Token tk = token;
                pos = token.pos;
                JCExpression type;
                boolean isVoid = token.kind == VOID;
                if (isVoid) {
                    type = to(F.at(pos).TypeIdent(TypeTag.VOID));
                    nextToken();
                } else {
                    // method returns types are un-annotated types
                    type = unannotatedType();
                }
                if (token.kind == LPAREN && !isInterface && type.hasTag(IDENT)) {
                    if (isInterface || tk.name() != className)
                        error(pos, "invalid.meth.decl.ret.type.req");
                    else if (annosAfterParams.nonEmpty())
                        illegal(annosAfterParams.head.pos);
                    return List.of(methodDeclaratorRest(
                        pos, mods, null, names.init, typarams,
                        isInterface, true, dc));
                } else {
                    pos = token.pos;
                    Name name = ident();
                    if (token.kind == LPAREN) {
                        return List.of(methodDeclaratorRest(
                            pos, mods, type, name, typarams,
                            isInterface, isVoid, dc));
                    } else if (!isVoid && typarams.isEmpty()) {
                        List<JCTree> defs =
                            variableDeclaratorsRest(pos, mods, type, name, isInterface, dc,
                                                    new ListBuffer<JCTree>()).toList();
                        accept(SEMI);
                        storeEnd(defs.last(), S.prevToken().endPos);
                        return defs;
                    } else {
                        pos = token.pos;
                        List<JCTree> err = isVoid
                            ? List.of(toP(F.at(pos).MethodDef(mods, name, type, typarams,
                                List.nil(), List.nil(), null, null)))
                            : null;
                        return List.of(syntaxError(token.pos, err, "expected", LPAREN));
                    }
                }
            }
        }
    }

第一个Token是CLASS，类的关键词，接下来是一个用户自定义的IDENTIFIER，这个Token也就是类名。然后是这个类的类型可选参数，将这个参数解析成JCTypeParameter语法节点，下一个Token是EXTENDS或者IMPLEMENTS。然后是对classBody的解析，包括变量，方法，内部类的解析等等。整个classBody解析的结果存放在list集合中。最后将会把这些子节点添加到JCClassDecl这棵class树中。
最后来看一个例子：

public class Yufa {
    int a;
    private int b = a + 1;
    public int getB() {
        return b;
    }
    public void setB(int b) {
        this.b = b;
    }
}

上述代码对应的语法树如下，图中的语法树去掉了一些节点类型，如定义变量的修饰符JCModifiers、变量的类型定义JCPrimitiveTypeTree等。JCMethodDecl节点也有一些被省略，如方法的访问修饰符JCModifiers、方法的返回来行JCPrimitiveTypeTree和方法的参数JCVariableDecl等等。：

当这个类解析完成后，会接着将这个类节点加到这个类对应的包路径的顶层节点中，这个顶层节点是JCCompilationUnit。JCCompilationUnit持有以package作为pid和JCClassDecl的集合，整个java文件就被解析完成了。整棵语法树如下：

最后再补充一点：所有语法节点的生成都是在TreeMaker类中完成的，TreeMaker实现了在JCTree.Factory接口中定义的所有节点的构造方法。

Javac工作原理分析(2):语法分析器

Struts2 源码分析-----工作原理分析

编译原理课程设计——基于预测分析方法的表达式语法分析器（python)）

图解通信原理与案例分析-16：2G GSM基站的工作原理--时分多址与无线资源管理RRM

Python实现SLR（1）语法分析器，编译原理yyds！