learn babel parse how to work
目标 
学习 babel-parser 如何进行工作, 解决一个一个实际问题.
入口函数 parse 
首先 babel 可以解析集中模块, 在 parse 函数的第二个参数可以进行指定代码所属模块, babel 默认代码为 script 即不带模块的方式解析.
getParser 
函数签名
1 2 3 4 5 6 7 8 9 10 11 12 function  getParser (options : ?Options , input : string Parser  {     let  cls = Parser ;   if  (options?.plugins ) {          validatePlugins (options.plugins );     cls = getParserClass (options.plugins );   }      return  new  cls (options, input); } 
Parser 
Parser 是一个继承了很多 class 的 class, 它继承的每个 class 都负责 parse 的某个具体方面, 从最原始的 父级 class 开始看
BaseParser 
BaseParser class 比较简单,主要是存储一些信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 export  default  class  BaseParser  {     options : Options ;    inModule : boolean ;    scope : ScopeHandler <*>;    classScope : ClassScopeHandler ;   prodParam : ProductionParameterHandler ;    plugins : PluginsMap ;    filename : ?string ;    sawUnambiguousESM : boolean  = false ;   ambiguousScriptDifferentAst : boolean  = false ;      state : State ;          input : string ;   length : number ;   hasPlugin (name : string ): boolean  {          return  this .plugins .has (name);   }   getPluginOption (plugin : string , name : string                if  (this .hasPlugin (plugin)) return  this .plugins .get (plugin)[name];   } } 
主要是处理和解析comment(注释的) 基于 eslint 的 espree 而来的.
LocationParser 
这个 class 主要处理解析错误,从而抛出一个错误和错误位置
Tokenizer 
词法分析
UtilParser 
解析工具,一些解析相关的工具函数
NodeUtils 
node 和 处理 node 的工具函数
LValParser 
处理左值
ExpressionParser 
表达式解析
StatementParser 
语句解析, 组装成 progrm
Parser 
初始化入口
1 2 3 4 5                               Parser class Parser+-->StatementParser+--> ExpressionParser+--> NodeUtils +----> UtilParser+--+                                                                                  |                               CommentsParser<-----+LocationParser<-+Tokenizer<---+ 
parse 
上方调用实例化完成 Parser class 之后调用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 parse (): File  {  let  paramFlags = PARAM ;   if  (this .hasPlugin ("topLevelAwait" ) && this .inModule ) {     paramFlags |= PARAM_AWAIT ;   }      this .scope .enter (SCOPE_PROGRAM );      this .prodParam .enter (paramFlags);      const  file = this .startNode ();   const  program = this .startNode ();   this .nextToken ();   file.errors  = null ;   this .parseTopLevel (file, program);   file.errors  = this .state .errors ;   return  file; } 
函数返回 File 对象, 也就是说,整个解析过程是在这个函数周期内完成的.
解析 const a = 10 
1 parses.parse ("const a = 10" ); 
通过简单的例子来看 babel parse 过程.
this.nextToken 
this.nextToken 位于 Tokenizer 所以很明显就是进行词法分析的. 函数主要做如下工作:
获得当前上下文, 并判断当前字符是否需要跳过(例如空格,tab 之类的不需要解析) 
获得当前字符串位置 
判断位置是否超出字符 
判断 context.overide 是否存在,如果存在就进行调用并将 this 当做参数传递 
调用 geTokenFromCode 
 
1 this .getTokenFromCode (this .input .codePointAt (this .state .pos ));
this.geTokenFromCode 
函数是一个大的 switch 语句判断,例如判断当前是否为左括号,右括号之类的, 这里我们第一个字符是 const 的 c, 并不是一个完整的关键字之类的需要继续读取更多信息
会调用 this.readWrod 函数根据 this.state.pos 位置信息继续向后读取直到读到 const 之后的空格, 确定了 const 是一个完整的部分
调用了 type.keywords 来获取 const 匹配的对象, 然后调用 this.finishToken(type, word);
之后更新 this.state 对象, 更新上下文
this.finishToken 
函数更新当前 state 内的位置信息, type value 之类的信息. 然后条件调用 updateContext,
this.updateContext 
updateContext 函数主要是判断 type 是否为关键字以及是否为需要单独处理的一些关键字.
this.nextToken 执行完毕
this.parseTopLevel 
this.parseTopLevel 是 StatementParser 类下的.
parserTopLevel 调用了 this.parseBlockBody 函数, 别的是一些模块 export 出来的变量处理, 和最后的 File 对象组装.
this.parseBlockBody 
函数也很简单, 给 Node 赋值了 body 和 directives 空数组, 之后调用了 this.parseBlockOrModuleBlockBody
this.parseBlockOrModuleBlockBody 
函数调用 this.parseStatement 解析语句,这里需要被解析的语句是 const, 这个函数还是个循环,会不断遍历到结尾.
this.parseStatement 
parseStatement 首先会判断是不是 @ 开头,如果是那么就是 Directive, 是不是 let
如果不是,那么会 进行 进行一个 switch case 匹配, 这里需要匹配 const 就先来看 const
1 2 3 4 5 6 7 8 case  _types2.types ._const :case  _types2.types ._var :  kind = kind || this .state .value ;   if  (context && kind !== "var" ) {     this .raise (this .state .start , _location.Errors .UnexpectedLexicalDeclaration );   }   return  this .parseVarStatement (node, kind); 
this.parseVarStatement 
1 2 3 4 5 6 7 parseVarStatement (node, kind ) {  this .next ();   this .parseVar (node, false , kind);   this .semicolon ();   return  this .finishNode (node, "VariableDeclaration" ); } 
这里调用 this.next 这个函数调整了 state 的一些位置信息. 例如 lastTokenEnd, 之后调用了 this.nextToken 上面写了 this.nexToken, 之后调用 this.pareVar 来解析声明
this.parseVar 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 parseVar (node, isFor, kind ) {  const  declarations = node.declarations  = [];      const  isTypescript = this .hasPlugin ("typescript" );   node.kind  = kind;   for  (;;) {     const  decl = this .startNode ();     this .parseVarId (decl, kind);          if  (this .eat (_types2.types .eq )) {       decl.init  = this .parseMaybeAssign (isFor);     } else  {       if  (kind === "const"  && !(this .match (_types2.types ._in ) || this .isContextual ("of" ))) {         if  (!isTypescript) {           this .unexpected ();         }       } else  if  (decl.id .type  !== "Identifier"  && !(isFor && (this .match (_types2.types ._in ) || this .isContextual ("of" )))) {         this .raise (this .state .lastTokEnd , _location.Errors .DeclarationMissingInitializer , "Complex binding patterns" );       }       decl.init  = null ;     }     declarations.push (this .finishNode (decl, "VariableDeclarator" ));     if  (!this .eat (_types2.types .comma )) break ;   }   return  node; } 
parseVarId 
1 2 3 4 parseVarId (decl, kind ) {  decl.id  = this .parseBindingAtom ();   this .checkLVal (decl.id , kind === "var"  ? _scopeflags.BIND_VAR  : _scopeflags.BIND_LEXICAL , undefined , "variable declaration" , kind !== "var" ); } 
可以看到这里调用了 this.parseBindingAtom , 内部有一些判断, 之后又调用了 this.next,主要是判断是否为 ts ,是否为 async 之类的.
解析 “a |> b” 
解析 “a |> b”, 相同的步骤就不再说了.看看不同之处, 这个管道操作符部署于正是规范,需要插件,所以 Parser 没有插件的时候回报错,主要看看他怎么解决这个问题的.
前面都一样 parseBlockOrModuleBlockBody 开始解析. 它在 parseStatementContent 函数的 大 switch 中匹配不到合适的关键字, 那么因为他可能是个解析式,所以调用 parseExpression 函数进行处理
parseExpression 
1 2 3 4 5 6 7 8 9 parseExpression (noIn, refExpressionErrors ) {      const  startPos = this .state .start ;    const  startLoc = this .state .startLoc ;        const  expr = this .parseMaybeAssign (noIn, refExpressionErrors);         return  expr;  } 
1 2 3 4 5 6 7 8 9 10 parseMaybeAssign (noIn, refExpressionErrors, afterLeftParse, refNeedsArrowPos ) {              let  left = this .parseMaybeConditional (noIn, refExpressionErrors, refNeedsArrowPos);          return  left;   } 
1 2 3 4 5 6 7 parseMaybeConditional (noIn, refExpressionErrors, refNeedsArrowPos ) {          const  expr = this .parseExprOps (noIn, refExpressionErrors);      } 
1 2 3 4 5 6 7 8 parseExprOps (noIn, refExpressionErrors ) {     const  expr = this .parseMaybeUnary (refExpressionErrors);     return  this .parseExprOp (expr, startPos, startLoc, -1 , noIn); 
1 2 3 4 5 6 7 8 9 10 11 parseMaybeUnary (refExpressionErrors ) {         const  startPos = this .state .start ;     const  startLoc = this .state .startLoc ;     let  expr = this .parseExprSubscripts (refExpressionErrors);     return  expr;   } 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 parseExprAtom (refExpressionErrors ) {   parseExprAtom (refExpressionErrors ) {     debugger ;     if  (this .state .type  === _types.types .slash ) this .readRegexp ();     const  canBeArrow = this .state .potentialArrowAt  === this .state .start ;     let  node;     switch  (this .state .type ) {              case  _types.types .name :         {           node = this .startNode ();           const  containsEsc = this .state .containsEsc ;           const  id = this .parseIdentifier ();           if  (!containsEsc && id.name  === "async"  && this .match (_types.types ._function ) && !this .canInsertSemicolon ()) {             const  last = this .state .context .length  - 1 ;             if  (this .state .context [last] !== _context.types .functionStatement ) {               throw  new  Error ("Internal error" );             }             this .state .context [last] = _context.types .functionExpression ;             this .next ();             return  this .parseFunction (node, undefined , true );           } else  if  (canBeArrow && !containsEsc && id.name  === "async"  && this .match (_types.types .name ) && !this .canInsertSemicolon ()) {             const  oldMaybeInArrowParameters = this .state .maybeInArrowParameters ;             const  oldMaybeInAsyncArrowHead = this .state .maybeInAsyncArrowHead ;             const  oldYieldPos = this .state .yieldPos ;             const  oldAwaitPos = this .state .awaitPos ;             this .state .maybeInArrowParameters  = true ;             this .state .maybeInAsyncArrowHead  = true ;             this .state .yieldPos  = -1 ;             this .state .awaitPos  = -1 ;             const  params = [this .parseIdentifier ()];             this .expect (_types.types .arrow );             this .checkYieldAwaitInDefaultParams ();             this .state .maybeInArrowParameters  = oldMaybeInArrowParameters;             this .state .maybeInAsyncArrowHead  = oldMaybeInAsyncArrowHead;             this .state .yieldPos  = oldYieldPos;             this .state .awaitPos  = oldAwaitPos;             this .parseArrowExpression (node, params, true );             return  node;           }           if  (canBeArrow && this .match (_types.types .arrow ) && !this .canInsertSemicolon ()) {             this .next ();             this .parseArrowExpression (node, [id], false );             return  node;           }                      return  id;         }                     default :         throw  this .unexpected ();     }   } } 
1 2 3 4 5 6 7 8 9 10 11 12 13 parseExprSubscripts (refExpressionErrors ) {   const  startPos = this .state .start ;    const  startLoc = this .state .startLoc ;    const  potentialArrowAt = this .state .potentialArrowAt ;        const  expr = this .parseExprAtom (refExpressionErrors);    if  (expr.type  === "ArrowFunctionExpression"  && expr.start  === potentialArrowAt) {      return  expr;    }    return  this .parseSubscripts (expr, startPos, startLoc);  } 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 parseExprOp (left, leftStartPos, leftStartLoc, minPrec, noIn ) {  let  prec = this .state .type .binop ;   if  (prec != null  && (!noIn || !this .match (_types.types ._in ))) {     if  (prec > minPrec) {       const  operator = this .state .value ;              if  (operator === "|>"  && this .state .inFSharpPipelineDirectBody ) {         return  left;       }             const  node = this .startNodeAt (leftStartPos, leftStartLoc);       node.left  = left;       node.operator  = operator;       if  (operator === "**"  && left.type  === "UnaryExpression"  && (this .options .createParenthesizedExpressions  || !(left.extra  && left.extra .parenthesized ))) {         this .raise (left.argument .start , _location.Errors .UnexpectedTokenUnaryExponentiation );       }       const  op = this .state .type ;       const  logical = op === _types.types .logicalOR  || op === _types.types .logicalAND ;       const  coalesce = op === _types.types .nullishCoalescing ;       if  (op === _types.types .pipeline ) {                  this .expectPlugin ("pipelineOperator" );         this .state .inPipeline  = true ;         this .checkPipelineAtInfixOperator (left, leftStartPos);       } else  if  (coalesce) {         prec = _types.types .logicalAND .binop ;       }        }