[进行中]写一个sql编译器for sqlite

[toc]

background

在学习SQLite过程中，SQLite中的Virtual Database Engine, 需要解释SQL语言生成字节码，执行。SQLite自行实现了Lemon的parser。我希望了解他如何执行，可以的话，build my own.

本文目标：根据《flex与bison中文版.pdf》，学习与构建同步进行

SQLite-Lemon 介绍

sqlite每条指令有一个opcode和P1～p5操作数

$ sqlite3 ex1.db
sqlite> explain delete from tbl1 where two<20;
addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
   Init           0     12    0                    00  Start at 12  
   Null           0     1     0                    00  r[1]=NULL    
   OpenWrite      0     2     0     3              00  root=2 iDb=0; tbl1
   Rewind         0     10    0                    00               
     Column         0     1     2                    00  r[2]=tbl1.two
     Ge             3     9     2     (BINARY)       51  if r[2]>=r[3] goto 9
     Rowid          0     4     0                    00  r[4]=rowid   
     Once           0     8     0                    00               
     Delete         0     1     0     tbl1           02               
   Next           0     4     0                    01               
  Noop           0     0     0                    00               
  Halt           0     0     0                    00               
  Transaction    0     1     1     0              01  usesStmtJournal=0
  TableLock      0     2     1     tbl1           00  iDb=0 root=2 write=1
  Integer        20    3     0                    00  r[3]=20      
  Goto           0     1     0                    00

编译器和解释器

编译器几个阶段：

（词法分析+语法分析+语义分析+中间代码优化）+ 链接gcc 生成可执行文件。

Bison就是把语法/词法文件翻译成c代码。或许就是yacc名字来源。

编译器complier 和解释器interpreter区别：

翻译成机器码, 就是编译器 . 否则是解释器

complier : translate source code but doesn’t execute it. such as : gcc, clang

interpreter : translate source code and execute immediately. such as : ruby

语法分析：这样的语法图是重要的

正则表达式

-?[0-9]+"."[0-9]*

-?：0次或1次

"."：实际的.

cat|dog

'[^']*' 匹配任意不是’的符号

flex&bison基本

从一个计算器，来阐述flex、bison及协同。

词法分析

词法文件定义词法分隔规则

/*词法文件：cal.l*/
%%
"+" { printf("PLUS\n"); }
"-" { printf("PLUS\n"); }
"*" { printf("PLUS\n"); }
"/" { printf("PLUS\n"); }
"|" { printf("PLUS\n"); }
[0-9]+ { printf("NUMBER %s\n", yytext); }
\n { printf("NEWLINE\n"); }
[ \t] { }
. { printf("Mystery character %s\n", yytext); }
%%

上面的词法文件只有规则部分：

引号：flex使用”“内文本原意, ,不解释正则表达
yytext: flex每次匹配后就会设置yytext

运行:

flex cal.l 将词法分析文件翻译成 lex.yy.c gcc lex.yy.c -lfl 编译成词法分析程序。

lex.yy.c 包含了

extern char *yytext; 供外部使用(但目前看来，至少bison不会直接使用吗？)
extern yylex函数：词法分析入口。供外部调用，从输入流读取, 返回

结果：

dong@DESKTOP-2CR8MES:~/gitcodes/sql4/orange/tmp$ flex cal.l 
dong@DESKTOP-2CR8MES:~/gitcodes/sql4/orange/tmp$ gcc lex.yy.c -lfl
dong@DESKTOP-2CR8MES:~/gitcodes/sql4/orange/tmp$ ./a.out 
12+22
NUMBER 12
PLUS
NUMBER 22
NEWLINE
56/7q
NUMBER 56
PLUS
NUMBER 7
Mystery character q
NEWLINE

语法文件

/* 语法文件*/
%{
#include <stdio.h>
%}

%token NUMBER
%token ADD SUB MUL DIV ABS
%token EOL

%%
calclist: /* 空规则: 从输入开头匹配*/
    | calclist expr EOL { printf("=%d\n", $2); } 
    ;
expr: factor
    | expr ADD factor { $$ = $1 + $3; }
    | expr SUB factor { $$ = $1 - $3; }
    ;
factor: term
    | factor MUL term { $$ = $1 * $3; }
    | factor DIV term { $$ = $1 / $3; }
    ;
term: NUMBER
    | ABS term { $$=$2 > 0? $2:-$2;}
    ;
%%
int main(int argc, char** argv)
{
    yyparse();
}
void yyerror(char* s)
{
    fprintf(stderr, "error: %s\n", s);
}

对于声明和规则部分：

bison规则中每个语法符号都有一个语义值。:左边的语义值用$$表示，:右边依次为$1, $2。
从词法分析器返回记号时候，记号值总是存在yylval中。其他符号的值,在语法分析器的规则设置.
如果没有动作执行，默认 $$=$1
需要在.y中定义 yyerror，因为.c中需要调用yyerror报错

编译运行

bison -d cal.y会生成 cal.tab.c 和 cal.tab.h

主要内容：

token声明转换枚举。
extern YYSTYPE yylval;
int yyparse (void); 语法分析入口.

协同编译：

bison与flex协同：每当需要一个token，就可以调用yylex读取一部分输入获取记号,并记住当前位置，

如果flex中有return返回,下次继续从这里开始（通常情况）（{return NUMBER}）；
如果没有返回如\t,flex会立即继续。([\t] {})

flex 在包含 bison生成的头文件后，词法文件在动作中才能返回 bison中定义的token，设置bison需要的yylval值。

注意的是：yytext是flex的，是flex每次匹配到规则，字符串值记录到yytext 。

yyparse流程为：调用词法的yylex, 匹配规则, 返回bison定义的token/ 设置bison 变量yylval 等， bison语法再执行bison的动作

sql: sql.l sql.y
	bison -d sql.y
	flex sql.l
	gcc -o $@ sql.tab.c lex.yy.c -lfl

Links

sqlite bytecode 中文 flex & bison