Difference between revisions of "NAND2GO Chapter 11. Compiler Ⅱ:Code Generation"
test>Xlp |
m (1 revision imported) |
(No difference)
|
Latest revision as of 10:23, 1 August 2019
1. Context |
Convert the Jack language into VM language. | ||
2. Goal |
Write a Jack VM code generator in Java, using recursive descent. | ||
3. Measurable Effects |
The reader should be able to teach others about the application of recursive descent in code generation. The reader should know how the VM works when a subroutine is called, and be familiar with the four stacks in VM (static, local, argument, field). Our code generator passes all the test samples, generating the same VM codes as the sample compiler does. | ||
4. Outputs | 5. Process | 6. Inputs | |
A Jack code generator in Java, along with an HTTP Web API. |
Review recursive descent through examples. Review the four stacks in VM language (See chapter 8) Write our Jack code generator in Java, according to Figure 10.5 in textbook. |
Chapter 11 in textbook. Sample codes written in Jack, for test. A sample Jack Compiler, which parses sample codes into VM codes. | |
7. External Factors |
Introduction
In chapter 9, we discover that it is much easier to write programs in Jack language than in VM language. Further, we implement a tokenizer and a syntax analyzer in chapter 10. With the syntax tree, it is time now to accomplish our last task: generate VM, or virtual machine, instructions from the syntax tree in an XML document. This step is called code generation, in which the compiler is responsible for maintaining several symbol tables to map from an identifier to its address in the stack of the VM.
Let's review the three stages we talked about in the last chapter: (See Figure 10.1 in textbook)
tokenization: Extract tokens (symbols, keywords, constants, and identifiers) from the source code in Jack.
syntax analysis: Analyze the syntax based on the tokens,and generate a syntax tree if no syntax errors occur in the source file.
code generation: Generate the target code in VM language according to the syntax tree.
We focus on the last stage in this chapter.
Code Generation Specification
To generate VM codes, the code generator should maintain several symbol tables:
symbol table for local variables
symbol table for arguments
symbol table for fields
symbol table for static members
These symbol tables are implemented using hash map, a data structure that stores key-value pairs. Each entry in the symbol table stores the type and the offset of an identifier. As in chapter 10, we use the same example:
class Adder {
fi eld int x, y;
method int add() {
return x + y;
}
For the handler for the class, it maintains a symbol table for the two fi elds x and y. It is similar to the table below:
This table is created when the code generator scans the declaration field int x, y; in the syntax tree. And when it generate codes for the expression x + y, it looks them up in the table and fi nds the off set for the indentifi ers. So, it knows the off set for x is 0, and write the line push this 0 to push the value of x into the stack. It does the similarly for y, and the expression becomes
push this 0
push this 1
add
The code for the whole method becomes
push argument 0
pop pointer 0
push this 0
push this 1
add
return
The fi rst two lines are added because the argument 0 is the address of this instance for its method. We put it in the this register by pop pointer 0. We do not add these two lines for a function, which is a "static" method of the class, not of some instance of the class.
Recursive descent approach
Similar to our syntax analyzer, the code generator also generates codes using recursive descent. This time, however, this descent is even easier than it is in chapter 10, for we have already generated the syntax tree. With the tree at hand, recursive descent is no longer a obstacle. In contrast, it was more diffi cult in chapter 10 since the tree structure is not very clear in the Jack code itself, which has ambiguity in some cases (see Section 10.4).
class Adder {
fi eld int x, y;
method int add() {
return x + y;
}
}
The syntax tree is generated by the analyzer:
<class>
<keyword> class </keyword>
<identifi er> Adder </identifi er>
<symbol> { </symbol>
<classVarDec>
<keyword> fi eld </keyword>
<keyword> int </keyword>
<identifi er> x </identifi er>
<symbol> , </symbol>
<identifi er> y </identifi er>
<symbol> ; </symbol>
</classVarDec>
<subroutineDec>
<keyword> method </keyword>
<keyword> int </keyword>
<identifi er> add </identifi er>
<symbol> ( </symbol>
<parameterList>
</parameterList>
<symbol> ) </symbol>
<symbol> { </symbol>
<statements>
<returnStatement>
<keyword> return </keyword>
<expression>
<term>
<identifi er> x </identifi er>
</term>
<symbol> + </symbol>
<term>
<identifi er> y </identifi er>
</term>
<symbol> ; </symbol>
</expression>
</returnStatement>
</statements>
<symbol> } </symbol>
</subroutineDec>
<symbol> } </symbol>
</class>
Using recursion, we only focus on one level each time. When processing the class level, we inpterpret the code as follows:
<class>
<keyword> class </keyword>
<identifi er> Adder </identifi er>
<symbol> { </symbol>
<classVarDec>
to be handled recursively
</classVarDec>
<subroutineDec>
to be handled recursively
</subroutineDec>
<symbol> } </symbol>
</class>
Now, it is time to generate VM bytecodes.
Implementation using recursive descent
Our implementation is in Java, a popular object-oriented programming language. The source code is public at https://github.com/kingium/JackCompiler with a readme fi le. It is equiped with an HTTP Web API, which can be deployed in any server with Java SE installed. Besides the Web API, our module JackCompiler consists of three modules. The module JackCodeGenerator is for this chapter.
The CodeGenerator Module
The code generator traverse the syntax tree, which is an XML document. Each element in the XML document serves as an input to some handlers. This process is done recursively, from the top of the syntax tree to its bottom. The processor for class element reads in all its classVarDec elements, and constructs a symbol table for them, called a "fi eld table". Then, it calls subordinate functions to process each subroutineDec elements. For each type of subroutines (function, method, constructor), we have specifi c handlers. For example, for the procConstructor method, the Java code goes like this:
private void procConstructor(Element node) throws JackCompilerException {
ArrayList<Element> children = getChildren(node);
String returnType = children.get(1).getFirstChild().getNodeValue();
String subroutineName = children.get(2).getFirstChild().getNodeValue();
codes.add("function " + className + "." + subroutineName + " " + Integer.toString(localMap.size()));
codes.add("push constant " + Integer.toString(this.fi eldMap.size()));
codes.add("call Memory.alloc 1");
codes.add("pop pointer 0");
procSubroutineStatements(children.get(6));
}
It first gets the subroutine name and type of the return value. Then, it generates the code "function " + className + "." + subroutineName + " " + Integer.toString(localMap. size()), which is a function defi nition in VM. The class name is put before the subroutine name, separated by a dot. The following three lines of codes are memory allocation, and the code generator uses the field map to determine how much space to allocate. Finally, it calls another function to handle the statements of this subroutine, by passing it the XML element children.get(6) in the syntax tree.
Perspective
The VM language is process-oriented, while Jack, to some extent, is an object-oriented programming (OOP) language, albeit key features of OOP like derivation and polymorphism are not supported by Jack. The conversion is usually not trivial, but in our project it is not that diffi cult because of the features above that Jack lacks. To investigate how to convert an OOP language in process-oriented VM, let's see what changes we make during the code generation:
Convert Classes to Structures
A class without any subroutines in it is identical to a structure (like struct in C). Each field in the class is equivalent to a member of the structure. For a class with some subroutines, however, we deal with the subroutines in a diff erent way (See below)
Convert Methods to Functions
Each subroutine in a class can be converted to a function in VM by puting the class name before the subroutine name, separated by a dot. For example, the constructor add of the class Adder becomes Adder.add in VM.
For a function in Jack, it is directly converted to a function in VM.
For a method in Jack, a hidden parameter this is added before all other parameters. this contains the address of the current instance.
For a constructor, it is directly conoverted to a function inVM, with some codes added to allocate memory to the fi elds of the instance to construct.
Convert Method Calls to Function Calls
Suppose you have an instance of Adder called adder, and you call its adder method in Jack:
let y = adder.add(x, 1);
The code generator, seeing this statement, will look up adder in the symbol tables, and learns it is a local variable. Then, it knows add must be a method of its class. Therefore, before pushing parameters x and 1 into the stack, it will fi rst push the address of the instance itself: (Suppose adder, x nad y is local variables in addresses 0, 1 and 2 respectively)
push local 0 // address of adder
push local 1 // value of x
push constant 1 // integer constant 1
call Adder.add // call the add method
pop local 2 // pop the return value to y
Project
JackCodeGenerator
This class has a public constructor and several public methods:
(constructor) public JackCodeGenerator(Document doc)
public ArrayList<String> generate() throws JackCompilerException
This class takes an XML document (from the JackAnalyzer module) as its input, and parses it into VM codes. Here is the sample input https://github.com/kingium/JackCompiler/blob/master/test/Square.jack together with the corresponding sample output https://github.com/kingium/JackCompiler/blob/master/test/Square.vm.
Glossary
Keywords
compiler
syntax analysis
code generation
symbol table
recursive-descent
parameters
arguments
virtual machine, VM
bytecode
register
syntax
References
Description: References are divided into books, journal articles, audio recordings, videos, web pages, research reports, etc. Please select the appropriate template for reference according to your needs.
- Noam Nisan & Shimon Schocken,《The Elements of Computing Systems: Building a Modern Computer from First Principles》, 1st Edition, MIT Press, 2005, ISBN:9780262640688, ,
Additional Reading Material
- Wikipedia, https://en.wikipedia.org/w/index.php?title=LL_parser&oldid=815200886, LL parser. (2017, December 13). In Wikipedia, The Free Encyclopedia. Retrieved 03:18, December 21, 2017
- Wikipedia, https://en.wikipedia.org/w/index.php?title=Recursive_descent_parser&oldid=810700680, Recursive descent parser. (2017, November 16). In Wikipedia, The Free Encyclopedia. Retrieved 03:19, December 21, 2017
- Wikipedia, https://en.wikipedia.org/w/index.php?title=XML&oldid=814880050, XML. (2017, December 11). In Wikipedia, The Free Encyclopedia. Retrieved 03:39, December 21, 2017