NAND2GO Chapter 6. Assembler

From MIT Technology Roadmapping
Revision as of 10:50, 3 April 2019 by test>Xlp
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Chapter 5
Back to Content Page
Chapter 7
English version /[[{{{ Chinese version link}}} |Chinese version]]


1. Context
  • Macro level: Learn about computer hardware and software structure. Completed Chapter 1-5.
  • Meso level: Understand and grasp the foundations of how software and hardware interact. Why the assembler is important, and how quantitative change leads to qualitative change.
  • Micro level: Become familiar with the basics of assembler, and understand its implications. Understand the basics of Parser, Symbol Table and Generator, and how it continues to play important roles in later chapters. Syntax and design contract.
2. Goal
  • Macro level: Grasp the fundamental concepts of how computer hardware and software work. Build upon chapter 1-5, and begin on software aspects of a computer.
  • Meso level: Learn the foundations of assembler, and the role it plays in the software hierarchy.
  • Micro level: Understand the translation between assembly and binary instructions. Fully grasp the concepts of the Parser, Symbol Table, and Generator. Understand the role of syntax and semantics. The final goal is to fulfill a design contract from input to output.
3. Measurable Effects

Be able to define and indicate the functions of the Assembler, Parser, Symbol Table and Generator function, and how they work in conjunction. For the project, assembly code should pass testing when translated into binary code, there is only one right answer.

4. Outputs 5. Process 6. Inputs

Translate assembly code into binary codes. Write a program, use any assembly language, and produce corresponding machine code, with or without symbols.

Learn and implement the transition between source code and target code, understand its relations with previous and following chapters. This can be done with the programming method (with Python, Java or C), or the non-programming method. You may refer to the course’s Coursera site for more details, the link is in the References section.

Nand2Tetris (Nisan & Schocken, 2008b) Chapter 6 (and what you have learned from Chapters 1-5, Coursera course site, and your own studies and experiences.

7. External Factors

Introduction

First of all, congratulations on completing chapters 1 to 5, the hardware aspects of Nand2Tetris!

Beginning from Chapter 6, we will focus on the software side. If you refer to our roadmap [see lecture 06 assembler. pdf, slide 2], you will see that chapter 6 serves as a bridge between the “hardware hierarchy” and the “software hierarchy”. While the assembler is sometimes regarded as “difficult to work with” by some programmers, its underlying principles are surprisingly simple. For students without any programming background, this chapter will provide a simple yet clear introduction on assembler. Many of the contents in this chapter is based on Chapter 4. If you encounter any confusing concepts, you may refer to the key concepts and tables in Chapter 4.

This chapter is titled assembler. Assembler is the translator between binary codes and assembly codes. Binary codes are the 0s and 1s, while assembly codes refer to a special way of how we choose to represent the 0s and 1s. Assembler make 0s and 1s into assembly code. [see lecture 06 assembler.pdf, slide 4].

It is important to mention, the most critical aspect of the chapter is not only about finishing homework projects, but rather:

Understand how assembler works.

Understanding how its underlying mechanism is transferable throughout the entire Nand2Tetris course and beyond.

The second aspect bears the most importance, since the mechanism and concepts does not only apply to this course, or to programming in general, but also to many aspects of our lives.

Background

Computer operates on binary codes, a string of 0s and 1s. Yet for most of us, learning programming began with “Hello World!”, on a higher-level language. You may wonder how that works. Well, the exact mechanism may vary, but the fundamental principles are the same. The code you wrote undergo a series of translations, eventually becoming binary instructions (set of 1s and 0s). The binary set is necessary for the computer hardware, since it is the only language (yes, the numbers in fact can be viewed as a language) it can understand. On the other hand, the programming language you use, is designed to be human friendly, making programming more accessible.

But if we think further, in actual computer hardware, you are not able to find the numbers 0s and 1s on a circuit board. That is because the 0s and 1s we use in programming is an abstraction of the voltage differences between two points. We are using human concepts to represent physical phenomenon, hence, it is said to be bridging the gap between hardware and software.

Assembler is the most basic level of the software hierarchy, making the transition between assembly code and binary code. The rest of the course will show how this conceptual abstraction continues up the ladder, all way to the game of Go, 2048 and beyond. But let us start at step one.

From Binary to Assembly

In chapter 4, we have already learned about A-instruction and C-instructions, which forms the basic functions of a computer. Here, we are concerned with how do begin making the transition between binary and assembly codes, where the latter shows some resemblance of a programming language.

Why and How?

The reason we make abstractions of binary numbers is to make it easier for us to understand what a string of 0s and 1s means. More importantly, it allows us to manipulate codes more efficiently, and enables us to write programs with more complex functions.

The keyword is abstraction (Schocken, 2012). This idea exists in many different subjects, and throughout our lives. For example, you use the word “today”, to refer to a specific date. You probably do not always say the day, month and year. It is even less likely for you to express the concept “today”, in terms of the physical spatial relationship between the sun and the Earth. Hence, the concept “today” is an abstraction of the information above. By a similar token, we want to find an efficient way to indicate a memory address on a computer (A-instruction), or a special computation we want the computer to execute (C-instruction). The purpose of abstraction is to let human better understand what we are trying to do, or, as Nisan & Schocken (2008a) described in with a quote by Donald Knuth in the Chapter 4 video:

“Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.” - Donald Knuth

The program we write has to be understandable by humans, so that we can manipulate codes, learn from each other experiences, and collaborate to build more complex programs.

Symbols, are what we use to represent a sequence of numbers at the assembly level. For example, when we want to access a memory address for its information, we want to have a more intuitive way to call it. Instead of calling a value by its address, such as address 100 ( with the binary address of “0000000001100100”), we can simply represent it with a symbol “coffee”. It gives us the flexibility of referring to a memory address (whether it is storing an A or C-instruction) by either its location number (100) or symbol (coffee). It is in fact a form of translation, we are essentially referring to the same object, yet the program can translate whichever predefined name we call it, and direct it to the correct binary address using a symbol table. There are several important concepts:

Variables: a “translator” will automatically assign a memory address, and ensure the correct address is referred to every time.
Labels: use symbols to mark locations of a program
Symbol resolution: this can be explained with a symbol table.
Symbol table: the symbol table indicates the number of lines of code it is located. It services as a reference between the symbol and its memory address. In reality, some memory and instructions will require more than one memory location.
Assembler: translates assembly code into binary code, which can be executed by the computer. It is essentially a translator, converting assembly code into its equivalent binary instruction. It is only a change of format (an abstraction), but not a change of meaning.
Machine Language Specification: It refers to the correspondence between given binary code and the assembly code, along with the syntax (the format) of the assembly code. It has 4 main functions (Chapter 6, p.107), which essentially performs a translation from assembly code with symbols into binary codes.

Among all the process described above, the most challenging and important function of the assembler is symbols handling. In the following sections, an assembler for Hack is proposed.

Hack Assembly-to-Binary Translation Specification

The function of an assembler is dictated by its design contract. The specification of translation is finding the equivalents between assembly and binary codes, and assemble the target binary code.

Syntax Conventions and File Formats

In the project files, you will see two formats. The “.asm” files store Hack’s assembly codes. While the “.hack” files stores binary codes.

.hack Binary code files: since we are working with a 16-bit system, each line consists of 16 digits of 0s and 1s. You can probably tell the A-instructions from the C-instructions.

.asm Assembly code files: the lines of text can represent instructions and symbol declaration. They are to be translated into binary code.

Constants, which refers to a defined field, are written in decimal numbers. While user designed symbols can start with other characters other than a digit.

For adding comments to the lines of code, use “//” after the actual code on the same line. For example

@100 // this is a comment, only the “@100” part will be executed.

For our assembler, White Space (empty lines and extra spaces) is ignored. However, for other assemblers, they may be important, depending on how the assembler is structured. In addition, the convention is having LABELS in upper case, and variables in lower case.

Instructions

There are two main types of instructions, a-instruction for address function, and c-instruction for compute function. You may want to refer to Chapter 4 for more detail.

When we look at the binary code, the first character determines the type of instruction. If it’s A- instruction, it starts with 0, and have the rest 15 digits representing the location of the memory address in binary. For c-instructions, it starts with 1, and has three fields: comp, dest and jump. Each field has its own translation table, translating its designated assembly code into binary codes. Any given line of assembly code can be translated part by part, and reassembled into the equivalent binary code.

Symbols

There are three main sources of symbols in Hack assembly language. Predefined symbols, which is universal to any Hack assembly program in our projects. Label symbols only need to be declared once in order to be used anywhere in the program, it refers to the memory address of the next command. Symbols that are declared are known as variable symbols, which is mapped to memory where it was first encountered in a consecutive fashion.

Example

In the example, (Figure 6.2, p. 111), each line is either an a-instruction and c-instruction. Recall that each type of instruction is translated differently. The assembly code is first broken down to its fields, then each field is translated individually and reassembled into binary code.

Implementation

The hack assembler receives .asm file as input and outputs a .hack program.

The exact mechanism of translations for Hack uses 4 main modules

Parser: for input
Code module: a library of corresponding binary and assembly codes, see section 6.2.2
Symbol Table:process symbols, see section 6.2.3
Main program that drives the process

You will see the function of individual module blow. The modules serves as individual parts of a “translation machine” between two languages.

The Parser Module

The parser module is at the input end. It dissembles each line of assembly code to its individual fields and symbols, following predefined format. On page 113 of the book, a chart indicating how parser works can be found.

The Code Module

This translates the c-instruction fields, dest, comp and jump, into their corresponding binary code. You may refer the section 6.2 in the book for the exact corresponding codes.

Assembler for Programs with No Symbols

Using parser and code module, an assembler program translating assembly code without symbols can be achieved. But for programs with symbols and label commands, we will need the next steps.

Tips for completing the chapter projects with programming can be found on page 115. For non-programmers, the author recommended a pen-and-paper method. Alternatively, you can also try using Excel!

The SymbolTable Module

The symbols requires an additional layer of handling than above step, which is to translate symbols into actual memory addresses. This is done with a symbol table. Symbol table lists and maintains the corresponding symbol and their meaning (memory address in this case). In may programming languages, there is an existing symbol table, or more commonly referred to as hash table.

Assembler for Programs with Symbols

Most programs are written with symbols, since it is much efficient to work with for programmers.

Before symbols can be interpreted on the spot, they must be all tallied. Hence, for implementation, the assembler needs to first go through the entire program and identify all symbols, and register the symbols and their corresponding memory address. and conduct the translation on the second go through.

The book offers advice on procedures (page 116). However, perhaps it is most helpful if you draw out the algorithm on paper, and follows the steps in sequence.

Perspective

In this project, we learned about the structure of, and methods in building an assembler. In essence, we are connecting memory address with different representation in assembly code with symbols. The assembler enables us to use these representations, through translating these representations into the underlying binary code.

In reality, the use of assemble varies. Some assembler are embedded in programming language, while other programming languages are able to directly generate binary codes.

The main idea is about abstraction. We are able represent more with less, thus enhancing our ability to manipulate data, and form more complex operations. This step will continue onto the coming chapters.

Project

There are two options listed in the book, through a programming language, or by hand. Essentially, both methods are aiming to achieve the same purpose, turning assembly code into binary instruction.

There is also another option: try it with Excel!

Glossary

Assembly Codes

Assembler

Binary Codes

Symbols: using a number to represent a memory address. Please note that the number is expressed in decimal system (such as “100”) in assembly, but represented in binary terms in a-instructions (“1100100”).

Symbol Resolution

Symbol Table

Machine Language Specification Parser

Symbol Table

Generator

References

Description: References are divided into books, journal articles, audio recordings, videos, web pages, research reports, etc. Please select the appropriate template for reference according to your needs.

  • Noam Nisan & Shimon Schocken,《The Elements of Computing Systems: Building a Modern Computer from First Principles》, 1st Edition, MIT Press, 2005, ISBN:9780262640688, ,
  • Nand2Tetris, http://nand2tetris.org/, Nisan, N. S. & Schocken, S. (2008c). The Elements of Computing Systems: Building a Modern Computer from First Principles
  • Schocken, S., Taming complexity in large-scale system projects, Paper presented at the Proceedings of the 43rd ACM technical symposium on Computer Science Education, Raleigh, North Carolina, USA, 2012

Additional Reading Material

Nisan & Schocken (2008a) also mentioned macro assembler and macro commands on Coursera Unit 6.7, which combines multiple assembly commands into one command. You may think we are elevating up from binary code, which we are. Yet if you think more carefully, we are still doing the same work with macro assembler and macro commands.

It is always helpful to review the concepts of category theory. You will see more about how the theory applies to the following chapters. John Baez gave a lecture on “The Mathematics of Networks”, with video available on YouTube. https://www.youtube.com/watch?v=IyJP_7ucwWo



Chapter 5
Back to Content Page
Chapter 7
English version /[[{{{ Chinese version link}}} |Chinese version]]