Writing a simple Compiler on my own - Action Rules for If-Else Statements [C][Flex][Bison]

drifter1 (67)in #utopian-io • 5 years ago

[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler

Introduction

Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will write the needed Action Rules for If-Else Statements.

More specifically, the topics that we will cover today are:

The need of a Statements-node
Action rules for "simple" if-else statements
Action rules for if-else statements that contains else if's
Running the compiler for "full_example.c"

Requirements:

Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:

What Compiler Design is (mainly the steps)
For which exact Language the Compiler is build for (Tokens and Grammar)
How to use Flex and Bison
How to implement a lexer and parser for the language using those tools
What the Symbol Table is and how we implement it
How we combine Flex and Bison together
How we can pass information from the Lexer to the Parser
How we define operator priorities, precedencies and associativity
What Semantic Analysis is (Attributes, SDT etc.)
How we do the so called "Scope Resolution"
How we declare types and check the type inter-compatibility for different cases, including function parameters
How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
Intermediate Code Representations, including AST's
How we implement an AST (structure and management)
Action Rules for other cases

Difficulty:

Talking about the series in general this series can be rated:

Intermediate to Advanced

Today's topic(s) can be rated:

Medium

So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

The need of a Statements-Node

If-else statements, for statements, while statements, optional functions and even the "main" function, all mostly contain more than one statements in their "body" or "tail" as I call it sometimes in the grammar. So, there is the need of creating a new node that will store many "statement" nodes, to be able to access them later on. Knowing that statements are described recursively in our grammar, we can visualize it as follows:

-||- means that the second "Statements" does the same as the first one.

So, let's create such a structure!

The Statements AST Node

This node will store an dynamic pointer array to AST_Node structures, where each of them will of course be a "statement". To know the amount of statements that are stored in this structure, we have to also define a variable to store the statement count. The whole structure will be of a new Node-type "STATEMENTS" and will have the name "AST_Node_Statements". So, let's define an array "statements" and the integer variable "statement_count" as entries of this structure.

In code this looks as following:

typedef enum Node_Type {

    ...

    STATEMENTS,  // statements

    ...

}Node_Type;

...

typedef struct AST_Node_Statements{
	enum Node_Type type; // node type

	// statements
	struct AST_Node **statements;
	int statement_count;

}AST_Node_Statements;

...

Of course this code is inside of "ast.h".

Creation Function

The way with which the rule "statements" is build up makes it easy for us to use the creation function to add new statements to the "statements"-parent rule. So, the creation function, let's say "new_statements_node", will have 3 parameters which are:

The statements array of the previous "statements"-rule
The statement_count value of the previous "statements"-rule
The "statement" to be added

Knowing that the "statements"-rule looks as following:
statements -> statements statement | statement ;
we understand that the first two parameters are useless for the second sub-rule of "statements". For this rule we will just pass NULL and 0 as parameters!

The function declaration in "ast.h" and actual code of "ast.c" are:

ast.h:

AST_Node *new_statements_node(AST_Node **statements,

    int statement_count, AST_Node *statement);

-----------------------------------------------------------------

ast.c:

AST_Node *new_statements_node(AST_Node **statements,

    int statement_count, AST_Node *statement){

	// allocate memory
	AST_Node_Statements *v = malloc (sizeof (AST_Node_Statements));

	// set node type
	v->type = STATEMENTS;

	// first statement
	if(statements == NULL){
		statements = (AST_Node**) malloc (sizeof (AST_Node*));
		statements[0] = statement;
		statement_count = 1;
	}
	// add new statement
	else{
		statements = (AST_Node**) realloc (statements,
            (statement_count + 1) * sizeof (AST_Node*));
		statements[statement_count] = statement;
		statement_count++;
	}

	// set entries
	v->statements = statements;
	v->statement_count = statement_count;

	// return type-casted result
	return (struct AST_Node *) v;
}

You can see that there are two cases, which correspond to the second and first sub-rule of "statements". Of course I also did some changes in the traversal and print functions of the nodes. To see these changes go to GitHub, where I always upload the code of the series.

Integrate with the "statements"-rule

In the second sub-rule we just create a new node with parameter NULL, 0 and the already created node stored in "statement". Knowing that all nodes are type-casted to "AST_Node", the second sub-rule will need to type-cast the already created "statements" node of the "statements"-part and give the statements-array and statement_count of this node as a parameter of the creation function, together with the "statement" node.

So, the new rule now becomes:

statements:
  statements statement
  {
    AST_Node_Statements *temp = (AST_Node_Statements*) $1;
    $$ = new_statements_node(temp->statements, temp->statement_count, $2);
  }
  | statement
  {
    $$ = new_statements_node(NULL, 0, $1);
  }
;

Also don't forget to include the token definition of "statements":
%type <node> statements

The same stuff will have to be done for "declarations" as well, but I will leave it for when we get to function declarations!

Action Rules for "Simple" If-Else Statements

The if statement is a somewhat complicated statement as it can include a dynamic amount of else if's and an optional else part. So, we will have to use concepts we already used in previous topics like "declarations", to take all the needed information/attributes and pass them over to the actual If-node! To solve a conflict with the else ifs and else, I had to split the "if_statement" rule into two parts: one with else if's and one without. So, the whole problem can be visualized as following:

Let's first get into the "simple" case that does not require the complicated passing of Else-If nodes. Visualizing this simple case we see that:

(as you can see I put lots of visualizations today!)

So, what do we understand? Well, the condition is pretty simple as we already have the information stored in the expression. The actual if_branch is represented by the "tail"-rule, which will have to store the "statements"-node that it get's from the "statements"-rule inside of the bracelets. This tells us that the "tail" also has to be of type "node" and so we add it alongside "statements". In the same way the if_statement, optional_else and else_if rule are also of type node. And so we can already write the following non-terminal type-definitions:

%type <node> statements tail
%type <node> if_statement else_if optional_else

The "tail"-rule becomes:

tail: LBRACE statements RBRACE
{ 
    $$ = $2; /* just pass information */
    ast_traversal($2);
}
;

So, what remains now is the optional else. For the optional else we will just return the "statements"-node stored in the "tail"-part (else exists) or NULL (no else). The rule becomes:

optional_else:
  ELSE tail
  {
    /* else exists */
    $$ = $2;
  }
  | /* empty */
  {
    /* no else */
    $$ = NULL;
  }
;

Knowing what we store in each of these non-terminals, the action rule for this simple if statement looks as following:

if_statement:
  IF LPAREN expression RPAREN tail else_if optional_else
  {
    /* later */
  }
  | IF LPAREN expression RPAREN tail optional_else
  {
    $$ = new_ast_if_node($3, $5, NULL, 0, $6);
  }
;

You can see that we pass NULL and 0 for the else ifs. For the rest we just find the correct number to put after '$'. All the parameters have the correct type already and so there is no need of type-casting or creating temporary nodes..

Action Rules for If-Else Statements that contains Else If's

So, now for the more complicated case! We will be using the same idea that we used for declarations and initialization, where we had global variables to store the newly found names and values for the array initialization. In this case we will have a AST_Node pointer array to store the else-if branches (elsifs) and a counter variable for the number of else ifs (elseif_count). To add a new entry we will define a new function "add_elseif" that takes the newly found Else-if node as a parameter and adds it to the global array. We of course have two cases for that:

First entry of the array, where we will have to allocate the array for the first time
General case, where we allocate memory for the new entry and add the entry to the end of the array

So, with all that in mind the parser now contains:

%{

	...

	// for else ifs
	void add_elseif(AST_Node *elsif);
	AST_Node **elsifs;
	int elseif_count = 0;
%}

...

void add_elseif(AST_Node *elsif){
  // first entry
  if(elseif_count == 0){
    elseif_count = 1;
    elsifs = (AST_Node **) malloc(1 * sizeof(AST_Node));
    elsifs[0] = elsif;
  }
  // general case
  else{
    elseif_count++;
    elsifs = (AST_Node **) realloc(elsifs, elseif_count * sizeof(AST_Node));
    elsifs[elseif_count - 1] = elsif;
  }
}

...

All that we said until now, can be visualized as following

Using the function "add_elseif", the "else_if"-rule is pretty easy to write now! Both sub-rules do the same thing, but will have a different numeration for the non-terminals that are passed as parameters. The "else_if"-rule looks as following:

else_if:
  else_if ELSE IF LPAREN expression RPAREN tail
  {
    AST_Node *temp = new_ast_elsif_node($5, $7);
    add_elseif(temp);
  }
  | ELSE IF LPAREN expression RPAREN tail
  {
    AST_Node *temp = new_ast_elsif_node($4, $6);
    add_elseif(temp);
  }
;

Thinking about the "if_statement"-rule now, the only thing that changes is that we will use these global variables as parameters, by also re-initializing them back to 0 and NULL. So, with all that in mind the final "if_statement" rule becomes:

if_statement:
  IF LPAREN expression RPAREN tail else_if optional_else
  {
    $$ = new_ast_if_node($3, $5, elsifs, elseif_count, $7);
    elseif_count = 0;
    elsifs = NULL;
  }
  | IF LPAREN expression RPAREN tail optional_else
  {
    $$ = new_ast_if_node($3, $5, NULL, 0, $6);
  }
;

And this is it actually! Wasn't that bad right?

Running the Compiler for "full_example.c"

Before running the compiler I would like to point out that I removed many "ast_traversal" functions from the action rules, as they would just produce duplicate messages. We will only put "ast_traversal" functions for the declarations and statements rules now. Later on there will be only one traversal of the whole AST, that will be found in the "program"-rule. Of course this will be used for the machine code generation :)

So, let's run the compiler:

You can see that we get new messages in the console! I also included some explanations for what got printed out. The main outer statements of the "full_example.c" file are of course only 4! The other statement node messages correspond to the for and while statements and more specifically to the "tail" of them. In the beginning we can also see the statements node of the last if-node that appears in the for statement. Nice huh? Next time we will get into Loops ;)

RESOURCES

References:

No references, just using code that I implemented in my previous articles.

Images:

All of the images are custom-made!

Previous parts of the series

Introduction -> What is a compiler, what you have to know and what you will learn
A simple C Language -> Simplified C, comparison with C, tokens, basic structure
Lexical Analysis using Flex -> Theory, Regular Expressions, Flex, Lexer
Symbol Table (basic structure) ->Why Symbol Tables, Basic Implementation
Using Symbol Table in the Lexer -> Flex and Symbol Table combination
Syntax Analysis Theory -> Syntax Analysis, grammar types and parsing
Bison basics -> Bison tutorial actually
Creating a grammar for our Language -> Grammar and first Parser
Combine Flex and Bison -> lexer and parser combined
Passing information from Lexer to Parser -> Bug Fix, yylval variable, YYSTYPE union, Setting up the Parser, Passing information "directly" and through the symbol table (special case) with examples.
Finishing Off The Grammar/Parser (part 1) -> Adding the missing grammar rules, small fixes
Finishing Off The Grammar/Parser (part 2) -> Operator priorities, precedencies and associativity, complete example file (for testing), grammar rule visualization, finish off grammar/parser
Semantic Analysis Theory -> What is Semantic Analysis about, CSG and CSL, Attribute Grammars, Syntax Directed Translations (SDT)
Semantics Examples -> Visualization of Semantics for different rules/cases, needed attributes, what we have to implement from now on
Scope Resolution using the Symbol Table -> What we have now, scope resolution, integration, compiler output
Type Declaration and Checking -> Type declaration and type checking code, "simple" code integration
Function Semantics (part 1) -> Code for parameter definitions, function declarations and parameter compatibility checking
Function Semantics (part 2) -> Revisit queue concept, basic implementation, inserting undeclared identifiers
Abstract Syntax Tree Principle -> Intermediate code generation and representations, how to design an AST
Abstract Syntax Tree Structure -> Node, operator and value types, AST node structures, some management functions, integration with the rest of the compiler
Abstract Syntax Tree Management -> Tweaking Nodes, Node creation and AST traversal functions, "manual" testing of the AST
Action Rules for Declarations and Initializations -> Replacing many values with a Value union, Other smaller changes, Action rules for declarations, Action rules for initializations
Action Rules for Expressions -> Separating operator tokens, Action rules for expressions, running the compiler
Action Rules for Assignments and Simple Statements -> Action rules for simple statements, Action rules for assignments, running the compiler

Final words | Next up on the project

And this is actually it for today's post! I hope that I explained everything well.
Next up on this series are:

Semantic analysis (using more action rules in Bison)
Intermediate Code generation (using the AST structure for more cases)
Machine Code generation (MIPS Assembly)

Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:

https://github.com/drifter1

Keep on drifting! ;)

#tutorials #programming #compiler #coding

5 years ago in #utopian-io by drifter1 (67)

$20.53

Sort:

Trending

[-]

portugalcoin (72) 5 years ago

Thank you for your contribution @drifter1.

Very good tutorial, thank you for your good work in developing these tutorials. Keep up the good work.

We look forward to seeing more tutorials of this quality.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.

Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

$7.89

14 votes

[-]

utopian-io (71) 5 years ago

Thank you for your review, @portugalcoin! Keep up the good work!

$0.00

[-]

utopian-io (71) 5 years ago

Hey, @drifter1!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

$0.00

1 vote

[-]

steemitboard (66) 5 years ago

Congratulations @drifter1! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You received more than 3000 as payout for your posts. Your next target is to reach a total payout of 4000

_{Click here to view your Board of Honor}
_{If you no longer want to receive notifications, reply to this comment with the word STOP}

To support your work, I also upvoted your post!

Support SteemitBoard's project! Vote for its witness and get one more award!

$0.00

1 vote

[-]

steem-ua (64) 5 years ago

Hi @drifter1!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 3.506 which ranks you at #6317 across all Steem accounts.
Your rank has dropped 11 places in the last three days (old rank 6306).

In our last Algorithmic Curation Round, consisting of 290 contributions, your post is ranked at #253.

Evaluation of your UA score:

You're on the right track, try to gather more followers.
The readers appreciate your great work!
Try to work on user engagement: the more people that interact with you via the comments, the higher your UA score!

Feel free to join our @steem-ua Discord server

$0.00

1 vote

STEEM 0.29

TRX 0.12

JST 0.033

BTC 63464.16

ETH 3111.33

USDT 1.00

SBD 3.98

Writing a simple Compiler on my own - Action Rules for If-Else Statements [C][Flex][Bison]

Introduction

More specifically, the topics that we will cover today are:

Requirements:

Difficulty:

Actual Tutorial Content

The need of a Statements-Node

The Statements AST Node

Creation Function

Integrate with the "statements"-rule

Action Rules for "Simple" If-Else Statements

Action Rules for If-Else Statements that contains Else If's

Running the Compiler for "full_example.c"

RESOURCES

References:

Images:

Previous parts of the series

Final words | Next up on the project

GitHub Account:

Hi @drifter1!

Evaluation of your UA score:

Coin Marketplace