Overview Of Compiler Construction Tools Information Technology Essay

Published: November 30, 2015 Words: 1452

A compiler is system software that translates a program written in source language into the target language in order to create an executable program.

Then, we can call that target program to process inputs and produce outputs.

The constructors of compiler, like any other type of software developers, use a variety of modern software development environments containing plethora of tools such as language editors, debuggers, version managers, profilers, test harnesses, and many more. In addition to these general software-development tools, other more specialized tools have been created to help implement various phases of a compiler much more rapidly and with greater accuracy.

These tools use specialized languages for specifying and implementing specific components, and many use quite sophisticated and complex algorithms. The most successful tools are those that hide the details of the generation algorithm and produce components that can be easily integrated into the remainder of the compiler. Some commonly used compiler-construction tools are:

Parser Generators:

The second stage of the compilation is parsing in which we scan the individual tokens produced by the lexical analyser to create a parse tree out of them with a motive to erase out the presence of any type of syntactical errors. Parser generators are special type of tools which produce syntax analysers (parse tree generators) from input that is based on a context-free grammar.

The program YACC is a parser generator developed by Stephen C. Johnson at AT&T for the UNIX operating system. The name is an acronym for "Yet Another Compiler Compiler." Its main use is to generate a parser based on an analytic grammar written in a notation similar to BNF.

Historically, Yacc has generated the significant code for the parser in the C programming language's evolution.

Creating an input/output translator with Yacc

YACC was earlier available as the default parser generator on most UNIX systems. The parser generated by YACC requires a lexical analyser. Lexical analyser generators, such as Lex or Flex are widely available.

A translator can be constructed using Yacc in the following manner:

First of all a file, say scan.y, containing specification of YACC the translator is prepared. The UNIX system command yacc scan.y transforms the file scan.y into a C program called y.tab.c using the LALR(Look Around Left To Right) method.

By compiling y.tab.c along with the 'ly' library that contains the LR parsing program using the command we obtain the desired object program obj.out which performs the translation specified by the original Yacc program.

Now, if other procedures are needed, they can be compiled or loaded with y. tab.c.

A standard YACC source program has three parts:

Declarations

Translation rules

Supporting C routines

Scanner Generators:

Scanning is the first phase of compilation in which we have to scan the incoming text to differentiate the tokens out.

Scanner generators automatically generate lexical analysers (the stream of tokens), from a specified regular expression. In the earlier compilers, syntax analysis consumed only a fraction of the running time of a compiler, but a large fraction of the intellectual effort of writing a compiler. (Examples could be EQM or PIC the legacy compilers)

Scanner generators have reduced this effort to minimal amounts. Lex is a program that generates lexical analysers ("scanners" or "lexers").

Lex was originally written by Mike Lesk and Eric Schmidt, and now is the standard lexical analyser generator for many UNIX systems, and also a tool exhibiting its behaviour is specified as part of the POSIX standard.

Creating a lexical analyser with lex

Lex helps write programs whose control flow is directed by instances of regular expressions in the input stream. It is well suited for editor-script type transformations and for segmenting input in preparation for a parsing routine.

Lex allows one to specify a lexical analyser by specifying regular expressions to describe patterns for tokens. The input notation for the Lex tool is referred to as the Lex language and the tool itself is the Lex compiler.

An input file, which we call first.l, is written with lex language and describes the lexical analyser to be generated. The Lex compiler transforms first.l to a C program, in a file that is always named first.yy.c. The latter file is compiled by the C compiler into a file called done.out, as always.

The C-compiler's output is a working lexical analyser that can take a stream of input characters and produce a stream of tokens.

Lex turns the user's expressions and actions (called source in this memo) into the host general-purpose language; the generated program is named yylex. The yylex program will recognize expressions in a stream (called input in this memo) and perform the specified actions for each expression as it is detected.

A Lex program has the following form:

Declarations

Translation rules

Auxiliary functions

Syntax-directed translation Engines:

Syntax-directed translation refers to a method of compiler implementation where the source language translation is completely dependent upon the working of the parser. In other words, we can say that the parsing process and parse trees are responsible for direct semantic analysis and the translation of the source program.

There are two types of attributes we might encounter:

Synthesized

Inherited

Synthesized attributes are those attributes that are passed up a parse tree, i.e., the left side attribute is computed from the right-side attributes. In contrast to this inherited attributes are those that are passed down a parse tree, i.e., the right-side attributes are derived from the left-side attributes.

SDT Model

Syntax-directed translation is essentially made up of two simple concepts:

Each node of a parse tree can have a set of associated values known as attributes.

The creation and traversal of parse trees can be automated.

This can be regarded as a separate phase of a compiler or we can augment our conventional grammar with information to control the semantic analysis and translation. Such grammars are called attribute grammars.

The main function of these engines is to produce intermediate code with three-address format, from input based on the parse tree. Moreover, this tool scans the parse tree completely to generate an intermediate code. This translation is done for each node of tree.

Code-generator generators:

Code generation is the last phase on compilation process by which a compiler's code generator converts some intermediate representation(usually TAC) of source code into a form (e.g., machine code) that can be readily executed by a machine (often a computer).

The input to the code generator typically consists of a parse tree or an abstract syntax tree. The tree is converted into a linear sequence of instructions, usually in an intermediate language such as three address code. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program.

These produces code generators from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine so that they could be executed by the machine.

Data-flow analysis engines:

Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. The information gathered is used by compilers to optimize a program.

Data flow analysis is used to determine how data flows through the program. The wide type of data flows depends on the nature of the desired task. These analyses are computed by extracting facts directly from the syntax/symbol tables and composing facts based on syntax constructs found at other points in the code.

The main benefit of these is that these facilitate the gathering of information about how values are transmitted from one part to other parts in a program. Most data flow analysis engines require local property calculations to be implemented by the user of the engine.

Data-flow analysis is a key part of code optimization.

Compiler-construction toolkits:

That provides an integrated set of routines for constructing various phases of a compiler.

Applications of Compiler Construction

Compiler construction techniques can be and are applied outside compiler construction in its strictest sense. Alternatively, more programming can be considered compiler construction than one would traditionally assume. Examples are reading structured data, rapid introduction of new formats, and general file conversion problems.

If data has a clear structure it is generally possible to write a grammar for it. Using a parser generator, a parser can then be generated automatically. Such techniques can, for example, be applied to rapidly create 'read' routines for HTML files, PostScript files, etc.

This also facilitates the rapid introduction of new formats.

Examples of file conversion systems that have profited considerably from compiler construction techniques are TEX text formatters, who convert TEX text to div format and PostScript interpreters, which convert PostScript text to instructions for a specific printer.