A compiler is a computer program (or set of programs) that translates any other computer language (target language) in a computer language (source language) written text. Original sequence is usually called source code and produces object code is called. Generally produces another program (eg, a linker) as suitable for processing by, but it may be a human readable text file.
The most common reason for wanting to translate source code to create an executable program. The name "compiler" is primarily a high-level programming language program, a low level language (eg, assembly language or machine language) to translate source code is used to. A program that a low level language translation of a high level is a decompilr. A program that translates between high-level languages, usually called a language translator, source translation, or language converter to the source. A language rewriter is usually a program that translates the language as expressions without a change.
A compiler to perform many or all of the following actions are expected: literal analysis, preprocessing, analysis, semantic analysis, code generation and code optimization.
Listen
Read phonetically
Dictionary - View detailed dictionary
History
High level programming languages on different types of CPUs to software reuse for the benefit of being able to cost quite a compiler writing was invented not begin to grow larger. Very limited memory capacity of early computers were also introduced many technical problems in software specifically written in assembly language for many years, is designed for computers. When implementing a compiler.
Towards the end of the 1950s, machine independent programming languages were first proposed. Later, several experimental compilers were developed. First compiler was written by Grace Hopper, a -0 programming language in 1952. A team led by John Backus at IBM Fortran normally be the first complete compiler in 1957, initially as a deposit. Compiled on multiple architectures for an early COBOL in 1960, was the language.
In many applications quickly caught on a high level plan to use the language domain. New programming languages supported by expanded functionality and because of the increasing complexity of computer architecture, compilers have become more and more complex.
Early compilers were written in assembly language. First self-hosting compiler - it's a high-level language able to compile your source code - for Lisp by Hart and Levin was built in 1962 at MIT. Since the 1970s, the common language, a compiler that compiles the practice has been applied, although both Pascal and C implementation language has been the popular choice. Building a self-hosting compiler is a bootstrapping problem - the first such compiler for a language written in a different language compiler should be compiled by either, or (as in Hart and Levin Lisp compiler) compiled by an interpreter running in the compiler.
Compiler output
One method used to classify compilers generate code platform on which they produce is implemented. The target platform is known as.
Whose output directly to a native or host compiler compiler only runs as the same type of computer and operating system is intended to run on. To produce a cross compiler is designed to run on a different platform. Cross compilers are often used when a software development environment for embedded systems do not support the development of the software.
A compiler that a virtual machine (VM) or the compiler to produce code that is produced on the same platform as the production can not be executed. The reason that compilers generally are not classified as native or cross compilers.
Listen
Read phonetically
Dictionary - View detailed dictionary
Compiled Versus Interpreted Languges
High-level programming languages, compiled languages and interpreted languages generally are divided for convenience. However, it rarely requires a language specifically compiled or interpreted to be particular about anything. Classification of a language usually refers to the most popular or widespread implementation - for example, the basic thought of as an interpreted language and C compilers and C interpreters exist in spite of a basic compiled.
In a sense, all languages, with "performance" explanation to explain a particular case only one CPU is being performed by the switching transistors. Just in time compilation and bytecode interpretation newest trend to blur the traditional classification.
There are exceptions. Some language specifications spell out the implementation should include a collection facility, for example, for common Lisp. Many other languages have features that are easy to implement in an interpreter, but a lot harder to write compiler, for example, APL, SNOBOL4, and many scripting languages programs with regular string operations in order arbitrary source code to build and then execute permission to do a special assessment by passing the code. To compile a language to implement these features, the program is usually a sequence library includes a version of the compiler itself should be sent.
Hardware compilation
Some compilers produce hardware can target at very low levels. For example, a Field Programmable Gate (FPGA) array or structured application-specific integrated circuit (ASIC). Synthesis tools such as compilers are called, be they hardware or compilers compile programs effectively the final configuration control of hardware and how it operates; compile output instructions that are executed in sequence are not - only one transistor interconnection or lookup table. For example, XST Xilinx synthesis tool is used to configure FPGAs. Similar devices Altera, Synplicity, Synopsys and other vendors are available.
Compiler Design
Compiler design approach adopted for the treatment, the experience of the person (s) affected by the complexity of the need to design, and resources (people and equipment) is available.
A relatively simple language, a compiler written by a person in a single monolithic piece of software can be. The source language is the production of large and complex, and high quality is essential for designing relatively independent phases, or may be divided into a number of relatives. different stages of development to be split into smaller units could be given to different people. It is much easier after repair or insert new stages instead of one phase later (for example, further optimization).
The recovery procedures in stages of division (or assists) was the production quality compiler at Carnegie Mellon University - compiler project (PQCC) by the champion. End of project cases, moderate (rarely heard today) is the end and the rear end of the beginning.
A compiler for a relatively simple language written by one person could be a single monolithic piece of software. When the source language is large and complex, and high quality output is required for the design can be divided into a number of relatively independent phases, or passes. Having distinct phases means development can be fragmented into small pieces and given to different people. It also becomes much easier to replace a single phase by an improvement, or to insert new stages later (eg, additional optimizations).
The division of the compilation process into several phases (or passes) was sponsored by the project's production quality compiler-compiler (PQCC) at Carnegie Mellon. This project introduced the terms end before the end of the middle (rarely heard today), and back-end.
All but the smallest of the compilers have more than two phases. However, these phases are generally considered part of the front end or back. The point where these two ends is still open to debate. The front end is generally considered when the syntactic and semantic processing takes place, with the translation to a lower level of representation (as source code).
The end of the medium is generally designed to perform optimizations on a form other than source code or machine code. This source code / independence machine code is intended to allow generic optimizations must be shared between versions of the compiler support for different languages and target processors.
The rear end takes the output of the medium. It may perform further analysis, transformations and optimizations that are for a particular computer. Then it generates code for a particular processor and OS.
This approach combines front-end/middle/back-end front-ends for different languages with back-end systems for different processors. Practical examples of this approach include the GNU Compiler Collection, LLVM, and Amsterdam Compiler Kit, with multiple front-ends, a shared analysis and multiple back-ends.
Listen
Read phonetically
Dictionary - View detailed dictionary
One-pass versus multi-pass compilers
Classified by the number of passes compiler is your background in computer hardware resource limitations. Collection includes a lot of work performance and quickly enough to the computer contains a program that had all of this work was not the memory. Top small program so that each source compilers (or represent something required) made a pass on some performance analysis and translation was divided into.
Ability to compile in a single pass is often seen as an advantage because it is a compiler and the compilers increasingly common as multi-pass compilers are compared to the work of writing is simple. Many languages were designed so that they get too close (eg, Pascal) can be compiled.
In some cases a language facility for designing more than one source may require more than one compiler to perform. For example, a source which appear on a 10-line statement affects the translation of the declaration to consider appearing on line 20. In this case, the real after the first pass is a pass during translation impressed with the statements that appear after the announcements about the need to gather information.
The disadvantage is that compilation in a single pass to refine it to produce high quality code is not possible to perform many of the needed adaptation. It can be hard to count how many have actually creates a compiler optimization. For example, an expression of the different stages of optimization analysis several times, but only once and expression analysis of a can.
A provably correct compiler split up into a small program compilers used by researchers interested in building technology. Prove the correctness of a set of small programs often have a large, single, proven accuracy than the equivalent program requires little effort.
Typical multi-pass compiler outputs machine code, though his final pass, there are many others:
• a "source - source compiler to" a kind of compiler that takes its input as a high-level language and a high level language has outputs. For example, an automatic parallelizing compiler, often as an input will lead a high-level language program and then change the code and annotated with parallel code annotations (eg OpenMP) or the constructs (eg FORTRAN DOALL statements) language.
• stage like some Prolog implementation that compiles of a theoretical machine assembly language compiler
The Prolog the Warren abstract machine (or WAM) is known as the machine o. Java, Python, and many more to bytecode compilers are also a sub.
• Just-In-Time compiler, and Java systems used by Smalataka, and even Microsoft Net. Common Intermediate Language (CIL) to
O application bytecode, which is compiled to native machine code just before execution are distributed.
Front end
The front end source code of the program analyzes building an internal representation, called the intermediate representation or IR. Also symbol table, a data source location, and scope as mapping each symbol in the code structure for information management. The several stages, which includes some of the following on are:
Line reconstruction. 1. Strop your keywords or identifiers in languages that allow arbitrary spaces before parsing stage, ready to make a canonical parser converts the input character sequence is required. Top-down, recursive - race, table-driven parsers used in the 1960s at a time usually read a character source and does not require a separate tokenizing phase. Atlas Autocode, Imp (and Elgaul and Coral66 some implementations) stropped languages whose compilers are examples of a line is the reconstruction phase.
2. Textual analysis breaks the source code text into small pieces called tokens. Each token language of a nuclear unit, for example, a keyword, identifier or symbol name. Syntax token usually a finite state automaton constructed from a regular expression, a regular language can be used to identify it. This phase is called or lexing scanning, and textual analysis software called a lexical analyzer or scanner
Lexical analysis
In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers. A lexer is often organized as separate scanner and tokenizer functions, though the boundaries may not be clearly defined.
Lexical grammar
The specification of a programming language will include a set of rules, often expressed syntactically, specifying the set of possible character sequences that can form a token or lexeme. The whitespace characters are often ignored during lexical analysis.
Token
A token is a categorized block of text. The block of text corresponding to the token is known as a lexeme. A lexical analyzer processes lexemes to categorize them according to function, giving them meaning. This assignment of meaning is known as tokenization. A token can look like anything: English, gibberish symbols, anything; it just needs to be a useful part of the structured text.
Consider this expression in the C programming language:
sum=3+2;
Tokenized in the following table:
Tokens are frequently defined by regular expressions, which are understood by a lexical analyzer generator such as lex. The lexical analyzer (either generated automatically by a tool like lex, or hand-crafted) reads in a stream of characters, identifies the lexemes in the stream, and categorizes them into tokens. This is called "tokenizing." If the lexer finds an invalid token, it will report an error.
Following tokenizing is parsing. From there, the interpreted data may be loaded into data structures, for general use, interpretation, or compiling.
Consider a text describing a calculation:
46 - number_of(cows);
The lexemes here might be: "46", "-", "number_of", "(", "cows", ")" and ";". The lexical analyzer will denote lexemes "46" as 'number', "-" as 'character' and "number_of" as a separate token. Even the lexeme ";" in some languages (such as C) has some special meaning.
Scanner
In the first phase, scanners, generally based on a finite state machine. It details the possible sequences of characters within the token it handles (these character sequences are known as personal examples of lexemes) can be contained within any of the encoding is. For example, a token may be a hierarchy of integer numerical digit characters. In many cases, the first non-white character is kind of token that follows a character set of characters acceptable for that token not until after it reaches the input characters one at a time are then processed to deduce can be (known as the maximum munch rule). Some languages are more complex lexeme construction rules may include but withdrew before the letter read.
Tokenizer
Tokenization demarcating and possibly classifying sections of a string of input characters is the process. Resulting token to some other form of re-processing are passed on. Parsing process input can be considered a sub-task.
Take for example, the following string. Unlike humans, not a computer intuitively 'see' that there are 9 words. A computer to have a range of only 43 characters.
Quick brown fox jumps over the lazy dog
Token to word tokenization process can be divided in the sentence. Although the examples given in the XML as input tokenized represent many ways:
<sentence>
<word> </ word
<word> quick </ word
<word> Brown </ word
<word> Fox </ word
<word> </ word jumps
<word> the </ word
</ Word> <word>
<word> lazy </ word
<word> dog </ word
</ Sentence>
A lexeme, however, only a certain kind (eg, a string literal, a sequence of characters) is known as a string of characters. To create a token, lexical analyzer needs a second stage, evaluator, the lexeme of the letter goes on to produce a value. Lexeme combined with its value type is exactly one token, an analyst can be formed. (Some tokens such as parentheses true values, and therefore for the evaluator function can not return some integers, identifiers for the. Evaluators, and wires can be much more complex. Sometimes evaluators completely suppress a lexeme can hidden from the analyst, which is useful for black and comments.)
For example, a computer program source code, string
net_worth_future = (assets - liabilities);
(White with suppressed) can be converted into literal token stream:
NAME "net_worth_future"
Equals
OPEN_PARENTHESIS
NAME "property"
Minus
NAME "liabilities"
CLOSE_PARENTHESIS
Semicolon
Although it is possible and sometimes necessary to write by hand is a Lexer, lexers are often generated by automated tools. These devices typically regular expressions describing tokens allowed in the input stream to accept. Each regular expression regular expression matching lexemes programming language that assessment is connected with the lexical grammar production. These tools and source code that can be compiled to execute or a finite state machine build a state table (which compile and execute code plugged into the template) may arise.
Regular expression patterns that strongly adhere to the characters in lexemes might represent. For example, based on an English language, a token name any English alphabetical characters or an ASCII alphanumeric character or an underscore followed by any number of events underscore can be. Can be represented compactly by the string [A-Za-Z_] [a Za - 9 Z_0] *. This means "any character az, AZ or _, 0 or az, AZ _, or 0-9 and followed by more."
Regular expressions and finite state machine as they powerful enough to generate recurring pattern, handle n "opening bracket, after a statement, n closing bracket are not after." They are not able to keep count, and checking that n on both sides the same - unless you set for a finite admissible values female. It takes a full parser in its full generality of this pattern recognition. An analyst can push a pile fields and then pop them off and try to see if stack is empty in the end.
Lex programming tool and its compiler for fast lexical analyzers based on formal description literal syntax is designed to generate code. It usually literal terms and the performance of critical applications with a complex set of requirements is not considered enough, for example, GNU Compiler Collection uses hand-written lexers.
Lexer generator
Top of Form
Bottom of Form
Textual analysis can often be performed in a single pass if a character is reading at a time. Classic single pass lexers generated by tools such as Flex can be.
Lex / flex family of generators on a table-driven approach which is less efficient than directly coded approach is used. With the latter approach is an engine generator that directly follow up through the goto statement jumps States produces. re2c and queχ such equipment (re2c like about the story) has proven engine that produced engines.It flex between two to three times faster than normal production are difficult to write by hand analyzers that these latter Tools generated by the engines to perform better than.
Simple utility to use a scanner generator, growth phase, when a language can change daily, especially in the specification should not be discounted. The ability to express verbal constructs a fund analyst described as regular expressions feature. Some equipment before and after conditions are hard to program by hand to provide details. In that case, using a scanner generator development can save a lot of time.
Lexical analyzer generators
• Flex - Classic Lex '(C / C) of the alternative version.
• JLex - a lexical analyzer generator for Java.
• Quex - (or 'Queχ') a method for pro-C lexical analyzer generator.
• OOLEX - an object-oriented lexical analyzer generator.
re2c •
• ply - and for Python Lex Yacc implementation of parsing tools.
1. Preprocessing. Some languages, like C, a preprocessing step that requires macro substitution and conditional compilation support. Syntactic or semantic analysis phase usually occurs before preprocessing, such as C in terms of syntactic forms preprocessor token instead literally manipulates. However, the plan supports macro substitution is based on syntactic forms as some languages.
2. Syntax analysis token sequence analysis of the program include recognition of syntactic structure. This phase is generally an analysis tree, a tree of a formal grammar rules define the syntax of the language with the structure designed according to the linear sequence of tokens makes the place. Parse tree, often augmented analyzed and transformed by the compiler in the later stages.
Listen
Read phonetically
Dictionary - View detailed dictionary
Parsing
Computer science and linguistics, analysis, or more formally, syntactic analysis, to analyze a sequence of tokens given (or less) formal grammar on the process of determining the grammatical structure. An analyst is as an interpreter or a compiler, where the input text refers to the inherent hierarchy and a further process (often parse trees, abstract syntax tree or other hierarchical structure, some sort of) into the appropriate form in components and at the same time normally checks for syntax errors. Parser often uses a different lexical analyzer input characters in order to create tokens. Parsers can be programmed by hand or semi-automatically from a grammar written in Backus-Naur form is a tool (such as Yacc) generated by (in some programming language) can be.
Parsing the first sentence of a natural language is the term for drawings, and still inflected languages like Romance languages or Latin used for drawing.
Parsers perform the functional programming language grammar can be constructed as specifications. Frost Hafiz and Callaghan is built on the work of others to a set of higher order functions (called parser combinators) which polynomial time and space complexity top-down parser obscure grammatical performance specifications be constructed as permission to stop building left recursive productions. X-SAIGA site is more on algorithms and implementation details.
Listen
Read phonetically
Dictionary - View detailed dictionary
Human languges
Some machine translation and natural language processing system, human languages are analyzed by a computer program. Human sentences are not easily analyzed by the program, as there is enough ambiguity in the structure of human language. In order to analyze the natural language data, researchers will first be used to agree on grammar. The choice of syntax both linguistic and computational concerns is affected by, for example, some parsing systems use lexical functional grammar, but in general, is parsed for the grammar of this type known to be NP-complete. Head-driven phrase structure grammar parsing other linguistic formalism which has become popular in the community, but other research efforts at less complex formalisms used in Penn Treebank as an emphasis. Shallow parsing noun phrases as key components only to discover the limits. Language to avoid controversy is another popular strategy dependency grammar parsing.
Most modern statistical parsers are at least partially, that is, they already annotated training data which has been a trust fund (analyzed by hand.) This approach allows the frequency with which the various building systems to gather information on specific contexts. (See machine learning.) Approaches which have been used directly (possible context-free grammar) PCFGs, maximum entropy, and neural nets include. Most of the more successful systems use lexical statistics (that is, to identify words they contain, as well as consider its part of speech). Although such systems are vulnerable to overfitting and requires some sort of smoothing effect.
Analysis algorithms for natural language programming language designed for manual as with grammar 'can not rely on the grammar of good qualities. As mentioned earlier, some grammar formalisms are too computationally difficult to analyze, in general, even if the desired structure is not free of context, the context-free approximation for some kind of grammar is used to perform a first pass. Algorithms that use context-free grammar CKY algorithm often depends on some kind, usually with some heuristic analyzes is to prune away the possibility to save time. (See chart analysis.) Trade speed for accuracy using some system, for example, Shift-linear time algorithm to reduce volumes. A recent development in the analysis of some reranking parser analyzes some of which have been proposed in large numbers, and a more complex system chooses the best option. This is normally a part of its subparts branching
Listen
Read phonetically
Dictionary - View detailed dictionary
Programming languages
The most common use of the Parser as a component of a compiler or interpreter is. It parses the source code of a computer programming language to create some form of internal representation. Programming languages, a context-free grammar can be specified in terms of the fast and efficient parsers can be written for them. Parsers are written by hand or generated by the parser generator.
Context-free grammar extent to which they can express in a language all of the requirements are limited. Informally, a language of reason is limited memory. Grammar, the presence long-term investment than an arbitrary build can not remember, it is a language, for example, before a name can refer to it should be declared is necessary to . More powerful grammar that can express this constraint, however, can not be parsed efficiently. Thus, a general context-free grammar for a superset of the desired language constructs to accept the strategy of creating a relaxing parser (that is, it accepts some illegal constructions), then, be filtered out unwanted Can constructs.
Listen
Read phonetically
Dictionary - View detailed dictionary
Overview Of Process
The following example demonstrates the common case of parsing a computer language with two levels of grammar: lexical and syntactic.
The first stage is the token generation, or lexical analysis, by which the input character stream is split into meaningful symbols defined by a grammar of regular expressions. For example, a calculator program would look at an input such as "12*(3+4)^2" and split it into the tokens 12, *, (, 3, +, 4, ), ^, and 2, each of which is a meaningful symbol in the context of an arithmetic expression. The parser would contain rules to tell it that the characters *, +, ^, ( and ) mark the start of a new token, so meaningless tokens like "12*" or "(3" will not be generated.
The next stage is parsing or syntactic analysis, which is checking that the tokens form an allowable expression. This is usually done with reference to a context-free grammar which recursively defines components that can make up an expression and the order in which they must appear. However, not all rules defining programming languages can be expressed by context-free grammars alone, for example type validity and proper declaration of identifiers. These rules can be formally expressed with attribute grammars.
The final phase is semantic parsing or analysis, which is working out the implications of the expression just validated and taking the appropriate action. In the case of a calculator or interpreter, the action is to evaluate the expression or program; a compiler, on the other hand, would generate some kind of code. Attribute grammars can also be used to define these actions.
Types of parsers
Parser's task is essentially to determine if and how the beginning of the input grammar symbol can be obtained. This can be done in essentially two ways:
• Top-down parsing - parsing an explanation from the top down - the formal rules of grammar detail using a top-down tree left to search for the derivations of an input stream be seen as an attempt to find is. Tokens are consumed from left to right. All options for inclusive elections right hand sides of grammar rules by expanding the opacity is adjusted. LL parsers and recursive - top-down parser descent parsers, which can not be left recursive productions are examples of adjusting. While it is assumed that top-down parsing simple implementation of direct and indirect left recursion is not to accommodate exponential time and space complexity may be required to parse the vague context-free grammar can, top-down parsing more sophisticated algorithm has been produced by Frost, Hafiz and Callaghan in the polynomial time ambiguity and left recursion accommodates and analyze the potential of trees - which generates an exponential number of representations of polynomial size. Their algorithm wrt an input to both left and right most of the derivations can produce a given cfg.
• Bottom-up parsing - an analyst input and try to rewrite it with a symbol beginning can start. Intuitively, the basic elements for the parser, these, and so on attempts to trace elements. LR parsers are examples of bottom-up parsers. Another term used for this type of analyzer is Shift-reduce parsing.
Another important distinction is whether an analyst leftmost derivation or right derivation (see context-free grammar) is generated. LL parsers and LR parsers generate a leftmost derivation generates a right derivation (although usually in reverse) would be
1. Semantic analysis phase that adds meaning to the compiler parse tree and symbol table information creates. This step checks the type (type check for errors) as the Czech sense, or does (his definition with variable and function references added) binding, or course work (need to start all local variables before using to) objects, the wrong program or continue to dismiss the warning. Semantic analysis usually requires a full analysis of the tree, which means that the phase step is as logical analysis, and logically precedes the code generation phase, although it is often possible to implement a compiler in a nearby code fold in several stages.
Listen
Read phonetically
Dictionary - View detailed dictionary
Back End
Back end to generate assembly code words sometimes because of student performance is confused with code generator. Between some literature on the machine dependent code generator back-end analysis and optimization phases distinguish normal end uses.
Back-end of the main steps include:
1. Analysis: The intermediate representation input derived from the multitude of program information. Typical analysis data used to create the flow - chains, dependence analysis, alias analysis, pointer analysis to define, analyze, etc. to avoid any compiler optimization is the basis for accurate analysis are analyzed. Call graph and control flow graph is usually made during the analysis phase.
2. Adaptation: intermediate language to represent the functional equivalent fast (or short) is converted to forms. Popular customization inline expansion, dead code elimination, constant propagation, loop transformation, the register allocation or even automatic parallelization.
3. Code Generation: turn intermediate language output language, usually the system is translated into native machine language. Deciding which variable selection and the appropriate machine instructions to register and memory and determine their associated addressing modes (see the Sethi-Ullman algorithm) to fit with the decision of such processing and storage is included.
Compiler analysis compiler optimization for any condition, and they work together tightly. For example, dependence analysis, loop transformation is important.
In addition, the scope of compiler analysis and optimization as much process / function blocks as a basic level is little different from, or even the whole program (interprocedural optimization) on. Obviously, a compiler is probably a better use of a comprehensive approach can work. But the broad approach is not free: large scope analysis and optimization very expensive in terms of compile time and memory space, it is especially true for interprocedural analysis and optimization.
interprocedural analysis and optimization of the existence HP, IBM, SGI, Intel, Microsoft, and Sun Microsystems are common in modern commercial compilers. Lack of open source GCC powerful interprocedural optimization was criticized for a long time, but it is changing in this regard. Another very complete analysis and optimization infrastructure with open source compiler Open64, the research and commercial purposes is used by many organizations.
More time and space compiler analysis and optimization required for some compilers due to leave them by default. Compile options for users to use the compiler optimization should be enabled explicitly .