Lexical Analysis
Lexical analyzer is essentially a pattern matcher. The earliest uses of P M was with text editors ( Unix ed line editor, Perl, or JavaScript).
ALA serves as front end of S_A. Technically L_A is S_A at the lowest level of program structures.
The L_A collect characters (from the input stream) into logical groups and assigns internal codes to (often referenced by named construct for the sake of readability) the groupings according to their structure.
These groupings are called lexemes. The internal codes are called tokens.
Example: sum = aldsum value/100;
token
IDENT ASSIN_OP
IDENT SUBT OP
IDENT DIVIS OP
INT LIT SEMICOLON
laxeme
sum
aldsum
value
100
L As extracts lexemes from a given input and produce the corresponding tokens. However, now days most L_A are subprograms that produces next lexeme and its associated token code from the input and return them to the caller(S_A). So the only view of the input program seen by S_A is the output of the L_A, one lexeme at a time. The L_A also skips comments, and blanks outside lexemes and inserts lexemes for user-defined names into symbol table Finally, the L As detect syntactic errors in tokens, such as ill-formed
floating-point literals, and report such error to the users.
There are three basic approaches to building LA 1. Write a formal description of the token pattern of the language using a descriptive
language related to regular expressions and use a software ( special program) to automatically generate L_A. (UNIX lex program) 2. Design a state transition diagram that describes the token pattern of the language and write
a program that implements the diagram 3. Design a state transition diagram that describes the token patterns of the language and
hand-construct a table driven implementation of the state diagram.
A state transition diagram , is graph like the syntax graph introduced in chapter3.
The nodes are labeled with state names. The arcs are labeled with the input characters that causes transitions. An arc may also include
an actions the L A must do when the transition is taken. This is nothing but so called finite automata (mathematical) machines, FAM as you remember can be designed to recogni s s of languages called regular languages. Regular expressions and regular grammars are
devices for regular languages The tokens of a programming language are regular language.
Get Answers For Free
Most questions answered within 1 hours.