Syntactic Analysis (Parser)

Overview

The Parser (also called Syntax Analyzer) is the second phase of compilation. It takes the flat list of tokens from the Scanner and organizes them into a hierarchical Abstract Syntax Tree (AST) according to the language grammar.

Analogy: Like diagramming a sentence in grammar class.“The cat eats fish” →

  Sentence
  ├── Subject: "The cat"
  └── Predicate
      ├── Verb: "eats"
      └── Object: "fish"

Grammar Rules

The parser follows this context-free grammar:

program     → statement*
statement   → declaration | print_stmt
declaration → "let" IDENTIFIER "=" expression ";"
print_stmt  → "print" expression ";"
expression  → addition
addition    → multiplication (("+ " | "-") multiplication)*
multiplication → primary (("*" | "/") primary)*
primary     → NUMBER | IDENTIFIER | "(" expression ")"

Operator Precedence is encoded by nesting depth:

primary (highest precedence) - literals, variables, parentheses
multiplication - * and /
addition (lowest precedence) - + and -

This ensures 3 + 4 * 2 parses as 3 + (4 * 2), not (3 + 4) * 2.

AST Node Types

The parser constructs these node types:

Expression Nodes

NumeroLiteral
Identificador
ExpresionBinaria
ExpresionAgrupada

Represents a numeric constant.

@dataclass
class NumeroLiteral(Expresion):
    token: Token  # Original token (for position info)
    valor: int    # The numeric value

Example:

42  →  NumeroLiteral(token=Token(...), valor=42)

Represents a variable reference.

@dataclass
class Identificador(Expresion):
    token: Token  # Original token
    nombre: str   # Variable name

Example:

x  →  Identificador(token=Token(...), nombre='x')

Represents an operation between two expressions.

@dataclass
class ExpresionBinaria(Expresion):
    izquierda: Expresion  # Left operand
    operador: Token       # +, -, *, /
    derecha: Expresion    # Right operand

Example:

5 + 3  →  ExpresionBinaria(
            izquierda=NumeroLiteral(5),
            operador=Token(SUMA, '+'),
            derecha=NumeroLiteral(3)
          )

Represents a parenthesized expression.

@dataclass
class ExpresionAgrupada(Expresion):
    expresion: Expresion  # The inner expression

Example:

(5 + 3)  →  ExpresionAgrupada(
              expresion=ExpresionBinaria(...)
            )

Statement Nodes

DeclaracionVariable
SentenciaPrint

Variable declaration statement.

@dataclass
class DeclaracionVariable(Sentencia):
    token_let: Token      # The 'let' keyword
    nombre: Token         # Variable name token
    expresion: Expresion  # Initializer expression

Example:

let x = 10;  →  DeclaracionVariable(
                  token_let=Token(LET, 'let'),
                  nombre=Token(IDENTIFICADOR, 'x'),
                  expresion=NumeroLiteral(10)
                )

Print statement.

@dataclass
class SentenciaPrint(Sentencia):
    token_print: Token    # The 'print' keyword
    expresion: Expresion  # Expression to print

Example:

print x;  →  SentenciaPrint(
               token_print=Token(PRINT, 'print'),
               expresion=Identificador('x')
             )

Program Node

@dataclass
class Programa:
    sentencias: List[Sentencia]  # All statements in the program

The root of the AST - contains all statements.

Parsing Algorithm

The parser uses Recursive Descent Parsing:

Initialize

Store token list
Set current position to 0
Initialize error list

Parse Program

def parsear(self) -> Programa:
    sentencias = []
    while not self._fin():
        sentencia = self._sentencia()
        if sentencia:
            sentencias.append(sentencia)
    return Programa(sentencias)

Parse Statements

Dispatch based on lookahead token:

def _sentencia(self) -> Sentencia:
    if self._coincide(TipoToken.LET):
        return self._declaracion_variable()
    if self._coincide(TipoToken.PRINT):
        return self._sentencia_print()
    raise error("Expected 'let' or 'print'")

Parse Expressions

Follow precedence levels recursively:

_expresion() → _suma_resta() → _multiplicacion_division() → _primario()

Parsing Example

Let’s trace how let x = 5 + 3 * 2; is parsed:

Statement Recognition

# Current token: LET
_sentencia():
  coincide(LET) → True
  return _declaracion_variable()

Variable Declaration

_declaracion_variable():
  token_let = Token(LET, 'let')  # Already consumed
  nombre = consumir(IDENTIFICADOR)  # 'x'
  consumir(IGUAL)  # '='
  expresion = _expresion()  # Parse right side
  consumir(PUNTO_COMA)  # ';'
  return DeclaracionVariable(token_let, nombre, expresion)

Expression Parsing

Parse 5 + 3 * 2 with correct precedence:

_expresion() → _suma_resta():
  # First, get left side
  expr = _multiplicacion_division()  # Returns NumeroLiteral(5)
  
  # See '+', so continue
  while coincide(SUMA, RESTA):
    operador = Token(SUMA, '+')
    derecha = _multiplicacion_division()  # Parse right side
    expr = ExpresionBinaria(expr, operador, derecha)
  
  return expr

Right Side (3 * 2)

_multiplicacion_division():
  # Get 3
  expr = _primario()  # NumeroLiteral(3)
  
  # See '*', so continue
  while coincide(MULTIPLICACION, DIVISION):
    operador = Token(MULTIPLICACION, '*')
    derecha = _primario()  # NumeroLiteral(2)
    expr = ExpresionBinaria(expr, operador, derecha)
  
  return ExpresionBinaria(
    izquierda=NumeroLiteral(3),
    operador=Token(MULT, '*'),
    derecha=NumeroLiteral(2)
  )

Final AST

DeclaracionVariable(
  token_let=Token(LET, 'let'),
  nombre=Token(IDENTIFICADOR, 'x'),
  expresion=ExpresionBinaria(
    izquierda=NumeroLiteral(5),
    operador=Token(SUMA, '+'),
    derecha=ExpresionBinaria(
      izquierda=NumeroLiteral(3),
      operador=Token(MULTIPLICACION, '*'),
      derecha=NumeroLiteral(2)
    )
  )
)

This correctly represents: 5 + (3 * 2)

Precedence Visualization

Correct Precedence
Parentheses Override
Same Precedence

Input: a + b * cAST:

Evaluation: a + (b * c)Why: _suma_resta() calls _multiplicacion_division() for right side, which consumes both b and c before returning.

Input: (a + b) * cAST:

      *
     / \
    +   c
   / \
  a   b

Evaluation: (a + b) * cWhy: _primario() recognizes ( and recursively calls _expresion(), creating the addition node first.

Input: a - b + cAST:

      +
     / \
    -   c
   / \
  a   b

Evaluation: (a - b) + cWhy: _suma_resta() loops left-to-right, building left-associative tree.

Error Handling

Syntax Errors

The parser reports specific, helpful error messages:

def _error(self, token: Token, mensaje: str) -> ErrorSintaxis:
    msg = f"Error de sintaxis en línea {token.linea}, columna {token.columna}: " \
          f"{mensaje}. Se encontró '{token.lexema}'"
    return ErrorSintaxis(msg)

Example Errors:

Missing Semicolon

Input: let x = 5Error:

Error de sintaxis en línea 1, columna 10: Se esperaba ';' al final de la declaración. Se encontró ''

Missing Variable Name

Input: let = 5;Error:

Error de sintaxis en línea 1, columna 5: Se esperaba nombre de variable después de 'let'. Se encontró '='

Unmatched Parenthesis

Input: let x = (5 + 3;Error:

Error de sintaxis en línea 1, columna 14: Se esperaba ')' después de la expresión. Se encontró ';'

Unexpected Token

Input: + x = 5;Error:

Error de sintaxis en línea 1, columna 1: Se esperaba 'let' o 'print'. Se encontró '+'

Error Recovery (Panic Mode)

The parser continues after errors to find multiple issues:

def _sincronizar(self):
    """Find next safe point to resume parsing"""
    self._avanzar()
    
    while not self._fin():
        # Previous token was semicolon - safe to continue
        if self._anterior().tipo == TipoToken.PUNTO_COMA:
            return
        
        # Current token starts a statement
        if self._ver_actual().tipo in (TipoToken.LET, TipoToken.PRINT):
            return
        
        self._avanzar()

Example:

let x = ;     // Error: missing expression
let y = 5;    // Still parsed correctly
print z       // Error: missing semicolon

Output:

Error de sintaxis en línea 1, columna 9: Se esperaba una expresión. Se encontró ';'
Error de sintaxis en línea 3, columna 8: Se esperaba ';' al final de print. Se encontró ''
✓ Completado: 1 sentencia parseada

Helper Methods

_coincide()
_consumir()
_verificar()
_avanzar()

Check if current token matches any of the given types, and consume if so:

def _coincide(self, *tipos: TipoToken) -> bool:
    for tipo in tipos:
        if self._verificar(tipo):
            self._avanzar()
            return True
    return False

Usage:

if self._coincide(TipoToken.SUMA, TipoToken.RESTA):
    # Current token is + or -, and has been consumed

Require a specific token type, or raise error:

def _consumir(self, tipo: TipoToken, mensaje: str) -> Token:
    if self._verificar(tipo):
        return self._avanzar()
    raise self._error(self._ver_actual(), mensaje)

Usage:

self._consumir(
    TipoToken.PUNTO_COMA,
    "Se esperaba ';' al final de la declaración"
)

Check token type without consuming:

def _verificar(self, tipo: TipoToken) -> bool:
    if self._fin():
        return False
    return self._ver_actual().tipo == tipo

Used for lookahead in decision-making.

Move to next token:

def _avanzar(self) -> Token:
    if not self._fin():
        self.actual += 1
    return self._anterior()

Returns the token that was just consumed.

Complete Parsing Example

Input Code
Token Stream
AST (Tree View)
Parser Output

let a = 5;
let b = a * 2 + 3;
print b;

[LET, ID(a), =, NUM(5), ;,
 LET, ID(b), =, ID(a), *, NUM(2), +, NUM(3), ;,
 PRINT, ID(b), ;,
 EOF]

Programa
├── DeclaracionVariable
│   ├── nombre: 'a'
│   └── expresion: NumeroLiteral(5)
├── DeclaracionVariable
│   ├── nombre: 'b'
│   └── expresion:
│       └── ExpresionBinaria(+)
│           ├── izquierda:
│           │   └── ExpresionBinaria(*)
│           │       ├── izquierda: Identificador('a')
│           │       └── derecha: NumeroLiteral(2)
│           └── derecha: NumeroLiteral(3)
└── SentenciaPrint
    └── expresion: Identificador('b')

[FASE 2] Análisis Sintáctico...
         (Verificando gramática y construyendo AST)
  ✓ Completado: 3 sentencias parseadas

Performance

Time Complexity

O(n) where n = token countEach token is visited exactly once.

Space Complexity

O(d) where d = max nesting depthRecursion stack depth = max expression nesting.

Source Code Reference

Implementation

File: compfinal.pyLines: 494-950Key Classes:

Expresion, Sentencia, Programa - AST node types
NumeroLiteral, Identificador, ExpresionBinaria, ExpresionAgrupada - Expression nodes
DeclaracionVariable, SentenciaPrint - Statement nodes
Parser - Main parser implementation

Main Methods:

parsear() - Entry point, returns Programa
_sentencia() - Parse one statement
_declaracion_variable() - Parse let statement
_sentencia_print() - Parse print statement
_expresion() → _suma_resta() → _multiplicacion_division() → _primario() - Expression parsing with precedence

Get Started

Core Concepts

Guides

Compiler Components

API Reference

Examples

Overview

Grammar Rules

AST Node Types

Expression Nodes

Statement Nodes

Program Node

Parsing Algorithm

Parsing Example

Precedence Visualization

Error Handling

Syntax Errors

Error Recovery (Panic Mode)

Helper Methods

Complete Parsing Example

Performance

Time Complexity

Space Complexity

Source Code Reference

Implementation

Next Steps

Semantic Analysis

API Reference

Get Started

Core Concepts

Guides

Compiler Components

API Reference

Examples

​Overview

​Grammar Rules

​AST Node Types

​Expression Nodes

​Statement Nodes

​Program Node

​Parsing Algorithm

​Parsing Example

​Precedence Visualization

​Error Handling

​Syntax Errors

​Error Recovery (Panic Mode)

​Helper Methods

​Complete Parsing Example

​Performance

Time Complexity

Space Complexity

​Source Code Reference

Implementation

​Next Steps

Semantic Analysis

API Reference

Overview

Grammar Rules

AST Node Types

Expression Nodes

Statement Nodes

Program Node

Parsing Algorithm

Parsing Example

Precedence Visualization

Error Handling

Syntax Errors

Error Recovery (Panic Mode)

Helper Methods

Complete Parsing Example

Performance

Source Code Reference

Next Steps