Skip to main content

Overview

The Parser (also called Syntax Analyzer) is the second phase of compilation. It takes the flat list of tokens from the Scanner and organizes them into a hierarchical Abstract Syntax Tree (AST) according to the language grammar.
Analogy: Like diagramming a sentence in grammar class.“The cat eats fish” →
  Sentence
  ├── Subject: "The cat"
  └── Predicate
      ├── Verb: "eats"
      └── Object: "fish"

Grammar Rules

The parser follows this context-free grammar:
program     → statement*
statement   → declaration | print_stmt
declaration → "let" IDENTIFIER "=" expression ";"
print_stmt  → "print" expression ";"
expression  → addition
addition    → multiplication (("+ " | "-") multiplication)*
multiplication → primary (("*" | "/") primary)*
primary     → NUMBER | IDENTIFIER | "(" expression ")"
Operator Precedence is encoded by nesting depth:
  • primary (highest precedence) - literals, variables, parentheses
  • multiplication - * and /
  • addition (lowest precedence) - + and -
This ensures 3 + 4 * 2 parses as 3 + (4 * 2), not (3 + 4) * 2.

AST Node Types

The parser constructs these node types:

Expression Nodes

Represents a numeric constant.
@dataclass
class NumeroLiteral(Expresion):
    token: Token  # Original token (for position info)
    valor: int    # The numeric value
Example:
42NumeroLiteral(token=Token(...), valor=42)

Statement Nodes

Variable declaration statement.
@dataclass
class DeclaracionVariable(Sentencia):
    token_let: Token      # The 'let' keyword
    nombre: Token         # Variable name token
    expresion: Expresion  # Initializer expression
Example:
let x = 10;  →  DeclaracionVariable(
                  token_let=Token(LET, 'let'),
                  nombre=Token(IDENTIFICADOR, 'x'),
                  expresion=NumeroLiteral(10)
                )

Program Node

@dataclass
class Programa:
    sentencias: List[Sentencia]  # All statements in the program
The root of the AST - contains all statements.

Parsing Algorithm

The parser uses Recursive Descent Parsing:
1

Initialize

  • Store token list
  • Set current position to 0
  • Initialize error list
2

Parse Program

def parsear(self) -> Programa:
    sentencias = []
    while not self._fin():
        sentencia = self._sentencia()
        if sentencia:
            sentencias.append(sentencia)
    return Programa(sentencias)
3

Parse Statements

Dispatch based on lookahead token:
def _sentencia(self) -> Sentencia:
    if self._coincide(TipoToken.LET):
        return self._declaracion_variable()
    if self._coincide(TipoToken.PRINT):
        return self._sentencia_print()
    raise error("Expected 'let' or 'print'")
4

Parse Expressions

Follow precedence levels recursively:
_expresion() → _suma_resta() → _multiplicacion_division() → _primario()

Parsing Example

Let’s trace how let x = 5 + 3 * 2; is parsed:
1

Statement Recognition

# Current token: LET
_sentencia():
  coincide(LET) → True
  return _declaracion_variable()
2

Variable Declaration

_declaracion_variable():
  token_let = Token(LET, 'let')  # Already consumed
  nombre = consumir(IDENTIFICADOR)  # 'x'
  consumir(IGUAL)  # '='
  expresion = _expresion()  # Parse right side
  consumir(PUNTO_COMA)  # ';'
  return DeclaracionVariable(token_let, nombre, expresion)
3

Expression Parsing

Parse 5 + 3 * 2 with correct precedence:
_expresion() → _suma_resta():
  # First, get left side
  expr = _multiplicacion_division()  # Returns NumeroLiteral(5)
  
  # See '+', so continue
  while coincide(SUMA, RESTA):
    operador = Token(SUMA, '+')
    derecha = _multiplicacion_division()  # Parse right side
    expr = ExpresionBinaria(expr, operador, derecha)
  
  return expr
4

Right Side (3 * 2)

_multiplicacion_division():
  # Get 3
  expr = _primario()  # NumeroLiteral(3)
  
  # See '*', so continue
  while coincide(MULTIPLICACION, DIVISION):
    operador = Token(MULTIPLICACION, '*')
    derecha = _primario()  # NumeroLiteral(2)
    expr = ExpresionBinaria(expr, operador, derecha)
  
  return ExpresionBinaria(
    izquierda=NumeroLiteral(3),
    operador=Token(MULT, '*'),
    derecha=NumeroLiteral(2)
  )
5

Final AST

DeclaracionVariable(
  token_let=Token(LET, 'let'),
  nombre=Token(IDENTIFICADOR, 'x'),
  expresion=ExpresionBinaria(
    izquierda=NumeroLiteral(5),
    operador=Token(SUMA, '+'),
    derecha=ExpresionBinaria(
      izquierda=NumeroLiteral(3),
      operador=Token(MULTIPLICACION, '*'),
      derecha=NumeroLiteral(2)
    )
  )
)
This correctly represents: 5 + (3 * 2)

Precedence Visualization

Input: a + b * cAST:
      +
     / \
    a   *
       / \
      b   c
Evaluation: a + (b * c)Why: _suma_resta() calls _multiplicacion_division() for right side, which consumes both b and c before returning.

Error Handling

Syntax Errors

The parser reports specific, helpful error messages:
def _error(self, token: Token, mensaje: str) -> ErrorSintaxis:
    msg = f"Error de sintaxis en línea {token.linea}, columna {token.columna}: " \
          f"{mensaje}. Se encontró '{token.lexema}'"
    return ErrorSintaxis(msg)
Example Errors:
Input: let x = 5Error:
Error de sintaxis en línea 1, columna 10: Se esperaba ';' al final de la declaración. Se encontró ''
Input: let = 5;Error:
Error de sintaxis en línea 1, columna 5: Se esperaba nombre de variable después de 'let'. Se encontró '='
Input: let x = (5 + 3;Error:
Error de sintaxis en línea 1, columna 14: Se esperaba ')' después de la expresión. Se encontró ';'
Input: + x = 5;Error:
Error de sintaxis en línea 1, columna 1: Se esperaba 'let' o 'print'. Se encontró '+'

Error Recovery (Panic Mode)

The parser continues after errors to find multiple issues:
def _sincronizar(self):
    """Find next safe point to resume parsing"""
    self._avanzar()
    
    while not self._fin():
        # Previous token was semicolon - safe to continue
        if self._anterior().tipo == TipoToken.PUNTO_COMA:
            return
        
        # Current token starts a statement
        if self._ver_actual().tipo in (TipoToken.LET, TipoToken.PRINT):
            return
        
        self._avanzar()
Example:
let x = ;     // Error: missing expression
let y = 5;    // Still parsed correctly
print z       // Error: missing semicolon
Output:
Error de sintaxis en línea 1, columna 9: Se esperaba una expresión. Se encontró ';'
Error de sintaxis en línea 3, columna 8: Se esperaba ';' al final de print. Se encontró ''
✓ Completado: 1 sentencia parseada

Helper Methods

Check if current token matches any of the given types, and consume if so:
def _coincide(self, *tipos: TipoToken) -> bool:
    for tipo in tipos:
        if self._verificar(tipo):
            self._avanzar()
            return True
    return False
Usage:
if self._coincide(TipoToken.SUMA, TipoToken.RESTA):
    # Current token is + or -, and has been consumed

Complete Parsing Example

let a = 5;
let b = a * 2 + 3;
print b;

Performance

Time Complexity

O(n) where n = token countEach token is visited exactly once.

Space Complexity

O(d) where d = max nesting depthRecursion stack depth = max expression nesting.

Source Code Reference

Implementation

File: compfinal.pyLines: 494-950Key Classes:
  • Expresion, Sentencia, Programa - AST node types
  • NumeroLiteral, Identificador, ExpresionBinaria, ExpresionAgrupada - Expression nodes
  • DeclaracionVariable, SentenciaPrint - Statement nodes
  • Parser - Main parser implementation
Main Methods:
  • parsear() - Entry point, returns Programa
  • _sentencia() - Parse one statement
  • _declaracion_variable() - Parse let statement
  • _sentencia_print() - Parse print statement
  • _expresion()_suma_resta()_multiplicacion_division()_primario() - Expression parsing with precedence

Next Steps

Semantic Analysis

See how the AST is validated for logical correctness

API Reference

Detailed API documentation for the Parser class