Scanner

Overview

The Scanner class (also known as a lexer or tokenizer) performs lexical analysis by reading source code character by character and converting it into a sequence of tokens. It’s the first phase of the compilation process.

Class Definition

class Scanner:
    def __init__(self, codigo_fuente: str)

Constructor Parameters

codigo_fuente

str

required

The complete source code text to be analyzed

Attributes

fuente (str): The complete source code
tokens (List[Token]): List of tokens found during scanning
inicio (int): Start position of the current token
actual (int): Current reading position
linea (int): Current line number (starts at 1)
columna (int): Current column number (starts at 1)
columna_inicio (int): Column where the current token starts
errores (List[str]): List of lexical errors found

Public Methods

escanear_tokens()

Scans the entire source code and returns the list of tokens.

def escanear_tokens(self) -> List[Token]

return

List[Token]

List of all tokens found in the source code, including a final FIN_ARCHIVO token

Example:

scanner = Scanner("let x = 10;")
tokens = scanner.escanear_tokens()

for token in tokens:
    print(token)
# Output:
# Token(LET, 'let', línea=1, col=1)
# Token(IDENTIFICADOR, 'x', línea=1, col=5)
# Token(IGUAL, '=', línea=1, col=7)
# Token(NUMERO, '10', línea=1, col=9, valor=10)
# Token(PUNTO_COMA, ';', línea=1, col=11)
# Token(FIN_ARCHIVO, '', línea=1, col=12)

Supported Token Types

The Scanner recognizes the following token types:

Keywords

LET: Variable declaration keyword
PRINT: Print statement keyword
LEO: Reserved keyword (no operation)
DIEGO: Reserved keyword (no operation)

Literals

NUMERO: Integer numbers (e.g., 10, 42, 100)
IDENTIFICADOR: Variable names (e.g., x, suma, miVariable)

Operators

SUMA: Addition operator (+)
RESTA: Subtraction operator (-)
MULTIPLICACION: Multiplication operator (*)
DIVISION: Division operator (/)
IGUAL: Assignment operator (=)

Delimiters

PAREN_IZQ: Left parenthesis ”(”
PAREN_DER: Right parenthesis ”)”
PUNTO_COMA: Semicolon ”;“

Special

FIN_ARCHIVO: End of file marker
ERROR: Invalid token

Features

Comment Support

The Scanner supports single-line comments using //:

scanner = Scanner("let x = 5; // this is a comment")
tokens = scanner.escanear_tokens()
# The comment is ignored during tokenization

Error Handling

When the Scanner encounters an invalid character, it:

Adds an error message to the errores list
Creates an ERROR token
Continues scanning (error recovery)

scanner = Scanner("let x = @;")
tokens = scanner.escanear_tokens()

if scanner.errores:
    for error in scanner.errores:
        print(error)
    # Output: Error léxico en línea 1, columna 9: carácter inesperado '@'

Implementation Details

Reserved Words

The Scanner maintains a dictionary of reserved words:

PALABRAS_RESERVADAS = {
    'let': TipoToken.LET,
    'print': TipoToken.PRINT,
    'leo': TipoToken.LEO,
    'diego': TipoToken.DIEGO,
}

Number Recognition

Numbers are recognized as sequences of digits:

Only integer numbers are supported
Decimal numbers are not supported in this version

Identifier Rules

Identifiers must:

Start with a letter (a-z, A-Z) or underscore (_)
Can contain letters, numbers, and underscores
Cannot be a reserved word

Valid identifiers: x, suma_total, miVariable, _private, contador1 Invalid identifiers: 1variable (starts with number), let (reserved word)

Usage Example

from compfinal import Scanner

# Create scanner with source code
code = """
let x = 10;
let y = 20;
print x + y;
"""

scanner = Scanner(code)
tokens = scanner.escanear_tokens()

# Check for errors
if scanner.errores:
    print("Lexical errors found:")
    for error in scanner.errores:
        print(f"  - {error}")
else:
    print(f"Successfully scanned {len(tokens)} tokens")
    for token in tokens:
        if token.tipo != TipoToken.FIN_ARCHIVO:
            print(f"  {token}")

Get Started

Core Concepts

Guides

Compiler Components

API Reference

Examples

Overview

Class Definition

Constructor Parameters

Attributes

Public Methods

escanear_tokens()

Supported Token Types

Keywords

Literals

Operators

Delimiters

Special

Features

Comment Support

Error Handling

Implementation Details

Reserved Words

Number Recognition

Identifier Rules

Usage Example

See Also

Get Started

Core Concepts

Guides

Compiler Components

API Reference

Examples

​Overview

​Class Definition

​Constructor Parameters

​Attributes

​Public Methods

​escanear_tokens()

​Supported Token Types

​Keywords

​Literals

​Operators

​Delimiters

​Special

​Features

​Comment Support

​Error Handling

​Implementation Details

​Reserved Words

​Number Recognition

​Identifier Rules

​Usage Example

​See Also

Overview

Class Definition

Constructor Parameters

Attributes

Public Methods

escanear_tokens()

Supported Token Types

Keywords

Literals

Operators

Delimiters

Special

Features

Comment Support

Error Handling

Implementation Details

Reserved Words

Number Recognition

Identifier Rules

Usage Example

See Also