Overview
TheScanner class (also known as a lexer or tokenizer) performs lexical analysis by reading source code character by character and converting it into a sequence of tokens. It’s the first phase of the compilation process.
Class Definition
Constructor Parameters
The complete source code text to be analyzed
Attributes
fuente(str): The complete source codetokens(List[Token]): List of tokens found during scanninginicio(int): Start position of the current tokenactual(int): Current reading positionlinea(int): Current line number (starts at 1)columna(int): Current column number (starts at 1)columna_inicio(int): Column where the current token startserrores(List[str]): List of lexical errors found
Public Methods
escanear_tokens()
Scans the entire source code and returns the list of tokens.List of all tokens found in the source code, including a final
FIN_ARCHIVO tokenSupported Token Types
The Scanner recognizes the following token types:Keywords
LET: Variable declaration keywordPRINT: Print statement keywordLEO: Reserved keyword (no operation)DIEGO: Reserved keyword (no operation)
Literals
NUMERO: Integer numbers (e.g., 10, 42, 100)IDENTIFICADOR: Variable names (e.g., x, suma, miVariable)
Operators
SUMA: Addition operator (+)RESTA: Subtraction operator (-)MULTIPLICACION: Multiplication operator (*)DIVISION: Division operator (/)IGUAL: Assignment operator (=)
Delimiters
PAREN_IZQ: Left parenthesis ”(”PAREN_DER: Right parenthesis ”)”PUNTO_COMA: Semicolon ”;“
Special
FIN_ARCHIVO: End of file markerERROR: Invalid token
Features
Comment Support
The Scanner supports single-line comments using//:
Error Handling
When the Scanner encounters an invalid character, it:- Adds an error message to the
erroreslist - Creates an
ERRORtoken - Continues scanning (error recovery)
Implementation Details
Reserved Words
The Scanner maintains a dictionary of reserved words:Number Recognition
Numbers are recognized as sequences of digits:- Only integer numbers are supported
- Decimal numbers are not supported in this version
Identifier Rules
Identifiers must:- Start with a letter (a-z, A-Z) or underscore (_)
- Can contain letters, numbers, and underscores
- Cannot be a reserved word
x, suma_total, miVariable, _private, contador1
Invalid identifiers: 1variable (starts with number), let (reserved word)
Usage Example
See Also
- Parser - Next phase: syntactic analysis
- Grammar - Token types and language specification
- Lexical Analysis - Understanding the scanning phase