SynKit Architecture
This document provides an overview of the SynKit architecture and how its components work together.
Core Design Principles
1. Source Generation Over Reflection
SynKit uses C# source generators to create lexers and parsers at compile time, providing:
- Performance: No runtime overhead from reflection
- Type Safety: Compile-time validation of grammar rules
- Debugging: Generated code can be stepped through
2. Attribute-Driven Configuration
Grammar rules and token definitions are specified using attributes:
[Token]- Define literal tokens[Regex]- Define pattern-based tokens[Rule]- Define parser production rules[Left],[Right]- Specify operator precedence and associativity
3. Immutable Error Handling
Parse errors are immutable objects that can be combined and enriched:
ParseErrorobjects can be merged using the|operator- Errors contain position information and expected elements
- Error contexts provide semantic information about what was being parsed
Component Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Text Layer │ │ Lexer Layer │ │ Parser Layer │
│ │ │ │ │ │
├─────────────────┤ ├─────────────────┤ ├─────────────────┤
│ • SourceFile │───▶│ • ILexer<T> │───▶│ • IParser │
│ • Position │ │ • Token<T> │ │ • ParseResult<T>│
│ • Range │ │ • TokenStream │ │ • ParseError │
│ • Location │ │ • [Lexer] attr │ │ • [Parser] attr │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ ┌─────────────────┐ │
│ │ │ │
└──────────────▶│ Reporting Layer │◀─────────────┘
│ │
├─────────────────┤
│ • Diagnostics │
│ • Severity │
│ • Presenter │
│ • Highlighter │
└─────────────────┘
Data Flow
1. Lexical Analysis
Source Text ──▶ ILexer<TToken> ──▶ Token Stream
│ │ │
│ ▼ ▼
│ [Generated Lexer] IToken<TToken>
│ │ │
▼ │ │
SourceFile ──────────┘ │
Position/Range │
│ │
└───────────────────────────────────┘
Error Reporting
2. Syntactic Analysis
Token Stream ──▶ IParser ──▶ ParseResult<T>
│ │ │
│ ▼ ├─ Ok<T>
│ [Generated Parser] │
│ │ └─ Error
│ ▼ │
└────▶ Error Recovery ──────────┘
Synchronization
Key Interfaces
ILexer
TToken Next()- Advance and return next tokenPosition Position- Current source positionbool IsEnd- Check if input is exhausted
IToken
TToken Kind- Token type/categorystring Text- Literal text contentRange Range- Source locationobject? Value- Parsed semantic value
ParseResult
bool IsOk/IsError- Success/failure stateOk<T> Ok- Success result with valueParseError Error- Failure with detailed error info
ParseError
IComparable Position- Error locationobject? Got- What was actually foundIReadOnlyDictionary<string, ParseErrorElement> Elements- Expected elements by context
Generator Architecture
Lexer Generation
- Analysis: Scan
[Lexer]attributed classes - Token Discovery: Find
[Token]and[Regex]attributes on enum values - Automata Construction: Build finite automata for token recognition
- Code Generation: Emit lexer implementation with optimized state machine
Parser Generation
- Grammar Analysis: Extract
[Rule]attributed methods - Precedence Resolution: Process
[Left],[Right],[Precedence]attributes - LR Table Construction: Build parsing tables using LR algorithm
- Code Generation: Emit parser with table-driven parsing logic
Error Recovery
SynKit implements following error recovery:
Lexer Errors
- Invalid characters produce
Errortokens - Lexer continues from next valid position
- Position information preserved for diagnostics
Parser Errors
- ParseError objects accumulate expected elements
- Multiple alternative errors can be merged
- Context information helps with meaningful error messages
- Synchronization points allow recovery and continuation
Thread Safety
- Lexers: Not thread-safe (maintain internal state)
- Parsers: Not thread-safe (maintain parse stacks)
- Tokens: Immutable and thread-safe
- ParseResults: Immutable and thread-safe
- SourceFiles: Thread-safe for reading