Refactor Transformation Pipeline to Composable Architecture
Labels: arch-review, maintainability, performance, high
Priority: P1
Severity: HIGH
Epic: Architectural Improvements Q2 2026
Problem Summary
The compiler's transformation pipeline consists of 18 sequential phases hardcoded in ParserManager.ApplyLanguageAnalysisPhases(). This monolithic design makes the compiler rigid, hard to test, difficult to debug, and prevents optimization opportunities.
Current Issues
Monolithic Pipeline
- 18 transformation phases in fixed order (ParserManager.cs:39-170)
- 5,236 lines of transformation code across 19 visitor files
- No ability to skip phases or reorder transformations
- No phase-level caching or optimization
- Complex dependencies between phases not explicit
Code Evidence:
// src/compiler/ParserManager.cs:39
public static AstThing ApplyLanguageAnalysisPhases(...)
{
if (upTo >= AnalysisPhase.TreeLink)
ast = new TreeLinkageVisitor().Visit(ast);
if (upTo >= AnalysisPhase.Builtins)
ast = new BuiltinInjectorVisitor().Visit(ast);
if (upTo >= AnalysisPhase.ClassCtors)
ast = new ClassCtorInserter().Visit(ast);
// ... 15 more phases in rigid sequence
}
Problems
Maintainability: - Adding new phase requires modifying central orchestration - Phase dependencies are implicit (order-based) - Cannot easily disable experimental phases - Hard to understand phase interactions
Testing: - Cannot test phases in isolation - Must run earlier phases to test later ones - No ability to inject test data between phases - Integration tests expensive (full pipeline)
Performance: - Cannot parallelize independent phases - Must run all phases even when some are no-ops - Cannot cache intermediate results per phase - No way to skip phases for unchanged code
Debugging: - Cannot step through single phase - Hard to bisect which phase caused error - No phase-level instrumentation - Cannot dump AST between specific phases
Requirements
Phase Interface
public interface ICompilerPhase
{
string Name { get; }
IReadOnlyList<string> DependsOn { get; }
IReadOnlyList<string> ProvidedCapabilities { get; }
PhaseResult Transform(AstThing ast, PhaseContext context);
}
public record PhaseResult(
AstThing TransformedAst,
IReadOnlyList<Diagnostic> Diagnostics,
bool Success
);
public class PhaseContext
{
public ISymbolTable SymbolTable { get; set; }
public ITypeRegistry TypeRegistry { get; set; }
public Dictionary<string, object> SharedData { get; }
public bool EnableCaching { get; set; }
}
Pipeline Orchestrator
public class TransformationPipeline
{
private readonly List<ICompilerPhase> _phases = new();
private readonly Dictionary<string, AstThing> _cache = new();
public void RegisterPhase(ICompilerPhase phase)
{
// Validate dependencies exist
foreach (var dep in phase.DependsOn)
{
if (!_phases.Any(p => p.ProvidedCapabilities.Contains(dep)))
throw new InvalidOperationException(
$"Dependency '{dep}' not satisfied");
}
_phases.Add(phase);
}
public PipelineResult Execute(AstThing ast, PipelineOptions options)
{
var context = new PhaseContext();
var allDiagnostics = new List<Diagnostic>();
var currentAst = ast;
// Topologically sort phases by dependencies
var sortedPhases = TopologicalSort(_phases);
foreach (var phase in sortedPhases)
{
if (options.SkipPhases.Contains(phase.Name))
continue;
// Check cache if enabled
if (options.EnableCaching && TryGetCached(phase, currentAst, out var cached))
{
currentAst = cached;
continue;
}
var result = phase.Transform(currentAst, context);
allDiagnostics.AddRange(result.Diagnostics);
if (!result.Success && options.StopOnError)
return new PipelineResult(currentAst, allDiagnostics, false);
currentAst = result.TransformedAst;
if (options.EnableCaching)
Cache(phase, ast, currentAst);
}
return new PipelineResult(currentAst, allDiagnostics, true);
}
}
Phase Registration
public class TreeLinkagePhase : ICompilerPhase
{
public string Name => "TreeLinkage";
public IReadOnlyList<string> DependsOn => Array.Empty<string>();
public IReadOnlyList<string> ProvidedCapabilities => new[] { "TreeStructure" };
public PhaseResult Transform(AstThing ast, PhaseContext context)
{
var visitor = new TreeLinkageVisitor();
var result = visitor.Visit(ast);
return new PhaseResult(result, visitor.Diagnostics, true);
}
}
public class SymbolTablePhase : ICompilerPhase
{
public string Name => "SymbolTable";
public IReadOnlyList<string> DependsOn => new[] { "TreeStructure", "Builtins" };
public IReadOnlyList<string> ProvidedCapabilities => new[] { "Symbols" };
public PhaseResult Transform(AstThing ast, PhaseContext context)
{
var visitor = new SymbolTableBuilderVisitor();
var result = visitor.Visit(ast);
context.SymbolTable = result.SymbolTable;
return new PhaseResult(result.Ast, visitor.Diagnostics, true);
}
}
Benefits
Testing
[Test]
public void TestTypeAnnotationPhase()
{
var pipeline = new TransformationPipeline();
pipeline.RegisterPhase(new TreeLinkagePhase());
pipeline.RegisterPhase(new SymbolTablePhase());
pipeline.RegisterPhase(new TypeAnnotationPhase());
// Test only specific phase with dependencies
var result = pipeline.Execute(testAst, new PipelineOptions
{
StopAfter = "TypeAnnotation"
});
Assert.Empty(result.Diagnostics);
}
Debugging
// Dump AST after specific phases
var result = pipeline.Execute(ast, new PipelineOptions
{
DumpAfter = new[] { "SymbolTable", "TypeAnnotation" },
DumpToFile = true
});
// Step through single phase
var phase = new SymbolTablePhase();
var phaseResult = phase.Transform(ast, context);
// Inspect result
Performance
// Parallel execution of independent phases
var parallelPipeline = new ParallelTransformationPipeline();
parallelPipeline.RegisterPhase(new Phase1());
parallelPipeline.RegisterPhase(new Phase2()); // Independent of Phase1
parallelPipeline.Execute(ast); // Runs Phase1 and Phase2 in parallel
// Phase-level caching
var cachedPipeline = new TransformationPipeline();
cachedPipeline.Execute(ast, new PipelineOptions { EnableCaching = true });
// Second run uses cached results for unchanged phases
Extensibility
// Third-party can add custom phases
public class CustomAnalysisPhase : ICompilerPhase
{
public string Name => "CustomAnalysis";
public IReadOnlyList<string> DependsOn => new[] { "Symbols", "Types" };
public IReadOnlyList<string> ProvidedCapabilities => new[] { "CustomMetadata" };
public PhaseResult Transform(AstThing ast, PhaseContext context)
{
// Custom analysis logic
}
}
var pipeline = new TransformationPipeline();
pipeline.RegisterPhase(new CustomAnalysisPhase());
Implementation Plan
Phase 1: Design & Interface (Weeks 1-2)
- Design ICompilerPhase interface
- Design TransformationPipeline orchestrator
- Design PhaseContext for shared data
- Create phase dependency resolution
Phase 2: Wrap Existing Phases (Weeks 3-6)
- Create phase wrappers for existing visitors
- Declare explicit dependencies
- Migrate from ApplyLanguageAnalysisPhases
- Keep existing behavior (no changes yet)
Phase 3: Enable Features (Weeks 7-8)
- Add phase-level caching
- Add skip/stop-after options
- Add AST dumping between phases
- Add phase timing instrumentation
Phase 4: Optimization (Weeks 9-10)
- Identify independent phases
- Implement parallel execution
- Optimize phase ordering
- Performance benchmarking
Acceptance Criteria
- [ ] All 18 phases wrapped as ICompilerPhase
- [ ] Dependencies explicitly declared
- [ ] Topological sorting works correctly
- [ ] Can skip arbitrary phases
- [ ] Can stop after specific phase
- [ ] Can dump AST between phases
- [ ] Phase-level timing available
- [ ] Tests for phase isolation
- [ ] Documentation for adding new phases
Architecture Diagram
┌─────────────────────────────────────┐
│ TransformationPipeline │
├─────────────────────────────────────┤
│ • RegisterPhase(ICompilerPhase) │
│ • Execute(ast, options) │
│ • TopologicalSort() │
│ • Cache management │
└─────────────────────────────────────┘
│
├─ phase1: TreeLinkage
│ └─ depends: []
├─ phase2: Builtins
│ └─ depends: [TreeLinkage]
├─ phase3: SymbolTable
│ └─ depends: [TreeLinkage, Builtins]
├─ phase4: TypeAnnotation
│ └─ depends: [SymbolTable]
└─ ... (14 more phases)
Example Usage
// Configure pipeline
var pipeline = new TransformationPipeline();
pipeline.RegisterPhase(new TreeLinkagePhase());
pipeline.RegisterPhase(new BuiltinInjectorPhase());
pipeline.RegisterPhase(new SymbolTablePhase());
// ... register all phases
// Execute with options
var result = pipeline.Execute(ast, new PipelineOptions
{
EnableCaching = true,
StopOnError = true,
DumpAfter = new[] { "SymbolTable", "TypeAnnotation" },
SkipPhases = new[] { "ExperimentalFeature" }
});
// Inspect results
Console.WriteLine($"Success: {result.Success}");
Console.WriteLine($"Diagnostics: {result.Diagnostics.Count}");
Console.WriteLine($"Phase timings: {string.Join(", ", result.PhaseTimings)}");
References
- Architectural Review:
docs/architectural-review-2025.md- Finding #5 - LLVM Pass Manager: https://llvm.org/docs/WritingAnLLVMPass.html
- GHC Pipeline: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/pipeline
- Related Issues: Enables #ISSUE-003 (Incremental Compilation), improves testing (#ISSUE-007)
Estimated Effort
10 weeks (2.5 months) - Weeks 1-2: Design and interface - Weeks 3-6: Wrap existing phases - Weeks 7-8: Enable features - Weeks 9-10: Optimization
Dependencies
- None (can be done independently)
Enables
- Issue #003: Incremental Compilation (phase-level caching)
- Issue #007: Better Testing (phase isolation)
- Future: Parallel compilation
- Future: Plugin architecture
Success Metrics
- All phases successfully wrapped
- Zero behavior changes from migration
- Phase isolation enables unit testing
- Pipeline configuration is flexible
- Performance neutral or improved