yoeunes / regex-parser
A powerful PCRE regex parser with lexer, AST builder, validation, ReDoS analysis, and syntax highlighting. Zero dependencies, blazing fast, and production-ready.
Fund package maintenance!
yoeunes
Installs: 434
Dependents: 3
Suggesters: 0
Security: 0
Stars: 10
Watchers: 1
Forks: 2
Open Issues: 0
pkg:composer/yoeunes/regex-parser
Requires
- php: >=8.2
Requires (Dev)
- phpstan/phpstan: ^2.0
- phpstan/phpstan-phpunit: ^2.0
- phpunit/phpunit: ^11.0|^12.0
- psr/cache: ^3.0
- psr/log: ^3.0
- psr/simple-cache: ^3.0
- rector/rector: ^2.0
- symfony/config: ^7.4|^8.0
- symfony/console: ^7.4|^8.0
- symfony/dependency-injection: ^7.4|^8.0
- symfony/http-foundation: ^7.4|^8.0
- symfony/http-kernel: ^7.4|^8.0
- symfony/routing: ^7.4|^8.0
- symfony/validator: ^7.4|^8.0
Suggests
- phpstan/extension-installer: To automatically enable the PHPStan rule for regex validation.
- phpstan/phpstan: To run static analysis and detect invalid regex patterns.
- psr/cache: To share AST cache via PSR-6 pools.
- psr/simple-cache: To share AST cache via PSR-16 caches.
- rector/rector: To automatically refactor and optimize regex patterns.
- dev-main
- v0.14.13
- v0.14.12
- v0.14.11
- v0.14.10
- v0.14.9
- v0.14.8
- v0.14.7
- v0.14.6
- v0.14.5
- v0.14.4
- v0.14.3
- v0.14.2
- v0.14.1
- v0.14.0
- v0.13.0
- v0.12.0
- v0.11.0
- v0.10.0
- v0.9.0
- v0.8.0
- v0.7.0
- v0.6.0
- v0.5.1
- v0.5.0
- v0.4.0
- v0.3.0
- v0.2.0
- v0.1.10
- v0.1.9
- v0.1.8
- v0.1.7
- v0.1.6
- v0.1.5
- v0.1.4
- v0.1.3
- v0.1.2
- v0.1.1
- v0.1.0
- dev-dev
- dev-php-docs
- dev-refacto
- dev-update-phpstan-rector
- dev-pcre-compliance-tests
- dev-define
- dev-yoeunes-patch-5
- dev-fix-ci
- dev-yoeunes-patch-3
- dev-yoeunes-patch-4
- dev-yoeunes-patch-2
- dev-yoeunes-patch-1
This package is auto-updated.
Last update: 2025-12-10 12:30:12 UTC
README
Treat Regular Expressions as Code.
RegexParser transforms opaque PCRE strings into a structured Abstract Syntax Tree.
It brings static analysis, security auditing, and automated refactoring to PHP's most powerful yet misunderstood tool. Stop treating regexes as magic strings; start treating them as logic.
Core Capabilities
- Deep Parsing — Full support for advanced PCRE2 syntax including subroutines, conditionals, and recursion.
- Security Auditing — Detects Catastrophic Backtracking (ReDoS) risks and vulnerabilities at analysis time.
- Documentation — Automatically generates human-readable explanations, HTML visualizations, and valid sample strings.
- Transformation — Manipulate the AST to optimize or refactor patterns programmatically.
- Integration — First-class support for Symfony, PHPStan, and Rector workflows.
"Think of it as
nikic/php-parser— but for regexes."
Table of Contents
- Installation
- Quick Start
- Advanced Usage
- ReDoS Analysis
- Framework & Tooling Integration
- Performance & Caching
- API Overview
- Versioning & BC Policy
- Support the Project
- Contributing
- License
Installation
composer require yoeunes/regex-parser
Requires PHP 8.2+.
Quick Start
Validate a regex
“Is this regex even valid?”
use RegexParser\Regex; $regex = Regex::create(); // Full PCRE string: /pattern/flags $result = $regex->validate('/^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$/i'); if ($result->isValid()) { echo "OK ✅\n"; } else { echo "Invalid regex: ".$result->getErrorMessage()."\n"; }
There’s also a tolerant parser:
$tolerant = $regex->parseTolerant('/(unclosed('); if ($tolerant->hasErrors()) { foreach ($tolerant->errors as $error) { echo "Error: ".$error->getMessage()."\n"; } } // You still get a partial AST: $ast = $tolerant->ast;
Explain a regex
“What does this pattern actually do?”
use RegexParser\Regex; $regex = Regex::create(); echo $regex->explain('/^(?<user>[a-z0-9_]+)\.(?<domain>[a-z.]+)$/i');
Output example (simplified):
Start of string
Named group "user":
One or more of: letters, digits or underscore
Literal "."
Named group "domain":
One or more of: letters or dots
End of string
You can also generate HTML explanations for documentation or debug UIs:
$html = $regex->htmlExplain('/(foo|bar)+\d{2,4}/');
Check ReDoS safety
“Can this regex blow up my CPU?”
use RegexParser\Regex; use RegexParser\ReDoS\ReDoSSeverity; $regex = Regex::create(); $pattern = '/^(a+)+$/'; // classic catastrophic backtracking example $analysis = $regex->analyzeReDoS($pattern); echo "Severity: ".$analysis->severity->value.PHP_EOL; echo "Score: ".$analysis->score.PHP_EOL; if (!$analysis->isSafe()) { echo "Hotspot: ".($analysis->vulnerablePart ?? 'unknown').PHP_EOL; foreach ($analysis->recommendations as $recommendation) { echo "- ".$recommendation.PHP_EOL; } } // Quick boolean check (for CI, input validation, etc.) if (!$regex->isSafe($pattern, ReDoSSeverity::HIGH)) { throw new \RuntimeException('Regex is not safe enough for untrusted input.'); }
Under the hood it inspects quantifiers, nested groups, backreferences and character sets using a real AST, not just regex‑on‑regex strings.
Configuration / Options
Regex::create() accepts a small, validated option array (or a RegexOptions value object via RegexOptions::fromArray()):
max_pattern_length(int, default:Regex::DEFAULT_MAX_PATTERN_LENGTH).cache(null| path string |RegexParser\Cache\CacheInterface).redos_ignored_patterns(list of strings to skip in ReDoS analysis).
Unknown or invalid keys throw RegexParser\Exception\InvalidRegexOptionException.
Advanced Usage
Parsing bare patterns vs PCRE strings
Most high‑level methods (parse, validate, analyzeReDoS) expect a full PCRE string:
$ast = $regex->parse('/pattern/ims');
If you only have the body, parsePattern() will wrap delimiters/flags for you:
$ast = $regex->parsePattern('a|b', '#', 'i');
If you already have just the pattern body, you can go lower‑level:
use RegexParser\Lexer; use RegexParser\Parser; $lexer = new Lexer(); $parser = new Parser(); $stream = $lexer->tokenize('a|b'); $ast = $parser->parse($stream, flags: '', delimiter: '/', patternLength: strlen('a|b'));
Working with the AST
Every parsed regex becomes a tree of node objects under RegexParser\Node\*.
Example:
use RegexParser\Regex; use RegexParser\Node\AlternationNode; use RegexParser\Node\LiteralNode; $regex = Regex::create(); $ast = $regex->parse('/foo|bar/'); $pattern = $ast->pattern; if ($pattern instanceof AlternationNode) { foreach ($pattern->branches as $branch) { foreach ($branch->children as $child) { if ($child instanceof LiteralNode) { echo "Literal: ".$child->value.PHP_EOL; } } } }
Each node exposes:
startPosition/endPosition: byte offsets in the original pattern- Node‑specific properties (e.g.
QuantifierNode::$min,$max,$type)
Writing a custom AST visitor
For experts: the “right” way to analyse patterns is to implement your own visitor.
namespace App\Regex; use RegexParser\Node\LiteralNode; use RegexParser\Node\QuantifierNode; use RegexParser\Node\RegexNode; use RegexParser\Node\SequenceNode; use RegexParser\NodeVisitor\AbstractNodeVisitor; /** * @extends AbstractNodeVisitor<int> */ final class LiteralCountVisitor extends AbstractNodeVisitor { protected function defaultReturn(): int { return 0; } public function visitRegex(RegexNode $node): int { return $node->pattern->accept($this); } public function visitLiteral(LiteralNode $node): int { return 1; } // Aggregate over sequences and groups: public function visitSequence(SequenceNode $node): int { $sum = 0; foreach ($node->children as $child) { $sum += $child->accept($this); } return $sum; } // For nodes you don't care about, just recurse or return 0 public function visitQuantifier(QuantifierNode $node): int { return $node->node->accept($this); } }
Usage:
use App\Regex\LiteralCountVisitor; use RegexParser\Regex; $regex = Regex::create(); $ast = $regex->parse('/ab(c|d)+/'); $visitor = new LiteralCountVisitor(); $count = $ast->accept($visitor); // e.g. 3
Because NodeVisitorInterface is templated, static analysers can infer the return type (int here).
Optimizing and recompiling patterns
You can round‑trip a pattern through AST → optimizer → compiler:
use RegexParser\Regex; use RegexParser\NodeVisitor\OptimizerNodeVisitor; use RegexParser\NodeVisitor\CompilerNodeVisitor; $regex = Regex::create(); $ast = $regex->parse('/(a|a)/'); $optimizer = new OptimizerNodeVisitor(); $optimizedAst = $ast->accept($optimizer); $compiler = new CompilerNodeVisitor(); $optimizedPattern = $optimizedAst->accept($compiler); echo $optimizedPattern; // e.g. '/(a)/'
This makes it easy to implement automated refactorings (via Rector) or style rules for regexes.
Auto-Modernize Legacy Patterns
Clean up messy or legacy regexes automatically:
use RegexParser\Regex; $regex = Regex::create(); $modern = $regex->modernize('/[0-9]+\-[a-z]+\@(?:gmail)\.com/'); echo $modern; // Outputs: /\d+-[a-z]+@gmail\.com/
What it does:
- Converts
[0-9]→\d,[a-zA-Z0-9_]→\w,[\t\n\r\f\v]→\s - Removes unnecessary escaping (e.g.,
\@→@) - Modernizes backrefs (
\1→\g{1}) - Preserves exact behavior — no functional changes
Perfect for refactoring legacy codebases or cleaning up generated patterns.
Syntax Highlighting
Make complex regexes readable with automatic syntax highlighting:
use RegexParser\Regex; $regex = Regex::create(); // For console output echo $regex->highlightCli('/^[0-9]+(\w+)$/'); // Outputs: ^[0-9]+(\w+)$ with ANSI colors // For web display echo $regex->highlightHtml('/^[0-9]+(\w+)$/'); // Outputs: <span class="regex-anchor">^</span>[<span class="regex-type">\d</span>]+(<span class="regex-type">\w</span>+)$
Color Scheme:
- Meta-characters (
(,),|,[,]): Blue - Structure - Quantifiers (
*,+,?,{...}): Yellow - Repetition - Escapes/Types (
\d,\w,\n): Green - Special chars - Anchors/Assertions (
^,$,\b): Magenta - Boundaries - Literals: Default - Plain text
HTML output uses <span class="regex-*"> classes for easy styling.
ReDoS Analysis
What is ReDoS?
Regular Expression Denial of Service happens when a regex engine spends exponential time on certain inputs. This is particularly bad when patterns are applied to untrusted input (HTTP, user forms, logs, etc.).
Classic examples:
/(a+)+$/onaaaaaaaaaaaaaaaa!/^(a|a?)+$/on long strings
How RegexParser detects it
Instead of guessing from the pattern string, RegexParser:
-
Parses the pattern into an AST.
-
Walks the tree with
ReDoSProfileNodeVisitor:- Tracks unbounded quantifiers (
*,+,{m,}). - Detects nested unbounded quantifiers (star‑height).
- Looks at alternations to see if branches share characters.
- Follows backreferences and subroutines.
- Takes into account atomic groups, possessive quantifiers and PCRE control verbs (which can “shield” against backtracking).
- Tracks unbounded quantifiers (
-
Aggregates the findings into a
ReDoSAnalysis:-
Overall
severity(SAFE,LOW,MEDIUM,HIGH,CRITICAL,UNKNOWN). -
A list of
vulnerabilitieswith:- message,
- severity,
- position in pattern.
-
This is static analysis — it doesn’t execute the regex — so it’s safe to run in CI.
Severity levels
From lowest to highest:
SAFE— no dangerous constructs detected.LOW— theoretical issues, but unlikely to be exploited.UNKNOWN— analysis was inconclusive due to complex constructs.MEDIUM— potentially problematic in edge cases.HIGH— clear ReDoS risk; avoid on untrusted input.CRITICAL— classic catastrophic patterns (nested+/*etc.).
analyzeReDoS() returns a ReDoSAnalysis with the severity, score, vulnerable substring (if any), and recommendations. isSafe() simply calls analyzeReDoS() and returns true only for severities considered safe/low (or below the optional threshold you pass in).
You choose what to tolerate:
if (!$regex->isSafe($pattern, ReDoSSeverity::HIGH)) { // block, warn, or open a ticket }
Framework & Tooling Integration
Symfony
-
Symfony bridge provides:
- A console command to scan your app’s config for dangerous regexes.
- A cache warmer to pre‑parse and pre‑analyze patterns on deploy.
- Easy service wiring for
Regexin your DI container.
Example (pseudo‑code):
services: RegexParser\Regex: factory: ['RegexParser\Regex', 'create'] arguments: - { cache: '%kernel.cache_dir%/regex', max_pattern_length: 100000 }
PHPStan
-
PHPStan extension hooks into string arguments of functions like
preg_match,preg_replace, Symfony validators, etc. -
It can:
- Validate regex syntax at analysis time.
- Optionally report ReDoS risks as PHPStan errors or warnings.
Configuration is done via the provided extension.neon, with options such as:
parameters: regexParser: ignoreParseErrors: true reportRedos: true redosThreshold: 'high'
Rector
-
Rector rules can use RegexParser to:
- Replace dangerous patterns with safer equivalents.
- Normalize regex style across a codebase.
- Add inline comments explaining complex patterns.
Performance & Caching
RegexParser is designed for high‑scale applications:
- Lexer uses a single PCRE state machine with offsets, not repeated substrings.
- Parser and Lexer instances are reused across calls and properly reset.
- Optional cache (filesystem or PSR‑compatible) stores parsed ASTs and ReDoS analyses.
Example:
use RegexParser\Regex; $regex = Regex::create([ 'cache' => '/path/to/cache/dir', // or a PSR cache instance 'max_pattern_length' => 100_000, 'redos_ignored_patterns' => [ '/^([0-9]{4}-[0-9]{2}-[0-9]{2})$/', // known safe patterns ], ]);
For Symfony, a cache warmer can parse and analyze all known patterns at deploy time so runtime costs are minimal.
API Overview
Regex
final readonly class Regex { public static function create(array $options = []): self; public function parse(string $regex): Node\RegexNode; public function parsePattern(string $pattern, string $delimiter = '/', string $flags = ''): Node\RegexNode; public function parseTolerant(string $regex): TolerantParseResult; public function validate(string $regex): ValidationResult; public function dump(string $regex): string; public function explain(string $regex): string; public function htmlExplain(string $regex): string; public function extractLiterals(string $regex): LiteralSet; public function analyzeReDoS(string $regex): ReDoS\ReDoSAnalysis; public function isSafe(string $regex, ?ReDoS\ReDoSSeverity $threshold = null): bool; public function getLexer(): Lexer; public function getParser(): Parser; }
Return types like ValidationResult, LiteralSet, ReDoSAnalysis are small, well‑typed value objects.
Exceptions
Regex::create()throwsInvalidRegexOptionExceptionfor unknown/invalid options.parse()/parsePattern()can throwLexerException,SyntaxErrorException(syntax/structure),RecursionLimitException(too deep), andResourceLimitException(pattern too long).parseTolerant()wraps those errors intoTolerantParseResultinstead of throwing.validate()converts parser/lexer errors into aValidationResult(no exception on invalid input).analyzeReDoS()/isSafe()share the same parsing exceptions asparse();isSafe()is a boolean wrapper aroundanalyzeReDoS().
Generic runtime errors (e.g., wrong argument types) are not part of the stable API surface.
Versioning & BC Policy
RegexParser follows Semantic Versioning:
-
Stable for 1.x (API surface we commit to keep compatible):
- Public methods and signatures on
Regex. - Value objects:
ValidationResult,TolerantParseResult,LiteralSet,ReDoS\ReDoSAnalysis. - Main exception interfaces/classes:
RegexParserExceptionInterface, parser/lexer exceptions,InvalidRegexOptionException. - Supported option keys for
Regex::create()/RegexOptions.
- Public methods and signatures on
-
Best-effort, may evolve within 1.x:
- AST node classes and
NodeVisitorInterface(new node types/visit methods can be added). - Built-in visitors and analysis heuristics.
- AST node classes and
If you maintain custom visitors, plan to adjust them when new nodes appear. Breaking changes beyond this policy land in 2.0.0.
Known Limitations
While this library supports a comprehensive set of PCRE2 features, some highly specific or experimental features may not be fully supported yet. For example:
- Certain Perl-specific verbs not yet standardized in PCRE2.
- Advanced Unicode features beyond basic properties and escapes.
- Experimental or platform-specific extensions.
If you encounter an unsupported feature, please open an issue with a test case.
Support the Project
If RegexParser saves you time, you can help keep it moving:
- Star the repository on GitHub
- Share it with your team or community
- Report issues or suggest features
- Contribute code or documentation
- Sponsor the work or hire me for consulting 🤝
Contributing
Contributions are welcome! Areas where help is especially useful:
- New optimizations for the optimizer visitor.
- Additional ReDoS heuristics and exploit‑string generation.
- IDE integrations (PHPStorm plugin, etc.).
- More bridges (Laravel, Laminas, …).
Please run the full test suite before submitting a PR.
License
This library is released under the MIT License.
Further Reading
Made with ❤️ by Younes ENNAJI