s9e / regexp-builder-command
Console command that generates a regexp from a list of strings.
Installs: 30
Dependents: 0
Suggesters: 0
Security: 0
Stars: 11
Watchers: 2
Forks: 1
Open Issues: 1
pkg:composer/s9e/regexp-builder-command
Requires
- php: >=8.1
- s9e/regexp-builder: ^2.0
- symfony/console: ^6.0
Requires (Dev)
- mikey179/vfsstream: ^1.6
- phpunit/phpunit: *
README
Synopsis
build-regexp is a command line tool that generates regular expressions that match a given set of strings.
Installation
There are two ways to use this command. You can download the latest release as a PHAR:
$ wget -q https://github.com/s9e/RegexpBuilderCommand/releases/latest/download/build-regexp.phar
$ chmod +x build-regexp.phar
$ ./build-regexp.phar --version
build-regexp 1.0.0
Or you can install the command as a Composer dependency:
$ composer -q require s9e/regexp-builder-command
$ vendor/bin/build-regexp --version
build-regexp 1.0.0
Usage
Strings can be specified either directly in the command invocation or via an input file. The following shell example shows how to pass them in the command invocation as a space-separated list:
$ ./build-regexp.phar foo bar baz
ba[rz]|foo
In the following example, we create a file with each value on its own line, then we pass the name of the file via the infile option:
$ echo -e "one\ntwo\nthree" > strings.txt
$ ./build-regexp.phar --infile strings.txt
one|t(?:hree|wo)
Alternatively, the list of strings can be passed as a JSON array:
$ echo '["foo","bar"]' > strings.json
$ ./build-regexp.phar --infile strings.json --infile-format json
bar|foo
By default, the result is output in the terminal directly. Alternatively, it can be saved to a file specified via the outfile option. In the following example, we save the result to a out.txt file before checking its content:
$ ./build-regexp.phar --outfile out.txt foo bar baz
$ cat out.txt
ba[rz]|foo
Presets
Several presets are available to generate regexps for different engines. They determine how the input is interpreted, and how/which characters are escaped in the output. The following presets are available:
pcreandpcre2escape non-printing characters and characters outside of low ASCII using PCRE's escape sequences\xhhand\x{hh..}. If theuflag is specified, the regexp operates on Unicode codepoints. Otherwise, it operates on bytes.javaandre2are functionally equivalent topcre2and always operate on Unicode codepoints.javascriptescapes non-printing characters and characters outside of low ASCII as\xhh,\uhhhh, and\u{hhhhh}. If theuflag is not present, characters outside the BMP are split into surrogate pairs.rawdoes not escape any literals. If theuflag is specified, the regexp operates on Unicode codepoints. Otherwise, it operates on bytes and is not guaranteed to produce a UTF-8 string.
The following examples show the results of a few different presets with the Unicode characters U+1F601 and U+1F602 as input.
$ ./build-regexp.phar --preset pcre "😁" "😂"
\xF0\x9F\x98[\x81\x82]
$ ./build-regexp.phar --preset javascript "😁" "😂"
\uD83D[\uDE01\uDE02]
$ ./build-regexp.phar --preset pcre --flags u "😁" "😂"
[\x{1F601}\x{1F602}]
$ ./build-regexp.phar --preset javascript --flags u "😁" "😂"
[\u{1F601}\u{1F602}]
Maintenance
To generate build-regexp.phar you'll need to download a recent release of box.phar and save it to the bin directory, then run composer build-phar.
See also
- https://github.com/s9e/RegexpBuilder - The library that powers this tool.
- https://github.com/devongovett/regexgen - Similar tool written in JavaScript.
- https://github.com/pemistahl/grex - Similar tool written in Rust.