readable-regexp
Last updated
Last updated
Regular Expressions - quick and concise, readable and composable.
Be explicit and extract common pieces
Compare a readable-regexp expression:
const num = capture.oneOf(
oneOrMore.digit, // integer
zeroOrMore.digit.exactly`.`.oneOrMore.digit // decimal
);
const regExp = match(num).exactly`,`.maybe` `.match(num).toRegExp(Flag.Global); // num is used twice here
With normal JS RegExp:
const regExp = /(\d+|\d*\.\d+), ?(\d+|\d*\.\d+)/g;
// we have to copy-paste the capture group
In a more complex use case, we can destructure the expression into manageable small parts:
const allowedChar = notCharIn`<>()[]\\\\` `.,;:@"` (whitespace);
const username =
oneOrMore.match(allowedChar)
.zeroOrMore(
exactly`.`
.oneOrMore.match(allowedChar)
);
const quotedString =
exactly`"`
.oneOrMore.char
.exactly`"`;
const ipv4Address =
exactly`[`
.repeat(1, 3).digit
.exactly`.`
.repeat(1, 3).digit
.exactly`.`
.repeat(1, 3).digit
.exactly`.`
.repeat(1, 3).digit
.exactly`]`;
const domainName =
oneOrMore(
oneOrMore.charIn`a-z` `A-Z` `0-9` `-`
.exactly`.`
)
.atLeast(2).charIn`a-z` `A-Z`;
const email =
lineStart
.capture.oneOf(username, quotedString)
.exactly`@`
.capture.oneOf(ipv4Address, domainName)
.lineEnd
.toRegExp();
This is far more readable and debuggable than the equivalent RegExp:
const email =
/^([^<>()[\]\\.,;:@"\s]+(?:\.[^<>()[\]\\.,;:@"\s]+)*|".+")@(\[\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\]|(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,})$/;
Multiple shorthands and syntax options
Without all the shorthands, an expression looks like this:
const regExp = exactly('[')
.captureAs('timestamp')(oneOrMore(not(charIn(']'))))
.exactly('] ')
.captureAs('category')(oneOrMore(word).exactly('-').oneOrMore(word))
.exactly(': ')
.captureAs('message')(oneOrMore(char))
.toRegExp('gm');
Whenever a function takes a single string literal, you can use a tagged template literal to remove the brackets:
const regExp = exactly`[`
.captureAs`timestamp`(oneOrMore(not(charIn`]`)))
.exactly`] `
.captureAs`category`(oneOrMore(word).exactly`-`.oneOrMore(word))
.exactly`: `
.captureAs`message`(oneOrMore(char))
.toRegExp`gm`;
When there is only one token in a quantifier or group, you can chain it with .
instead of using a bracket:
const regExp = exactly`[`
.captureAs`timestamp`.oneOrMore.not.charIn`]`
.exactly`] `
.captureAs`category`(oneOrMore.word.exactly`-`.oneOrMore.word)
.exactly`: `
.captureAs`message`.oneOrMore.char
.toRegExp`gm`;
There are shorthands for negating a character class or a lookaround:
const regExp = exactly`[`
.captureAs`timestamp`.oneOrMore.notCharIn`]`
.exactly`] `
.captureAs`category`(oneOrMore.word.exactly`-`.oneOrMore.word)
.exactly`: `
.captureAs`message`.oneOrMore.char
.toRegExp`gm`;
As you can see, most of the distracting brackets are gone, and you are left with a clean and concise expression.
Type check, auto-complete, and runtime safeguards
Some errors can be avoided just by writing in readable-regexp:
const o = 'Ȯ'; // 0x022e
const result1 = /\u22e/.test(n);
// false
const result2 = unicode`22e`.toRegExp().test(n);
// true
// '22e' is automatically fixed to be '\u022e'
Some errors can be caught by TypeScript at compile time:
// @ts-expect-error - k is not a valid flag
const regExp = char.toRegExp('gki');
Some can be caught at run time:
const result1 = /(foo)\2/.test('foofoo');
// false
const result2 = capture`foo`.ref(2).toRegExp().test('foofoo');
// Error: The following backreferences are not defined: 2