What is Regex?
Broadly speaking, regex (or "regular expressions", as they're more properly known) is a method by which to find specific characters (or patterns of characters) within a body of text. This is a tremendously powerful tool for processing large amounts of text data - especially if you're looking for specific elements that you suspect may be present (email addresses, phone numbers, and credit card numbers are all very common use-cases). With a few simple tools, a dizzyingly flexible array of text extraction options are possible through regex! We'll walk through a few of the essentials to start understanding and applying regex to your next project.
The Basics
Yeah, okay - but what is it?
Regular expressions in JavaScript are RegExp
objects, and can be created either as a literal or through the use of a regexp constructor.
//The literal is the pattern between the forward slashes
let literalRegex = /pies?/;
//A literal can also contain flags; these are after the second slash
let literalRegexWithFlags = /pies?/igd;
//The constructor can be used with just a pattern, ala
let constructorRegex = new RegExp('pies?');
//and with the advent of ES6, flags can be included in the constuctor
//(without throwing a TypeError)
let constructorRegexWithFlags = new RegExp('pies?', 'igd');
In both cases, the object contains two important parts: the expression and the flags. The expression is the part that's getting matched; the flags modify the behavior of the expression. The flags can be used in any combination. The flags that I commonly find to be helpful are d
(includes match indices with your results), g
(global search), and i
(case insensitive searching) - but there are a few more, and MDN has excellent documentation for them (as usual).
Special characters
Most of the practical applications for this tool won't involve searching for a specific string, such as "Bob" or "not_real_email@gmail.com", but rather will be searching for patterns of characters. This is facilitated with a handful of different special characters.
.
The wildcard character (".") matches any character (except line breaks - though even this can be altered through the use of the optional s flag).
?
The question mark character (?) will allow for matches with optional characters. For example: let re1 = /pies?/
would match either "pie" or "pies" - the ? makes the 's' optional.
*
The Kleene star character allows for matching a character (or pattern) zero or more times. Example: let regTest = /smelly*/
could match either smell or smellyyyyyyyyyyyy.
+
The Kleene plus character allows for matching a character (or pattern) one or more times. Example: let regObj = /smelly+/
could match either smelly or smellyyyyyyyyyyyy.
{i}
Braces allow you to quantify the number of matches you'd like. Example: let threeOfAny = /.{3}/
would match any set of three characters (abc, 123, b4%, <>!, etc).
{i,}
Similar to using a single number to specify an exact quantity to match, this method will match i or more of the quantified character. Example: let fooTwoPlus = /foo{2,}
will match 'foofoofoo' or 'foofoofoofoo', but not 'foo'.
{i,j}
You might see where this one is going - this specifies a range for viable matches that exists between i and j. Example: let catButt = /(cat|butt){2,3}/
will match 'catcat', 'catbutt', 'buttcat', 'buttbutt', 'catcatcat', 'catcatbutt', 'catbuttcat',... but not 'cat', 'butt', or 'catcatcatcat', etc.
^
The carat character matches the beginning of a string. Note: this means the entirety of the string against which the regex is being compared, NOT word boundaries. Example: let startCap = /^pies?/i;
will match the pie
in "Pie shops need to take marketing lessons from muffin shops,", but NOT the pie
in "What's the coolest pie shop you've ever heard about? Right? They're just not hip."
$
The dollar character matches the end of a string. Just like with the carat, this is with regards to the entirety of the string against whith the regex is being compared, NOT word boundaries. Example: let endCap = /pies?$/i
would capture the pie
in "West Coast Pies", but not any of the pie(s) in "West Coast Pies are the best pies! Try their 'Sur-pies' of the day."
\
The escape character is used for "escaping" the special properties of a character, allowing them to be used literally. This is essential for capturing characters such as periods, pipes, asterisks, question marks, and forward or backward slashes, to name a few examples. Example: let escapeCap = /pies\?/
would match "pies?" but not "pies" or "pie?" or "pie".
\b
The word boundary special character will capture word boundaries, and is surprisingly overlooked! Example: let boundaryCap = /\bpies?/
will capture the pie
in "Where's the pie?" but not the pie in "Apple-pie? Is it supposed to be hyphenated? English is hard."
[]
This a character class, and it serves to match any of the characters in the group. Example: let groupCap = /[cbr]at/
will match cat, bat, or rat, but not brat or mat or sat. Character classes can be modified with Kleene Stars, Kleene Plusses, ?, or bracket quantifiers ({}) to modify their behavior.
\w
This matches any basic alphanumeric character; essentially equivalent to [a-zA-Z0-9_].
\d
This matches any number character; essentially equivalent to [0-9].
\s
This matches any white space; this includes spaces, tabs, line breaks, etc.
Mix and match!
Those are all the basic ingredients to start creating useful regex matches - thankfully, the result is far more exciting than the sum of the parts, in this case. With a bit of practice combining the elements in different ways, the possibilities start becoming easier to imagine rather quickly. Searching for email addresses can be as simple as let emailCap = /\b(/w+)@(\w+).com\b/
, phone numbers can be parsed out with a capture similar to let phoneCap = /\b(\d{3})?[\s\.-](\d{3})[\s\.-](\d{4})\b/
, which would match any phone number in the form of "(boundary)(optional three digit area code)(space or literal period or hyphen)(any three digits)(space or literal period or hyphen)(any four digits)(boundary)".
All of this is really just scratching the surface of the utility that Regex provides! I hope that this gives you a decent jumping off point to start playing with regular expressions and to start seeing how they might help you with your next text parsing task. Good luck!
Further reading
- MDN Docs - the gold standard for documentation, and an incredible resource for digging much deeper into how Regex can be utilized with JavaScript.
- Regex Tester - a great utility for testing your captures.
- Regex Crossword - a fun way to learn Regex; different difficulty levels let you learn quickly and painlessly.
- RegexOne - a clean, structured introduction to Regex basics.