JS Regex
May 22, 2020 by Jane

JS has search(), match() and replace() methods for strings that allow regular expressions. Let's take a look at how to work with regex!

 

* Notice that regex expressions are not surrounded by single/double quotes, instead are surrounded by backslashes i.e. \pattern\[modifier]

 

Modifiers: (*these modifiers can be combined - simply write them one after another, order is not important)

Case-insensitive - i

E.g. var x = 'Hello"; str.search(/hello/i) will return 0, but str.search("hello") will return -1 since h is capitalized.

 

Global search - g

Find all matches in the given string, do not stop after finding the first occurrence.

 

Multiline search - m

Match ^ beginning to beginning of a line (specified by \n or \r in the string), $ to end of line instead of string

 

Metacharacters and character classes:

In summary: placing characters and patterns next to each other implies AND, using pipe "|" to separate them implies OR. Placing a ^ in front of a pattern implies NOT. With these we can create logical AND/OR/NOT statements to filter out string we don't want or find the strings we want.

Find any character/digits - [matches]

E.g. Find any characters in e,d,c, use [edc],

E.g. 2. Find any number in 0-9 use [0-9]

E.g. 3 Find any characters in range a-c use [a-c]

E.g. 4 Find any characters in range d-g or numbers 2 -4 use [d-g2-4] 

 

Find any character/digits that's not the ones specified- [^matches]

E.g. Find any number not in range 0-4 use [^0-4]

 

Find any of these patterns - ( pattern1|pattern2|pattern3...)

E.g. Find any words that match "ok" or "all" use (ok|all)

E.g. 2. Find any that has digits 0-4 or letter a use ([0-4]|a)

 

Find any alphanumeric character - /w (shorthand for /[A-Za-z0-9_]+/)

Find any non-alphanumeric character - /W

 

Find any digit - /d 

This is equivalent to [0-9]

 

Find any non-digit - /D

This is equivalent to [^0-9]

 

Find any whitespace - /s

Find any non-whitespace - /S

 

Find blank and characters before or after space - /bCHARS or CHARS/b

 

Dot wildcard "."

means match any character so \.un\ will match fun, sun, pun, run etc..

 

Quantifiers:

?    means 0/1 (memory aid: 0/1 is binary binary or conditional (true/false), ternay operator uses question mark "?" )

  means 1+ (memory aid: 1+, one or more) This can be a bit tricky sometimes because if we want to match any alphanumeric we use /\w/, but if we want to match 1+ (2 alphanumerics at least) we use /\w+/. In a string like "hi, how are you", /\w/ will return [h,i,h,o,w,a,r,e,y,o,u] but /\w+/ will return [hi,how,are,you]

*    means any quantity (memory aid: select * means select any in SQL or * meaning wildcard)

^    means beginning of string (memory aid: start on a good note -- goes up -- the caret looks like an up arrow) * this is different from the NOT operator in the metacharacteristic as the caret goes inside the square brackets. [^pattern] means exclude this from results. ^[pattern] means matches the beginning of the string to the pattern. This doesn't have to be used with [], it can also be used simply like /^Cal/ - find Cal at the beginning of the string.

$    means end of string  (memory aid: one gets paid at the end of the day -- dollar sign means end) 

?=    (a.k.a. positive lookahead) means followed by (but doesn't have to be immediate)  E.g. [pattern1](?=[pattern2]) means return the string that matches pattern1 with another string matching patter2 after it some n chars away 

?!    (a.k.a. negative lookahead) is the opposite of ?= E.g. [pattern1](?![pattern2]) meaning return the string only if it matches pattern1 yet is not followed by pattern2 anywhere

{a,b}     indicates the min and max count of the occurrence, if b is not given it means a or more counts of the occurrence, if the comma is not included and simply {a} is provided it means look for exactly a number of occurrences.

([pattern])     Match groups - regex can also be used to extract information for further processing. For example you could use a pattern such as ^(IMG\d+\.png)$ to capture and extract full image filenames, but if you only wanted to capture the filename without the extension, you could use the pattern ^(IMG\d+)\.png$ which only captures the part before the period.

  • Capture groups can also be nested - you could extract both the filename and the picture number using the same pattern by writing an expression like ^(IMG(\d+))\.png$ (using a nested parenthesis to capture the digits).

([pattern])\n    (a.k.a. capture group), must be preceded by a pattern surrounded by parentheses. The number n refers to which capture group should be repeated. E.g. /(\w+)\s\1/;  means find two words that are the same, separated by a space. 

 

Common Regex Methods/Functions:

  • Test([string]) method:
    • A simple way to check if a regular expression match is found in a string is to use the test() method. Returns boolean
      • [regex].test([str]); E.g. myRegex.test(myString); 
  • Match([regex]) string method:
    • Tries to find the regex and return it, the result itself is an array of strings that match the regx, but it also has properties index and input
      • str.match([regex]) E.g. myString.match(myRegex)
      • if found the return value is a string, the retval.index is the index of the string in the input, and retval.input is the original string
  • replace([regex], [replace string]) method:
    • Finds the regex and replaces it with the replace string specified.
    • If using capture groups, the replace string can swap the order of the captured groups with $[order] as follows:
      • let str = "one two three";
      • let fixRegex = /(\w+)\s(\w+)\s(\w+)/; // this will capture "one", "two", "three" in the three capture groups 
      • let replaceText = "$3 $2 $1"; // this will reverse the order of the words
      • let result = str.replace(fixRegex, replaceText); // output will be "three two one"

 

Greedy and Lazy matches:

A greedy match finds the longest possible substring that matches the given regex. This is the default mode of regex matching.

A lazy match finds the shortest. To find the shortest use ? after the wildcard * or + to get the lazy match. 

E.g.

let a = "helllllllo";
let r = /hel*/;  will return "helllllll" , the longest match possible

if r = /hel*?/; the match will return "he", the shortest match possible

 

E.g. 2

let a = "helllllllo";

let r = /he.+/; will return "helllllllo" the whole string because it's greedy matching

r = /he.+?/; will return "hel" the shortest lazy match

 

For more advanced topics in regex:

https://www.w3schools.com/jsref/jsref_regexp_dot.asp