Pattern Matching - Characters
Literal Characters
- will match the exact character in the string
- includes:
- digits (0-9)
- Uppercase and lowercase letters (A-Z, a-z)
- escaped characters
Escaped Characters
- there are few character the require escaping to use literally
- backlash () turns off the special meaning of a character
- Characters required to be preceded with a backslash()
- ^$+?*()\
Literal Examples
'Hello!'
'Goodbye\.' # escape the dot
'End of the line\.\n' # escape the dot, white space new line
'\$100\.00' # escape the dollar sign
'24\+36=60' # escape the + sign
'\\t means tab'
Character Classes
- Contains a range or series of characters
- Correspond to matching a single character
- Surrounded by square brackets
Defining Character Classes
- within square brackets []:
- we place one or more literal characters in it
- we can also place a character range using a start character, a dash and an ending character by specifying a starting character, a dash and an ending character Caret (^) [invert he character class by placing] at the beginning of indicates inverse matching
Character Classes Examples
[ABCD] # match only A, B, C or D
[aeiou] # match only a,e,i,o,u
[B-E] # match only B, C, D or E
[^B-E] # match anything but B, C, D or E
[a-z0-5] # match letter from a-z(lowercase) or number 0-5.
[^A-z] # match only non alphabet characters
Character Class Matching
screenshots
Character Classes Shorthand
- Shorthand to frequently used character classes:
- Alphanumeric, numeric and whitespace ranges
- Inverse of the above ranges
- The dot (.) character matches any character
Shorthand Character Classes
Shorthand | Equivalent | Description |
---|---|---|
. |
. |
match any single character except newline |
\w |
[A-z0-9_] |
alphanumeric and _ |
\W |
[^A-z0-9_] |
non - alphanumerics |
\d |
[0-9] |
digits |
\D |
[^0-9] |
non digits |
\s |
[ \b\f\n\r\t\v] |
whitespace |
\S |
[^ \b\f\n\r\t\v] |
non whitespace |
Shorthand Examples
Pattern | Description |
---|---|
... |
match any three characters |
\d\d |
match any two digits |
\w\w\w |
match any three alphanumeric characters |
\S |
match a single - non whitespace character |
[A-z]\W |
match one alphabet followed by a non white space |
\d\d\D\D |
match two digits followed by two non-digits |
\s\w\s |
match one alphanumeric characters surrounded by whitespace |
Shorthand Matching Examples
- three consecutive digits match
Pattern | String |
---|---|
[0-9][0-9][0-9] |
510 |
\d\d\d |
510 |
\d\d\d |
94704 |
\d\d\d |
x300 |
\d\d\d |
42 [no match] |