Pattern Matching - Repetition Qualifiers
- affects the count to match from the preceding character or group
- Can have a variable or specific range
Repetition Qualifier Characters
- match 0 or more occurrences ? match 0 or 1 occurrence of the previous entity
- match 1 or more occurrences
Specific Repetition Qualifiers
{n}
# match exactly n occurrences
{n,}
# match at least n occurrences
{n,m}
# match between n and m occurrences
{,m}
# match between 0 and m occurrences
Repetition Example 1
\d*
match 0 or more consecutive digits
Pattern | String | result |
---|---|---|
\d* |
(empty string) |
match |
\d* |
8 |
match |
\d* |
1234 |
match |
\d* |
12345 |
match |
Repetition Example 2
\d?
match 0 or one consecutive digit
Pattern | String | result |
---|---|---|
\d? |
(empty string) |
match |
\d? |
8 |
match |
\d? |
1234 |
match |
\d? |
12345 |
match |
Repetition Example 3
\d+
match 1 or more consecutive digit
Pattern | String | result |
---|---|---|
\d+ |
(empty string) |
no match |
\d+ |
8 |
match |
\d+ |
1234 |
match |
\d+ |
12345 |
match |
Repetition Example 4
\d{5}
match any 5 consecutive digit
Pattern | String | result |
---|---|---|
\d{5} |
(empty string) |
no match |
\d{5} |
8 |
no match |
\d{5} |
1234 |
no match |
\d{5} |
12345 |
match |
Repetition Example 5
\d{1,4}
match 1 to 4 consecutive digit
Pattern | String | result |
---|---|---|
\d{1,4} |
(empty string) |
match |
\d{1,4} |
8 |
match |
\d{1,4} |
1234 |
match |
\d{1,4} |
12345 |
match |
Greedy Qualifiers
- Match as much text as possible (Greedy)
- Include all repetition qualifiers (*,+,?)
- may lead to unexpected results
- avoid using . together with greedy qualifiers (it matches pretyt much everything)
Greedy Example
import re
s = '<tag>data</tag>'
pattern = '<.*>'
match = re.search(pattern,s)
print(match.group(0))
# Output <tag>data</tag>
Handling Greedy Matching
- use alternaties to repetition qualifiers (to avoid greedy )
- to turn off greedy matching add the ? right after the qualifier (makes it non-greedy)
- Use specific or custom character classes instead of the dot (.)
Non-Greedy Matching
import re
s = '<tag>data</tag>'
pattern = '<.*?>'
match = re.search(pattern,s)
print(match.group(0))
# Output <tag>