Pattern Matching - Repetition Qualifiers
- affects the count to match from the preceding character or group
- Can have a variable or specific range
Repetition Qualifier Characters
- match 0 or more occurrences ? match 0 or 1 occurrence of the previous entity
- match 1 or more occurrences
Specific Repetition Qualifiers
{n}
# match exactly n occurrences
{n,}
# match at least n occurrences
{n,m}
# match between n and m occurrences
{,m}
# match between 0 and m occurrences
Repetition Example 1
- \d*match 0 or more consecutive digits
| Pattern | String | result | 
|---|---|---|
| \d* | (empty string) | match | 
| \d* | 8 | match | 
| \d* | 1234 | match | 
| \d* | 12345 | match | 
Repetition Example 2
- \d?match 0 or one consecutive digit
| Pattern | String | result | 
|---|---|---|
| \d? | (empty string) | match | 
| \d? | 8 | match | 
| \d? | 1234 | match | 
| \d? | 12345 | match | 
Repetition Example 3
- \d+match 1 or more consecutive digit
| Pattern | String | result | 
|---|---|---|
| \d+ | (empty string) | no match | 
| \d+ | 8 | match | 
| \d+ | 1234 | match | 
| \d+ | 12345 | match | 
Repetition Example 4
- \d{5}match any 5 consecutive digit
| Pattern | String | result | 
|---|---|---|
| \d{5} | (empty string) | no match | 
| \d{5} | 8 | no match | 
| \d{5} | 1234 | no match | 
| \d{5} | 12345 | match | 
Repetition Example 5
- \d{1,4}match 1 to 4 consecutive digit
| Pattern | String | result | 
|---|---|---|
| \d{1,4} | (empty string) | match | 
| \d{1,4} | 8 | match | 
| \d{1,4} | 1234 | match | 
| \d{1,4} | 12345 | match | 
Greedy Qualifiers
- Match as much text as possible (Greedy)
- Include all repetition qualifiers (*,+,?)
- may lead to unexpected results
- avoid using . together with greedy qualifiers (it matches pretyt much everything)
Greedy Example
import re
s = '<tag>data</tag>'
pattern = '<.*>'
match = re.search(pattern,s)
print(match.group(0))
# Output <tag>data</tag>
Handling Greedy Matching
- use alternaties to repetition qualifiers (to avoid greedy )
- to turn off greedy matching add the ? right after the qualifier (makes it non-greedy)
- Use specific or custom character classes instead of the dot (.)
Non-Greedy Matching
import re
s = '<tag>data</tag>'
pattern = '<.*?>'
match = re.search(pattern,s)
print(match.group(0))
# Output <tag>