split regular-expressions python regex text-processing pattern-matching re-module search match findall
Python Regular Expressions Cheat Sheet
Basic Syntax
Every regular expression operation in Python requires importing the re
module:
import re
Special Characters
Character | Description |
---|---|
. | Matches any character except a newline. |
^ | Matches the start of a string. |
$ | Matches the end of a string. |
* | Matches 0 or more repetitions of the preceding pattern. |
+ | Matches 1 or more repetitions of the preceding pattern. |
? | Matches 0 or 1 repetition of the preceding pattern. |
{n} | Matches exactly n repetitions of the preceding pattern. |
{n,} | Matches n or more repetitions of the preceding pattern. |
{n,m} | Matches between n and m repetitions of the preceding pattern. |
[] | Matches any single character in brackets. |
[^] | Matches any single character not in brackets. |
\ | Escapes a special character. |
| | Matches either the pattern before or after the | . |
() | Groups patterns. |
Character Classes
Character | Description |
---|---|
\d | Matches any digit; equivalent to [0-9] . |
\D | Matches any non-digit; equivalent to [^0-9] . |
\w | Matches any word character (alphanumeric and underscore); equivalent to [a-zA-Z0-9_] . |
\W | Matches any non-word character; equivalent to [^a-zA-Z0-9_] . |
\s | Matches any whitespace character (space, tab, newline). |
\S | Matches any non-whitespace character. |
Common Patterns
Pattern | Description |
---|---|
r"\b" | Matches a word boundary. |
r"\B" | Matches a non-word boundary. |
r"\A" | Matches the start of a string. |
r"\Z" | Matches the end of a string. |
r"\G" | Matches the end of the previous match. |
r"\n" | Matches a newline character. |
r"\t" | Matches a tab character. |
re
Module Functions
Compiling Regular Expressions
re.compile(pattern, flags=0)
: Compiles a regex pattern for reuse.
pattern = re.compile(r'\d+')
Basic Functions
re.search(pattern, string, flags=0)
: Searches the string for the first location where the regex pattern produces a match, and returns a corresponding MatchObject
instance.
match = re.search(r'\d+', 'The price is 100 dollars')
if match:
print(match.group()) # Output: 100
re.match(pattern, string, flags=0)
: Determines if the regex pattern matches at the start of the string.
match = re.match(r'\d+', '123 apples')
if match:
print(match.group()) # Output: 123
re.fullmatch(pattern, string, flags=0)
: Checks if the entire string matches the regex pattern.
match = re.fullmatch(r'\d+', '12345')
if match:
print(match.group()) # Output: 12345
Finding All Matches
re.findall(pattern, string, flags=0)
: Returns all non-overlapping matches of the pattern in the string as a list of strings.
matches = re.findall(r'\d+', 'There are 12 apples and 5 oranges')
print(matches) # Output: ['12', '5']
re.finditer(pattern, string, flags=0)
: Returns an iterator yielding MatchObject
instances over all non-overlapping matches.
matches = re.finditer(r'\d+', 'There are 12 apples and 5 oranges')
for match in matches:
print(match.group()) # Output: 12 5
Substitution
re.sub(pattern, repl, string, count=0, flags=0)
: Returns the string obtained by replacing the leftmost non-overlapping occurrences of the pattern with the replacement string.
result = re.sub(r'\d+', '#', 'There are 12 apples and 5 oranges')
print(result) # Output: There are # apples and # oranges
re.subn(pattern, repl, string, count=0, flags=0)
: Returns a tuple containing the new string and the number of substitutions made.
result, num_subs = re.subn(r'\d+', '#', 'There are 12 apples and 5 oranges')
print(result, num_subs) # Output: There are # apples and # oranges 2
Splitting Strings
re.split(pattern, string, maxsplit=0, flags=0)
: Splits the string by occurrences of the pattern.
parts = re.split(r'\s+', 'Split this string by spaces')
print(parts) # Output: ['Split', 'this', 'string', 'by', 'spaces']
Flags
Flag | Description |
---|---|
re.IGNORECASE (re.I ) | Case-insensitive matching. |
re.MULTILINE (re.M ) | ^ and $ match the start and end of each line. |
< code>re.DOTALL ( | . matches any character, including newline. |
re.VERBOSE (re.X ) | Allows for more readable regex with comments and whitespace. |
MatchObject Methods
When a match is found, a MatchObject
is returned, providing several useful methods:
.group([group1, ...])
: Returns one or more subgroups of the match.
.start([group])
: Returns the starting position of the match.
.end([group])
: Returns the ending position of the match.
.span([group])
: Returns a tuple containing the start and end positions of the match.
match = re.search(r'(\d+)', 'The price is 100 dollars')
if match:
print(match.group()) # Output: 100
print(match.start()) # Output: 12
print(match.end()) # Output: 15
print(match.span()) # Output: (12, 15)
Examples
Validate an Email Address
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
email = 'example@example.com'
if re.match(email_pattern, email):
print('Valid email')
else:
print('Invalid email')
Extract Phone Numbers
text = "Contact me at 123-456-7890 or 987.654.3210"
phone_pattern = r'\d{3}[-.]\d{3}[-.]\d{4}'
phones = re.findall(phone_pattern, text)
print(phones) # Output: ['123-456-7890', '987.654.3210']
This cheat sheet provides a quick reference to the most common regex patterns and functions in Python. For more complex regex patterns and usage, consider exploring the [Python re
module documentation](https://docs.python.org/3/library/re.html).
Comments
Please log in to leave a comment.