regular-expressions re-module python-regex refindall match-numbers-in-python extract-integers floating-point-numbers regex-patterns signed-numbers unsigned-numbers
Understanding Regular Expressions with `re` in Python
In Python, the re
module is used for working with regular expressions (regex). Regular expressions are patterns that allow you to search, match, and manipulate strings. This tutorial explains how to use re.findall()
to extract specific patterns from a string using regex.
We will break down how to write a regex pattern that matches numbers, including integers and floating-point numbers, with optional signs (+
or -
). Then, we’ll provide multiple examples to show how it works in different scenarios.
Key Concepts
- Pattern Construction: The pattern
r"[+-]?\d+(\.\d+)?"
allows us to:
[+-]?
: Match an optional+
or-
sign.\d+
: Match one or more digits.(\.\d+)?
: Match an optional decimal point followed by one or more digits (for floating-point numbers).
re.findall()
: This function searches for all occurrences of the pattern in the text and returns them as a list.
Example 1: Matching Positive and Negative Integers
Let's first create a pattern that matches both positive and negative integers, with an optional +
or -
sign.
import re
# Pattern to match integers with optional sign
pattern = r"[+-]?\d+"
# Example text containing integers
text = "The numbers are +42, -17, and 100."
matches = re.findall(pattern, text)
print(matches) # Output: ['+42', '-17', '100']
Explanation:
- The pattern
r"[+-]?\d+"
matches both positive and negative integers (+42
and-17
), as well as unsigned integers (100
).
Example 2: Matching Floating-Point Numbers
Next, we will use a pattern to match floating-point numbers, which may or may not have a decimal part.
import re
# Pattern to match floating-point numbers
pattern = r"[+-]?\d+(\.\d+)?"
# Example text containing floating-point numbers
text = "Temperatures range from -3.5 to +27.85 degrees."
matches = re.findall(pattern, text)
print(matches) # Output: ['-3.5', '+27.85']
Explanation:
- The pattern
r"[+-]?\d+(\.\d+)?"
matches numbers like-3.5
and+27.85
. It can handle both signed and unsigned numbers with a decimal point.
Example 3: Matching Only Positive Numbers
You can modify the pattern to match only positive numbers by removing the optional negative sign.
import re
# Pattern to match only positive numbers (integers and floats)
pattern = r"\+?\d+(\.\d+)?"
# Example text containing positive numbers
text = "Prices increased by +5.75 and 12.99."
matches = re.findall(pattern, text)
print(matches) # Output: ['+5.75', '12.99']
Explanation:
- By using
r"\+?\d+(\.\d+)?"
, the pattern only matches positive numbers, both with and without a leading+
.
Example 4: Extracting Multiple Types of Numbers
Sometimes, you may want to extract both integers and floating-point numbers from a mixed string of text.
import re
# Pattern to match integers and floating-point numbers
pattern = r"[+-]?\d+(\.\d+)?"
# Example text with both integers and floating-point numbers
text = "The values are -20, 15.75, +7, and 3.1415."
matches = re.findall(pattern, text)
print(matches) # Output: ['-20', '15.75', '+7', '3.1415']
Explanation:
- This pattern finds both integer (
-20
,+7
) and floating-point (15.75
,3.1415
) numbers, handling both signed and unsigned numbers.
Example 5: Finding Numbers in a Complex String
In cases where the numbers are scattered across a complex string, you can still extract them easily using the same pattern.
import re
# Pattern to match signed and unsigned integers/floats
pattern = r"[+-]?\d+(\.\d+)?"
# Complex string with embedded numbers
text = "In 2021, the growth was +6.5%, compared to -3.14% in 2020."
matches = re.findall(pattern, text)
print(matches) # Output: ['2021', '+6.5', '-3.14', '2020']
Explanation:
- This pattern efficiently extracts all numeric values from a string, whether they are positive or negative, whole numbers or floating points.
Summary
In this tutorial, we explored various examples of how to use regular expressions in Python to extract signed and unsigned integers, floating-point numbers, and a mix of both. By modifying the regex pattern slightly, you can adjust it to match exactly what you need in your specific application.
Key Patterns:
r"[+-]?\d+"
: Matches signed or unsigned integers.r"[+-]?\d+(\.\d+)?"
: Matches signed or unsigned floating-point numbers and integers.r"\+?\d+(\.\d+)?"
: Matches only positive numbers (with or without a+
).
These patterns, combined with the re.findall()
function, allow you to efficiently extract numerical data from any string.
Comments
Please log in to leave a comment.