Published on October 24, 2024By DeveloperBreeze

Understanding Regular Expressions with `re` in Python

In Python, the re module is used for working with regular expressions (regex). Regular expressions are patterns that allow you to search, match, and manipulate strings. This tutorial explains how to use re.findall() to extract specific patterns from a string using regex.

We will break down how to write a regex pattern that matches numbers, including integers and floating-point numbers, with optional signs (+ or -). Then, we’ll provide multiple examples to show how it works in different scenarios.

Key Concepts

  1. Pattern Construction: The pattern r"[+-]?\d+(\.\d+)?" allows us to:
  • [+-]?: Match an optional + or - sign.
  • \d+: Match one or more digits.
  • (\.\d+)?: Match an optional decimal point followed by one or more digits (for floating-point numbers).
  1. re.findall(): This function searches for all occurrences of the pattern in the text and returns them as a list.

Example 1: Matching Positive and Negative Integers

Let's first create a pattern that matches both positive and negative integers, with an optional + or - sign.

import re

# Pattern to match integers with optional sign
pattern = r"[+-]?\d+"

# Example text containing integers
text = "The numbers are +42, -17, and 100."

matches = re.findall(pattern, text)
print(matches)  # Output: ['+42', '-17', '100']

Explanation:

  • The pattern r"[+-]?\d+" matches both positive and negative integers (+42 and -17), as well as unsigned integers (100).

Example 2: Matching Floating-Point Numbers

Next, we will use a pattern to match floating-point numbers, which may or may not have a decimal part.

import re

# Pattern to match floating-point numbers
pattern = r"[+-]?\d+(\.\d+)?"

# Example text containing floating-point numbers
text = "Temperatures range from -3.5 to +27.85 degrees."

matches = re.findall(pattern, text)
print(matches)  # Output: ['-3.5', '+27.85']

Explanation:

  • The pattern r"[+-]?\d+(\.\d+)?" matches numbers like -3.5 and +27.85. It can handle both signed and unsigned numbers with a decimal point.

Example 3: Matching Only Positive Numbers

You can modify the pattern to match only positive numbers by removing the optional negative sign.

import re

# Pattern to match only positive numbers (integers and floats)
pattern = r"\+?\d+(\.\d+)?"

# Example text containing positive numbers
text = "Prices increased by +5.75 and 12.99."

matches = re.findall(pattern, text)
print(matches)  # Output: ['+5.75', '12.99']

Explanation:

  • By using r"\+?\d+(\.\d+)?", the pattern only matches positive numbers, both with and without a leading +.

Example 4: Extracting Multiple Types of Numbers

Sometimes, you may want to extract both integers and floating-point numbers from a mixed string of text.

import re

# Pattern to match integers and floating-point numbers
pattern = r"[+-]?\d+(\.\d+)?"

# Example text with both integers and floating-point numbers
text = "The values are -20, 15.75, +7, and 3.1415."

matches = re.findall(pattern, text)
print(matches)  # Output: ['-20', '15.75', '+7', '3.1415']

Explanation:

  • This pattern finds both integer (-20, +7) and floating-point (15.75, 3.1415) numbers, handling both signed and unsigned numbers.

Example 5: Finding Numbers in a Complex String

In cases where the numbers are scattered across a complex string, you can still extract them easily using the same pattern.

import re

# Pattern to match signed and unsigned integers/floats
pattern = r"[+-]?\d+(\.\d+)?"

# Complex string with embedded numbers
text = "In 2021, the growth was +6.5%, compared to -3.14% in 2020."

matches = re.findall(pattern, text)
print(matches)  # Output: ['2021', '+6.5', '-3.14', '2020']

Explanation:

  • This pattern efficiently extracts all numeric values from a string, whether they are positive or negative, whole numbers or floating points.

Summary

In this tutorial, we explored various examples of how to use regular expressions in Python to extract signed and unsigned integers, floating-point numbers, and a mix of both. By modifying the regex pattern slightly, you can adjust it to match exactly what you need in your specific application.

Key Patterns:

  • r"[+-]?\d+": Matches signed or unsigned integers.
  • r"[+-]?\d+(\.\d+)?": Matches signed or unsigned floating-point numbers and integers.
  • r"\+?\d+(\.\d+)?": Matches only positive numbers (with or without a +).

These patterns, combined with the re.findall() function, allow you to efficiently extract numerical data from any string.

Comments

Please log in to leave a comment.

Continue Reading:

Validate Password Strength

Published on January 26, 2024

javascript

Python Regular Expressions (Regex) Cheatsheet

Published on August 03, 2024

python