Python中re模块的元字符怎么使用

发布时间：2022-04-07 10:47:57 作者：iii
来源：亿速云阅读：198

Python中re模块的元字符怎么使用

在Python中，re模块是用于处理正则表达式的标准库。正则表达式是一种强大的工具，用于匹配、搜索和替换文本中的模式。re模块中的元字符是正则表达式的核心组成部分，它们具有特殊的含义，用于定义复杂的匹配规则。本文将详细介绍Python中re模块的元字符及其使用方法。

1. 常用元字符

1.1 `.`（点号）

. 元字符匹配除换行符 \n 之外的任何单个字符。

import re

text = "cat, bat, hat, rat"
pattern = r"c.t"
matches = re.findall(pattern, text)
print(matches)  # 输出: ['cat']

1.2 `^`（脱字符）

^ 元字符用于匹配字符串的开头。

text = "hello world"
pattern = r"^hello"
match = re.match(pattern, text)
if match:
    print("匹配成功")  # 输出: 匹配成功

1.3 `$`（美元符号）

$ 元字符用于匹配字符串的结尾。

text = "hello world"
pattern = r"world$"
match = re.search(pattern, text)
if match:
    print("匹配成功")  # 输出: 匹配成功

1.4 `*`（星号）

* 元字符用于匹配前面的字符零次或多次。

text = "baaaat"
pattern = r"ba*t"
match = re.match(pattern, text)
if match:
    print("匹配成功")  # 输出: 匹配成功

1.5 `+`（加号）

+ 元字符用于匹配前面的字符一次或多次。

text = "baaaat"
pattern = r"ba+t"
match = re.match(pattern, text)
if match:
    print("匹配成功")  # 输出: 匹配成功

1.6 `?`（问号）

? 元字符用于匹配前面的字符零次或一次。

text = "bat"
pattern = r"ba?t"
match = re.match(pattern, text)
if match:
    print("匹配成功")  # 输出: 匹配成功

1.7 `{}`（花括号）

{} 元字符用于指定前面的字符匹配的次数。

text = "baaaat"
pattern = r"ba{3}t"
match = re.match(pattern, text)
if match:
    print("匹配成功")  # 输出: 匹配成功

1.8 `[]`（方括号）

[] 元字符用于匹配方括号内的任意一个字符。

text = "cat, bat, hat, rat"
pattern = r"[bcr]at"
matches = re.findall(pattern, text)
print(matches)  # 输出: ['bat', 'cat', 'rat']

1.9 `|`（竖线）

| 元字符用于匹配多个模式中的任意一个。

text = "cat, bat, hat, rat"
pattern = r"cat|bat"
matches = re.findall(pattern, text)
print(matches)  # 输出: ['cat', 'bat']

1.10 `()`（圆括号）

() 元字符用于分组，可以将多个字符组合在一起，整体进行匹配。

text = "cat, bat, hat, rat"
pattern = r"(c|b|r)at"
matches = re.findall(pattern, text)
print(matches)  # 输出: ['c', 'b', 'r']

2. 转义字符

在正则表达式中，某些字符具有特殊含义，如 .、*、+ 等。如果要匹配这些字符本身，需要使用反斜杠 \ 进行转义。

text = "3.14"
pattern = r"3\.14"
match = re.match(pattern, text)
if match:
    print("匹配成功")  # 输出: 匹配成功

3. 预定义字符集

3.1 `\d`

\d 匹配任意数字字符，等价于 [0-9]。

text = "123abc"
pattern = r"\d+"
matches = re.findall(pattern, text)
print(matches)  # 输出: ['123']

3.2 `\D`

\D 匹配任意非数字字符，等价于 [^0-9]。

text = "123abc"
pattern = r"\D+"
matches = re.findall(pattern, text)
print(matches)  # 输出: ['abc']

3.3 `\w`

\w 匹配任意字母、数字或下划线字符，等价于 [a-zA-Z0-9_]。

text = "hello_world123"
pattern = r"\w+"
matches = re.findall(pattern, text)
print(matches)  # 输出: ['hello_world123']

3.4 `\W`

\W 匹配任意非字母、数字或下划线字符，等价于 [^a-zA-Z0-9_]。

text = "hello_world123!"
pattern = r"\W+"
matches = re.findall(pattern, text)
print(matches)  # 输出: ['!']

3.5 `\s`

\s 匹配任意空白字符，包括空格、制表符、换行符等。

text = "hello world\n"
pattern = r"\s+"
matches = re.findall(pattern, text)
print(matches)  # 输出: [' ', '\n']

3.6 `\S`

\S 匹配任意非空白字符。

text = "hello world\n"
pattern = r"\S+"
matches = re.findall(pattern, text)
print(matches)  # 输出: ['hello', 'world']

4. 贪婪与非贪婪匹配

正则表达式默认是贪婪匹配，即尽可能多地匹配字符。可以在量词后面加上 ? 来实现非贪婪匹配。

text = "<html><head><title>Title</title></head></html>"
pattern_greedy = r"<.*>"
pattern_non_greedy = r"<.*?>"

matches_greedy = re.findall(pattern_greedy, text)
matches_non_greedy = re.findall(pattern_non_greedy, text)

print(matches_greedy)  # 输出: ['<html><head><title>Title</title></head></html>']
print(matches_non_greedy)  # 输出: ['<html>', '<head>', '<title>', '</title>', '</head>', '</html>']

5. 总结

re 模块中的元字符是正则表达式的核心，掌握它们的使用方法对于处理复杂的文本匹配任务至关重要。通过本文的介绍，你应该已经了解了常用的元字符及其用法。在实际应用中，可以根据具体需求灵活组合这些元字符，构建出强大的正则表达式。

Python中re模块的元字符怎么使用

Python中re模块的元字符怎么使用

1. 常用元字符

1.1 .（点号）

1.2 ^（脱字符）

1.3 $（美元符号）

1.4 *（星号）

1.5 +（加号）

1.6 ?（问号）

1.7 {}（花括号）

1.8 []（方括号）

1.9 |（竖线）

1.10 ()（圆括号）

2. 转义字符

3. 预定义字符集

3.1 \d

3.2 \D

3.3 \w

3.4 \W

3.5 \s

3.6 \S

4. 贪婪与非贪婪匹配

5. 总结

相关阅读

1.1 `.`（点号）

1.2 `^`（脱字符）

1.3 `$`（美元符号）

1.4 `*`（星号）

1.5 `+`（加号）

1.6 `?`（问号）

1.7 `{}`（花括号）

1.8 `[]`（方括号）

1.9 `|`（竖线）

1.10 `()`（圆括号）

3.1 `\d`

3.2 `\D`

3.3 `\w`

3.4 `\W`

3.5 `\s`

3.6 `\S`