博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
正则表达式
阅读量:6449 次
发布时间:2019-06-23

本文共 6958 字,大约阅读时间需要 23 分钟。

一、一些重要的定义:

literal

A literal is any character we use in a search or matching expression, for example, to find ind in windows the ind is a literal string - each character plays a part in the search, it is literally the string we want to find.

metacharacter

A metacharacter is one or more special characters that have a unique meaning and are NOT used as literals in the search expression, for example, the character ^ (circumflex or caret) is a metacharacter.

target string

This term describes the string that we will be searching, that is, the string in which we want to find our match or search pattern.

search expression

Most commonly called the regular expression. This term describes the search expression that we will be using to search our target string, that is, the pattern we use to find what we want.

escape sequence

An escape sequence is a way of indicating that we want to use one of our metacharacters as a literal. In a regular expression an escape sequence involves placing the metacharacter \ (backslash) in front of the metacharacter that we want to use as a literal, for example, if we want to find (s) in the target string window(s) then we use the search expression \(s\) and if we want to find  in the target string c:\\file then we would need to use the search expression \\\\file (each \ we want to search for as a literal (there are 2) is preceded by an escape sequence \).

escape character转义字符\ :

正则表达式语言由两种基本字符类型组成:原义(正常)文本字符和元字符。转义字符用来指示不是元字符。

metacharacter元字符,元字符就是在正则表达式中具有特殊意义的字符。

literal字面意思的,原意的

Extended Regular Expressions (EREs) will support Basic Regular Expressions (BREs are essentially a subset of EREs).

EREs支持BREs。

二、元字符以及其解释:

[ ]

Match anything inside the square brackets for ONE character position, once and only once. For example, [12] means match the target to 1 and if that does not match then match the target to 2 while [0123456789] means match to any character in the range 0 to 9.

匹配任意一个在方括号中的字符一次。

- The - (dash) inside square brackets is the 'range separator' and allows us to define a range, in our example above of [0123456789] we could rewrite it as [0-9].

You can define more than one range inside a list, for example, [0-9A-C] means check for 0 to 9 and A to C (but not a to c).

NOTE: To test for - inside brackets (as a literal) it must come first or last, that is, [-0-9] will test for - and 0 to 9.

在方括号中的破折号是用一个范围操作符并允许我们定义一个范围,当然你可以定义多个范围。

The ^ (circumflex or caret) inside square brackets negates the expression (we will see an alternate use for the circumflex/caret outside square brackets later), for example, [^Ff] means anything except upper or lower case F and [^a-z] means everything except lower case a to z.

脱字号 ['k?r?t] 在方括号中起到否定表达式的作用(我们可以看到一个交替使用对于把脱字符放在方括号外面)。

.

The . (period) means any character(s) in this position, for example, ton. will find tons, tone and tonn in tonneau but not wanton because it has no following character.

句号意思是匹配任何一个字符

?

The ? (question mark) matches when the preceding character occurs 0 or 1 times only, for example, colou?r will find both color (u is found 0 times) and colour (u is found 1 time).

问号,匹配前面的字符发生0次或者1次

*

The * (asterisk or star) matches when the preceding character occurs 0 or more times, for example, tre* will find tree (e is found 2 times) and tread (e is found 1 time) and trough (e is found 0 times and thus returns a match only on the tr).

星号,匹配前面的字符发生0次或者多次

+

The + (plus) matches when the preceding character occurs 1 or more times, for example, tre+ will find tree (e is found 2 times) and tread (e is found 1 time) but NOT trough (0 times).

加号,表示匹配前面的字符1次或者多次

{n}

Matches when the preceding character, or character range, occurs n times exactly, for example, to find a local phone number we could use [0-9]{3}-[0-9]{4} which would find any number of the form 123-4567. Value is enclosed in braces (curly brackets).

Note: The - (dash) in this case, because it is outside the square brackets, is a literal. Louise Rains writes to say that it is invalid to commence a NXX code (the 123) with a zero (which would be permitted in the expression above). In this case the expression [1-9][0-9]{2}-[0-9]{4} would be necessary to find a valid local phone number.

curly brackets花括号,精确匹配前面的字符或者范围。比如[0-9]{3}-[0-9]{4}可以匹配到123-4567,

{n,m}

Matches when the preceding character occurs at least n times but not more than m times, for example, ba{2,3}b will find baab and baaab but NOT bab or baaaab. Values are enclosed in braces (curly brackets).

大于等于n小于等于m次

{n,}

Matches when the preceding character occurs at least n times, for example, ba{2,}b will find 'baab', 'baaab' or 'baaaab' but NOT 'bab'. Values are enclosed in braces (curly brackets).

至少发生n次

$

The $ (dollar) means look only at the end of the target string, for example, fox$ will find a match in 'silver fox' since it appears at the end of the string but not in 'the fox jumped over the moon'.

dollar符号查找以目标字符串结尾的字符串。

()

The ( (open parenthesis) and ) (close parenthesis) may be used to group (or bind) parts of our search expression together. Officially this is called a subexpression (a.k.a. a submatch or group) and subexpressions may be nested to any depth. Parentheses (subexpresions) also capture the matched element into a variable that may be used as a backreference.  OR .

左圆括号和右圆括号或许用来把我们查询表达式的各个部分组合起来。官方叫做子表达式,子表达式可以被嵌套到任何深度。圆括号也捕捉匹配的元素到或许会用作后续引用的一个变量。

|

The | (vertical bar or pipe) is called alternation in techspeak and means find the left hand OR right values, for example, gr(a|e)y will find 'gray' or 'grey' and has the sense that - having found the literal characters 'gr' - if the first test is not valid (a) the second will be tried (e), if the first is valid the second will not be tried. Alternation can be nested within each expression, thus gr((a|e)|i)y will find 'gray', 'grey' and 'griy'.

竖线或者叫管道符,被叫做交替,意思是查找左边或者右边的值,比如gr(a|e)y,管道符两边只能一边有效。管道符可以被嵌套到任何地方。

三、Common Extensions and Abbreviations常见的扩展和缩写

Backslash Sequence转义序列

\d

Match any character in the range 0 - 9 (equivalent of POSIX [:digit:])

\D

Match any character NOT in the range 0 - 9 (equivalent of POSIX [^[:digit:]])

\s

Match any whitespace characters (space, tab etc.). (equivalent of POSIX [:space:] EXCEPT VT is not recognized)

\S

Match any character NOT whitespace (space, tab). (equivalent of POSIX [^[:space:]])

\w

Match any character in the range 0 - 9, A - Z, a - z and punctuation (equivalent of POSIX [:graph:])

\W

Match any character NOT the range 0 - 9, A - Z, a - z and punctuation (equivalent of POSIX [^[:graph:]])

Positional Abbreviations位置缩写

\b

Word boundary. Match any character(s) at the beginning (\bxx) and/or end (xx\b) of a word, thus \bton\b will find ton but not tons, but \bton will find tons.

\B

Not word boundary. Match any character(s) NOT at the beginning(\Bxx) and/or end (xx\B) of a word, thus \Bton\B will find wantons but not tons, but ton\B will find both wantons and tons.

本文转自chenzudao51CTO博客,原文链接: http://blog.51cto.com/victor2016/1871534,如需转载请自行联系原作者

你可能感兴趣的文章
memcache数据库和redis数据库的区别(理论)
查看>>
我的友情链接
查看>>
MyBatis+Spring结合
查看>>
shell实例-判断apache是否正常启动
查看>>
SharedPreferences存储复杂对象解决方案
查看>>
Office 365之SkyDrive Pro
查看>>
脑残式网络编程入门(二):我们在读写Socket时,究竟在读写什么?
查看>>
无缝滚动实现原理分析【公告栏】
查看>>
Java Web 高性能开发
查看>>
redis-cli 命令总结
查看>>
CentOS 4.4双网卡绑定,实现负载均衡
查看>>
GitHub页面使用方法
查看>>
Python爬虫综述(笔记)
查看>>
Scala之柯里化和隐式转换
查看>>
wmic命令
查看>>
Merge and BottomUpSort
查看>>
reids 安装记录
查看>>
获取androdmanifest里面的meta-data
查看>>
Centos 6.3编译安装nagios
查看>>
如何实现7*24小时灵活发布?阿里技术团队这么做
查看>>