Ruby - 正则表达式
正则表达式 是一种特殊的字符序列,可帮助您使用模式中保存的专用语法来匹配或查找其他字符串或字符串集。
正则表达式文字 是斜杠之间或任意定界符之间后跟 %r 的模式,如下所示 -
句法
/pattern/
/pattern/im # option can be specified
%r!/usr/local! # general delimited regular expression
例子
#!/usr/bin/ruby
line1 = "Cats are smarter than dogs";
line2 = "Dogs also like meat";
if ( line1 =~ /Cats(.*)/ )
puts "Line1 contains Cats"
end
if ( line2 =~ /Cats(.*)/ )
puts "Line2 contains Dogs"
end
这将产生以下结果 -
Line1 contains Cats
正则表达式修饰符
正则表达式文字可以包括可选的修饰符来控制匹配的各个方面。修饰符在第二个斜杠字符之后指定,如前所示,并且可以由以下字符之一表示 -
先生。
修改器和描述
1
我
匹配文本时忽略大小写。
2
哦
仅在第一次计算正则表达式文字时执行 #{} 插值一次。
3
X
忽略空格并允许在正则表达式中添加注释。
4
米
匹配多行,将换行符识别为普通字符。
5
你,e,s,n
将正则表达式解释为 Unicode (UTF-8)、EUC、SJIS 或 ASCII。如果未指定这些修饰符,则假定正则表达式使用源编码。
与用 %Q 分隔的字符串文字一样,Ruby 允许您以 %r 开始正则表达式,后跟您选择的分隔符。当您描述的模式包含许多您不想转义的正斜杠字符时,这非常有用 -
# Following matches a single slash character, no escape required
%r|/|
# Flag characters are allowed with this syntax, too
%r[</(.*)>]i
正则表达式模式
除了控制字符(+ ? . * ^ $ ( ) [ ] { } | \) 之外,所有字符都与自身匹配。您可以通过在控制字符前面加上反斜杠来转义控制字符。
下表列出了 Ruby 中可用的正则表达式语法。
先生。
图案及描述
1
^
匹配行首。
2
$
匹配行尾。
3
。
匹配除换行符之外的任何单个字符。使用 m 选项也可以匹配换行符。
4
[...]
Matches any single character in brackets.
5
[^...]
Matches any single character not in brackets
6
re*
Matches 0 or more occurrences of preceding expression.
7
re+
Matches 1 or more occurrence of preceding expression.
8
re?
Matches 0 or 1 occurrence of preceding expression.
9
re{ n}
Matches exactly n number of occurrences of preceding expression.
10
re{ n,}
Matches n or more occurrences of preceding expression.
11
re{ n, m}
Matches at least n and at most m occurrences of preceding expression.
12
a| b
Matches either a or b.
13
(re)
Groups regular expressions and remembers matched text.
14
(?imx)
Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected.
15
(?-imx)
Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected.
16
(?: re)
Groups regular expressions without remembering matched text.
17
(?imx: re)
Temporarily toggles on i, m, or x options within parentheses.
18
(?-imx: re)
Temporarily toggles off i, m, or x options within parentheses.
19
(?#...)
Comment.
20
(?= re)
Specifies position using a pattern. Doesn't have a range.
21
(?! re)
Specifies position using pattern negation. Doesn't have a range.
22
(?> re)
Matches independent pattern without backtracking.
23
\w
Matches word characters.
24
\W
Matches nonword characters.
25
\s
Matches whitespace. Equivalent to [\t\n\r\f].
26
\S
Matches nonwhitespace.
27
\d
Matches digits. Equivalent to [0-9].
28
\D
Matches nondigits.
29
\A
Matches beginning of string.
30
\Z
Matches end of string. If a newline exists, it matches just before newline.
31
\z
Matches end of string.
32
\G
Matches point where last match finished.
33
\b
Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
34
\B
Matches non-word boundaries.
35
\n, \t, etc.
Matches newlines, carriage returns, tabs, etc.
36
\1...\9
Matches nth grouped subexpression.
37
\10
Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code.
正则表达式示例
文字字符
Sr.No.
Example & Description
1
/ruby/
Matches "ruby".
2
¥
Matches Yen sign. Multibyte characters are supported in Ruby 1.9 and Ruby 1.8.
字符类
Sr.No.
Example & Description
1
/[Rr]uby/
Matches "Ruby" or "ruby".
2
/rub[ye]/
Matches "ruby" or "rube".
3
/[aeiou]/
Matches any one lowercase vowel.
4
/[0-9]/
Matches any digit; same as /[0123456789]/.
5
/[a-z]/
Matches any lowercase ASCII letter.
6
/[A-Z]/
Matches any uppercase ASCII letter.
7
/[a-zA-Z0-9]/
Matches any of the above.
8
/[^aeiou]/
Matches anything other than a lowercase vowel.
9
/[^0-9]/
Matches anything other than a digit.
特殊字符类
Sr.No.
Example & Description
1
/./
Matches any character except newline.
2
/./m
In multi-line mode, matches newline, too.
3
/\d/
Matches a digit: /[0-9]/.
4
/\D/
Matches a non-digit: /[^0-9]/.
5
/\s/
Matches a whitespace character: /[ \t\r\n\f]/.
6
/\S/
Matches non-whitespace: /[^ \t\r\n\f]/.
7
/\w/
Matches a single word character: /[A-Za-z0-9_]/.
8
/\W/
Matches a non-word character: /[^A-Za-z0-9_]/.
重复案例
Sr.No.
Example & Description
1
/ruby?/
Matches "rub" or "ruby": the y is optional.
2
/ruby*/
Matches "rub" plus 0 or more ys.
3
/ruby+/
Matches "rub" plus 1 or more ys.
4
/\d{3}/
Matches exactly 3 digits.
5
/\d{3,}/
Matches 3 or more digits.
6
/\d{3,5}/
Matches 3, 4, or 5 digits.
非贪婪重复
This matches the smallest number of repetitions −
Sr.No.
Example & Description
1
/<.*>/
Greedy repetition: matches "<ruby>perl>".
2
/<.*?>/
Non-greedy: matches "<ruby>" in "<ruby>perl>".
用括号分组
Sr.No.
Example & Description
1
/\D\d+/
No group: + repeats \d
2
/(\D\d)+/
Grouped: + repeats \D\d pair
3
/([Rr]uby(, )?)+/
Match "Ruby", "Ruby, ruby, ruby", etc.
向后参考
This matches a previously matched group again −
Sr.No.
Example & Description
1
/([Rr])uby&\1ails/
Matches ruby&rails or Ruby&Rails.
2
/(['"])(?:(?!\1).)*\1/
Single or double-quoted string. \1 matches whatever the 1st group matched . \2 matches whatever the 2nd group matched, etc.
备择方案
Sr.No.
Example & Description
1
/ruby|rube/
Matches "ruby" or "rube".
2
/rub(y|le))/
Matches "ruby" or "ruble".
3
/ruby(!+|\?)/
"ruby" followed by one or more ! or one ?
锚
It needs to specify match position.
Sr.No.
Example & Description
1
/^Ruby/
Matches "Ruby" at the start of a string or internal line.
2
/Ruby$/
Matches "Ruby" at the end of a string or line.
3
/\ARuby/
Matches "Ruby" at the start of a string.
4
/Ruby\Z/
Matches "Ruby" at the end of a string.
5
/\bRuby\b/
Matches "Ruby" at a word boundary.
6
/\brub\B/
\B is non-word boundary: matches "rub" in "rube" and "ruby" but not alone.
7
/Ruby(?=!)/
Matches "Ruby", if followed by an exclamation point.
8
/Ruby(?!!)/
Matches "Ruby", if not followed by an exclamation point.
带括号的特殊语法
Sr.No.
Example & Description
1
/R(?#comment)/
Matches "R". All the rest is a comment.
2
/R(?i)uby/
Case-insensitive while matching "uby".
3
/R(?i:uby)/
Same as above.
4
/rub(?:y|le))/
Group only without creating \1 backreference.
搜索和替换
使用正则表达式的一些最重要的 String 方法是sub 和gsub ,以及它们的就地变体sub! 和gsub! 。
所有这些方法都使用正则表达式模式执行搜索和替换操作。子与子 ! 替换第一次出现的模式和gsub & gsub! 替换所有出现的情况。
sub和gsub返回一个新字符串,保留原始字符串不变 , 其中sub! 和gsub! 修改调用它们的字符串。
以下是示例 -
#!/usr/bin/ruby
phone = "2004-959-559 #This is Phone Number"
# Delete Ruby-style comments
phone = phone.sub!(/#.*$/, "")
puts "Phone Num : #{phone}"
# Remove anything other than digits
phone = phone.gsub!(/\D/, "")
puts "Phone Num : #{phone}"
这将产生以下结果 -
Phone Num : 2004-959-559
Phone Num : 2004959559
以下是另一个例子 -
#!/usr/bin/ruby
text = "rails are rails, really good Ruby on Rails"
# Change "rails" to "Rails" throughout
text.gsub!("rails", "Rails")
# Capitalize the word "Rails" throughout
text.gsub!(/\brails\b/, "Rails")
puts "#{text}"
这将产生以下结果 -
Rails are Rails, really good Ruby on Rails