Re 从表头合理取出host

程序员文章站 2022-07-15 22:51:00

...

hd(re:split(Host, ":", [{return, list}]))

re:split 功能还是非常强大的

split(Subject, RE, Options) -> SplitList

Types:

Subject = iodata() | unicode:charlist()

RE = mp() | iodata() | unicode:charlist()

Options = [Option]

Option = anchored
       | global
       | notbol
       | noteol
       | notempty
       | {offset, integer() >= 0}
       | {newline, nl_spec()}
       | bsr_anycrlf
       | bsr_unicode
       | {return, ReturnType}
       | {parts, NumParts}
       | group
       | trim
       | CompileOpt

NumParts = integer() >= 0 | infinity

ReturnType = iodata | list | binary

CompileOpt = compile_option()

See compile/2 above.

SplitList = [RetData] | [GroupedRetData]

GroupedRetData = [RetData]

RetData = iodata() | unicode:charlist() | binary() | list()

This function splits the input into parts by finding tokens according to the regular expression supplied.

The splitting is done basically by running a global regexp match and dividing the initial string wherever a match occurs. The matching part of the string is removed from the output.

As in the re:run/3 function, an mp() compiled with the unicode option requires the Subject to be a Unicode charlist(). If compilation is done implicitly and the unicode compilation option is given to this function, both the regular expression and the Subject should be given as valid Unicode charlist()s.

The result is given as a list of "strings", the preferred datatype given in the return option (default iodata).

If subexpressions are given in the regular expression, the matching subexpressions are returned in the resulting list as well. An example:

    re:split("Erlang","[ln]",[{return,list}]).

will yield the result:

    ["Er","a","g"]

while

    re:split("Erlang","([ln])",[{return,list}]).

will yield

    ["Er","l","a","n","g"]

The text matching the subexpression (marked by the parentheses in the regexp) is inserted in the result list where it was found. In effect this means that concatenating the result of a split where the whole regexp is a single subexpression (as in the example above) will always result in the original string.

As there is no matching subexpression for the last part in the example (the "g"), there is nothing inserted after that. To make the group of strings and the parts matching the subexpressions more obvious, one might use the group option, which groups together the part of the subject string with the parts matching the subexpressions when the string was split:

    re:split("Erlang","([ln])",[{return,list},group]).

gives:

    [["Er","l"],["a","n"],["g"]]

Here the regular expression matched first the "l", causing "Er" to be the first part in the result. When the regular expression matched, the (only) subexpression was bound to the "l", so the "l" is inserted in the group together with "Er". The next match is of the "n", making "a" the next part to be returned. Since the subexpression is bound to the substring "n" in this case, the "n" is inserted into this group. The last group consists of the rest of the string, as no more matches are found.

By default, all parts of the string, including the empty strings, are returned from the function. For example:

    re:split("Erlang","[lg]",[{return,list}]).

will return:

    ["Er","an",[]]

since the matching of the "g" in the end of the string leaves an empty rest which is also returned. This behaviour differs from the default behaviour of the split function in Perl, where empty strings at the end are by default removed. To get the "trimming" default behavior of Perl, specify trim as an option:

    re:split("Erlang","[lg]",[{return,list},trim]).

The result will be:

    ["Er","an"]

The "trim" option in effect says; "give me as many parts as possible except the empty ones", which might be useful in some circumstances. You can also specify how many parts you want, by specifying {parts,N}:

    re:split("Erlang","[lg]",[{return,list},{parts,2}]).

This will give:

    ["Er","ang"]

Note that the last part is "ang", not "an", as we only specified splitting into two parts, and the splitting stops when enough parts are given, which is why the result differs from that of trim.

More than three parts are not possible with this indata, so

    re:split("Erlang","[lg]",[{return,list},{parts,4}]).

will give the same result as the default, which is to be viewed as "an infinite number of parts".

Specifying 0 as the number of parts gives the same effect as the option trim. If subexpressions are captured, empty subexpression matches at the end are also stripped from the result if trim or {parts,0} is specified.

If you are familiar with Perl, the trim behaviour corresponds exactly to the Perl default, the {parts,N} where N is a positive integer corresponds exactly to the Perl behaviour with a positive numerical third parameter and the default behaviour of re:split/3 corresponds to that when the Perl routine is given a negative integer as the third parameter.

Summary of options not previously described for the re:run/3 function:

{return,ReturnType}

Specifies how the parts of the original string are presented in the result list. The possible types are:

iodata

The variant of iodata() that gives the least copying of data with the current implementation (often a binary, but don't depend on it).

binary

All parts returned as binaries.

list

All parts returned as lists of characters ("strings").

group

Groups together the part of the string with the parts of the string matching the subexpressions of the regexp.

The return value from the function will in this case be a list() of list()s. Each sublist begins with the string picked out of the subject string, followed by the parts matching each of the subexpressions in order of occurrence in the regular expression.

{parts,N}

Specifies the number of parts the subject string is to be split into.

The number of parts should be a positive integer for a specific maximum on the number of parts and infinity for the maximum number of parts possible (the default). Specifying {parts,0} gives as many parts as possible disregarding empty parts at the end, the same as specifying trim

trim

Specifies that empty parts at the end of the result list are to be disregarded. The same as specifying {parts,0}. This corresponds to the default behaviour of the split built in function in Perl.

上一篇：开发者需要提前了解产品需求和功能吗？开发者与产品需求

下一篇： MongDb group操作