R for data Science（十）

程序员文章站 2022-03-01 15:43:38

...

承接上次继续学习stringr这个包，今天学习这几个函数str_detect， str_subset，str_extract，str_replace，str_split

1.str_detect

该函数适用于模式匹配，即变量里面到底有没有我们需要的匹配的字符,返回值为逻辑值T or F，同时可以结合使用sum，mean等基础函数

x <- c("apple", "banana", "pear")
str_detect(x, "e")
#> [1]  TRUE FALSE  TRUE
#很明显第二个不含字母e，返回的逻辑值为false
# 统计多少字母a开头的的单词
sum(str_detect(x, "^a"))
#> [1] 1
mean(str_detect(x, "^a"))
#> [1] 0.333

str_detect()的一个常见用法是选择与模式匹配的元素。您可以通过逻辑子设置或者使用更方便

str_subset来实现这一点

x[str_detect(x, "^a")]
[1] "apple"

或者

str_subset(x,"^a")
[1] "apple"
however, your strings will be one column of a data frame, and you’ll want to use filter instead

如果是数据框，我们就需要使用filter函数了

df <- tibble(
  word = words, 
  i = seq_along(word)
)
> df %>% 
+     filter(str_detect(words, "x$"))
# A tibble: 4 x 2
  word      i
  <chr> <int>
1 box     108
2 sex     747
3 six     772
4 tax     841

2.str_subset

是提取可以匹配到的原始数据内容，即提取的是第一个参数中含有可以匹配到的变量

colours <- c("red", "orange", "yellow", "green", "blue", "purple")
colour_match <- str_c(colours, collapse = "|")
has_colour <- str_subset(sentences, colour_match)
has_colour
[1] "Glue the sheet to the dark blue background." "Two blue fish swam in the tank."
[3] "The colt reared and threw the tall rider." "The wide road shimmered in the hot sun."

[5] "See the cat glaring at the scared mouse." "A wisp of cloud hung in the blue air."
[7] "Leaves turn brown and yellow in the fall." "He ordered peach pie with ice cream."

ok ，我们可以看到匹配到的是sentence的变量内容

3.str_extract

要提取匹配的实际文本

matches <- str_extract(has_colour, colour_match)
head(matches)
#> [1] "blue" "blue" "red"  "red"  "red"  "blue"
# 注意，str_extract()只提取第一个匹配，就是只提取句子中第一个可以匹配的值。我们可以很容易地看到，首先选择所有匹配大于1的句子
more <- sentences[str_count(sentences, colour_match) > 1]
str_extract(more, colour_match)
#> [1] "blue"   "green"  "orange"

4.str_replace

str_replace()和str_replace_all()允许用新字符串替换匹配项。最简单的用法是用固定的字符串替换模式

就是字符串替换

x <- c("apple", "pear", "banana")
str_replace(x, "[aeiou]", "-")
#> [1] "-pple"  "p-ar"   "b-nana"
str_replace_all(x, "[aeiou]", "-")
#> [1] "-ppl-"  "p--r"   "b-n-n-"

很明显默认情况下，每一项只匹配第一个匹配到的字母

使用str_replace_all()可以通过提供一个命名向量来执行多个替换

x <- c("1 house", "2 cars", "3 people")
str_replace_all(x, c("1" = "one", "2" = "two", "3" = "three"))
#> [1] "one house"    "two cars"     "three people"

5.str_split

使用str_split()将字符串拆分为多个部分。例如，我们可以把句子分成几个单词，但是返回的是一个列表

sentences %>%
  head(5) %>% 
  str_split(" ")
#> [[1]]
#> [1] "The"     "birch"   "canoe"   "slid"    "on"      "the"     "smooth" 
#> [8] "planks."
#> 
#> [[2]]
#> [1] "Glue"        "the"         "sheet"       "to"          "the"        
#> [6] "dark"        "blue"        "background."
#> 
#> [[3]]
#> [1] "It's"  "easy"  "to"    "tell"  "the"   "depth" "of"    "a"     "well."
#> 
#> [[4]]
#> [1] "These"   "days"    "a"       "chicken" "leg"     "is"      "a"      
#> [8] "rare"    "dish."  
#> 
#> [[5]]
#> [1] "Rice"   "is"     "often"  "served" "in"     "round"  "bowls."
"a|b|c|d" %>% 
  str_split("\\|") %>% 
  .[[1]]
#> [1] "a" "b" "c" "d"

否则，与返回列表的其他stringr函数一样，您可以使用simplify = TRUE来返回一个矩阵:

sentences %>%
  head(5) %>% 
  str_split(" ", simplify = TRUE)
#>      [,1]    [,2]    [,3]    [,4]      [,5]  [,6]    [,7]    
#> [1,] "The"   "birch" "canoe" "slid"    "on"  "the"   "smooth"
#> [2,] "Glue"  "the"   "sheet" "to"      "the" "dark"  "blue"  
#> [3,] "It's"  "easy"  "to"    "tell"    "the" "depth" "of"    
#> [4,] "These" "days"  "a"     "chicken" "leg" "is"    "a"     
#> [5,] "Rice"  "is"    "often" "served"  "in"  "round" "bowls."
#>      [,8]          [,9]   
#> [1,] "planks."     ""     
#> [2,] "background." ""     
#> [3,] "a"           "well."
#> [4,] "rare"        "dish."
#> [5,] ""

上一篇： HttpServletRequest.getInputStream()只能读取一次

下一篇： R for Data Science总结之——tibble

R for data Science（十）

承接上次继续学习stringr这个包，今天学习这几个函数str_detect， str_subset，str_extract，str_replace，str_split

1.str_detect

该函数适用于模式匹配，即变量里面到底有没有我们需要的匹配的字符,返回值为逻辑值T or F，同时可以结合使用sum，mean等基础函数

或者

如果是数据框，我们就需要使用filter函数了

2.str_subset

是提取可以匹配到的原始数据内容，即提取的是第一个参数中含有可以匹配到的变量

ok ，我们可以看到匹配到的是sentence的变量内容

3.str_extract

4.str_replace

str_replace()和str_replace_all()允许用新字符串替换匹配项。最简单的用法是用固定的字符串替换模式

就是字符串替换

很明显默认情况下，每一项只匹配第一个匹配到的字母

5.str_split

使用str_split()将字符串拆分为多个部分。例如，我们可以把句子分成几个单词，但是返回的是一个列表

R语言实现data.frame 分组计数、求和等

Oracle 11g R2 Backup Data Pump(数据泵)之expdp/impdp工具

华为发布全新MateBook D 14/15：升级AMD R7/十代酷睿i7

R语言逻辑回归、ROC曲线与十折交叉验证详解

R语言中矩阵matrix和数据框data.frame的使用详解

世界十大*摩托车雅马哈R1居榜首，第六最快速度深不可测

使用Spring Data R2DBC +Postgres实现增删改查功能

Win2008 r2 下修改mysql data目录的方法

Oracle Data Guard介绍（10g r2文档翻译）

Win2008 r2下修改mysql data目录的方法详细介绍

R for data Science（十）

承接上次继续学习stringr这个包，今天学习这几个函数str_detect， str_subset，str_extract，str_replace，str_split

1.str_detect

该函数适用于模式匹配，即变量里面到底有没有我们需要的匹配的字符,返回值为逻辑值T or F，同时可以结合使用sum，mean等基础函数

或者

如果是数据框，我们就需要使用filter函数了

2.str_subset

是提取可以匹配到的原始数据内容，即提取的是第一个参数中含有可以匹配到的变量

ok ，我们可以看到匹配到的是sentence的变量内容

3.str_extract

4.str_replace

str_replace()和str_replace_all()允许用新字符串替换匹配项。最简单的用法是用固定的字符串替换模式

就是字符串替换

很明显默认情况下，每一项只匹配第一个匹配到的字母

5.str_split

使用str_split()将字符串拆分为多个部分。例如，我们可以把句子分成几个单词，但是返回的是一个列表

R语言 实现data.frame 分组计数、求和等

Oracle 11g R2 Backup Data Pump(数据泵)之expdp/impdp工具

华为发布全新MateBook D 14/15：升级AMD R7/十代酷睿i7

R语言逻辑回归、ROC曲线与十折交叉验证详解

R语言中矩阵matrix和数据框data.frame的使用详解

世界十大*摩托车 雅马哈R1居榜首，第六最快速度深不可测

使用Spring Data R2DBC +Postgres实现增删改查功能

Win2008 r2 下修改mysql data目录的方法

Oracle Data Guard介绍（10g r2文档翻译）

Win2008 r2下修改mysql data目录的方法详细介绍

R语言实现data.frame 分组计数、求和等

世界十大*摩托车雅马哈R1居榜首，第六最快速度深不可测