欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

【ElasticSearch实战】

程序员文章站 2022-07-05 14:38:16
...

ElasticSearch简介(ElasticSearch6.0以上的版本)

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,能够达到实时、稳定、可
靠、快速搜索。在大量数据收集分析的时候,非常适合。

查看ElasticSearch版本

		http://es节点IP:9200/
		{
		    "name": "xx",
		    "cluster_name": "xxxx",
		    "cluster_uuid": "xxxxxxxxxxxxxxxxxx",
		    "version": {
		        "number": "6.2.4",
		        "build_hash": "ccec39f",
		        "build_date": "2018-04-12T20:37:28.497551Z",
		        "build_snapshot": false,
		        "lucene_version": "7.2.1",
		        "minimum_wire_compatibility_version": "5.6.0",
		        "minimum_index_compatibility_version": "5.0.0"
		    },
		    "tagline": "You Know, for Search"
		}

ElasticSearch核心概念

    近实时:
            写入数据到数据可以被搜索到有一个小延迟(大概1秒);基于es执行搜索和分析可以达到秒级。
    Cluster(集群):
            集群包含多个节点,每个节点属于哪个集群是通过一个配置(集群名称,默认是elasticsearch)来决定的
    Node(节点):
            节点默认是随机分配名称,默认节点会去加入一个名称为“elasticsearch”的集群
    Index(索引):
            索引包含一堆有相似结构的文档数据   相当于  数据库(DB)
    Type(类型):   
            索引里都可以有一个或多个type,一个type下的document,都有相同的field      相当于  数据库(DB)的表
    Document(文档):  
            文档是es中的最小数据单元,一个document可以是一条客户数据   相当于  数据库(DB)的表的行数据
    Field(字段-列):
    		Field是Elasticsearch的最小单位。一个document里面有多个field,每个field就是一个数据字段    相当表字段
    mapping(映射):
    		数据如何存放到索引对象上,需要有一个映射配置,包括:数据类型、是否存储、是否分词       相当于  数据库(DB)的表字段定义的约束

e.g:案例

client.indices.putMapping({
    index : 'blog',      //数据库名称
    type : 'article',     //表名称
    body : {
        article: {
            properties: {
                id: {                                  // 表字段
                    type: 'string',               //映射   表约束      数据类型
                    analyzer: 'ik',               //映射   表约束     是否分词     ik:ik分词
                    store: 'yes',                 //映射   表约束     是否存储
                },
                title: {                                      // 表字段
                    type: 'string',						//映射   表约束      数据类型
                    analyzer: 'ik',						//映射   表约束     是否分词    ik:ik分词
                    store: 'no',							//映射   表约束     是否存储
                },
                content: {                           // 表字段
                    type: 'string',				//映射   表约束      数据类型
                    analyzer: 'ik',				//映射   表约束      数据类型    ik:ik分词
                    store: 'yes',					//映射   表约束      数据类型
                }
            }
        }
    }
});

查看ES包含哪些index(数据库)

ES:
   http://ES节点ip:9200/_cat/indices?v      
Mysql:
  show databases

查看ES index(数据库)包含哪些Type(表)

ES:
   http://es-node:9200/索引名称/_mapping?pretty=true  
Mysql:
	use  数据库名
    show tables

查看ES shards(索引分片)

http://es-node:9200/_cat/shards
可以看到数据被分布在哪些片上

查看ES TYPE(表)定义

GET 索引名称/_mapping

group by(分组)

sql:
     select team, count(*) as player_count from player group by team;
 es:
     TermsAggregationBuilder interfaceIdTeam= AggregationBuilders.terms("teamKey").field("team.keyword");
      注:teamKey  随便起的分组名称,到时时候取值的时候会用到
          team.keyword:  分组的字段  team          keyword:分词  一般在分组的时候都要加上

group by多个field(注: 没有用过 )

sql:
		select team, position, count(*) as pos_count from player group by team, position;
es:
	TermsBuilder teamAgg= AggregationBuilders.terms("player_count ").field("team");
	TermsBuilder posAgg= AggregationBuilders.terms("pos_count").field("position");
	sbuilder.addAggregation(teamAgg.subAggregation(posAgg));

count/max/min/sum/avg(注: max/max/min/sum/avg 字段 必须为数据类型)

sql:
	select team, count(*) as player_count,max(age),min(age),sum(age),avg(age) from player group by team;
es:
	//分组
   		TermsAggregationBuilder team= AggregationBuilders.terms("teamKey").field("team.keyword");
   //count 
   		ValueCountAggregationBuilder valueCountAggregationBuilder = AggregationBuilders.count("count_age").field("age");
   //max 
  		 MaxAggregationBuilder maxAggregationBuilder = AggregationBuilders.max("max_age").field("age");
   //min
   		MinAggregationBuilder minAggregationBuilder = AggregationBuilders.min("max_age").field("age");
   //sum
   		SumAggregationBuilder sumAggregationBuilder = AggregationBuilders.sum("max_age").field("age");
   //avg
   AvgAggregationBuilder avgAggregationBuilder = AggregationBuilders.avg("max_age").field("age");
   // 添加到时分组里面去
		   team.subAggregation(valueCountAggregationBuilder);
		   team.subAggregation(maxAggregationBuilder);
		   team.subAggregation(minAggregationBuilder);
		   team.subAggregation(sumAggregationBuilder);
		   team.subAggregation(avgAggregationBuilder);

order by

sql:
	select team, sum(salary) as total_salary from player group by team order by total_salary desc;
ES:
	TermsAggregationBuilder team= AggregationBuilders.terms("teamKey").field("team.keyword");
	SumAggregationBuilder sumAggregationBuilder = AggregationBuilders.sum("total_salary ").field("salary");
	team.subAggregation(sumAggregationBuilder);
	team.order(BucketOrder.aggregation("total_salary ", false));
	

limit

sql:
		select team, sum(salary) as total_salary from player group by team order by total_salary desc  limit 1000;
ES:
	TermsAggregationBuilder team= AggregationBuilders.terms("teamKey").field("team.keyword");
	SumAggregationBuilder sumAggregationBuilder = AggregationBuilders.sum("total_salary ").field("salary");
	team.subAggregation(sumAggregationBuilder);
	team.order(BucketOrder.aggregation("total_salary ", false)).size(1000);;

where 条件

  1. 组合查询/多条件查询/布尔查询

        BoolQueryBuilder boolQuery = new BoolQueryBuilder();
      boolQuery().must();//文档必须完全匹配条件,相当于and
      boolQuery().mustNot();//文档必须不匹配条件,相当于not
      boolQuery().should();//至少满足一个条件,这个文档就符合should,相当于or
    
  2. 精确查询(必须完全匹配上)

    //不分词查询 参数1: 字段名,参数2:字段查询值,因为不分词,所以汉字只能查询一个字,英语是一个单词.
    QueryBuilder queryBuilder=QueryBuilders.termQuery("fieldName", "fieldlValue");
    //分词查询,采用默认的分词器
    QueryBuilder queryBuilder2 = QueryBuilders.matchQuery("fieldName", "fieldlValue");
    

    多个匹配

    //不分词查询,参数1: 字段名,参数2:多个字段查询值,因为不分词,所以汉字只能查询一个字,英语是一个单词.
    QueryBuilder queryBuilder=QueryBuilders.termsQuery("fieldName", "fieldlValue1","fieldlValue2...");
    //分词查询,采用默认的分词器
    QueryBuilder queryBuilder= QueryBuilders.multiMatchQuery("fieldlValue", "fieldName1", "fieldName2", "fieldName3");
    //匹配所有文件,相当于就没有设置查询条件
    QueryBuilder queryBuilder=QueryBuilders.matchAllQuery();
    
  3. 模糊查询(只要包含即可)

    //模糊查询常见的5个方法如下
    //1.常用的字符串查询
    QueryBuilders.queryStringQuery("fieldValue").field("fieldName");//左右模糊
    //2.常用的用于推荐相似内容的查询
    QueryBuilders.moreLikeThisQuery(new String[] {"fieldName"}).addLikeText("pipeidhua");//如果不指定filedName,则默认全部,常用在相似内容的推荐上
    //3.前缀查询  如果字段没分词,就匹配整个字段前缀
    QueryBuilders.prefixQuery("fieldName.keyWord","fieldValue");
    //4.fuzzy query:分词模糊查询,通过增加fuzziness模糊属性来查询,如能够匹配hotelName为tel前或后加一个字母的文档,fuzziness 的含义是检索的term 前后增加或减少n个单词的匹配查询
    QueryBuilders.fuzzyQuery("hotelName.keyWord", "tel").fuzziness(Fuzziness.ONE);
    //5.wildcard query:通配符查询,支持* 任意字符串;?任意一个字符
    QueryBuilders.wildcardQuery("fieldName.keyWord","ctr*");//前面是fieldname,后面是带匹配字符的字符串
    QueryBuilders.wildcardQuery("fieldName.keyWord ","c?r?");
    注: 在模糊查询的时候 在看下字段类型   有keyWord 要加上    不然不支持  中文以及大写字母模糊查询
    
  4. 范围查询

      //闭区间查询
     QueryBuilder queryBuilder0 = QueryBuilders.rangeQuery("fieldName").from("fieldValue1").to("fieldValue2");
     //开区间查询
     QueryBuilder queryBuilder1 = QueryBuilders.rangeQuery("fieldName").from("fieldValue1").to("fieldValue2").includeUpper(false).includeLower(false);//默认是true,也就是包含
     //大于
     QueryBuilder queryBuilder2 = QueryBuilders.rangeQuery("fieldName").gt("fieldValue");
     //大于等于
     QueryBuilder queryBuilder3 = QueryBuilders.rangeQuery("fieldName").gte("fieldValue");
     //小于
     QueryBuilder queryBuilder4 = QueryBuilders.rangeQuery("fieldName").lt("fieldValue");
     //小于等于
     QueryBuilder queryBuilder5 = QueryBuilders.rangeQuery("fieldName").lte("fieldValue");
       ```
    
  5. 聚合查询

    (1)统计某个字段的数量
        ValueCountBuilder vcb=  AggregationBuilders.count("count_uid").field("uid");
      (2)去重统计某个字段的数量(有少量误差)
       CardinalityBuilder cb= AggregationBuilders.cardinality("distinct_count_uid").field("uid");
      (3)聚合过滤
      FilterAggregationBuilder fab= AggregationBuilders.filter("uid_filter").filter(QueryBuilders.queryStringQuery("uid:001"));
      (4)按某个字段分组
      TermsBuilder tb=  AggregationBuilders.terms("group_name").field("name");
      (5)求和
      SumBuilder  sumBuilder=	AggregationBuilders.sum("sum_price").field("price");
      (6)求平均
      AvgBuilder ab= AggregationBuilders.avg("avg_price").field("price");
      (7)求最大值
      MaxBuilder mb= AggregationBuilders.max("max_price").field("price"); 
      (8)求最小值
      MinBuilder min=	AggregationBuilders.min("min_price").field("price");
      (9)按日期间隔分组
      DateHistogramBuilder dhb= AggregationBuilders.dateHistogram("dh").field("date");
      (10)获取聚合里面的结果
      TopHitsBuilder thb=  AggregationBuilders.topHits("top_result");
      (11)嵌套的聚合
      NestedBuilder nb= AggregationBuilders.nested("negsted_path").path("quests");
      (12)反转嵌套
      AggregationBuilders.reverseNested("res_negsted").path("kps "); 
    
相关标签: ES elastic