欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

Laravel个人博客集成Elasticsearch和ik分词

程序员文章站 2022-05-08 11:04:09
在之前的博客中,写了一篇用laravel5.5和vue写的个人博客。GitHub地址为:https://github.com/Johnson19900110/phpJourney。最近有空,就想着把Elasticsearch集成了进来。 因为博主比较懒,在博客园写博客,所以个人博客就没有同步了,因此 ......

在之前的博客中,写了一篇用laravel5.5和vue写的。GitHub地址为:。最近有空,就想着把Elasticsearch集成了进来。

因为博主比较懒,在博客园写博客,所以个人博客就没有同步了,因此就用php的一个爬虫库 fabpot/goutte 把自己博客园文章爬到了自己博客上。

Laravel个人博客集成Elasticsearch和ik分词

代码如下:

<?php
namespace App\Libraries;

use App\Post;
use Goutte\CLient;
use Symfony\Component\DomCrawler\Crawler;

class CnblogsPostSpider {

    protected $client;

    protected $crawler;

    protected $urls = [];

    public function __construct(Client $client, $url)
    {
        $this->client = $client;
        $this->crawler = $client->request('GET', $url);
    }

    public function getUrls()
    {
        $urls = $this->crawler->filter('.postTitle > a')->each(function ($node) {
            return $node->attr('href');
        });

        foreach ($urls as $url) {
            $crawler = $this->client->request('GET', $url);

            $cnBlogId = $this->getCnBlogId($url);

            $post = new Post();
            if($post->where('cnblogs_id', $cnBlogId)->count()) {
                // 已爬过该博客,只更新阅读和评论数
                $post->where('cnblogs_id', $cnBlogId)->update([
                    'views'         => $this->getViews($crawler),
                    'comments'      => $this->getComments($crawler),
                ]);
            }else {
                $post->insert([
                    'title'         => $this->getTitle($crawler),
                    'category_id'   => 1,
                    'content'       => $this->getContent($crawler),
                    'user_id'       => 1,
                    'views'         => $this->getViews($crawler),
                    'comments'      => $this->getComments($crawler),
                    'cnblogs_id'    => $cnBlogId,
                    'cnblogs_url'   => $url,
                    'created_at'    => $this->getCreatedAt($crawler),
                ]);
            }
        }
    }

    public function getCnBlogId($url)
    {
        $url_arr = explode('/', $url);
        $last = array_pop($url_arr);
        $path_arr = explode('.', $last);
        return intval(array_shift($path_arr));
    }

    protected function getTitle(Crawler $crawler)
    {
        return trim($crawler->filter('.postTitle > a')->text());
    }

    protected function getContent(Crawler $crawler)
    {
        return trim($crawler->filter('#cnblogs_post_body')->text());
    }

    protected function getViews(Crawler $crawler)
    {
        return intval(trim($crawler->filter('#post_view_count')->text()));
    }

    protected function getComments(Crawler $crawler)
    {
        return intval($crawler->filter('#post_comment_count')->text());
    }

    protected function getCreatedAt(Crawler $crawler)
    {
        return trim($crawler->filter('#post-date')->text());
    }
}

然后开始使用Laravel scout 集成ES:

首先,先下载ES包:

 composer require tamayo/laravel-scout-elastic 

这个包依赖 Laravel scout包,所以也就顺便装好了。

然后 publish config 和添加  ServiceProviders 。

这时候就可以装ES了。因为我们要使用中文分词 ik 插件,在安装ik插件的时候,如果我们自己取想办法安装会浪费你很多精力。

因为博主也是刚接触ES,所以我们直接使用现成的项目: 。

这个项目当前的版本是 Elasticsearch 5.1.1,当然ik 插件也就顺便装好了。

$ curl http://localhost:9200

{
  "name" : "Rkx3vzo",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "Ww9KIfqSRA-9qnmj1TcnHQ",
  "version" : {
    "number" : "5.1.1",
    "build_hash" : "5395e21",
    "build_date" : "2016-12-06T12:36:15.409Z",
    "build_snapshot" : false,
    "lucene_version" : "6.3.0"
  },
  "tagline" : "You Know, for Search"
}

当你出现这个界面,说明ES已经装好了。

这时候就可以创建一个 artisan 命令,来创建ES的index和template。

<?php

namespace App\Console\Commands;

use GuzzleHttp\Client;
use Illuminate\Console\Command;

class InitEs extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'es:init';

    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Init es to create index';

    /**
     * Create a new command instance.
     *
     * @return void
     */
    public function __construct()
    {
        parent::__construct();
    }

    /**
     * Execute the console command.
     *
     * @return mixed
     */
    public function handle()
    {
        //
        $client = new Client();
        $this->createTemplate($client);
        $this->createIndex($client);
    }

    public function createTemplate(Client $client)
    {
        $url = config('scout.elasticsearch.hosts')[0] . ':9200/' . '_template/rtf';
        $client->put($url, [
            'json' => [
                'template' => '*',
                'settings' => [
                    'number_of_shards' => 1
                ],
                'mappings' => [
                    '_default_' => [
                        '_all' => [
                            'enabled' => true
                        ],
                        'dynamic_templates' => [
                            [
                                'strings' => [
                                    'match_mapping_type' => 'string',
                                    'mapping' => [
                                        'type' => 'text',
                                        'analyzer' => 'ik_smart',
                                        'ignore_above' => 256,
                                        'fields' => [
                                            'keyword' => [
                                                'type' => 'keyword'
                                            ]
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ]
            ]
        ]);

    }

    public function createIndex(Client $client)
    {
        $url = config('scout.elasticsearch.hosts')[0] . ':9200/' . config('scout.elasticsearch.index');
        $client->put($url, [
            'json' => [
                'settings' => [
                    'refresh_interval' => '5s',
                    'number_of_shards' => 1,
                    'number_of_replicas' => 0,
                ],
                'mappings' => [
                    '_default_' => [
                        '_all' => [
                            'enabled' => false
                        ]
                    ]
                ]
            ]
        ]);
    }
}

因为 tamayo/laravel-scout-elastic 不带 highlight 功能,所以我们需要稍微修改一下。新建一个EsEngine继承ElasticsearchEngine类,然后重写几个方法即可。

<?php
/**
 * Created by PhpStorm.
 * User: johnson
 * Date: 2018/6/14
 * Time: 下午3:10
 */

namespace App\Libraries;


use Laravel\Scout\Builder;
use ScoutEngines\Elasticsearch\ElasticsearchEngine;
use Illuminate\Database\Eloquent\Collection;

class EsEngine extends ElasticsearchEngine
{
    public function search(Builder $builder)
    {
        return $this->performSearch($builder, array_filter([
            'numericFilters' => $this->filters($builder),
            'size' => $builder->limit,
        ]));
    }

    protected function performSearch(Builder $builder, array $options = [])
    {
        $params = [
            'index' => $this->index,
            'type' => $builder->model->searchableAs(),
            'body' => [
                'query' => [
                    'bool' => [
                        'must' => [
                            [
                                'query_string' => [
                                    'query' => "*{$builder->query}*",
                                ]
                            ]
                        ]
                    ]
                ],
            ]
        ];
        /**
         * 这里使用了 highlight 的配置
         */
        if ($builder->model->searchSettings
            && isset($builder->model->searchSettings['attributesToHighlight'])
        ) {
            $attributes = $builder->model->searchSettings['attributesToHighlight'];
            foreach ($attributes as $attribute) {
                $params['body']['highlight']['fields'][$attribute] = new \stdClass();
            }
        }

        if ($sort = $this->sort($builder)) {
            $params['body']['sort'] = $sort;
        }

        if (isset($options['from'])) {
            $params['body']['from'] = $options['from'];
        }

        if (isset($options['size'])) {
            $params['body']['size'] = $options['size'];
        }

        if (isset($options['numericFilters']) && count($options['numericFilters'])) {
            $params['body']['query']['bool']['must'] = array_merge($params['body']['query']['bool']['must'],
                $options['numericFilters']);
        }

        return $this->elastic->search($params);
    }

    public function map($results, $model)
    {
        if ($results['hits']['total'] === 0) {
            return Collection::make();
        }

        $keys = collect($results['hits']['hits'])
            ->pluck('_id')->values()->all();

        $models = $model->whereIn(
            $model->getKeyName(), $keys
        )->get()->keyBy($model->getKeyName());

        return collect($results['hits']['hits'])->map(function ($hit) use ($model, $models) {

            $one = $models[$hit['_id']];
            /**
             * 这里返回的数据,如果有 highlight,就把对应的  highlight 设置到对象上面
             */
            if (isset($hit['highlight'])) {
                $one->highlight = $hit['highlight'];
            }
            return $one;
        });
    }
}

我们这里要搜索的是博客,所以在Post模型中添加

  use Searchable;
  public $searchSettings = [ 'attributesToHighlight' => [ '*' ] ]; public $highlight = [];

然后在查询数据的时候使用scout的search方法即可。

public function search(Request $request)
    {
        $q = $request->get('q', false);

        $posts = [];
        if($q !== false) {
            $posts = Post::search($q)->paginate();
        }

        return view('index', compact('posts', 'q'));
    }

查询到的数据中,包含 highlight 属性。所以在模版中就可以这样用


@if(isset($post->highlight['content']))
@foreach($post->highlight['content'] as $item)
...{!! $item !!}...
@endforeach
@else
{{ empty($post->content) ? '...' : mb_substr($post->content, 0, 300) . '...' }}
@endif

最终的效果是这样滴

Laravel个人博客集成Elasticsearch和ik分词