欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

初学者MongoDB聚合概览(The Beginner’s Guide to MongoDB Aggregation)

程序员文章站 2022-06-15 16:10:47
...

原文链接: https://studio3t.com/knowledge-base/articles/mongodb-aggregation-framework/#introduction

The Beginner’s Guide to MongoDB Aggregation(初学者概览)

一、Introduction (简介)

When you start with MongoDB, you will use the find() command for querying data and it will probably be sufficient, but as soon as you start doing anything more advanced than data retrieval, you will need to know more about the MongoDB Aggregation Framework.

I will explain the main principles of building working queries and how to take advantage of indexes for speeding up queries.

Furthermore,I will introduce the most important stages of the aggregation pipeline with short examples using each one, and how to apply them to an aggregation pipeline.

开始用 MongoDB 的时候,用 find() 命令来查询数据就够了。但是,如果想做一些比数据检索更复杂的工作,那就需要进一步了解 MongoDB 聚合框架。

接下来我讲一讲构建可用查询的主要原则和利用索引加速查询的方法。

此外,我将通过简单明了的例子,介绍聚合管道里最重要的一些阶段,以及在聚合管道中的应用。

二、The MongoDB Aggregation Framework (MongoDB 聚合框架)

Here is a diagram to illustrate a typical pipeline.

下面只一个典型管道的图解。

初学者MongoDB聚合概览(The Beginner’s Guide to MongoDB Aggregation)

The $match() stage filters those documents we need to work with, those that fit our needs.

$match() 阶段的功能是按需求筛选数据。

The $group() stage does the aggregation job and finally, we $sort() the resulting documents the way we require. In order to store the documents obtained we use the $output() stage.

$group() 阶段负责聚合工作。然后,我们通过 $sort() 把生成的文档按所要求的方式排序。 $output() 用来保存文档。

The input of the pipeline can be one or several collections. The pipeline then performs successive transformations on the data until our goal is achieved. This way, we can break down a complex query into easier stages, in each of which we complete a different operation on the data. So, by the end of the query pipeline, we will have achieved all that we wanted.

管道的输入可以是一个也可以是几个集合,然后持续转换数据,直到完成要求。通过这种方式,我们可以把复杂的查询划分为几个简单阶段,每个阶段我们都对数据执行不同的操作。最后,就可以得到理想的结果。

This approach allows us to check whether our query is functioning properly at every stage by examining both its input and the output. The output of each stage will be the input of the next.

There is no limit to the number of stages used in the query, or how we combine them. To achieve optimum query performance there are a number of best practices to take into account. We will come to those later in the article.

检查每一步的输入和输出,我们可以用这个方法来判断查询是否正常运行。每一步的输出就是下一步的输入。查询中使用的阶段数量和组合是不受限制的。为了达到最佳查询效果,下文列举了一些范例供大家参考。我们将在后文继续探讨这个问题。

三、Syntax (语法)

This is an example of how we build the aggregation query:

下面是一个关于我们如何创建聚合查询的例子:

pipeline = [  
  { $match : { … },  
  { $group : { … },  
  { $sort : { … }, 
  ... 
] 
db.collectionName.aggregate(pipeline, options) 

四、Regarding Limits(一些局限)

Up to 100 MB of RAM can be used per stage. You will get an error from the database if you exceed this limit. If it becomes an unavoidable problem you can opt to page to disk, with the only disadvantage that you will wait a little longer because it is slower to work in the disk rather than memory. To choose the page to disk method, you just need to use the option allowDiskUse, in this way:

每一个阶段最多可以使用 100MB 的运行内存。如果超出这个限制,数据库就会报错。如果不得不这么做的话,可以选择页到磁盘。缺点就是要多等一会儿,因为在磁盘比在内存运行慢。选择页到磁盘,只需要加上选项 allowDiskUse,如下:

db.collectionName.aggregate(pipeline, { allowDiskUse : true })

The documents returned by the aggregation query, either as a cursor or stored via $out() in another collection, are limited to 16MB.

通过聚合查询、光标或是用$out()储存,返回的最大限制为 16MB。

If you are likely to exceed this limit, then you should specify that the output of the aggregation query will be as a cursor and not as a document.

如果超出这个限制,那么必须指定聚合查询的输出从文档改为光标。

五、Our Collections(我们的集合)

I will be giving examples of the use of all the more important stages. To illustrate the examples, I am going to use two collections. The first is called ‘universities’ and is made up of these documents (the data is not real):

对于比较重要的阶段,我会用例子来阐述。为了说明这些例子,我用两个集合。第一个由下面文档组成,“高校集合”(非真实数据):

{
  country : 'Spain',
  city : 'Salamanca',
  name : 'USAL',
  location : {
    type : 'Point',
    coordinates : [ -5.6722512,17, 40.9607792 ]
  },
  students : [
    { year : 2014, number : 24774 },
    { year : 2015, number : 23166 },
    { year : 2016, number : 21913 },
    { year : 2017, number : 21715 }
  ]
}
{
  country : 'Spain',
  city : 'Salamanca',
  name : 'UPSA',
  location : {
    type : 'Point',
    coordinates : [ -5.6691191,17, 40.9631732 ]
  },
  students : [
    { year : 2014, number : 4788 },
    { year : 2015, number : 4821 },
    { year : 2016, number : 6550 },
    { year : 2017, number : 6125 }
  ]
}

If you would like to test these examples on your own installation, you can insert them with this bulk command:
如果想在自己的数据库中测试这些例子,可以用下面的命令集插入。

use 3tdb
db.universities.insert([
{
  country : 'Spain',
  city : 'Salamanca',
  name : 'USAL',
  location : {
    type : 'Point',
    coordinates : [ -5.6722512,17, 40.9607792 ]
  },
  students : [
    { year : 2014, number : 24774 },
    { year : 2015, number : 23166 },
    { year : 2016, number : 21913 },
    { year : 2017, number : 21715 }
  ]
},
{
  country : 'Spain',
  city : 'Salamanca',
  name : 'UPSA',
  location : {
    type : 'Point',
    coordinates : [ -5.6691191,17, 40.9631732 ]
  },
  students : [
    { year : 2014, number : 4788 },
    { year : 2015, number : 4821 },
    { year : 2016, number : 6550 },
    { year : 2017, number : 6125 }
  ]
}
])

The second and last collection is called ‘courses’ and looks like this:
第二个是“课程集合”,大致如下:

{
  university : 'USAL',
  name : 'Computer Science',
  level : 'Excellent'
}
{
  university : 'USAL',
  name : 'Electronics',
  level : 'Intermediate'
}
{
  university : 'USAL',
  name : 'Communication',
  level : 'Excellent'
}

Again, you can insert them in the same way, using the following code:
当然,你也可以用下面的代码插入:

db.courses.insert([
{
  university : 'USAL',
  name : 'Computer Science',
  level : 'Excellent'
},
{
  university : 'USAL',
  name : 'Electronics',
  level : 'Intermediate'
},
{
  university : 'USAL',
  name : 'Communication',
  level : 'Excellent'
}
])

六、Aggregation Stages(联合阶段)

$match

This stage allows us to choose just those documents from a collection that we want to work with. It does this by filtering out those that do not follow our requirements.

这一阶段允许我们从想要使用的集合中选择文档,过滤掉那些不符合要求的。

In the following example, we only want to work with those documents which specify that Spain is the value of the field country, and Salamanca is the value of the field city.

In order to get a readable output, I am going to add .pretty() at the end of all the commands.

在下面的例子中,我们只想处理那些国家字段值是西班牙和城市字段值是萨拉曼卡的文档。为了结果便于理解,我会在所有命令的结尾加一个 .pretty()。

db.universities.aggregate([
  { $match : { country : 'Spain', city : 'Salamanca' } }
]).pretty()

The output is…

输出结果是:

{
    "_id" : ObjectId("5b7d9d9efbc9884f689cdba9"),
    "country" : "Spain",
    "city" : "Salamanca",
    "name" : "USAL",
    "location" : {
        "type" : "Point",
        "coordinates" : [
            -5.6722512,
            17,
            40.9607792
        ]
    },
    "students" : [
		{
			"year" : 2014,
			"number" : 24774
		},
		{
			"year" : 2015,
			"number" : 23166
		},
		{
			"year" : 2016,
			"number" : 21913
		},
		{
			"year" : 2017,
			"number" : 21715
		}
	]
}
{
	"_id" : ObjectId("5b7d9d9efbc9884f689cdbaa"),
	"country" : "Spain",
	"city" : "Salamanca",
	"name" : "UPSA",
	"location" : {
		"type" : "Point",
		"coordinates" : [
			-5.6691191,
			17,
			40.9631732
		]
	},
	"students" : [
		{
			"year" : 2014,
			"number" : 4788
		},
		{
			"year" : 2015,
			"number" : 4821
		},
		{
			"year" : 2016,
			"number" : 6550
		},
		{
			"year" : 2017,
			"number" : 6125
		}
	]
}

$project

It is rare that you ever need to retrieve all the fields in your documents. It is good practice to return only those fields you need so as to avoid processing more data than is necessary. The $project() stage is used to do this and to add any calculated fields that you need.

一般来说,在查询文档时,我们不需要所有值。通常是只返回那些你需要的字段,这样可以避免处理不必要的数据。

In this example, we only need the fields country, city and name. In the code that follows, please note that:

在这个例子里,我们只需要国家、城市、姓名这些字段。下面的代码中,请注意:

  • We must explicitly write _id : 0 when this field is not required
  • 如果我们不需要这个字段,我们必须明确地写出来 _id : 0
  • Apart from the _id field, it is sufficient to specify only those fields we need to obtain as a result of the query
  • 除了_id这个字段,只指定那些查询结果里需要的字段就可以了。

This stage …(这一阶段)

db.universities.aggregate([
  { $project : { _id : 0, country : 1, city : 1, name : 1 } }
]).pretty()

…will give the result …(结果是)

{ "country" : "Spain", "city" : "Salamanca", "name" : "USAL" }
{ "country" : "Spain", "city" : "Salamanca", "name" : "UPSA" }

Use the [KaTeX parse error: Expected 'EOF', got '#' at position 90: …mongodb-arrays/#̲how-to-use-filt…project to filter elements from MongoDB arrays.
用 $filter 操作符配合 $project 来筛选 MongoDB数组的元素。

$group

With the $group() stage, we can perform all the aggregation or summary queries that we need, such as finding counts, totals, averages or maximums.

在 $group阶段,可以执行所有需要的聚合和汇总查询,比如查找数量、总和、平均数或是最大值。

In this example, we want to know the number of documents per university in our ‘universities’ collection:

在这个例子中,我们想知道高校集合这个集合中每个大学文档的数量:

The query …(查询语句)

db.universities.aggregate([
  { $group : { _id : '$name', totaldocs : { $sum : 1 } } }
]).pretty()

…will produce this result …(产出结果)

{ "_id" : "UPSA", "totaldocs" : 1 }
{ "_id" : "USAL", "totaldocs" : 1 }

$out

This is an unusual type of stage because it allows you to carry the results of your aggregation over into a new collection, or into an existing one after dropping it, or even adding them to the existing documents (new in 4.1.2 version).

这个阶段不太常见,其功能是将聚合的结果转移到一个新集合中,或者将其删除后转移到一个现有的集合中,甚至将其添加到现有文档中。(版本4.1.2新支持)

The $out() operator must be the last stage in the pipeline. For the first time, we are using an aggregation with more than one stage. We now have two, a $group() and an $out():

outout 操作符必须是管道最后一个阶段。这里我们第一次要用到具有多个阶段的聚合。我们现在有两个,一个是group(),一个是$out()。

db.universities.aggregate([
  { $group : { _id : '$name', totaldocs : { $sum : 1 } } },
  { $out : 'aggResults' }
])

Now, we check the content of the new ‘aggResults’ collection:
现在我们来检查一下新聚合结果集合的内容:

db.aggResults.find().pretty()
{ "_id" : "UPSA", "totaldocs" : 1 }
{ "_id" : "USAL", "totaldocs" : 1 }
>

$unwind

Now we’ve produced a multi-stage aggregation, we can go on to build up a pipeline.

现在我们已经生成过多阶段聚合,现在可以开始构建管道。

The $unwind() stage is commonly found in a pipeline because it is a means to an end. You cannot work directly on the elements of an array within a document with stages such as $group(). The $unwind() stage enables us to work with the values of the fields within an array.

unwind()unwind() 阶段在管道里最常见,因为它是达到目的的一种手段。你不能直接处理文档中包含在group()等阶段的数组元素。$unwind()阶段可以处理数组中字段的值。

Where there is an array field within the input documents, you will sometimes need to output the document several times, once for every element of that array. Each copy of the document has the array field replaced with the successive element.

当输入文档中有一个数组字段时,有时候我们需要多次输出文档,每次输出对应于该数组的每个元素。文档的每个副本都将数组字段替换为后续元素。

In the next example, I am going to apply the stage only to the document whose field name contains the value USAL.

在接下来的例子中,我将只对字段名包含值为USAL的文档应用该阶段。

This is the document:(下面是文档)

{
  country : 'Spain',
  city : 'Salamanca',
  name : 'USAL',
  location : {
    type : 'Point',
    coordinates : [ -5.6722512,17, 40.9607792 ]
  },
  students : [
    { year : 2014, number : 24774 },
    { year : 2015, number : 23166 },
    { year : 2016, number : 21913 },
    { year : 2017, number : 21715 }
  ]
}

Now, we apply the $unwind() stage, over the student’s array, and check that we get a document per each element of the array. The first document is made up of the fields in the first element of the array and the rest of the common fields, the second document is made up of the fields in the second element of the array and the rest of the common fields, and so on.
现在我们在学生的数组上应用 $unwind()阶段,然后来看看每个数组元素是否都有一个文档。第一个文档由数组第一个元素的字段和其他的公共字段组成,第二个文档由数组第二个元素的字段和其他公共字段组成,以此类推。

db.universities.aggregate([
  { $match : { name : 'USAL' } },
  { $unwind : '$students' }
]).pretty()
{
	"_id" : ObjectId("5b7d9d9efbc9884f689cdba9"),
	"country" : "Spain",
	"city" : "Salamanca",
	"name" : "USAL",
	"location" : {
		"type" : "Point",
		"coordinates" : [
			-5.6722512,
			17,
			40.9607792
		]
	},
	"students" : {
		"year" : 2014,
		"number" : 24774
	}
}
{
	"_id" : ObjectId("5b7d9d9efbc9884f689cdba9"),
	"country" : "Spain",
	"city" : "Salamanca",
	"name" : "USAL",
	"location" : {
		"type" : "Point",
		"coordinates" : [
			-5.6722512,
			17,
			40.9607792
		]
	},
	"students" : {
		"year" : 2015,
		"number" : 23166
	}
}
{
	"_id" : ObjectId("5b7d9d9efbc9884f689cdba9"),
	"country" : "Spain",
	"city" : "Salamanca",
	"name" : "USAL",
	"location" : {
		"type" : "Point",
		"coordinates" : [
			-5.6722512,
			17,
			40.9607792
		]
	},
	"students" : {
		"year" : 2016,
		"number" : 21913
	}
}
{
	"_id" : ObjectId("5b7d9d9efbc9884f689cdba9"),
	"country" : "Spain",
	"city" : "Salamanca",
	"name" : "USAL",
	"location" : {
		"type" : "Point",
		"coordinates" : [
			-5.6722512,
			17,
			40.9607792
		]
	},
	"students" : {
		"year" : 2017,
		"number" : 21715
	}
}

$sort

Sometimes you need to sort your results by the value of a specific field. For this, you need the $sort() stage. For example, let’s sort the documents obtained as a result of the $unwind stage by the number of students in descending order. In order to get a lesser output, I am going to project only the year and the number of students.

如果需要将结果按照某种字段排序,可以用 sort()sort()阶段。比如,把通过unwind()阶段获得的文档按照学生人数降序排列。为了少输出一点,我将只project 年份和学生的个数。

db.universities.aggregate([
  { $match : { name : 'USAL' } },
  { $unwind : '$students' },
  { $project : { _id : 0, 'students.year' : 1, 'students.number' : 1 } },
  { $sort : { 'students.number' : -1 } }
]).pretty()

This gives the result …
结果是

{ "students" : { "year" : 2014, "number" : 24774 } }
{ "students" : { "year" : 2015, "number" : 23166 } }
{ "students" : { "year" : 2016, "number" : 21913 } }
{ "students" : { "year" : 2017, "number" : 21715 } }

The $sort stage can be used with other stages to reduce data in a MongoDB collection to just exactly what you need.

$limit

What if you are only interested in the first two results of your query? It is as simple as:

如果只对查询的前两条结果感兴趣呢?很简单:

db.universities.aggregate([
  { $match : { name : 'USAL' } },
  { $unwind : '$students' },
  { $project : { _id : 0, 'students.year' : 1, 'students.number' : 1 } },
  { $sort : { 'students.number' : -1 } },
  { $limit : 2 }
]).pretty()
{ "students" : { "year" : 2014, "number" : 24774 } }
{ "students" : { "year" : 2015, "number" : 23166 } }

Notice that when you need to limit the number of sorted documents, you must use the $limit stage just after the $sort.
注意:如果需要限制已排序文档的数量,必须在 $sort之后再使用 $limit阶段。

Now we have a full pipeline. We can paste this whole MongoDB aggregate query and all its stages straight into the Aggregation Editor in Studio 3T. It is pasted in by copying it and clicking on the code paste button as shown.

现在,一个完整的管道建立起来了。我们可以将整个MongoDB聚合查询及其所有阶段直接粘贴到Studio 3T的聚合编辑器中。只要复制和点击代码粘贴按钮进来就行,如下图所示。

初学者MongoDB聚合概览(The Beginner’s Guide to MongoDB Aggregation)

Read more about the Aggregation Editor, Studio 3T’s stage-by-stage MongoDB aggregation query builder.
想看更多关于聚合编辑器,Studio 3T的步进MongoDB聚合查询构造器

Just the part shown below is copied and pasted in

复制和粘贴下面的部分

db.universities.aggregate([
  { $match : { name : 'USAL' } },
  { $unwind : '$students' },
  { $project : { _id : 0, 'students.year' : 1, 'students.number' : 1 } },
  { $sort : { 'students.number' : -1 } }
])

In the next screenshot, we can see the full pipeline in Studio 3T and its output.
在以下截图中,可以看到在Studio 3T中的整个管道以及输出。

初学者MongoDB聚合概览(The Beginner’s Guide to MongoDB Aggregation)

Removing stages in Studio 3T is a simple matter of using the button shown in the next screenshot.

在Studio 3T中删除阶段只需使用下一个截图中显示的按钮即可。

初学者MongoDB聚合概览(The Beginner’s Guide to MongoDB Aggregation)

Studio 3T’s Aggregation Editor supports these MongoDB aggregation operators and stages.
Studio 3T的聚合编辑器支持这些MongoDB聚合操作符和阶段。

$addFields

It is possible that you need to make some changes to your output in the way of new fields. In the next example, we want to add the year of the foundation of the university.

这个功能是对输出结果进行增加新字段的更改。在下面的例子里,我们想增加一个大学创立的年份。

db.universities.aggregate([
  { $match : { name : 'USAL' } },
  { $addFields : { foundation_year : 1218 } }
]).pretty()

This gives the result …
结果是

{
	"_id" : ObjectId("5b7d9d9efbc9884f689cdba9"),
	"country" : "Spain",
	"city" : "Salamanca",
	"name" : "USAL",
	"location" : {
		"type" : "Point",
		"coordinates" : [
			-5.6722512,
			17,
			40.9607792
		]
	},
	"students" : [
		{
			"year" : 2014,
			"number" : 24774
		},
		{
			"year" : 2015,
			"number" : 23166
		},
		{
			"year" : 2016,
			"number" : 21913
		},
		{
			"year" : 2017,
			"number" : 21715
		}
	],
	"foundation_year" : 1218
}

$count

The $count() stage provides an easy way to check the number of documents obtained in the output of the previous stages of the pipeline. Let’s see it in action:

$count()阶段提供了一个简单的方法来检查管道前几个阶段的输出中的文档数量。让我们看一下它起的作用:

db.universities.aggregate([
  { $unwind : '$students' },
  { $count : 'total_documents' }
]).pretty()

This provides the total of those years for which we know the number of students at the University.
这里给出了这些年在大学里学生的人数的总和。

{ "total_documents" : 8 }

$lookup

Because MongoDB is document-based, we can shape our documents the way we need. However, there is often a requirement to use information from more than one collection. Using the $lookup(), here is an aggregate query that merges fields from two collections.

因为MongoDB是基于文档的,所以我们可以根据需要来设计文档。但是,通常都会需要来自多个集合的信息。使用$lookup() ,这是一个可以把两个集合的字段合并在一起的聚合查询。

db.universities.aggregate([
  { $match : { name : 'USAL' } },
  { $project : { _id : 0, name : 1 } },
  { $lookup : {
      from : 'courses',
      localField : 'name',
      foreignField : 'university',
      as : 'courses'
  } }
]).pretty()

If you want this query to run fast, you are going to need to index the name field in the universities collection and the university field in the courses collection. In other words, do not forget to index the fields involved in the lookup().lookup(). 如果想要这种查询很快,你得在大学集合中给姓名字段添加索引,还有课程集合中的大学字段也是一样。换句话说,lookup()涉及的字段都要索引。

{
	"name" : "USAL",
	"courses" : [
		{
			"_id" : ObjectId("5b7d9ea5fbc9884f689cdbab"),
			"university" : "USAL",
			"name" : "Computer Science",
			"level" : "Excellent"
		},
		{
			"_id" : ObjectId("5b7d9ea5fbc9884f689cdbac"),
			"university" : "USAL",
			"name" : "Electronics",
			"level" : "Intermediate"
		},
		{
			"_id" : ObjectId("5b7d9ea5fbc9884f689cdbad"),
			"university" : "USAL",
			"name" : "Communication",
			"level" : "Excellent"
		}
	]
}

$sortByCount

This stage is a shortcut for grouping, counting and then sorting in descending order the number of different values in a field.

这个阶段是对字段中不同值的数量进行分组、计数、降序排列的快捷方式。

Suppose you want to know the number of courses per level, sorted in descending order. The following is the query you would need to build:

假设你想知道每个级别课程的数量,并降序排列。我们就可以构建下面的查询。

db.courses.aggregate([
  { $sortByCount : '$level' }
]).pretty()

This is the output:
下面是输出:

{ "_id" : "Excellent", "count" : 2 }
{ "_id" : "Intermediate", "count" : 1 }

$facet

Sometimes when creating a report on data, you find that you need to do the same preliminary processing for a number of reports, and you are faced with having to create and maintain an intermediate collection.

有时,在创建数据报告时,你需要对很多报告进行相同的初步处理,并且必须创建和维护中间集合。

You may, for example, do a weekly summary of trading that is used by all subsequent reports. You might have wished it were possible to run more than one pipeline simultaneously over the output of a single aggregation pipeline.

例如,你可以做一个所有后续报告都使用的每周交易总结。你可能需要可以在单个聚合管道的输出上并行多个管道。

We can now do it within a single pipeline thanks to this stage. Take a look at this example:

有了这一阶段,我们现在可以在一个单独的管道中完成。请看下面的例子:

db.universities.aggregate([
  { $match : { name : 'USAL' } },
  { $lookup : {
      from : 'courses',
      localField : 'name',
      foreignField : 'university',
      as : 'courses'
  } },
  { $facet : {
      'countingLevels' :
      [
         { $unwind : '$courses' },
         { $sortByCount : '$courses.level' }
      ],
      'yearWithLessStudents' :
      [
         { $unwind : '$students' },
         { $project : { _id : 0, students : 1 } },
         { $sort : { 'students.number' : 1 } },
         { $limit : 1 }
      ]
  } }
]).pretty()

What we have done is to create two reports from our database of university courses. CountingLevels and YearWithLessStudents. They both used the output from the first two stages, the $match and the lookup.使lookup. 我们所做的就是从我们的大学课程数据库中创建两个报告:级别数量和较少学生的年份。它们都使用前两个阶段match 和$lookup的输出。

With a large collection, this can save a great deal of processing time by avoiding repetition, and we no longer need to write an intermediate temporary collection.

对于大型集合,通过避免重复来节省大量处理时间,并且我们不再需要写中间临时集合。

{
	"countingLevels" : [
		{
			"_id" : "Excellent",
			"count" : 2
		},
		{
			"_id" : "Intermediate",
			"count" : 1
		}
	],
	"yearWithLessStudents" : [
		{
			"students" : {
				"year" : 2017,
				"number" : 21715
			}
		}
	]
}

[Read more about other use cases of the facetstageinMongoDBaggregation.](https://studio3t.com/knowledgebase/articles/mongodbfacetbucketstages/)MongoDBfacet stage in MongoDB aggregation.](https://studio3t.com/knowledge-base/articles/mongodb-facet-bucket-stages/) 在MongoDB 聚合中阅读更多关于facet 阶段的用例

Exercise(练习)

Now, try to resolve the next exercise by yourself.

现在尝试自己解决一下下面的练习。

How do we get the total number of students that have ever belonged to each one of the universities?

如何才能得到曾经属于每一所大学的学生总数?

db.universities.aggregate([
  { $unwind : '$students' },
  { $group : { _id : '$name', totalalumni : { $sum : '$students.number' } } }
]).pretty()

The output:
输出是:

{ "_id" : "UPSA", "totalalumni" : 22284 }
{ "_id" : "USAL", "totalalumni" : 91568 }

Yes, I have combined two stages. But, how do we build a query that sorts the output by the totalalumni field in a descending order?
是的,我刚合并了两个阶段。但是,怎么构建一个查询让总校友数字段按降序排列输出呢?

db.universities.aggregate([
  { $unwind : '$students' },
  { $group : { _id : '$name', totalalumni : { $sum : '$students.number' } } },
  { $sort : { totalalumni : -1 } }
]).pretty()

Right, we need to apply the $sort() stage at the output of the group().group(). 没错,我们需要在group()的输出应用$sort()阶段。

Checking our Query(检查我们的查询)

I mentioned earlier that it is very easy, and indeed essential, to check that the stages of our query are performing the way we need them to perform.

我前面说过,检查查询的每个阶段是否按我们需要的方式执行非常简单,而且非常重要。

With Studio 3T you have two buttons to check the input and output documents for any particular stage. You can see this in the next screenshot.

在Studio 3T中,你可以按两个按钮来检查任意阶段的输入和输出。请看下面的截图。

初学者MongoDB聚合概览(The Beginner’s Guide to MongoDB Aggregation)

Here’s another MongoDB aggregation example which uses the $match, $group, and $sort stages.
这里还有MongoDB聚合 match,match,group,$sort阶段的一些例子。

Performance(性能)

The aggregation pipeline automatically reshapes the query with the aim of improving its performance.

聚合管道会自动地重新构建查询来提高性能。

If you have both $sort and $match stages, it is always better to use the $match before the $sort in order to minimize the number of documents that the $sort stage has to deal with.

如果你同时使用sortsort 和match 阶段,最好是在sort使sort之前使用match,以最小化$sort 阶段必须处理的文档数量。

To take advantage of indexes you must do it in the first stage of your pipeline. And here, you must use the $match() or the $sort() stages. We can check whether the query is using an index through the explain() method.

想要利用索引,必须在管道的第一阶段进行。这里,必须match()match() 或sort()阶段使用。我们可以通过explain()方法检查查询有没有用索引。

pipeline = [...]
db.<collectionName>.aggregate( pipeline, { explain : true })

Speaking of performance, learn to improve database performance with MongoDB aggregation.
说到性能,学习如何利用聚合提高MongoDB的性能。

Conclusion(总结)

I’ve introduced the Aggregation Pipeline and demonstrated with examples how to use only some stages. The more that you use MongoDB, the more important the Aggregation Pipeline becomes in allowing you to do all those reporting, transforming, and advanced querying tasks that are so integral to the work of a database developer.

我已经介绍了聚合管道,并且演示了如何使用某些阶段。你使用MongoDB越多,聚合管道就越重要。因为它允许你执行所有报告、转换、高级查询任务,而这些任务对数据库开发人员来说不可或缺。

With the more complex pipeline processes, it becomes increasingly important to check and debug the input and output of every stage. There is always a point at which one needs to paste the growing aggregation pipeline into an IDE for MongoDB such as Studio 3T, with a built-in Aggregation Editor, so you can debug every stage independently.

随着滚到流程越来越复杂,检查调试每个阶段的输入和输出变得越来越重要。在某些情况下,需要将不断增长的聚合管道粘贴到MongoDB的集成开发环境中比如Studio 3T,用内置的聚合编辑器独立地调试每一个阶段。

Check out other aggregation query examples:

查看其它聚合查询示例:

更新于2019年12月5日

Updated on December 5, 2019

翻译By 阿迪不会飞

相关标签: 翻译