MongoDB数据库聚合之分组统计$group的用法详解

2024-08-10 12:47 数据库作者：原子星

前言

MongoDB不像关系型数据库，普通的查询不支持汇总，要进行复杂的分组汇总，需要使用聚合管道，$group可以说是MongoDB聚合管道进行数据分析最常用的一个阶段。该阶段根据分组键值（组键）把文档分成若干组，每个唯一的键值对应一个文档。组键通常是一个或多个字段，也可以是表达式的结果。$group阶段输出的结果中，_id字段的值就是组键的值，输出文档中还可以包含汇总表达式的字段，汇总表达式的功能非常丰富，下面的列表会简单介绍，具体的使用方法可以参考详细说明。

$group的语法

{
 $group:
   {
     _id: <expression>, // 组键，就是分组的键值字段或表达式
     <field1>: { <accumulator1> : <expression1> },
     ...
   }
 }

字段说明：

字段	说明
_id	不可省略，通过`_id`表达式指定分组的依据，如果直接指定`_id`的值为`null`或常量，则把全部的输入文档汇总后返回一个文档
field	可选，汇总表达式计算的结果
_id和field可以是任何合法的表达式。

分组汇总操作符

分组汇总操作符比较多，功能丰富且强大，这里简要介绍其用途，详细的用法后续再专文介绍。

操作符	用途介绍
`$accumulator`	返回累加结果
`$addToSet`	把分组中不重复的表达式的值作为数组返回，注意数组的元素无序的，类似分组内的distinct
`$avg`	返回数值的平均值。非数值会被忽略
`$bottom`	按照指定的顺序返回分组中最后一个元素
`$bottomN`	按照指定的顺序返回分组中最后N个元素字段的集合，如果分组元素数量小于N，则返回全部
`$count`	返回分组内的元素数量
`$first`	返回分组内第一个元素表达式的结果
`$firstN`	返回分组内前n个元素的聚合。只有文档有序时才有意义
`$last`	返回分组中最后一个文档的表达式的结果
`$lastN`	返回分组内最后n个元素的聚合。只有文档有序时才有意义
`$max`	返回每个分组表达式值的最大值
`$maxN`	返回分组内最大的n个元素的集合
`$median`	返回分组中的中位数
`$mergeObjects`	返回分组合并后的文档
`$min`	返回分组内表达式的最小值
`$percentile`	返回与指定百分位数值相对应的值的数组
`$push`	返回每个分组表达式值的数组
`$stdDevPop`	返回标准差
`$stdDevSamp`	返回样本标准差
`$sum`	返回合计值，忽略空值
`$top`	根据指定的顺序返回组内最前面的元素
`$topN`	根据指定的顺序返回组内前N个元素的聚合

注意

$group使用内存不能超过100M，超过会报错。如果想要处理更多数据或者少用一些内存，可使用allowDiskUse选项把数据写入临时文件。
当使用$first、$last等操作符时，可以考虑在参与排序的分组字段上添加索引，某些情况下，这些操作可以使用索引快速定位到相应的记录。

一些例子

统计数量

创建并插入数据：

db.sales.insertMany([
  { "_id" : 1, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
  { "_id" : 2, "item" : "jkl", "price" : Decimal128("20"), "quantity" : Int32("1"), "date" : ISODjavascriptate("2014-03-01T09:00:00Z") },
  { "_id" : 3, "item" : "xyz", "price" : Decimal128("5"), "quantity" : Int32( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
  { "_id" : 4, "item" : "xyz", "price" : Decimal128("5"), "quantity" :  Int32("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
  { "_id" : 5, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
  { "_id" : 6, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
  { "_id" : 7, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
  { "_id" : 8, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

统计sales全部文档数量

相当于collection.find({}).count()

db.sales.aggregate( [
  {
    $group: {
       _id: null,
       count: { $count: { } }
    }
  }
] )

结果：

{ "_id" : null, "count" : 8 }

检索不同的值，等价于distinct

仍以上例的sales集合数据为例

db.sales.aggregate( [ { $group : { _id : "$item" } } ] )

结果：

{ "_id" : "abc" }
{ "_id" : "jkl" }
{ "_id" : "def" }
{ "_id" : "xyz" }

等价于：

db.sales.distinct("item")

按Item分组

下面的聚合先按照item进行分组，计算每个item销售总额，并且返回大于等于100的item。

db.sales.aggregate(
  [
    //阶段1
    {
      $group :
        {
          _id : "$item",
          totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }
        }
     },
     //阶段2
     {
       $match: { "totalSaleAmount": { $gte: 100 } }
     }
   ]
 )

阶段1：$group阶段，根据item进行分组，并计算每个item的销售总额。

阶段2：$math阶段，过滤结果文档，只返回销售总额totalSaleAmount大于等于100的文档。

结果：

{ "_id" : "abc", "totalSalpythoneAmount" : Decimal128("170") }
{ "_id" : "xyz", "totalSaleAmount" : Decimal128("150") }
{ "_id" : "def", "totalSaleAmount" : Decimal128("112.5") }

计算总数、合计和平均值

创建一个sales集合并插入记录：

db.sales.insertMany([
  { "_id" : 1, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
  { "_id" : 2, "item" : "jkl", "price" : Decimal128("20"), "quantity" : Int32("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
  { "_id" : 3, "item" : "xyz", "price" : Decimal128("5"), "quantity" : Int32( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
  { "_id" : 4, "item" : "xyz", "price" : Decimal128("5"), "quantity" :  Int32("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
  { "_id" : 5, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
  { "_id" : 6, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
  { "_id" : 7, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
  { "_id" : 8, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

按照日期分组

下面的聚合管道计算2014年的销售总额、平均销量和销售数量

db.sales.aggregate([
  //阶段1
  {
    $match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } }
  },
  //阶段2
  {
    $group : {
       _id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
       totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
       averageQuantity: { $avg: "$quantity" },
       count: { $sum: 1 }
    }
  },
  //阶段3
  {
    $sort : { totalSaleAmount: -1 }
  }
 ])

阶段1使用$math只允许2014年的数据进入下一阶段.

阶段2使用$group根据日期进行分组，统计每个分组的销售总额、平均销量和销售数量。

阶段3使用$sort按照销售总额进行降序排序

结果：

{
   "_id" : "2014-04-04",
   "totalSaleAmount" : Decimal128("200"),
   "averageQuantity" : 15, "count" : 2
}
{
   "_id" : "2014-03-15",
   "totalSaleAmount" : Decimal128("50"),
   "averageQuantity" : 10, "count" : 1
}
{
   "_id" : "2014-03-01",
   "totalSaleAmount" : Decimal128("40"),
   "averageQuantity" : 1.5, "count" : 2
}

按控制null分组

下面的聚合操作，指定分组_id为空，计算集合中所有文档的总销售额、平均数量和计数。

db.sales.aggregate([
  {
    $group : {
       _id : null,
       totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
       averageQuantity: { $avg: "$quantity" },
       count: { $sum: 1 }
    }
  }
 ])

结果：

{
"_id" : null,
"totalSaleAmount" : Decimal128("452.5"),
"averageQuantity" : 7.875,
"count" : 8
}

数据透视

创建books集合并插入数据

db.books.insertMany([
  { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
  { "_id" : 8752, "title" : "Divihttp://www.devze.comne Comedy", "author" : "Dante", "copies" : 1 },
  { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
  { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
  { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])

根据作者对标题分组

下面的聚合操作将books集合中的数据透视为按作者分组的标题。

db.books.aggregate([
   { $group : { _id : "$author", books: { $push: "$title" } } }
 ])

结果

{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

根据作者对文档分组

db.books.aggregate([
   // 阶段1
   {
     $group : { _id : "$author", books: { $push: "$$ROOT" } }
   },
   // 阶段2
   {
     $addFields:
       {
         totalCopies : { $sum: "$books.copies" }
       }
   }
 ])

阶段1：分组$group，根据作者对文档进行分组，使用$$ROOT把文档作为books的元素。

阶段2：$addFields会在输出文档中添加一个字段，即每位作者的图书总印数。

结果：

{
"_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliadphp", "author" : "Homer", "copies" : 10 }
],
"totalCopies" : 20
}

{
"_id" : "Dante",
"eMrCIFDRBs;books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
],
"totalCopies" : 5
}

以上内容参考mongodb官方文档整理而来

总结

到此这篇关于MongoDB数据库聚合之分组统计$group的用法详解的文章就介绍到这了,更多相关MongoDB分组统计$group用法内容请搜索编程客栈(www.devze.com)以前的文章或继续浏览下面的相关文章希望大家以后多多支持编程客栈(www.devze.com)！

继续阅读：mongodb 分组统计 mongodb 统计 mongodb分组统计$group

MongoDB数据库聚合之分组统计$group的用法详解

目录

前言

$group的语法

分组汇总操作符

注意

一些例子

统计数量

统计sales全部文档数量

检索不同的值，等价于distinct

按Item分组

计算总数、合计和平均值

按照日期分组

按控制null分组

数据透视

根据作者对标题分组

根据作者对文档分组

总结

更多精彩内容

精彩评论

最新数据库

MySQL多实例部署实战完美攻略

MySQL Hint 功能优化查询性能的操作步骤

深入解析MySQL中的约束

在IDEA中直接使用可视化方式创建项目数据库的具体步骤

在linux系统中使用通用包安装Mysql的步骤

数据库排行榜

Hadoop Key Management Server (KMS)配置及测试

spark报错ERROR ObjectStore: Version information found in metastore differs 2.1.0 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version.

Navicat连接Oracle数据库的详细步骤与注意事项

解决Navicat远程连接MySQL出现 10060 unknow error的方法

redis-cluster集群调优之cluster-require-full-coverage参数