MongoDB 2.1 introduced the aggregation framework, a faster alternative to Map/Reduce for common aggregation operations. If you took a look at the documentation and examples, you may have found the feature intimidating. Once you tame it, this new feature reveals itself as a very powerful beast. So read on to discover its true power through a series of examples.

A MongoDB aggregation is a series of special operators applied to a collection. An operator is a JavaScript object with a single property, the operator name, which value is an option object:
{ $name: { /* options */ } }Supported operator names are: $project, $match, $limit, $skip, $unwind, $group, and $sort, each with their own set of options. A series of operators is called a pipeline:
[{ $project: { /* options */ } }, { $match: { /* options */ } }, { $group: { /* options */ } }]When executing a pipeline, MongoDB pipes operators into each other. “Pipe” here takes the Linux meaning: the output of an operator becomes the input of the following operator. The result of each operator is a new collection of documents. So Mongo executes the previous pipeline as follows:
collection | $project | $match | $group => result
You can add as many operators to a pipeline as you like, even twice the same one, at different positions:
collection | $match | $group | $match | $project | $group => result
That explains why a pipeline is not written as a simple JavaScript object, but rather as a collection of objects: in an object, the same operator couldn’t appear twice:
// The first appearance of $match and $group would be ignored with this syntax { $match: { /* options */ }, $group: { /* options */ }, $match: { /* options */ }, $project: { /* options */ }, $group: { /* options */ } } // So MongoDB imposes a collection of JavaScript objects instead [ { $match: { /* options */ } }, { $group: { /* options */ } }, { $match: { /* options */ } }, { $project: { /* options */ } }, { $group: { /* options */ } } ] // That's longer and cumbersome to read, but you'll get used to it
To execute a pipeline on a MongoDB collection, use the aggregate() function on that collection:
db.books.aggregate([{ $project: { title: 1 } }]);Tip: If you're using Node.js, both the native adapter (since v0.9.9.2) and the ODM (since v3.1.0) support the new aggregation framework. For instance, to execute the previous pipeline on a Mongoose model, you just need to write:
Books.aggregate([{ $project: { title: 1 } }], function(err, results) {
// do something with the result
});The main benefit of the aggregation framework is that MongoDB executes it without the overhead of the JavaScript engine. It's implemented directly in C++, and therefore it's very fast. The main limitation - as compared to classical SQL aggregation - is that it’s limited to a single collection. In other terms, you can’t do a Mongo aggregation on several collections using a JOIN-like operation. Apart from that, it’s very powerful.
In this post, I’ll illustrate the power of pipeline operators by example, and compare them to their SQL counterpart. For a detailed reference, go to docs.mongodb.org.
Use the $project operator to select or rename properties from a collection - similar to what you would do with the SQL SELECT clause.
// sample data > db.books.find(); [ { _id: 147, title: "War and Peace", ISBN: 9780307266934 }, { _id: 148, title: "Anna Karenina", ISBN: 9781593080273 }, { _id: 149, title: "Pride and Prejudice", ISBN: 9783526419358 }, ] |
# sample data > SELECT * FROM book; +-----+-----------------------+---------------+ | id | title | ISBN | +-----+-----------------------+---------------+ | 147 | 'War and Peace' | 9780307266934 | | 148 | 'Anna Karenina' | 9781593080273 | | 149 | 'Pride and Prejudice' | 9783526419358 | +-----+-----------------------+---------------+ |
> db.books.aggregate([
{ $project: {
title: 0, // eliminate from the output
reference: "$ISBN" // use ISBN as source
} }
]);
[
{ _id: 147, reference: 9780307266934 },
{ _id: 148, reference: 9781593080273 },
{ _id: 149, reference: 9783526419358 },
] |
> SELECT id, ISBN AS reference FROM book; +-----+---------------+ | id | reference | +-----+---------------+ | 147 | 9780307266934 | | 148 | 9781593080273 | | 149 | 9783526419358 | +-----+---------------+ |
The $project operator can also create composed fields and sub-documents using any of the supported expression operators ($and, $or, $gt, $lt, $eq, $add, $mod, $substr, $toLower, $toUpper, $dayOfWeek, $hour, $cond, $ifNull, to name a few).
Group documents with, as you would have guessed, the $group operator.
// fastest way > db.books.count(); 3 // if you really want to use aggregation > db.books.aggregate([ { $group: { // _id is required, so give it a constant value // to group all the collection into one result _id: null, // increment nbBooks for each document nbBooks: { $sum: 1 } } } ]); [ { _id: null, nbBooks: 3 } ] |
> SELECT COUNT(*) FROM book; +----------+ | COUNT(*) | +----------+ | 3 | +----------+ |
// sample data > db.books.find() [ { _id: 147, title: "War and Peace", author_id: 72347 }, { _id: 148, title: "Anna Karenina", author_id: 72347 }, { _id: 149, title: "Pride and Prejudice", author_id: 42345 } ] |
# sample data > SELECT * FROM book +-----+---------------------+-----------+ | id | title | author_id | +-----+---------------------+-----------+ | 147 | War and Peace | 72347 | | 148 | Anna Karenina | 72347 | | 149 | Pride and Prejudice | 42345 | +-----+---------------------+-----------+ |
> db.books.aggregate([
{ $group: {
// group by author_id
_id: "$author_id",
// increment nbBooks for each document
nbBooks: { $sum: 1 }
} }
]);
[
{ _id: 72347, nbBooks: 2 },
{ _id: 42345, nbBooks: 1 }
] |
> SELECT author_id, COUNT(*) FROM book GROUP BY author_id; +-----------+----------+ | author_id | COUNT(*) | +-----------+----------+ | 72347 | 2 | | 42345 | 1 | +-----------+----------+ |
A pipeline can feature more than one operator. Here is a combination of $group and $project:
> db.books.aggregate([
{ $group: {
_id: "$author_id",
nbBooks: { $sum: 1 }
} },
{ $project: {
_id: 0,
authorId: "$_id",
nbBooks: 1
} }
]);
[
{ authorId: 72347, nbBooks: 2 },
{ authorId: 42345, nbBooks: 1 }
] |
> SELECT author_id AS author, COUNT(*) AS nb_books FROM book GROUP BY author_id; +--------+----------+ | author | nb_books | +--------+----------+ | 72347 | 2 | | 42345 | 1 | +--------+----------+ |
$group supports a lot of aggregation functions: $first, $last, $min, $max, $avg, $sum, $push, and $addToSet. Check the MongoDB documentation for a complete reference.
// sample data > db.reviews.find(); [ { _id: "455", bookId: "974147", date: new Date("2012-07-10"), score: 1 }, { _id: "456", bookId: "345335", date: new Date("2012-07-12"), score: 5 }, { _id: "457", bookId: "345335", date: new Date("2012-07-13"), score: 2 }, { _id: "458", bookId: "974147", date: new Date("2012-07-16"), score: 3 } ] |
# sample data > SELECT * FROM review; +-----+---------+--------------+-------+ | id | book_id | date | score | +-----+---------+--------------+-------+ | 455 | 974147 | "2012-07-10" | 1 | | 456 | 345335 | "2012-07-12" | 5 | | 457 | 345335 | "2012-07-13" | 2 | | 458 | 974147 | "2012-07-16" | 3 | +-----+---------+--------------+-------+ |
> db.reviews.aggregate([
{ $group: {
_id: "$bookId",
avgScore: { $avg: "$score" },
maxScore: { $max: "$score" },
nbReviews: { $sum: 1 }
} }
]);
[
{ _id: 345335, avgScore: 3.5, maxScore: 5, nbReviews: 2 },
{ _id: 974147, avgScore: 3, maxScore: 3, nbReviews: 2 }
] |
> SELECT book_id, AVG(score) as avg_score, MAX(score) as max_score, COUNT(*) as nb_reviews FROM review GROUP BY book_id ; +---------+------------+----------+------------+ | book_id | avg_score | max_score | nb_reviews | +---------+------------+----------+------------+ | 345335 | 3.5 | 5 | 2 | | 974147 | 2 | 3 | 2 | +---------+------------+----------+------------+ |
You can restrict the collection to be processed using a query object, passed to the $match operator. Whether you place this operator before or after a $group operator, it becomes the equivalent of WHERE or HAVING in SQL.
> db.reviews.aggregate([
{ $match : {
date: { $gte: new Date("2012-07-11") }
} },
{ $group: {
_id: "$bookId",
avgScore: { $avg: "$score" }
} }
]);
[
{ _id: 345335, avgScore: 3.5 },
{ _id: 974147, avgScore: 3 }
] |
> SELECT book_id, AVG(score) FROM review WHERE review.date > "2012-07-11" GROUP BY review.book_id ; +---------+------------+ | book_id | AVG(score) | +---------+------------+ | 345335 | 3.5 | | 974147 | 3 | +---------+------------+ |
> db.reviews.aggregate([
{ $group: {
_id: "$bookId",
avgScore: { $avg: "$score" }
} },
{ $match : {
avgScore: { $gt: 3 }
} }
]);
[
{ _id: 345335, avgScore: 3.5 }
] |
> SELECT book_id, AVG(score) AS avg_score FROM review GROUP BY review.book_id HAVING avg_score > 3; +---------+------------+ | book_id | AVG(score) | +---------+------------+ | 345335 | 3.5 | +---------+------------+ |
If documents inside a collection contain arrays, you can develop ("unwind") these arrays into several unique documents using the $unwind operator.
// sample data > db.articles.find(); [ { _id: 12351254, title: "Space Is Getting Closer", tags: ["science", "space", "iss"] }, { _id: 22956492, title: "Computer Solves Rubiks Cube", tags: ["computing", "science"] } ] |
# sample data > SELECT * FROM article; +------------+---------------------------+ | id | title | +----------+-----------------------------+ | 12351254 | Space Is Getting Closer | | 22956492 | Computer Solves Rubiks Cube | +------------+---------------------------+ > SELECT * FROM tag; +-----+------------+-----------+ | id | article_id | name | +-----+------------+-----------+ | 534 | 12351254 | science | | 535 | 12351254 | space | | 536 | 12351254 | iss | | 816 | 22956492 | computing | | 817 | 22956492 | science | +-----+------------+-----------+ |
> db.articles.aggregate([
{ $unwind: "$tags" }
]);
[
{
_id: 12351254,
title: "Space Is Getting Closer",
tags: "science"
},
{
_id: 12351254,
title: "Space Is Getting Closer",
tags: "space"
},
{
_id: 22956492,
title: "Computer Solves Rubiks Cube",
tags: "computing"
},
{
_id: 22956492,
title: "Computer Solves Rubiks Cube",
tags: "science"
}
] |
> SELECT article.id, article.title, tag.name FROM article LEFT JOIN tag ON article.id = tag.article_id; +------------+-----------------------------+-----------+ | article.id | article.title | tag.name | +------------+-----------------------------+-----------+ | 12351254 | Space Is Getting Closer | science | | 12351254 | Space Is Getting Closer | space | | 22956492 | Computer Solves Rubiks Cube | computing | | 22956492 | Computer Solves Rubiks Cube | science | +------------+-----------------------------+-----------+ |
The true power of the aggregation framework reveals when you pipe $unwind to $group. This is similar to using LEFT JOIN ... GROUP BY in SQL.
> db.articles.aggregate([
{ $unwind: "$tags" },
{ $group: {
_id: "$tags",
nbArticles: { $sum: 1 }
} }
]);
[
{ _id: "science", nbArticles: 2 },
{ _id: "space", nbArticles: 1 },
{ _id: "computing", nbArticles: 1 },
] |
> SELECT tag.name, COUNT(article.id) AS nb_articles FROM article LEFT JOIN tag ON article.id = tag.article_id GROUP BY tag.name; +-----------+-------------+ | tqg.name | nb_articles | +-----------+-------------+ | science | 2 | | space | 1 | | computing | 1 | +-------------+-----------+ |
> db.articles.aggregate([
{ $unwind: "$tags" },
{ $group: {
_id: "$tags",
articles: { $addToSet: "$_id" }
} }
]);
[
{ _id: "science", articles: [12351254, 22956492] },
{ _id: "space", articles: [12351254] },
{ _id: "computing", articles: [22956492] },
] |
> SELECT tag.name, GROUP_CONCAT(article.id) AS articles FROM article LEFT JOIN tag ON article.id = tag.article_id GROUP BY tag.name; +-----------+-------------------+ | tqg.name | articles | +-----------+-------------------+ | science | 12351254,22956492 | | space | 12351254 | | computing | 22956492 | +-------------+-----------------+ |
Imagine what you can do with this system... Pipe operators one after the other, group, sort, limit, etc. The ultimate example, taken from the MongoDB documentation itself, shows a pipeline with two successive $group operators. An SQL database can only do that with subqueries.
If your Map/Reduce functions are simple enough, refactor your Mongo code to the new aggregation framework. It will execute faster, and open to a new realm of possibility.
Tweet
Published on 12 Oct 2012
with tags MongoDB NodeJS