| Title: | A Native Lazy Analytical Backend for MongoDB |
|---|---|
| Description: | Provides a disciplined, lazy subset of 'dplyr' semantics for MongoDB aggregation pipelines. Queries remain lazy until collect() and compile into MongoDB-native aggregation stages. |
| Authors: | Paolo Bosetti [aut, cre] |
| Maintainer: | Paolo Bosetti <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.0 |
| Built: | 2026-05-24 07:11:58 UTC |
| Source: | https://github.com/pbosetti/mdbplyr |
Appends a single raw MongoDB aggregation stage, provided as a JSON string, after the stages generated from the current lazy query.
append_stage(x, json_string)append_stage(x, json_string)
x |
A |
json_string |
A JSON string representing a single MongoDB pipeline
stage, such as |
This is an escape hatch for features not yet modeled by the package. The package does not attempt to infer schema changes introduced by manual stages.
A modified tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) query <- append_stage(tbl, "{\"$limit\": 1}") show_query(query)tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) query <- append_stage(tbl, "{\"$limit\": 1}") show_query(query)
Arrange a lazy Mongo query
arrange.tbl_mongo(.data, ..., .by_group = FALSE)arrange.tbl_mongo(.data, ..., .by_group = FALSE)
.data |
A |
... |
Bare field names or |
.by_group |
Whether to prefix the ordering with grouping fields. |
A modified tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::arrange(tbl, dplyr::desc(amount))tbl <- tbl_mongo( list(name = "orders"), schema = c("amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::arrange(tbl, dplyr::desc(amount))
Collect a lazy Mongo query
collect(x, ...)collect(x, ...)
x |
A |
... |
Additional arguments forwarded to the executor. |
A tibble.
tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) { tibble::tibble(status = "paid", amount = 10) } ) query <- dplyr::filter(tbl, amount > 0) collect(query)tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) { tibble::tibble(status = "paid", amount = 10) } ) query <- dplyr::filter(tbl, amount > 0) collect(query)
Compile a lazy Mongo query into an aggregation pipeline
compile_pipeline(x)compile_pipeline(x)
x |
A |
A list of MongoDB aggregation stages.
tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) query <- dplyr::filter(tbl, amount > 0) compile_pipeline(query)tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) query <- dplyr::filter(tbl, amount > 0) compile_pipeline(query)
Open a lazy Mongo query as a mongolite cursor
cursor(x, ...)cursor(x, ...)
x |
A |
... |
Additional arguments forwarded to the cursor executor. |
A mongolite iterator when backed by a live MongoDB collection, or a
compatible cursor-like object supplied by the source.
collection <- list( name = "orders", aggregate = function(pipeline_json, iterate = FALSE, ...) { data <- tibble::tibble(status = "paid", amount = 10) if (!iterate) { return(data) } local({ offset <- 1L page <- function(size = 1000) { if (offset > nrow(data)) { return(data[0, , drop = FALSE]) } out <- data[offset:min(nrow(data), offset + size - 1L), , drop = FALSE] offset <<- offset + nrow(out) tibble::as_tibble(out) } structure(environment(), class = "mongo_iter") }) } ) tbl <- tbl_mongo(collection, schema = c("status", "amount")) iter <- cursor(dplyr::filter(tbl, amount > 0)) iter$page()collection <- list( name = "orders", aggregate = function(pipeline_json, iterate = FALSE, ...) { data <- tibble::tibble(status = "paid", amount = 10) if (!iterate) { return(data) } local({ offset <- 1L page <- function(size = 1000) { if (offset > nrow(data)) { return(data[0, , drop = FALSE]) } out <- data[offset:min(nrow(data), offset + size - 1L), , drop = FALSE] offset <<- offset + nrow(out) tibble::as_tibble(out) } structure(environment(), class = "mongo_iter") }) } ) tbl <- tbl_mongo(collection, schema = c("status", "amount")) iter <- cursor(dplyr::filter(tbl, amount > 0)) iter$page()
Filter a lazy Mongo query
filter.tbl_mongo(.data, ..., .by = NULL, .preserve = FALSE)filter.tbl_mongo(.data, ..., .by = NULL, .preserve = FALSE)
.data |
A |
... |
Predicate expressions. |
.by |
Unsupported. |
.preserve |
Included for dplyr compatibility. |
Predicate expressions use schema-first tidy evaluation:
bare names refer to MongoDB fields when they are known fields of the lazy table
otherwise, bare names are evaluated in the local R environment and inlined as literals
.data$... always forces a MongoDB field reference
.env$... always forces a local R value
if a name exists both as a field and as a local variable, the field wins;
use .env$... to force the local value
Bare field resolution depends on the known schema of .data. If a field,
including a dotted path such as `message.measurements.Fx`, is missing
from schema_fields(), write it explicitly as .data$... or supply the
schema when creating the tbl_mongo.
The same resolution rules apply to expression arguments in
mutate.tbl_mongo() and summarise.tbl_mongo().
A modified tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::filter(tbl, amount > 0) threshold <- 10 dplyr::filter(tbl, amount > threshold) dplyr::filter(tbl, .data$amount > .env$threshold)tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::filter(tbl, amount > 0) threshold <- 10 dplyr::filter(tbl, amount > threshold) dplyr::filter(tbl, .data$amount > .env$threshold)
Flatten nested object fields into flat columns
flatten_fields(.data, ..., names_fn = identity)flatten_fields(.data, ..., names_fn = identity)
.data |
A |
... |
Optional bare field roots or backticked dotted paths to flatten. |
names_fn |
Optional naming function applied to flattened output names. |
flatten_fields() relies on known schema fields. If nested dotted paths are
not known yet, supply schema = ... when creating the table or call
infer_schema() first.
With no field arguments, all known dotted paths are flattened. Existing
already-flat columns are preserved. Arrays are treated as leaf values; use
unwind_array() first if you need one row per array element.
A modified tbl_mongo object.
Group a lazy Mongo query
group_by.tbl_mongo( .data, ..., .add = FALSE, .drop = dplyr::group_by_drop_default(.data) )group_by.tbl_mongo( .data, ..., .add = FALSE, .drop = dplyr::group_by_drop_default(.data) )
.data |
A |
... |
Bare field names. |
.add |
Whether to add to existing groups. |
.drop |
Included for dplyr compatibility. |
A modified tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::group_by(tbl, status)tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::group_by(tbl, status)
Infer schema fields from the first source document
infer_schema(x)infer_schema(x)
x |
A |
infer_schema() inspects the first document of the source collection and
flattens nested named subdocuments into dotted paths such as
"message.measurements.Fx". Arrays and other non-object values are treated
as leaf fields. Because the schema comes from a single document, heterogeneous
collections may still require manual schema adjustment.
A tbl_mongo object with its source and IR schema updated from the
first document in the underlying collection.
collection <- list( name = "orders", aggregate = function(pipeline_json, iterate = FALSE, ...) { tibble::tibble( status = "paid", message = list(list(amount = 10, currency = "EUR")) ) } ) tbl <- tbl_mongo(collection) schema_fields(infer_schema(tbl))collection <- list( name = "orders", aggregate = function(pipeline_json, iterate = FALSE, ...) { tibble::tibble( status = "paid", message = list(list(amount = 10, currency = "EUR")) ) } ) tbl <- tbl_mongo(collection) schema_fields(infer_schema(tbl))
Construct a MongoDB source wrapper
mongo_src( collection, name = NULL, schema = NULL, executor = NULL, cursor_executor = NULL )mongo_src( collection, name = NULL, schema = NULL, executor = NULL, cursor_executor = NULL )
collection |
A |
name |
Optional human-readable collection name. |
schema |
Optional character vector describing the available fields. |
executor |
Optional function used to execute compiled pipelines. |
cursor_executor |
Optional function used to open a cursor over compiled pipelines. |
A mongo_src object.
collection <- list( name = "orders", aggregate = function(pipeline_json, iterate = FALSE, ...) { if (iterate) { data <- tibble::tibble(status = "paid", amount = 10) return(local({ page <- function(size = 1000) data structure(environment(), class = "mongo_iter") })) } tibble::tibble(status = "paid", amount = 10) } ) src <- mongo_src(collection, schema = c("status", "amount")) srccollection <- list( name = "orders", aggregate = function(pipeline_json, iterate = FALSE, ...) { if (iterate) { data <- tibble::tibble(status = "paid", amount = 10) return(local({ page <- function(size = 1000) data structure(environment(), class = "mongo_iter") })) } tibble::tibble(status = "paid", amount = 10) } ) src <- mongo_src(collection, schema = c("status", "amount")) src
Add computed fields to a lazy Mongo query
mutate.tbl_mongo(.data, ...)mutate.tbl_mongo(.data, ...)
.data |
A |
... |
Named scalar expressions. |
Expression arguments follow the same field-vs-local name resolution rules as
filter.tbl_mongo().
A modified tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::mutate(tbl, doubled = amount * 2)tbl <- tbl_mongo( list(name = "orders"), schema = c("amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::mutate(tbl, doubled = amount * 2)
Rename fields in a lazy Mongo query
rename.tbl_mongo(.data, ...)rename.tbl_mongo(.data, ...)
.data |
A |
... |
Named bare field renames. |
A modified tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::rename(tbl, total = amount)tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::rename(tbl, total = amount)
Inspect known fields for a lazy Mongo query
schema_fields(x)schema_fields(x)
x |
A |
A character vector of known field names.
src <- mongo_src( list(name = "orders", aggregate = function(...) tibble::tibble()), schema = c("status", "amount") ) tbl <- tbl_mongo(src) schema_fields(src) schema_fields(tbl)src <- mongo_src( list(name = "orders", aggregate = function(...) tibble::tibble()), schema = c("status", "amount") ) tbl <- tbl_mongo(src) schema_fields(src) schema_fields(tbl)
Select fields from a lazy Mongo query
select.tbl_mongo(.data, ...)select.tbl_mongo(.data, ...)
.data |
A |
... |
Bare field names or |
Selecting a dotted field path such as `message.measurements.Fx` does
not flatten nested documents by default. The collected result preserves the
native nested structure unless you explicitly rename the field in select()
or call flatten_fields().
A modified tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::select(tbl, amount)tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::select(tbl, amount)
Show the MongoDB aggregation pipeline for a lazy query
show_query(x, ...)show_query(x, ...)
x |
A |
... |
Unused. |
The pipeline JSON string, invisibly.
tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) query <- dplyr::select(tbl, amount) show_query(query)tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) query <- dplyr::select(tbl, amount) show_query(query)
Slice a lazy Mongo query
slice_head.tbl_mongo(.data, ..., n = NULL, prop = NULL, by = NULL) slice_tail.tbl_mongo(.data, ..., n = NULL, prop = NULL, by = NULL) ## S3 method for class 'tbl_mongo' head(x, n = 6L, ...)slice_head.tbl_mongo(.data, ..., n = NULL, prop = NULL, by = NULL) slice_tail.tbl_mongo(.data, ..., n = NULL, prop = NULL, by = NULL) ## S3 method for class 'tbl_mongo' head(x, n = 6L, ...)
.data |
A |
... |
Must be empty. |
n |
Number of rows to keep. Negative values drop rows from the opposite
end, matching |
prop |
Unsupported. |
by |
Unsupported. |
x |
A |
A modified tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::slice_head(tbl, n = 2) dplyr::slice_tail(tbl, n = 2) head(tbl, 2)tbl <- tbl_mongo( list(name = "orders"), schema = c("amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::slice_head(tbl, n = 2) dplyr::slice_tail(tbl, n = 2) head(tbl, 2)
Summarise a lazy Mongo query
summarise.tbl_mongo(.data, ..., .by = NULL, .groups = NULL)summarise.tbl_mongo(.data, ..., .by = NULL, .groups = NULL)
.data |
A |
... |
Named summary expressions. |
.by |
Unsupported. |
.groups |
Included for dplyr compatibility. |
Summary expressions follow the same field-vs-local name resolution rules as
filter.tbl_mongo().
A modified tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) query <- tbl |> dplyr::group_by(status) |> dplyr::summarise(total = sum(amount)) show_query(query)tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) tibble::tibble() ) query <- tbl |> dplyr::group_by(status) |> dplyr::summarise(total = sum(amount)) show_query(query)
Create a lazy MongoDB table
tbl_mongo(collection, name = NULL, schema = NULL, executor = NULL)tbl_mongo(collection, name = NULL, schema = NULL, executor = NULL)
collection |
A |
name |
Optional collection name when |
schema |
Optional character vector describing known fields. |
executor |
Optional executor function for compiled pipelines. |
Supplying schema = ... is the most reliable way to make field references
explicit, especially for dotted paths in nested documents. If you do not want
to write the schema manually, infer_schema() can populate it from the first
document in the collection:
tbl_mongo(collection) |> infer_schema()
This is convenient, but it only reflects one document. If the collection is heterogeneous, fields that do not appear in the first document may still need to be added manually.
A tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) { tibble::tibble(status = "paid", amount = 10) } ) tbltbl <- tbl_mongo( list(name = "orders"), schema = c("status", "amount"), executor = function(pipeline, ...) { tibble::tibble(status = "paid", amount = 10) } ) tbl
Compute and keep only derived fields
transmute.tbl_mongo(.data, ...)transmute.tbl_mongo(.data, ...)
.data |
A |
... |
Named scalar expressions. |
Expression arguments follow the same field-vs-local name resolution rules as
filter.tbl_mongo().
A modified tbl_mongo object.
tbl <- tbl_mongo( list(name = "orders"), schema = c("amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::transmute(tbl, doubled = amount * 2)tbl <- tbl_mongo( list(name = "orders"), schema = c("amount"), executor = function(pipeline, ...) tibble::tibble() ) dplyr::transmute(tbl, doubled = amount * 2)
Unwind one array field lazily
unwind_array(.data, field, preserve_empty = FALSE)unwind_array(.data, field, preserve_empty = FALSE)
.data |
A |
field |
A single bare field name or backticked dotted path. |
preserve_empty |
Whether to preserve rows with missing or empty arrays. |
unwind_array() compiles to MongoDB $unwind and repeats each row once per
array element, replacing the original array field with that element. Only one
field can be unwound per call; chain multiple calls if needed.
A modified tbl_mongo object.