[Discussion] MongoDB Swift driver

That is exciting to hear @Geordie_J! Please keep us posted on how it goes and let us know if you run into any questions or issues.

Our ConnectionPool is not quite an accurate mapping to a traditional connection pool, due to the way libmongoc designed pooling - each mongoc_client_t actually maintains one connection per host specified in the connection string. So in a standalone configuration, one of our Connections is truly one connection, but if you were connected to e.g. a 3-node replica set each Connection would be in actuality 3 connections.

Since the number of concurrent operations happening is capped by the number of threads in the NIOThreadPool, we should expect the number of Connections to max out around the number of threads in the NIOThreadPool. Though due to some libmongoc quirks types such as MongoCursor and ClientSession also hold onto their own Connections until they are killed/ended.

Yes, I definitely expect the performance to improve in a pure Swift implementation. The NIOThreadPool usage means we have to cross threads, which could have impact on performance. As I mentioned in the "alternatives considered" section above, we went with this approach so we could make sure we had a solid API before making structural changes to our existing work.

I don't really have a good sense of how our eventual perf will compare to the Node.js driver (which my co-author @mbroadst works on in addition to this driver :slightly_smiling_face:) but when the time comes we do have an official benchmarks spec that will come in handy.

1 Like

Thanks for your feedback! Based on our experience maintaining other MongoDB drivers, we've learned that users value the API stability of the drivers greatly, and that updating across major version releases of a driver can be quite difficult for a lot of organizations. Given how frequently both the MongoDB server and the driver require new error types, we decided that using a non-frozen enum would force us to make major releases too frequently and require our users to upgrade too often. By opting for a protocol-oriented error hierarchy, we allow ourselves to introduce new error types as necessary without worrying about API stability. If you haven't already, check out the section of the proposal that discusses this in more depth.

Another thing to note is that we document each public method that returns a future with the types of errors the future could fail with. This way, users can more easily discover which types of errors to catch and handle specifically and which ones they can probably ignore safely. For an example of such documentation, see the API docs for insertOne.

Edit: I forgot to introduce myself! I'm Patrick and I also work with @kmahar and @mbroadst at MongoDB on the Swift driver.

Hey @Lupurus, I'm not planning on stopping the development of MongoKitten, and I won't for the forseeable future. I am, however, looking to get more community involvement as a requirement for being a SSWG accepted project.

libmongoc is a shared core, and that provides a number of benefits, 90% of which is guaranteeing stability. They can focus all their company's effort on testing and improving the C driver with it having direct benefits to all language implementations. MongoKitten doesn't benefit from this, so I need to test it separately from their ecosystem and automated testing/tooling.

On the flip-side, there's downsides to using a shared C driver. One has been stability as well, initially, because C doesn't simply translate 1:1 to a (good) Swift API. The wrappers downside is that it needs to be both a C implementation and a Swift driver, so the API needs to be adapted to how C works. This ends up creating bottlenecks in the API design as well as performance.

With all due respect to the MongoDB team, but I wouldn't even consider using their blocking C driver if I had to start a MongoKitten driver from scratch. I've been in a solid fight for over four years to create great Swift APIs. I know it cannot be done as well if you built on top of a huge C codebase. Swifts language features just don't translate well with C, and it doesn't allow deep integration with Swift libraries such as the sswg logging, metrics, channels & event loops. Also performance wise there are bottlenecks in Swift as a language that prevents C libraries from achieving their full potential.

That being said I appreciate the effort for official support because it makes the Swift ecosystem as a whole directly supported for enterprise. If a company has a support contract with MongoDB, I bet they can at least expect some help as a part of that. With MongoKitten there's no way I can help as part of a MongoDB service contract.

Finally, MongoKitten does support all features MongoDB has to offer in its raw APIs. But I didn't create helpers for each edge case. This is something the MongoDB driver doesn't offer either. Missing helpers can be added in "feature" requests, although nothing stops you from adding a helper for some feature yourself. In the contrary, there are less limitations in a sense that the MongoCore and MongoKitten core can be used to built much different clients in usability/API. The API & socket layer in the C library are dead set, while if you wanted you could use MongoKitten together with a future SSH library built on NIO and use that to build a proxy that way.

1 Like

@Joannis_Orlandos: Thank you for your detailled answer. I really like MongoKitten (especially the fact, that it is fully Swift) and I also already made a pull request for a small helper (if you may remember ;)).

What I don't understand from the MongoDB-team: Why starting a new driver instead of helping at MongoKitten?

I remember your PR, although I don't remember what it was specifically :slight_smile: We discussed collaborating, but their demands in a cooperation weren't exactly cooperative. So under those demands it's not just them passing the opportunity.

Hi @Lupurus! Thanks for your interest in our proposal, I can help give some context here and hopefully answer your questions.

Background
The original purpose for MongoSwift was to provide a solid foundation for something we called MongoMobile, which was an embedded version of the MongoDB database running in memory on your iOS device (similar to sqlite). In order to accomplish this task in a very limited time we felt our best approach was to wrap the rock-solid existing libmongoc with a lightweight Swift wrapper. Since the embedded server was written in C++, and we already had a shim for libmongoc to speak directly to the embedded server, it seemed prudent to reuse that work to get something in users hands as quickly as possible. Language interoperability, in particular with C-like languages, is a core feature of the Swift language and so in a relatively short period of time we were able to develop a full featured, stable synchronous MongoDB driver for Swift, even though we only intended to make something useful for MongoMobile!

Around this time you may recall that MongoDB acquired Realm, so efforts on the MongoMobile project were rolled into that project. As a company, we still strongly believe in the promise of Server Side Swift, and so decided to press on with development of MongoSwift.

Approach
Using libmongoc gave us the confidence and security to focus our efforts completely on making the API as idiomatic as possible while providing all the features of a modern MongoDB driver. The 1.0 version of the driver achieves two major goals: API stability for a major version, and an asynchronous version of the driver. libmongoc is treated as an internal implementation detail, none of its API is leaked in our implementation.

What does this mean for users? Given that our goal post-1.0 is to migrate the internals of the driver away from libmongoc to pure Swift, you can safely adopt this version of the driver and gain progressive performance enhancements without needing to alter your code.

Performance
@Geordie_J asked about performance on AWS Lambda compared to the Node.js driver, so I ran a small benchmark this weekend attempting to simulate realistic load on a local REST api against an M10 instance hosted on MongoDB Atlas. This relatively unscientific benchmark shows that MongoSwift 1.0.0-rc0 is ~10% slower than the Node.js driver. We expect that a migration to using SwiftNIO directly will unlock new levels of performance, but considering we haven’t spent any time at all on performance tuning I would say this is a good start for us.

What I don't understand from the MongoDB-team: Why starting a new driver instead of helping at MongoKitten?

@kmahar touched on this when we discussed it during the pitch phase, but the short answer to this question is that years of experience have shown us that owning the driver is the best way we can provide a high quality user experience. The advantage of MongoDB owning the driver is in our expertise and commitment to supporting, enhancing, improving, and maintaining the driver consistently for years to come.

We are strongly committed to the ideals of the FOSS community, and we must balance that with the concerns of our users. We are happy to collaborate on components the driver uses (a DNS library was brought up as an example), or projects which use the driver (maybe a Fluent adapter), and we also need to track current server releases in an agile manner. Not being in control over our code base could become complicated, and risk jeopardizing user experience.

I hope this answers most of the outstanding questions. We’ve worked very hard on this release and are eager to hear technical feedback on our proposal.

3 Likes

I'm not experienced enough to estimate, what problems in the API design can be there, if you build up the API on a c-driver. Could you make an example?

Okay, that's surely a good reason. I just really feel sorry for Joannis, because he did a great work with MongoKitten and in the end, he may not have a chance against a bigger (and paid) team.

For me (as a private and so far non-relevant person) it may be a big plus, that the code of the MongoDB-team is perfectly documented, what really helps. I hope I will have soon time to make some tests, then I can give further feedback.

hey @kmahar would you please open a PR at sswg/proposals at master · swift-server/sswg · GitHub with the proposal markdown, the proposal number is #10 and the review manager is @tanner0101

1 Like

hey @kmahar only skimmed the proposal, and looking forward to exploring it in more details.

couple of outstanding questions:

  1. the initializer signature suggests you need to pass in an EventLoopGroup:
public init(
        _ connectionString: String = "mongodb://localhost:27017",
        using eventLoopGroup: EventLoopGroup,
        options: ClientOptions? = nil
    )

in other parts of the ecosystem we follow a pattern where you pass in an EventLoopGroupPovider to make this a bit easier in cases the user does not already have an ELG at hand. you may want to consider following the same pattern

  1. could the BSON module be useful outside the context of the mongo client? if so, is it worth extracting it to a separate library?
2 Likes

Yes I'll put together a PR shortly!

Thanks for this suggestion! We like the idea a lot and are happy to incorporate it into the API. (For anyone who would like to follow along with that change being implemented I've opened SWIFT-749.)

The library could be useful on its own, e.g. you might use it to convert between files containing raw BSON data and JSON files.

Moving this out into a separate package altogether is definitely a goal of ours, though as of now we've been planning to handle that separation post-1.0 as part of the pure Swift BSON rewrite (same public API, new internals) I alluded to in the proposal. Correct me if I'm wrong, but I believe we can do that in a non-breaking manner for driver users by having the driver re-export all the public types we've moved out.

I think the main reason we've held off on this so far is that, for the purpose of the standalone library efficiently interoperating with the driver, we'd need the BSON API to use several C types in its public API, which is something we've worked to avoid in our APIs.

For example, Document would need to expose a pointer to its backing bson_t so that MongoSwift methods accepting Documents would have a bson_t to pass to the corresponding libmongoc method, and would also need an initializer accepting a pointer to a bson_t, and so on. Right now these can be internal since the code lives in one module. In a pure Swift implementation Document would be wrapping a ByteBuffer which we'd be fine to expose in the public API.

1 Like

I think the main reason we've held off on this so far is that, for the purpose of the standalone library efficiently interoperating with the driver, we'd need the BSON API to use several C types in its public API, which is something we've worked to avoid in our APIs.

this makes sense, thanks for additional details

1 Like

I see the PR was merged. Just for our own planning purposes around approximately when we will release, etc. I have some process related questions that maybe you or anyone else in SSWG can answer (apologies if these answers are documented somewhere, but I don't see it mentioned in the incubation process description) -

  • Should edits we want to make to the proposal going forward (e.g. incorporating the switch to using an EventLoopGroupProvider) just be opened as PRs?
  • I know there is usually a second [Feedback] thread containing an edited proposal, at what point do we move onto that thread?

Thanks :slight_smile:

I won't speak for the SSWG, but historically the proposal docs are "products of their time". For example, the Redis proposal no longer reflects the current state of the library's API.

However, seeing as this proposal hasn't reached the feedback thread yet, I would say you could just open a PR with edits, if you want to improve the document before that thread begins.

Again, historically this has been when the author feels that they have adequately responded to feedback and left enough time for people to voice their concerns.

1 Like

the proposal in https://github.com/swift-server/sswg/tree/master/proposals does not need to reflect API changes after the proposal is excepted, but it does need to reflect the state of the proposal prior to the SSWG reviewing it. in other words, as you prepare the [Feedback] thread, if the proposal has materially changed you want to submit a PR into https://github.com/swift-server/sswg/tree/master/proposals to reflect that, and the same at the end of the feedback period which is normally 2 weeks after the [Feedback] thread was posted.

related, I am about to suggest to the SSWG a small procedural change such that the [Feedback] thread will be just a "call for action" with link to the updated proposal text in https://github.com/swift-server/sswg/tree/master/proposals.

@Mordil answer above is a correct. to add, if you feel you have not received enough useful feedback you can call out specific questions, tradeoffs or design dilemma you want the community to chime in on.

1 Like

Thanks for the details @Mordil @tomerd!

One question that's come up for us recently we'd love to hear community thoughts on is how options are provided to API methods. As it stands, we have structs for each API method that contain allowed options for that method. But we've considered an alternative approach where we instead just accept all the options directly as arguments to the method.

For example, countDocuments looks like this:

public func countDocuments(
    _ filter: Document = [:],
    options: CountDocumentsOptions? = nil,
    session: ClientSession? = nil
) -> EventLoopFuture<Int>

and CountDocumentsOptions looks like this:

public struct CountDocumentsOptions: Codable {
    /// Specifies a collation.
    public var collation: Document?

    /// A hint for the index to use.
    public var hint: Hint?

    /// The maximum number of documents to count.
    public var limit: Int?

    /// The maximum amount of time to allow the query to run.
    public var maxTimeMS: Int?

    /// A ReadConcern to use for this operation.
    public var readConcern: ReadConcern?

    /// A ReadPreference to use for this operation.
    public var readPreference: ReadPreference?

    /// The number of documents to skip before counting.
    public var skip: Int?

    /// Convenience initializer allowing any/all parameters to be optional
    public init(
        collation: Document? = nil,
        hint: Hint? = nil,
        limit: Int? = nil,
        maxTimeMS: Int? = nil,
        readConcern: ReadConcern? = nil,
        readPreference: ReadPreference? = nil,
        skip: Int? = nil
    ) { ... }
}

So right now if you wanted to perform a count using some options you'd do something like

var opts = CountDocumentsOptions()
opts.readPreference = .primaryPreferred
opts.skip = 100

// alternatively construct options in one line
let opts = CountDocumentsOptions(readPreference: .primaryPreferred, skip: 100)

collection.countDocuments(["a": 1], options: opts).map { ... }

The alternative way of doing this would be to have countDocuments looks like this:

public func countDocuments(
    _ filter: Document = [:],
    collation: Document? = nil,
    hint: Hint? = nil,
    limit: Int? = nil,
    maxTimeMS: Int? = nil,
    readConcern: ReadConcern? = nil,
    readPreference: ReadPreference? = nil,
    skip: Int? = nil,
    session: ClientSession? = nil
) -> EventLoopFuture<Int>

And then to call the method with options you'd do

collection.countDocuments(
    ["a": 1],
   readPreference: .primaryPreferred, 
   skip: 100
).map { ... }

The first approach is more verbose, but it does allow reuse of options structs and prevents our method signatures from becoming extremely lengthy (some API methods like find have 20+ valid options users can specify).

Both approaches have precedent amongst our other drivers.

Would love to hear what you all think!

1 Like

+1 to the options struct pattern. I think it makes the API more readable at a glance. Being able to re-use the options structs is also a win for maintainability and extensibility.

3 Likes

The discussion here was very positive and the SSWG agrees this should move forward to the final [Feedback] phase.

The [Feedback] post should be similar to the original [Discussion] post but updated with all of the latest changes / requested clarification. You can see an example here:

If all goes well in the [Feedback] phase, the last step will be voting on accepting the package to the index.

Thank you for putting this detailed proposal together @kmahar!

@tomerd can you lock this thread?

3 Likes

+1 on that too. It'll also make your life a lot easier if you ever have to add another option without breaking API.

1 Like

+1

1 Like