[Discussion] MongoDB Swift driver

Awesome to hear you've had such success @dlbuckley! We knew you were using it but didn't realize it was at that scale :slight_smile:

1 Like

Hi Kaitlin, thanks for your work on the official MongoDB driver and for your talk about it at try! Swift last year that tipped me off on the NIO implementation.

FYI we are about to complete the first iteration of a project that uses the driver in an AWS Lambda function, also thanks to the work of @fabianfett. It’s an internal project for R&D purposes but I think that also counts as “production use”.

You mentioned the C driver is synchronous, which the Swift driver circumvents by using an NIOThreadPool. Does that mean the driver essentially uses a connection per thread, currently? Would rewriting the internals in Swift, as you mention is a goal for 2.0, likely give the driver better performance characteristics due to “real” non-blocking IO? I would be interested to hear if you have an intuition how that would compare to the node.js implementation, performance-wise.

2 Likes

Hi @Geordie_J, there already is a non blocking MongoDB driver available. It's by @Joannis_Orlandos: https://github.com/OpenKitten/MongoKitten

I haven't worked with either of the implementations (I'm more of a DynamoDB guy :wink:) and can't find a performance comparison anywhere (which doesn't mean that there isn't one). Maybe you can run some tests and let everyone know?

Though I might assume that the performance gain shouldn't be too big within lambda (except when you are making a ton of request concurrently). But only tests will tell if that "feeling" is right.

Thanks, we initially tried MongoKitten but had crashes with it on Lambda. It’s probably because we were accidentally calling synchronous code in an EventLoop but the error reporting wouldn’t tell us anything about it.

The question about “real” NIO was more for use outside of Lambda, and out of interest. I have a pretty good grasp on what these concurrency primitives mean but haven’t fully understood how traditionally synchronous APIs interact with NIO / Futures in general.

@Geordie_J If you've got questions regarding using MongoKitten synchronously or experience unexpected problems in general, definitely do reach out. I'm happy to help.

1 Like

As I'm using MongoKitten for my current project, I'm now wondering, what will be the future of MongoKitten. It's cool, that there is now an official support, because as far as I remember, MongoKitten doesn't support all the stuff, that a MongoDB could do (what actually was never a problem for my project). @Joannis_Orlandos: Do you plan to stop working on MongoKitten? If so, I hope that all your hard work will now be used for replacing libmongoc. Would you suggest changing my project to the official driver now?

@kmahar: I will need to take a closer look to the API, but for the view sight I'm thinking about the fact, that you use protocols for errors and not enums. I personally would use enums and if there are new error types that breaks the API, I would make a change in the version number and bring it to a new major version (at least that is how I understood the version numbers in SwiftPM).

That is exciting to hear @Geordie_J! Please keep us posted on how it goes and let us know if you run into any questions or issues.

Our ConnectionPool is not quite an accurate mapping to a traditional connection pool, due to the way libmongoc designed pooling - each mongoc_client_t actually maintains one connection per host specified in the connection string. So in a standalone configuration, one of our Connections is truly one connection, but if you were connected to e.g. a 3-node replica set each Connection would be in actuality 3 connections.

Since the number of concurrent operations happening is capped by the number of threads in the NIOThreadPool, we should expect the number of Connections to max out around the number of threads in the NIOThreadPool. Though due to some libmongoc quirks types such as MongoCursor and ClientSession also hold onto their own Connections until they are killed/ended.

Yes, I definitely expect the performance to improve in a pure Swift implementation. The NIOThreadPool usage means we have to cross threads, which could have impact on performance. As I mentioned in the "alternatives considered" section above, we went with this approach so we could make sure we had a solid API before making structural changes to our existing work.

I don't really have a good sense of how our eventual perf will compare to the Node.js driver (which my co-author @mbroadst works on in addition to this driver :slightly_smiling_face:) but when the time comes we do have an official benchmarks spec that will come in handy.

1 Like

Thanks for your feedback! Based on our experience maintaining other MongoDB drivers, we've learned that users value the API stability of the drivers greatly, and that updating across major version releases of a driver can be quite difficult for a lot of organizations. Given how frequently both the MongoDB server and the driver require new error types, we decided that using a non-frozen enum would force us to make major releases too frequently and require our users to upgrade too often. By opting for a protocol-oriented error hierarchy, we allow ourselves to introduce new error types as necessary without worrying about API stability. If you haven't already, check out the section of the proposal that discusses this in more depth.

Another thing to note is that we document each public method that returns a future with the types of errors the future could fail with. This way, users can more easily discover which types of errors to catch and handle specifically and which ones they can probably ignore safely. For an example of such documentation, see the API docs for insertOne.

Edit: I forgot to introduce myself! I'm Patrick and I also work with @kmahar and @mbroadst at MongoDB on the Swift driver.

Hey @Lupurus, I'm not planning on stopping the development of MongoKitten, and I won't for the forseeable future. I am, however, looking to get more community involvement as a requirement for being a SSWG accepted project.

libmongoc is a shared core, and that provides a number of benefits, 90% of which is guaranteeing stability. They can focus all their company's effort on testing and improving the C driver with it having direct benefits to all language implementations. MongoKitten doesn't benefit from this, so I need to test it separately from their ecosystem and automated testing/tooling.

On the flip-side, there's downsides to using a shared C driver. One has been stability as well, initially, because C doesn't simply translate 1:1 to a (good) Swift API. The wrappers downside is that it needs to be both a C implementation and a Swift driver, so the API needs to be adapted to how C works. This ends up creating bottlenecks in the API design as well as performance.

With all due respect to the MongoDB team, but I wouldn't even consider using their blocking C driver if I had to start a MongoKitten driver from scratch. I've been in a solid fight for over four years to create great Swift APIs. I know it cannot be done as well if you built on top of a huge C codebase. Swifts language features just don't translate well with C, and it doesn't allow deep integration with Swift libraries such as the sswg logging, metrics, channels & event loops. Also performance wise there are bottlenecks in Swift as a language that prevents C libraries from achieving their full potential.

That being said I appreciate the effort for official support because it makes the Swift ecosystem as a whole directly supported for enterprise. If a company has a support contract with MongoDB, I bet they can at least expect some help as a part of that. With MongoKitten there's no way I can help as part of a MongoDB service contract.

Finally, MongoKitten does support all features MongoDB has to offer in its raw APIs. But I didn't create helpers for each edge case. This is something the MongoDB driver doesn't offer either. Missing helpers can be added in "feature" requests, although nothing stops you from adding a helper for some feature yourself. In the contrary, there are less limitations in a sense that the MongoCore and MongoKitten core can be used to built much different clients in usability/API. The API & socket layer in the C library are dead set, while if you wanted you could use MongoKitten together with a future SSH library built on NIO and use that to build a proxy that way.

1 Like

@Joannis_Orlandos: Thank you for your detailled answer. I really like MongoKitten (especially the fact, that it is fully Swift) and I also already made a pull request for a small helper (if you may remember ;)).

What I don't understand from the MongoDB-team: Why starting a new driver instead of helping at MongoKitten?

I remember your PR, although I don't remember what it was specifically :slight_smile: We discussed collaborating, but their demands in a cooperation weren't exactly cooperative. So under those demands it's not just them passing the opportunity.

Hi @Lupurus! Thanks for your interest in our proposal, I can help give some context here and hopefully answer your questions.

Background
The original purpose for MongoSwift was to provide a solid foundation for something we called MongoMobile, which was an embedded version of the MongoDB database running in memory on your iOS device (similar to sqlite). In order to accomplish this task in a very limited time we felt our best approach was to wrap the rock-solid existing libmongoc with a lightweight Swift wrapper. Since the embedded server was written in C++, and we already had a shim for libmongoc to speak directly to the embedded server, it seemed prudent to reuse that work to get something in users hands as quickly as possible. Language interoperability, in particular with C-like languages, is a core feature of the Swift language and so in a relatively short period of time we were able to develop a full featured, stable synchronous MongoDB driver for Swift, even though we only intended to make something useful for MongoMobile!

Around this time you may recall that MongoDB acquired Realm, so efforts on the MongoMobile project were rolled into that project. As a company, we still strongly believe in the promise of Server Side Swift, and so decided to press on with development of MongoSwift.

Approach
Using libmongoc gave us the confidence and security to focus our efforts completely on making the API as idiomatic as possible while providing all the features of a modern MongoDB driver. The 1.0 version of the driver achieves two major goals: API stability for a major version, and an asynchronous version of the driver. libmongoc is treated as an internal implementation detail, none of its API is leaked in our implementation.

What does this mean for users? Given that our goal post-1.0 is to migrate the internals of the driver away from libmongoc to pure Swift, you can safely adopt this version of the driver and gain progressive performance enhancements without needing to alter your code.

Performance
@Geordie_J asked about performance on AWS Lambda compared to the Node.js driver, so I ran a small benchmark this weekend attempting to simulate realistic load on a local REST api against an M10 instance hosted on MongoDB Atlas. This relatively unscientific benchmark shows that MongoSwift 1.0.0-rc0 is ~10% slower than the Node.js driver. We expect that a migration to using SwiftNIO directly will unlock new levels of performance, but considering we haven’t spent any time at all on performance tuning I would say this is a good start for us.

What I don't understand from the MongoDB-team: Why starting a new driver instead of helping at MongoKitten?

@kmahar touched on this when we discussed it during the pitch phase, but the short answer to this question is that years of experience have shown us that owning the driver is the best way we can provide a high quality user experience. The advantage of MongoDB owning the driver is in our expertise and commitment to supporting, enhancing, improving, and maintaining the driver consistently for years to come.

We are strongly committed to the ideals of the FOSS community, and we must balance that with the concerns of our users. We are happy to collaborate on components the driver uses (a DNS library was brought up as an example), or projects which use the driver (maybe a Fluent adapter), and we also need to track current server releases in an agile manner. Not being in control over our code base could become complicated, and risk jeopardizing user experience.

I hope this answers most of the outstanding questions. We’ve worked very hard on this release and are eager to hear technical feedback on our proposal.

3 Likes

I'm not experienced enough to estimate, what problems in the API design can be there, if you build up the API on a c-driver. Could you make an example?

Okay, that's surely a good reason. I just really feel sorry for Joannis, because he did a great work with MongoKitten and in the end, he may not have a chance against a bigger (and paid) team.

For me (as a private and so far non-relevant person) it may be a big plus, that the code of the MongoDB-team is perfectly documented, what really helps. I hope I will have soon time to make some tests, then I can give further feedback.

hey @kmahar would you please open a PR at https://github.com/swift-server/sswg/tree/master/proposals with the proposal markdown, the proposal number is #10 and the review manager is @tanner0101

1 Like

hey @kmahar only skimmed the proposal, and looking forward to exploring it in more details.

couple of outstanding questions:

  1. the initializer signature suggests you need to pass in an EventLoopGroup:
public init(
        _ connectionString: String = "mongodb://localhost:27017",
        using eventLoopGroup: EventLoopGroup,
        options: ClientOptions? = nil
    )

in other parts of the ecosystem we follow a pattern where you pass in an EventLoopGroupPovider to make this a bit easier in cases the user does not already have an ELG at hand. you may want to consider following the same pattern

  1. could the BSON module be useful outside the context of the mongo client? if so, is it worth extracting it to a separate library?
2 Likes

Yes I'll put together a PR shortly!

Thanks for this suggestion! We like the idea a lot and are happy to incorporate it into the API. (For anyone who would like to follow along with that change being implemented I've opened SWIFT-749.)

The library could be useful on its own, e.g. you might use it to convert between files containing raw BSON data and JSON files.

Moving this out into a separate package altogether is definitely a goal of ours, though as of now we've been planning to handle that separation post-1.0 as part of the pure Swift BSON rewrite (same public API, new internals) I alluded to in the proposal. Correct me if I'm wrong, but I believe we can do that in a non-breaking manner for driver users by having the driver re-export all the public types we've moved out.

I think the main reason we've held off on this so far is that, for the purpose of the standalone library efficiently interoperating with the driver, we'd need the BSON API to use several C types in its public API, which is something we've worked to avoid in our APIs.

For example, Document would need to expose a pointer to its backing bson_t so that MongoSwift methods accepting Documents would have a bson_t to pass to the corresponding libmongoc method, and would also need an initializer accepting a pointer to a bson_t, and so on. Right now these can be internal since the code lives in one module. In a pure Swift implementation Document would be wrapping a ByteBuffer which we'd be fine to expose in the public API.

1 Like

I think the main reason we've held off on this so far is that, for the purpose of the standalone library efficiently interoperating with the driver, we'd need the BSON API to use several C types in its public API, which is something we've worked to avoid in our APIs.

this makes sense, thanks for additional details

1 Like

I see the PR was merged. Just for our own planning purposes around approximately when we will release, etc. I have some process related questions that maybe you or anyone else in SSWG can answer (apologies if these answers are documented somewhere, but I don't see it mentioned in the incubation process description) -

  • Should edits we want to make to the proposal going forward (e.g. incorporating the switch to using an EventLoopGroupProvider) just be opened as PRs?
  • I know there is usually a second [Feedback] thread containing an edited proposal, at what point do we move onto that thread?

Thanks :slight_smile:

I won't speak for the SSWG, but historically the proposal docs are "products of their time". For example, the Redis proposal no longer reflects the current state of the library's API.

However, seeing as this proposal hasn't reached the feedback thread yet, I would say you could just open a PR with edits, if you want to improve the document before that thread begins.

Again, historically this has been when the author feels that they have adequately responded to feedback and left enough time for people to voice their concerns.

the proposal in https://github.com/swift-server/sswg/tree/master/proposals does not need to reflect API changes after the proposal is excepted, but it does need to reflect the state of the proposal prior to the SSWG reviewing it. in other words, as you prepare the [Feedback] thread, if the proposal has materially changed you want to submit a PR into https://github.com/swift-server/sswg/tree/master/proposals to reflect that, and the same at the end of the feedback period which is normally 2 weeks after the [Feedback] thread was posted.

related, I am about to suggest to the SSWG a small procedural change such that the [Feedback] thread will be just a "call for action" with link to the updated proposal text in https://github.com/swift-server/sswg/tree/master/proposals.

@Mordil answer above is a correct. to add, if you feel you have not received enough useful feedback you can call out specific questions, tradeoffs or design dilemma you want the community to chime in on.

Terms of Service

Privacy Policy

Cookie Policy