Really struggling to use MongoDB with swift on the server

over the past few days i’ve been trying (with little success) to integrate MongoDB into my server application. i figured that, rather than give up, i would write up my experiences here in the hopes the situation might improve with feedback.

mongo drivers

there are two viable driver libraries in the community today:

mongo-swift-driver is the “official” driver, but from what i’ve observed, mongokitten seems to be somewhat more popular. in my opinion, both libraries suffer from serious deficiencies that preclude their use in production, which i will detail in the rest of this write-up.

at a systemic level, having two libraries that do essentially the same thing has also been harmful to my productivity, because i will encounter an issue with one library, spend a lot of time rewriting the code to use the other library, make some progress, and then run into an issue with the other library, continually switching back and forth between the two.

documentation

arguably, documentation is even more important than implementation, because it is simply not possible to use a library if you don’t know how to use it.

mongo-swift-driver has Jazzy API reference, and to its credit, most of its API has at least a sentence or two describing what it does. it is hard to discover though, because you have to manually click each function signature to view its description, and for some reason, the async APIs do not appear alongside their promise/future-based counterparts; instead they appear haphazardly some ways down the page.

for a while, the second issue led me to believe that MSD’s docs had not been updated for async/await.

they are also not sorted alphabetically, which makes it hard to browse the generic MongoSH reference for an API and find its swift language binding.

while i give the docs credit for existing, many of them seem to be missing some basic information that a new developer might be wondering about. for example, none of the paragraphs for createCollection(_:options:session:), including the list of thrown errors, explains what happens if the collection already exists, which is a pretty basic programmer question :slight_smile:

MSD has no tutorials linked from its GitHub homepage. it has examples, but they are all using other large web frameworks (Vapor, Kitura, etc.) and there doesn’t seem to be many minimal examples demonstrating basic tasks like:

  • managing a database life cycle
  • managing a collection life cycle
  • updating/upserting a document
  • storing and retrieving binary data

mongokitten had no API reference before i started hosting it myself on swiftinit. but a majority of its API lacks doccomments (see the page for MongoCollection), so it only gets you so far. on the other hand, it at least has CRUD examples.

the sparse docs (for both libraries) really contrasts with the general MongoDB manual which is really helpful and really well-written. unfortunately, i had a really hard time transferring what i learned from the MongoDB manual to our swift world because…

API inconsistency

mongokitten seems to have designed its own, “swifty” API which looks different from MongoDB’s general API.

MongoDB has a quite simple and elegant interface that really only requires understanding one concept: the document. i was able to learn a lot about how to interact with MongoDB over mongosh just from reading the manuals.

which is why i wish mongokitten just exposed some basic bindings for transmitting documents to and from MongoDB instead of re-engineering its own strongly-typed swift API.

to me, db.collection.updateOne(filter, update, options) is very easy to understand. but we do not have db.collection.updateOne(filter:update:options:) in swift. instead we have:

  • func updateEncoded<E>(where: Document, to: E) async throws -> UpdateReply
  • func updateEncoded<Query, E>(where: Query, to: E) async throws -> UpdateReply
  • func updateOne<Query>(where: Query, to: Document) async throws -> UpdateReply
  • func updateOne(where: Document, to: Document) async throws -> UpdateReply
  • func upsert<Query>(Document, where: Query) async throws -> UpdateReply
  • func upsert(Document, where: Document) async throws -> UpdateReply
  • func upsertEncoded<E>(E, where: Document) async throws -> UpdateReply
  • func upsertEncoded<Query, E>(E, where: Query) async throws -> UpdateReply

interestingly, we have upsert (upsertOne?), but we do not have upsertMany. i have no idea how to pass options with MongoKitten, and the inconsistent ordering of the arguments is confusing to me.

MSD to its credit hews closely to MongoDB’s general API, and i had a much easier time mapping what i learned from the MongoDB manual to the MSD API.

name collisions

MongoKitten is really prone to toplevel name collisions. i don’t know why it vends a top-level type called “Binary” or “Sample”. MSD is better in this regard.

Foundation dependencies

this one is one of the biggest blockers for me, because i work very hard to keep Foundation from getting linked into my binaries, because Foundation really is a big problem in memory-constrained environments, like a server.

both MSD and MongoKitten are intertwined with Foundation in ways that do not seem well-justified to me.

for example, MongoKitten depends on the bson package, which exposes “convenience APIs” that use Foundation types. these APIs don’t contribute much, but cost a lot because they add a Foundation dependency.

i really wish MongoKitten adopted an organization more like what SwiftNIO does, where they factored out all the Foundation-related APIs into a separate NIOFoundationCompat overlay.

MSD is even worse because not only does it link Foundation, but it actually requires you to import Data in order to use it. BSONBinary has ByteBuffer and [UInt8]-taking initializers, but they are not public, and BSONBinary also uses ByteBuffer as its internal representation!

C dependencies

MongoKitten is a pure-swift library, so this is not relevant to it. but MSD wraps libmongoc so you have to do C-isms like call cleanupMongoSwift() in a defer block. installing MSD also involves some hoops:

The driver vendors and wraps the MongoDB C driver (libmongoc), which depends on a number of external C libraries when built in Linux environments. As a result, these libraries must be installed on your system in order to build MongoSwift.

it’s also not clear to me (as a new user) which parts of its API are safe vs. unsafe. for example, the docs for MongoCursor make it sound like a dangerous type which requires a lot of management, but i don’t really see many “safety procedures” being performed in examples that use it.

conclusion

i hope this feedback does not come across as too harsh. i really am grateful for everyone who has worked on these two libraries. but as it stands, i cannot really use them, for all of the reasons detailed above, but the Foundation dependencies especially. i hope this changes in future releases.

1 Like

I definitely think it comes across as harsh and not in a productive way. You use language such as "retarted" with regards to software that someone has made available free of charge, possibly in their free time.

If something is different to your expectations is might not be "retarded", possibly you don't agree with/follow/know the full context in which this decision was taken. Also: We are all learning as we go, I can say for sure that I regret a number of APIs that I have myself pushed for a number of years ago (same for the contrary).

Lastly, what kinda of response to you expect to this post? Maybe get in touch with the authors and suggest improvements. If y'all come to a common ground then I'd expect this to actually lead to something fruitful. If I were any of the authors of the software you mention I can tell you that I certainly wouldn't feel a lot of motivation to respond to this post.

12 Likes

it's always interesting to me the different medians people are accustomed to in their interactions with other people. i’ve been in jobs where being called ‘retarded’ (as opposed just having your work called ‘retarded’) would constitute a good day. i’ve also been in jobs where a coworker not acknowledging a birthday message would have been undeniably toxic behavior. such is the gamut of human experience.

i understand why you feel this way, because for a long time i operated under a similar mindset. i think it's easy to get defensive and say because you spent a lot of time doing something for the good of the community that everyone should celebrate it. that is after all what i felt entitled to when i quit my last job and i open sourced all my software and couldn't fathom why it wasn't getting the reception i thought deserved. because in hindsight a lot of my closed source libraries were not really ready for public consumption and even though it hurt my feelings, i grew as a professional (and a person) from that experience.

i think ultimately feedback is feedback and we all need to learn to brush off the ones we don't like instead of retreating into the language of hurt and telling ourselves things like "well i wouldn't be motivated to respond to this". your response is feedback to my feedback, and had i not received it, i probably would have posted a similarly worded writeup about a different set of libraries down the road, and gotten the same outcome. because most of the time i am not motivated to respond to a bitter complaint either, but growth only happens when you do something you don't want to do over and over again until it becomes second nature.

1 Like

This mostly seems like constructive criticism and a reasonable review of the options for Mongo APIs that other developers may find useful.

However, using a slur to describe MSD’s behavior is a clear Code of Conduct violation, and I will hide this thread until it is fixed.

1 Like

i think categorizing it as a slur is a bit extreme (if so, i hear hundreds of slurs each day), but you are right that it was rude and unprofessional of me and i have edited it out of the original post. :slight_smile:

1 Like

sorry, I was drafting a reply and pressed something wrong and it posted. I'll re-post once I'm done writing :slightly_smiling_face:

2 Likes

Hi, I’m one of the authors and current maintainers of mongo-swift-driver.

To first reply in a personal sense: I agree with @weissi’s assessment that your post came off as harsh in particular due to the slur you used, and that along with your doubling down on speaking to members of the community in this way did not make me super motivated to engage with you. Thanks for making the edit in the end.

There is a difference between wanting to be celebrated and more simply just wanting to be spoken to and treated in a respectful manner. I’m sorry to hear you’ve found yourself in toxic work environments in the past (and perhaps still do today) but I think we strive for a more healthy environment here on the forums and in the Swift community at large.

All that said, I will now reply in a professional sense to some of the points in your post.

Documentation

Regarding the ordering and sorting of API methods within each page, this seems feasible to customize in Jazzy. For example, I see that NIO’s docs for EventLoopFuture group the API methods into categories. I filed SWIFT-1658 about looking into this.

Regarding missing examples:

Just to clarify, do you mean “managing the life cycles of the Swift objects”, or does this include creating and dropping DBs/collections? If the former, there is nothing to manage; these types are just lightweight structs that know what namespace to send underlying commands to and they don’t require any special handling.

This is probably one that makes sense to add to the README and I filed SWIFT-1659.

Can you give some more details on what you are looking for here or possibly file your own issue/ticket describing it? What Swift type does the binary data start out in?

I will also add that in general we are pretty happy to answer “how do I do X” questions that you can’t find an answer to in the docs if you open a GitHub issue or Jira ticket with your question.

Foundation dependencies

These initializers weren’t omitted from the public API for any strong reason. I filed SWIFT-1660 about adding public API for this.

Foundation shows up in various other places as well though, most notably via the Date type which we use both in our public API to provide BSON datetime support as well as internally. I’m not aware of a non-Foundation Date type we could use. I know we also use some various Foundation types internally and in public API.

Our assumption thus far has been that people would mostly be using the driver in contexts where we aren’t the only library with a dependency on Foundation (e.g. Vapor also depends on it), and among database drivers we are not unique in this dependency - PostgresNIO, Redistack, and MySQLKit all look to have some Foundation imports as well.

I’m not sure about the feasibility of pulling everything out into a Foundation compat module or removing the dependency altogether, but it’s not something we could do until another major version anyway. I filed SWIFT-1661 so we’ll remember to think about this next time we’re doing a major release.

C dependencies

Depending on libmongoc has had some tradeoffs for us and for our users. A major upside is that it has been quite simple for us to keep the Swift driver up-to-date on recent MongoDB features as often a lot of the heavy lifting is handled in the C layer.

For example, adding support for the driver to work with serverless MongoDB Atlas just required a C driver upgrade and adding testing, whereas the effort to support it in a pure Swift driver in a robust, spec-compliant, and well-tested manner would have likely taken a single engineer 2 months of full-time work. The development effort to write a pure Swift driver relative to our resources is what has largely prevented us from making much forward progress on eliminating the libmongoc dependency thus far.

The downsides of the C dependency are those you name, some C-isms required of users and system dependencies that users must install on Linux.

Just to clarify, our entire API surface is memory-safe; it is possible to leak memory without proper handling of certain types, but we wouldn't consider that a safety issue. Regarding MongoCursor, the only special handling that may be required is ensuring proper resource cleanup by "killing" the cursor. This is needed in cases where the cursor isn't iterated to completion e.g. due to some error, such as a mismatch between the Codable type the user is mapping documents to and the data in the collection.
In the EventLoopFuture example you link, we should show how to do that in the error case. However, on any Swift versions where concurrency are available (which it sounds like you are on) we will handle doing that for you upon deinit in a new Task. Prior to concurrency we did not have an easy way to kick off async background cleanup like that hence needing explicit API. So if you are using async/await APIs you shouldn’t need to think about this at all. This is mentioned in the docstring for kill() but we could clarify it in the other places kill() is mentioned. Filed SWIFT-1662.

In summary

Thanks for the feedback you've provided. Just to set expectations, the Swift driver maintainers are a very small team that is also responsible for maintaining MongoDB’s Rust client libraries and maintaining / writing new shared MongoDB driver specifications. So I can’t guarantee any timelines for when we will get to particular issues here, but we have at least recorded all of them now. That said, we very much welcome pull requests so if there are any changes you really want to see (such as adding a missing initializer) the fastest way to get them in is contributing them yourself. Going forward, GitHub or Jira is the best way to get in touch with us on this type of thing.

edit:

I realized I neglected to respond to this point. the answer is an error is returned and I just opened a PR to add that to the docs. clarify in docstrings what happens if you create a collection with the same name by kmahar · Pull Request #776 · mongodb/mongo-swift-driver · GitHub

21 Likes

the latter. i don’t really understand what you mean “nothing to manage”, because createCollection(_:options:session:) is async, which leads me to believe that it changes some remote state.

some of the things i was interested in include:

  • creating a collection only if it does not already exist
  • renaming a collection
  • binding the same collection to more than one Codable type at once
  • transferring a collection between databases
  • deleting a collection
  • linting empty collections

some of these questions also apply to databases.

BSON arrays are very size-inefficient, because they encode the index of each element as a string key. so when i have a large array of (trivial-typed) values i want to stash in a collection, i want to serialize it manually and pass the buffer as a BSON binary field. i have already implemented the transformations to and from a ByteBuffer, so effectively this is about storing a ByteBuffer (or just a [UInt8]) inside a BSON document. but if there is an easier way to do this without fiddling with ByteBuffer i am all ears.

great! this will also make it easier to migrate between MSD and MongoKitten, since MongoKitten speaks ByteBuffer.

i’m not sure if MSD relies on any Date operations, but i noticed that BSON also has a Decimal128 type, which it treats as an opaque slug. while i have wished for years that swift had a compliant decimal type (this was a permanent timesink when i worked in fintech), i don’t think leaving the decimals as stubs with no supported operations is a huge blocker. so i wonder if the same approach cannot also be taken for the BSON datetime type.

this assumption is both true, and also limiting. because even though a lot of swift libraries depend on Foundation, swift is not the only language people write services in, and if we assume that the only people using the driver are people who either have to depend on Foundation for a separate reason, or have purchased resources to accommodate it, then those are the only people who are going to end up using swift on the server.

in every (server-side) role i have been in, i found it very difficult to justify the usage of swift to others, because the margins in arbitrage are quite thin and cloud bills add up, and partners wonder why they are burning capital waiting on someone to rewrite every Foundation-dependent library from scratch and why they didn’t just find a C++ developer instead. and to be frank, the only way i was ever able to keep using swift was by experiencing and engaging in “toxic work behaviors” :slight_smile:

so i hear people all the time on these forums asking each other “how do we grow the server-side swift community” but they only ever hear from people who are already using swift on the server for various company-specific reasons, because people who gave up on server-side swift and switched to a different language aren’t hanging around these forums, and there are not a lot of us who had the time, energy, and let’s be honest, abrasiveness, to fight for swift. so it really is frustrating to hear “you’re not making sense, everyone uses Foundation” over and over when talking about swift on the server, on web, on embedded, etc.

understood. since it sounds like cursor state is less complicated now with Tasks, i think the major C pain points have been alleviated.

thanks again for all your work on MSD! i know it can be demoralizing to hear people complain how it’s not quite there yet, but the fact that you all are continuing to push this forward goes a long way towards persuading people to keep backing swift on the server and makes me feel less alone in what i do.

2 Likes

By "nothing to manage" I meant "no special handling is required of the Swift types MongoDatabase and MongoCollection" (like you don't need to call some close() method on the objects when you're done inserting data or something like that). I thought you might have been asking about this since other types like MongoClient and MongoCursor have "shutdown" methods.
But I see now you are asking about the databases and collections stored in your MongoDB cluster that these objects correspond to. In which case yes there is management to do. createCollection is async since it sends a create command to the database server to create the corresponding object there.

Thanks for the list of lifecycle-related topics, that is helpful. I filed SWIFT-1666 about adding a guide on this. If any of those questions are burning / things you still need an answer to feel free to open an issue and we can give you an answer sooner than we can write a guide.

Yeah, this is an unfortunate property of the BSON format. I think what you're doing to transform the data to a ByteBuffer and stick it into a BSONBinary seems like the best way to work around this.

I think in a world where the C driver continues to handle everything that might be feasible. We could have some compat module that provides a BSONDateTime <- > Date converter for those who want it. But we might have a harder time in a pure Swift world as the driver would then have to do a lot of time-related things like measure roundtrip times to each host, estimate the "staleness" of secondaries, enforce timeouts, etc. (Maybe we could avoid at least some of the Date usage for that stuff though with new things like ContinuousClock and the timeout capabilities on Tasks.) I'll comment these thoughts on the ticket so we don't lose track of them.
Thanks for surfacing the Foundation concern; I know this issue can be particularly frustrating for some constrained deployment environments e.g. FaaS which we are seeing more and more MongoDB users utilizing these days.

My pleasure! :slightly_smiling_face:

3 Likes

just an update: i've uploaded a patch to publicize the two initializers on BSONBinary here:

4 Likes