Generic Compression Library

Doing it that way removes most of the motivation to make the Swift native version and we lose everything the native approach would bring to the language and the ecosystem. Besides which, it's not as if the "official" version would ship immediately, nor prevent anyone from bridging system libraries themselves. Heck, if Swift had an Experiment package, we could put it in there to evolve. So my concern here isn't just that it takes longer, it's that doing an "official" version of anything that's just a wrapper means we're far less likely to get a native version at all, all the while the library has the portability and dependency issues all such wrappers have. These are both net detriments to the language and community in a variety of ways. So I'm not really talking about compression specifically here, but the ability of Swift to have any even semi-official native libraries for any important functionality.

1 Like

So why is it wrong to take the wrapper approach in the meantime as something we can ship sooner?

I am very supportive of the effort to organize the community to develop Swift implementations of important libraries. I just think that is an orthogonal concern from what we do until they are mature enough to ship for production use. In performance sensitive domains that may be a while as @scanon noted above.

1 Like

That just brings the functionality, which pretty much anyone can make by importing a system library. It doesn't bring anything else to the language, so an "official" wrapper is of limited value, all the while delaying native development and potentially stifling community efforts. Basically, if we're going to have "official" non-standard libraries, they need to bring more value than the community could provide, the more the better, and that value must be worth the likely loss in community investment in the area the library covers.

There's a WWDC 2015 video about the compression library.

SR-10244 describes some issues with the Compression overlay (for Swift 5.1).

I'm concerned that the Compression overlay's dependency on Foundation.Data may prevent Foundation from importing Compression in the future.

The standard library uses an internal _HasContiguousBytes protocol.

There are several public protocols in Foundation:

I think those protocols belong in the standard library.

If even C implementation can't reach high performances while using a compiler optimised by year of works partly by the processor constructor themselves, I hardly see how you wish to make Swift realise that miracle.

Use the right tool for the right task, and the right tool to write highly optimised targeted piece of code is not Swift.

I agree that it's remarkable that a rather low-level C library has grown a dependency on Foundation in its Swift overlay, and it probably indicates that stdlib doesn't cover byte-level processing very well.

2 Likes

Meanwhile, Go has multiple native implementations of compression algorithms in the standard library since version 1 (https://golang.org/pkg/compress/). And it has an idiomatic interface (the equivalent of try data.compress()), not some weird wrappers like the Compress framework.

Since it is native, you can build a Linux binary on your Mac and run it on a Raspberry Pi, without installing anything.

Is it slower than native C libs? Who knows? The 0.1% of people that need the extra performance can benchmark and use whatever super optimized C library they want.

Swift is supposed to be a more powerful language than Go, so it should be even easier to have an even better implementation in Swift. But actions speak louder than words.

5 Likes

The performance of the go gzip implementation are know to be slow compared to native implementation (that's why there is go package like cgzip).

The 0,1% include all people wanting to server HTTP content for instance. Compression is use everywhere nowadays.
Having a poor implementation in the stdlib (just because we want to avoid the cross compile issue) and telling user to install a native library wrapper to workaround the poor performances is probably not the model we want to follow for Swift.

1 Like

That depends entirely on how poor the performance is, and who the anticipated user of the library is. There's the rare user who needs to write their own HTTP server, but far more who just need to manually compress a file or something, for which a slight slowdown is an easy tradeoff for 100% platform compatibility.

You are basically making my point. Most of the people that write their own HTTP servers are not CPU constrained, otherwise they wouldn’t write their own. And with gzip in the standard library they get that for free. For the rare cases that they do need the extra performance, a C based alternative is readily available.

Contrast this with Swift that does not have neither HTTP server nor gzip in the standard library.

The available open source compression libraries already achieve 100% platform compatibility (and much more -- these also run on platforms than Swift doesn't support). SPM supports building C projects; it should not be too difficult to, say, package zlib up into a module that's available from Swift, along with a Swift-native interface.

For example, Swift NIO does this with BoringSSL. Like compression, production-quality encryption is likely also best done using existing code.

If I recall the discussion correctly, there is a long term goal of a swift-native crypto implementation, but for the time being they are packaging BoringSSL because it's better than depending on a system version of OpenSSL (since each platform Swift supports comes with a different version of OpenSSL) and there isn't the manpower/time to develop a swift-native crypto library at the moment. Eventually there will be a pure-swift crypto library.


While you do make a valid point, that it would be easy to package zlib into a module with a swift interface, I think there's a big difference between swift-native crypto and swift-native compression. Crypto is far more intricate and if you do it wrong there can be security ramifications. Plus there is a greater need for it to be fast than something like compression.

Compression is not quite as high-profile. If you get it wrong that usually means it just won't work (I'm betting a working pure-swift implementation would be far less work than a swift native crypto implementation). Performance can and will be improved over time and those who need more performance can go with the "package a C library and use a swift interface" route.

Huh? Compression is every bit as security-critical as crypto[1], and performance is, if anything, more important[2]. I've worked on low-level optimization of both pretty extensively, and this doesn't make any sense at all to me.

[1] People download compressed code and binaries that they run on their machine. Compression bugs lead trivially to RCE. Even when you're not decompressing a binary, "servers leaking sensitive memory via compression bug" is practically its own subgenra of vulnerability.

[2] Asymmetric crypto--where the difficult software performance work is--is used for key exchange. This is slow, but O(1). After that, you typically do O(n) work with a symmetric cypher, and those are more often than not implemented in hardware (AES, etc). Compression is on the critical path for most network accesses. Every bit of speed you can squeeze out counts, because you're trying to optimize the transfer + compression system, not either one in isolation.

2 Likes

I see the validity of your claim, however, wouldn't a pure-swift implementation be less prone to memory leaking than C code? Many of the bugs which plague C programs are far less-likely to occur in swift due to its basic language design (ARC, optionals, etc). Isn't that kind of the point of swift? A pure-swift implementation is not the same as a lower-level implementation.

Neither I, nor anyone else on this thread claimed the performance would work for every use case. It has been stated that those who would need better performance could use a C library directly, both by myself and others.

Not everyone is going to use compression for network accesses. The very small number of network libraries out there could use C-interop for a more performant compression implementation while everyone else could use the marginally slower pure-swift compression library.

I'm positive that in time, a pure-swift compression library could become as fast as a C implementation. I'm not claiming version 1.0.0 would match something like zlib with years of work into it, but eventually as swift improves so would everything else that uses swift.

I agree! This is not at all in conflict with my position.

If there is an industry standard implementation already available, then the pragmatic approach is to start with that. There is a clear need for Swift programmers to have access to compression and encryption algorithms right now. The way to provide that is to package up an existing library with a nice Swift overlay. We should concentrate our efforts on making sure we get the API right.

We'll be able to replace the implementation when and if it becomes feasible to do so. We can even do this gradually, starting with specific methods/algorithms. Having a known-good baseline implementation will make it easier to evaluate any potential replacement.

2 Likes

A novel implementation in any language(*) is more suspect than existing heavily fuzzed and tested implementations. Languages like Rust and Swift help avoid some of the worst classes of bugs, of course.

(*) possible exceptions for machine-checked correctness proofs, depending on how battle-tested the compiler and proof assistant are.

Whatever you are using compression for, the optimal choice of compression algorithm is fundamentally determined by two things:

  • the compression ratio achieved.
  • the ratio of compression and decompression speed to transfer speed for whatever medium you are using.

None of what I'm saying is unique to network transfers. It is equally true for transfers between disk and memory, or levels of the cache hierarchy, as it does for messages sent by carrier pigeon. However you are moving data, a faster compressor means you can use a greater compression ratio, which translates into better performance, less energy used, and smaller memory footprint.

That would leave it about 2x slower than the implementations that we're currently using on Apple platforms. I'm sure we can do better in time, but we shouldn't regress in the meantime.

None of this is to say people shouldn't write Swift implementations of these things. You totally should, because it's a great demonstration, a great source of useful bug reports, and it's also just fun. Rather, I am only pointing out that the bar for replacing the existing implementations is very high.

1 Like

Which is why it would be important that any new implementation is compared and tested against the battle-proven implementations.

The outputs can be compared to be equivalent to a given set of inputs. Speeds can also be compared quite easily. Memory usage during runtime can be observed. It is entirely possible, although definitely not easy, to build a new implementation that can be just as good as the well-known ones. It will definitely take time and lots of incremental improvements to get there though.

Everything you said is perfectly valid, but not everyone cares about using the "most optimal" solution. Average Joe swift dev doesn't care if the pure-swift compression library is 10% slower than the Apple high performance compression library. They most likely just want something that's easy to use (and perhaps cross-platform).

I agree wholeheartedly that there's no way the first version could wholly replace existing implementations and I apologize if my remarks gave anyone that impression.

These points do carry weight though in my opinion. If compression were to be a part of something external, where it isn't tied to the language (like what BoringSSL is in NIO) and can incur breaking changes at any time (like removing it's packaged C library overlay) then I wouldn't be opposed to following in the footsteps of NIO.

The compression overlay would need to match exactly what the pure-swift api would be, so that it could be replaced piece-by-piece. Otherwise, I'm not sure how this would work? Vend a C library and swift overlay for it, only to deprecate it in a couple of years after thousands of packages depend on it?

I don't see any reason why designing the perfect API requires a "pure swift" implementation, or why transitioning to a Swift implementation would require API changes. Fundamentally the compression algorithms in question traffic in a contiguous block of memory, a concept that can be well represented in almost every language.

1 Like

I wasnt saying we would need a pure-swift implementation first. Just that we’d need to make sure the api for the C overlay would match whatever it is we do for the pure-swift version.
I can’t think of a reason why the api would change when migrating to the pure swift version. It would just be a real hassle if the implementation interface changed between the c overlays and the swift implementation.

2 Likes

Huh? I guess it's my turn now to say that this doesn't make any sense at all to me. Take 10 gzip and 10 RSA implementations in Python and see which ones have the most bugs. Maybe it makes a bit more sense if we are only talking about C, but crypto code is at least an order of magnitude more complex and difficult to test than compression.

Well, that's true for every kind of code people download. JSON (or ASN.1 for that matter) parsers have equal number of problems.

Again you can say this for everything, for example JSON encoding/decoding is also on the critical path. However it has been shown repeatedly (e.g. Python) that programmers will take a 100x performance hit in order to get good UX.

And the reality is that for the 4 years that we've been having these debates, people are downloading dodgy libraries from the internet and are implementing these things themselves.

Why do you think that SPM (of all projects) decided to implement their own SHA256 function? [1] The answer is because to do that now you need to depend in closed source code and get an API like this: withUnsafeBytes Data API confusion

[1] https://github.com/apple/swift-package-manager/blob/26d456594d3572f3822a794cacebd8e0b653ad70/Sources/Basic/SHA256.swift

1 Like
Terms of Service

Privacy Policy

Cookie Policy