Generic Compression Library

Introduction

This proposal introduces new compression functionality to the standard library like flate and gzip (gunzip, bzip2 ...)

Motivation

Currently, there is no pure swift implementation of compression algorithms like deflate, inflate, gzip, bzip2, gunzip ...

this will eliminate remove the dependency of third-party libraries like zlib (statically or system linked) and will improve compile time.
it will also help many server-side swift frameworks, in their compression implementations.

ref: stdlib/public/Darwin/Compression uses zlib under the hood.

1 Like

These kind of additions are more likely to be fall under Foundation framework umbrella rather than standard library.

Second, I'm not sure how Swift pure implementation gives us any advantage over C implementation which is battle-tested and optimized.

2 Likes

It removes a dependency on external libraries, making Swift more portable and bringing said performance issues to light. And since Foundation doesn’t have this functionality, I wouldn’t think it would live there.

4 Likes

Afaics, Swift maintainers don't prefer a big fat standard library. Even essential types like Data are implemented in Foundation. I don't know how we can have these functionalities when we don't have the Data struct. Practically, you should import Foundation to make these classes useful.

On the other hand, Foundation is maintained by Apple itself, and there is no way to have open discussion about changes in Foundation.

P.S: I opened a thread on moving Data to standard library some while ago, and it was unsuccessful.

Well yes, shipping such a library would require changes to Swift processes. We'd likely need a Data-like type in the standard library and support from Apple for making official, if not shipped with the system, packages. I think compression could be a good first candidate for such a system.

2 Likes

I believe compression algorithms don't belong in the core stdlib (i.e., the default "Swift" module that's imported by every Swift program); they should rather be loaded with an explicit import statement.

On its platforms, Apple provides the Compression framework, which provides a unified interface for OS-provided compression algorithms. This framework is accessible to all Swift programs running on these platforms. It's not available on other platforms and the API it provides is outside the scope of the Swift Evolution process.

It would probably not be a good idea to reimplement the existing compression libraries in pure Swift. The existing code is already highly reliable, universally portable and well-optimized for popular target platforms.

However, it would be nice indeed if we had uniform APIs for compression that are usable on all supported platforms. This strikes me as a project that can and should be developed as an external SPM package, then perhaps pitched here for eventual inclusion. (Yes, we may need to come up with new processes to support this -- however, in the meantime, it shouldn't be too hard to import an external package.)

Some elements may lead to pitches for potential stdlib enhancements -- for example, it may be interesting to discuss if the APIs for streaming compression can/should be generalized to a stdlib construct, or if the existing stdlib facilities for dealing with contiguous byte buffers are adequate.

6 Likes

I agree about where the library would live, but I think there's great value in implementing the algorithms purely in Swift, as it both tests the language's applicability and capabilities (as you suggest, we'd want some improvements in the standard library to really support this) and would also provide 100% compatibility across all of the platforms Swift supports. Otherwise you're constantly running into issues around which version of which compression library Swift is currently running on top of for certain methods, which is not a good look for a supposedly cross platform language.

I agree that a pure Swift implementation of compression algorithms would be a fun project to work on, and it would likely generate useful input in the form of optimizer bug reports. I can also see how prototype-quality pure Swift compression algorithms would be useful to include in the Swift Benchmark Suite, or perhaps the compiler test suite.

However, I don't see a reason we'd want to ship such implementations as a public-facing part of the project. The existing compression libraries have gone through decades of development; they are fast and extremely stable. The cost of producing reimplementations of comparable quality seems extremely high to me, with very little in terms of potential payoff.

That said, there is no doubt this would be a fun project to work on. Why not do this as an external package?

11 Likes

It's worth noting that even "mature" languages are often not fast enough. I wrote several of the optimized encoders and decoders used by the compression library on Apple platforms; the core algorithms are mostly in assembly because it's often possible to go 20-200% faster than C or C++ for that type of workload.

The optimization tasks necessary to get that speed in Swift are awesome to work on, but there's years of work to be done to get there. In the meantime, it probably makes sense to wrap Swifty APIs around the fastest, safest implementations we have available.

10 Likes

An external package brings no buy-in from the community, so participation will be necessarily limited. There's no discoverability of such a package, especially given the current state of Swift's package ecosystem. There's a necessarily limited pool of talent, as such a project would be ineligible for Apple contributions. Without a blessing of some kind, the project is limited to those contributors with the high level of interest such a project would normally bring. Basically, it becomes just another open source project languishing in obscurity and the situation doesn't actually improve.

When and why would this work be done if the answer is always "just wrap the C library"? Swift needs to stand on its own, otherwise it will never succeed as a standalone, cross platform language. It will be relegated to Apple platforms as a wrapper language unless there's official investment in making the language independent, even of other system libraries. Official libraries, even if they're only available as packages, have the best chance of driving the language forward through real world usage and helping adoption of the language by providing solutions to real problems.

This doesn't have to start with a compression library, it could be almost anything (integrate CryptoSwift?), but it does need to happen.

2 Likes

I disagree. It should be possible to work towards being able to write implementations in Swift while shimming existing implementations for the benefit of all users. Write prototypes, add them to test/Prototypes. Add benchmarks to measure performance. File performance bugs. If there are reasons why this approach doesn't work, we should fix them.

There are a number of smaller features that I've been meaning to chip away on in exactly this manner; they've gotten lost in the shuffle of addressing more fundamental numerics issues in the past few months, but I don't see any reason (yet?) why this approach would not work. One great test case would be Tim's floating-point -> string conversion functions; those can be written in Swift, and there's no good reason why they should be slower than the current C implementations. Adding Swift implementations as a benchmark and filing an umbrella bug to bring it up to parity with C would be a great starter project.

Of course it's possible, I just think it would be more effective and less overall effort if the solution to whatever problem wasn't creating a bridged implementation while working on a native one, but instead offering the native one and funneling all of the development effort into improving that, and the language with it. I'd rather see the wrapper libraries be the community driven versions that users fall back to if they need to. After all, any "official" thing will drain resources from community implementations, unless there's something unique, like wrapping a system library, about them. Additionally, like I said, I think the native versions would be more easily portable as well.

1 Like

The problem with this approach is that it pushes an immature implementation as the “official” solution. I agree with @scanon that an incremental approach is appropriate. If we can define good abstractions for the base API the “official” implementation can be a wrapper for now and eventually swapped out with a Swift implementation if and only if it becomes sufficiently mature. In the meantime, people would be free to use and test the Swift implementation with any code that is written in terms of the protocol-based API.

I agree that it is possible that this approach might cause it to take longer for a mature Swift implementation to appear. I don’t think that’s a bad thing. The goal should be a mature implementation exposing a first-class Swift API. It would be cool if the implementation itself is eventually written in Swift but that shouldn’t be the primary focus.

2 Likes

Doing it that way removes most of the motivation to make the Swift native version and we lose everything the native approach would bring to the language and the ecosystem. Besides which, it's not as if the "official" version would ship immediately, nor prevent anyone from bridging system libraries themselves. Heck, if Swift had an Experiment package, we could put it in there to evolve. So my concern here isn't just that it takes longer, it's that doing an "official" version of anything that's just a wrapper means we're far less likely to get a native version at all, all the while the library has the portability and dependency issues all such wrappers have. These are both net detriments to the language and community in a variety of ways. So I'm not really talking about compression specifically here, but the ability of Swift to have any even semi-official native libraries for any important functionality.

1 Like

So why is it wrong to take the wrapper approach in the meantime as something we can ship sooner?

I am very supportive of the effort to organize the community to develop Swift implementations of important libraries. I just think that is an orthogonal concern from what we do until they are mature enough to ship for production use. In performance sensitive domains that may be a while as @scanon noted above.

1 Like

That just brings the functionality, which pretty much anyone can make by importing a system library. It doesn't bring anything else to the language, so an "official" wrapper is of limited value, all the while delaying native development and potentially stifling community efforts. Basically, if we're going to have "official" non-standard libraries, they need to bring more value than the community could provide, the more the better, and that value must be worth the likely loss in community investment in the area the library covers.

There's a WWDC 2015 video about the compression library.

SR-10244 describes some issues with the Compression overlay (for Swift 5.1).

I'm concerned that the Compression overlay's dependency on Foundation.Data may prevent Foundation from importing Compression in the future.

The standard library uses an internal _HasContiguousBytes protocol.

There are several public protocols in Foundation:

I think those protocols belong in the standard library.

If even C implementation can't reach high performances while using a compiler optimised by year of works partly by the processor constructor themselves, I hardly see how you wish to make Swift realise that miracle.

Use the right tool for the right task, and the right tool to write highly optimised targeted piece of code is not Swift.

I agree that it's remarkable that a rather low-level C library has grown a dependency on Foundation in its Swift overlay, and it probably indicates that stdlib doesn't cover byte-level processing very well.

2 Likes

Meanwhile, Go has multiple native implementations of compression algorithms in the standard library since version 1 (compress/ directory - compress - Go Packages). And it has an idiomatic interface (the equivalent of try data.compress()), not some weird wrappers like the Compress framework.

Since it is native, you can build a Linux binary on your Mac and run it on a Raspberry Pi, without installing anything.

Is it slower than native C libs? Who knows? The 0.1% of people that need the extra performance can benchmark and use whatever super optimized C library they want.

Swift is supposed to be a more powerful language than Go, so it should be even easier to have an even better implementation in Swift. But actions speak louder than words.

5 Likes