Rewriting C-Libraries in Swift

It happens very often, that Swift-libraries are only wrappers for old C-libraries (for example libxml or zlib). It's a good start, because in this way you don't get headaches from the effort to use C code in Swift (I did that several times and in every major Swift version there were breaking changes). But in the end, a native solution would be much better, especially because Swift is a safer language then C (at least that is my understanding and it happens often, that someone finds scaring security vulnerabilities in those libraries). Did anyone ever thought about rewriting one of these libraries in Swift? Or would that be a senseless effort?

I'm sure someone did, and I'm sure there exists pure Swift alternative to many C libraries.

That said, porting something to another language is always a significant effort that someone has to be willing to give. While wrapping C libraries can and often does cause headache, it is still usually faster than rewriting thousands of lines of battle-tested C code.

So is this a senseless effort? It might be and it might not. It depends on the library, its complexity, its existing ecosystem (i.e. the community that already uses it) and how much porting it to Swift can improve its usability, performance, etc.

1 Like

zlib is an interesting case where--while written in unsafe languages--mainstream implementations (as provided by the system on most platforms) are generally extremely robust, due to their widespread use and aggressive use of fuzzing to uncover bugs. Any novel implementation, even one in a safe language, should be considered suspect by comparison¹.

I fully expect it to be rewritten in safe languages eventually (Rust already has multiple implementations, and I'm sure a few folks have written Swift ones, too), but their are (in my opinion) bigger safety gains to be had elsewhere in the short term.

¹ I would relax this claim somewhat for an implementation with a checked proof of correctness.

7 Likes

Just to pick your brain a bit - which libraries do you think would benefit the most by being reimplemented in a safe language? Does anything in particular come to mind?

1 Like

I think Mozilla's deployment of Rust in their CSS engine is a great example of a good place to target:

  • Exposed to incoming data from random websites
  • Very complex, making proof-style checking much less practical
  • CSS is actively changing, so you can't just write once, verify exhaustively, and be done

I expect they'll see ongoing payoffs for that effort in both a reduced rate of security regressions and reduced maintenance/change costs.

One other thought: I anticipate bigger payoffs in codebases that don't have ready access to a high quality data structures library already. Replacing hand rolled C implementations of strings, dynamic arrays, hash tables, and so on with standard ones is almost certainly worthwhile. As of last summer Swift depends on fewer libraries (just ICU/libobjc/libsystem now on Darwin!) so is much closer to being suitable for these sorts of environments.

5 Likes

What David said is exactly right. Especially "bigger payoffs in codebases that don't have ready access to a high quality data structures library already". Part of what makes zlib "relatively safe" to have in C¹ is that there's absolutely nothing "clever" about it. The parsing and data structures involved are nearly as simple as possible².

¹ Well, parts of it are implemented in assembly on some systems (including Darwin).
² of course, there have still been many bugs over the years, but at least in the hardened system implementations, they're pretty rare now.

3 Likes

There were written some interesting points I would like to comment:

  1. "Depends, how many people already use it"
    I always asked myself, why developers so often don't consider rewriting (their) old codebase. I'm just a hobby developer, and I experienced often, that sometimes rewriting old code can save time, because maintaining old code would cost even more time (and I can imagine, that the time factor plays a big role for this question).

  2. "Widespreaded libraries are safe"
    Is it like this? I never learned C and I always just learned the syntax of a programming language. Until now I never was interested in how the things work behind the code (that is something that has now changed after reading different things here in the forum). Maybe that is a question behind my first question: Is C really more unsafe then Swift? Or is Swift just easier to code with? At least, I guess, there would be much less bugs (one main reason, why I love Swift, is that things like optionals produces better code).

I came up with this question, as I read, that the people of Redox OS are rewriting the libc (they call it relibc). They completely build up an entire OS only on one programming language, what is pretty amazing (although I still prefer Swift over Rust :)). So the question is: Why are they doing this?

PS: Could someone explain me "bigger payoffs in codebases that don't have ready access to a high quality data structures library already"? Is this, where the Standard Library comes in?

Imagine you're a new programmer that started learning C. You start traditionally with the hello world program:

    printf("Hello, World!");

You're happy, you love this language, you would grade it :100:. To celebrate you modify your program:

    printf("This is cool! 100%");

Uh, oh. Now you are doing undefined behavior, and are exposed to the wrath of nasal demons, because you used printf format specifier wrong.

Okay, let's ignore that and go to the second lesson. Computers are for computing! You ask your tutorial how to add two numbers:

    int8_t a = 2;
    int8_t b = 3;
    int8_t c = a + b;

Great! But these are chump numbers! I can calculate that in my head!

    int8_t a = 123;
    int8_t b = 87;
    int8_t c = a + b;

Oh no, nasal demons again. And Xcode didn't even warn me about that.

What's next? If staments! I wanna write my own AI

    char *message;
    bool b = true;
    if (b) {
        message = "b is set to true";
    } else {
        message = "b is set to false";
    }
    printf("%s\n", message);

but this program is too abstract. Let's write something about me personally:

    char *message;
    bool iLikeMushrooms = false;
    if (iLikeMushrooms) {
        message = "cukr loves mushrooms a lot!";
    }
    printf("%s\n", message);

Oh no, nasal demons again because of uninitialized variable.

What is the next thing new programmers learn? Loops! Tutorial teaches you how to write "Hello world" forever:

    while (true) {
        printf("Hello, world!\n");
    }

Nice. But I don't like welcoming the world that much. Let's make it more silent:

    while (true) { }

Did you know that in C++ and earlier versions of C infinite loops that don't do anything are undefined behavior?

At this point you are angry. Why every time you do seemingly innocent changes to your program, you cause undefined behavior, which theoretically can delete your hard drive? This time you don't want any of that. You will copy the tutorial, and not change even a single letter of the source code.

#include <stdio.h>

int main()
{
    int a, b, c;
    printf("Enter the first value:");
    scanf("%d", &a);
    printf("Enter the second value:");
    scanf("%d", &b);
    c = a + b;
    printf("%d + %d = %d\n", a, b, c);
    return 0;
}

You run it, and... no tutorial I could find checks if scanf failed. You buy a bottle of vodka to drink with your band of nasal demons. You start thinking about rewriting your brain in rust.

9 Likes

This is a misreading of what has been written here. In general, wide-spread libraries are not safe.

There are a few specific wide-spread libraries that are cornerstones of file formats commonly used on the internet that are extremely well-tested relative to the complexity of the algorithms they use and--while I would not claim they are bug-free--are at least mostly without any simple bugs.

1 Like

Adobe Flash has joined the chat

8 Likes

Seriously, though:

Developers rewrite old code all the time, for many different reasons. If you're trying to reduce future maintenance costs, we tend to call that a refactor rather than a rewrite. When you write the code, you make some assumptions about how easy it will be to maintain and what things you might want to add in the future, but reality might differ. It's common to go back and visit code if you feel the design isn't meeting your expectations in practice.

Another reason to rewrite code is to adopt new technologies - again, typically because you think it be an overall maintenance win, or some old technology you used is being deprecated, or because it will help you develop new features later.

When it comes to adopting Swift - well, C isn't being deprecated any time soon, so the 2 relevant reasons to port a C library to Swift are to reduce maintenance burdens (including from safety issues), and to offer new features (e.g. generics).

Apple are doing it right now by building Swift in the OS. They've rebuilt their UI code in Swift. Some of the motivation may have been to reduce bugs and safety issues in the old code, but they're also making heavy use of features like generics and property wrappers, which they didn't have before Swift.

C is an ISO standard and cannot change very easily, even if inherent safety issues emerge.
Swift is designed to be safe (period. Not just "safer than C"). It's always the most important factor, even more than source stability IIRC.

3 Likes

Okay, but this error could also happen in Swift, if there would be a function like printf.

Wow, the result is -47! I see, I don't have any knowledge of C and I had to read about. int8_t just has 1 byte, means it can have the numbers -128 to 127. The result of the calculation would be 210, so you need a uint8_t (or a int16_t?), to make your example work correctly. Interesting: Xcode warns you, if you write int8_t a = 128;, but there is really no warning for your example.

By the way: In Swift you just write var x = 200. How does Swift manage this to make it as performant as C? And what's about the memory?

Whats the problems with that example? Xcode gives a warning about that. The example compiles, but just prints "(null)".

The scanf-example crashes when I enter a non-digit. That's of course horrible, in Swift this is much safer, because it will be checked, if the String to Integer conversion works.


Okay, back to topic (sorry for those questions or comments, but since reading this forum, I want to learn more about all those background stuff of programming languages):

@scanon: Sorry, then I misunderstood you. Well-tested is for sure a good argument, but how often do people still find bugs and vulnerabilities? It's the same question as with open source software: Just because the code is open, it doesn't mean, that it is safer (because no one can check every line of code or no one can check a library in every case).

Rewriting old C-libraries would at least make it easier for people like me, who never learned C or C++, if they want to contribute to those projects ;)

If you want to write a new compression algorithm, or even a new implementation of zlib, you don’t need an existing version written in Swift.

In general, if you want to build some feature, you can do it as a package. You don’t need to make it part of the original library itself. Swift has been specifically designed to support that, even if you’re extending a C library.

Answering @Lupurus.

Contracted because off-topic

Swift uses native integer as a default. So it would just be Int64 on 64bit machine, and Int32 on 32bit machine. So it will likely use more memory than C example, though marginally. In practice it works well most of the time, and you can specify the size should you need to;

var x: Int8 = 200

The example never sets any value to message. There is no specification about what to do in C. In other word, this is undefined behavior.

Undefined behavior is very nasty because it can be anything. It may crash at runtime, set all bits to zero, use old memory values, etc. Different compilers can choose to do different things there, even compiling/running it twice could result in different behavior. It is a nightmare for debugging because it crank it works on my machine to the max.

You also can not reason anything about the code because there's no specification to reason about. It could be that Xcode tries to be nice and set message to null pointer. It could be that you's lucky that message uses old memory that happens to be all zero (null in C). Though given that there's a warning, it could be the former.

It's a territory Gods only know.

2 Likes