Why is it possible to get a class from a String but not a struct or enum?

So it's possible to get a class Type using NSClassFromString(), but there doesn't appear to be any way to do the same for struct and enum types. Is there a reason for this limitation? For instance, this is something which could be useful for serializing and deserializing collections of heterogeneous type.

The Swift runtime supports getting any type by mangled name, using the _typeByName function in the standard library. (It's unofficial, though, so may be subject to change.)

2 Likes

Oh cool I didn't know that. Is there an easy way to get a mangled name from a type?

There is also some lack of expressiveness in the language. For example type(of:) function isn't implemented in Swift because right now there is no way to split the merged meta type kinds.

One day we 'might' get another keyword meta alongside some and any. That keyword and things like _typeByName would potentially allow us to express a function like:

func subtype<T>(of type: meta T, named name: String) -> (any meta T)? 

Usage:

let type: (any meta Any)? = subtype(of: Any.self, named: "Bool")
type == Bool.self // true

Here are some more details around that idea:

What is the officially unofficial way to get the mangled name for a type?

I ask because I have a use where I want to initialize an instance of a type that was stored in an archive, but right now I have to manually register the type along with some arbitrary name for it before I can read/write those types since I can't just read the archive and turn the stored type name into the type itself.

(It'd be better if such a mechanism for this wasn't.... unofficial... :stuck_out_tongue:)

Well, for serialization purposes, you probably want that list anyway, because instantiating an arbitrary type is a security hole. There isn't an official way to turn a type into a mangled name yet.

Hmm, that's true. Maybe we shouldn't get one... people like me would just inadvertently use it wrong. :stuck_out_tongue:

In my case, I could also make this more automatic if there were just a way to get all of the types that conform to some protocol. Then I could just spin through those and take their class name or something and compare them with the string name I stored in the archive and I wouldn't have to pre-register anything. I think that'd still be generally safe?

No.

There is a lesson to be learned from previous serialisation formats that have used "store the type in the archive", which is that they are not safe to use unless you are absolutely, 100%, would swear on it in court certain that you will only use this on trusted data, as they are a vector for arbitrary code execution. This is widely documented with equivalent formats, such as Python's pickle and Java serialization.

The only safe way to use a format of this kind is to list exactly the types you are willing to deserialize. Anything more dynamic than that runs the risk of surprising injection. For example, suppose you said "I'd tolerate any Collection here": well, an attacker can now force you to construct any Collection that is present in any of your dependencies, including internal ones that are not normally constructible by you and that may do weird and unexpected things when constructed in this way.

In essence, you need to be confident that all possible types that conform are actually safe to deserialize in this way, and in the past we have repeatedly learned the lesson that the only way to be sure of this is for your type whitelist to be maximally restrictive: that is, by default you cannot deserialise any type, and you have to explicitly name the types you have audited and found to be safe to deserialise.

Put another way: don't store the type names in the archive. Use a versioned schema instead. This allows you to know a priori what types are in your archive.

4 Likes

So why is NSClasdFromString allowed? Doesn’t this have exactly the issues you’ve described, only for a subset of types?

It's a holdover from Objective-C.

3 Likes

Also note that my answer was intentionally scoped to a question about serialisation. Being able to instantiate an object from a type name isn't evil: it's useful. There are cases where being able to do this is valuable. For example, it can be used to implement an extension registry, or for tools like Python's WSGI when you need to find another object in the same process as you.

It's only a bad idea when the input is potentially untrusted, such a serialisation format: then you're in trouble. Strings hardcoded in the binary or that come from your command line are much less of a thing to be worried about.

1 Like

Thanks for the writeup!

The design I had in mind would only be dealing in types that conform to some particular "this can be serialized" protocol and therefore would be known to be allowed to be written/read. So in some ways, that type conformance becomes the whitelist and you wouldn't be able to deserialize just any arbitrary type if it didn't already conform to the protocol, at least. That seems more or less equivalent to what you're saying, but perhaps I'm still missing something subtle.

The issue arises when this protocol is widely deployed. Consider, as an example, Codable. Suppose for a moment that instead of the uses of Codable being of the form decode(MyType.self, from: data) they were decode(from: data) and could return any type that conforms to Codable, based on what the archive says it contains.

If you have downstream dependencies, you can now deserialize any of their objects that conform to Codable from your archive, as well as your own. On top of that, if your dependencies add new internal Codable-conforming types in a patch or minor update, those also suddenly become possible deserialisation targets.

The reason this matters is that in order to deserialize a type, you must invoke its initializer and parsing code. If you will take anything that conforms to a protocol type, and that protocol is potentially widely conformed, then you cannot possibly audit that all of these initialisers have code that it is safe for you to run. These initialisers are arbitrary code: they could perform network access, read files from disk, or have memory safety vulnerabilities in them. These vulnerabilities occur all the time, including in systems that match exactly this property. Do you trust every object that implements Codable to be safe to deserialize at arbitrary points in your program? I don't!

Your proposal is probably safe if the protocol you need to check for conformance with is internal, and if you have carefully audited your initialisers. In general, though, I recommend against it: it's valuable to know what you're expecting to get, and a schema gives you that power.

7 Likes

(This is a great summary of one of the reasons we designed the Codable APIs exactly in this way.)

2 Likes

Ah ha - that makes a lot of sense and better explains why Codable is the way it is. So it's one of those cases where what might be a fine pattern in a small context becomes a very bad pattern in a larger one.