New literal for string from contents of file

I don’t have a strong opinion about this, but it probably makes sense to use something like #fileContentLiteral if we’re allowing it to be any type, not a string. My recollection is that the ExpressibleByStringLiteral protocols assume that you’re dealing with unicode encoded string data, which random binary blobs may not be. @xwu would know more. Maybe this could be made to work with any type that is ExpressibleByString or ExpressibleByArray literal, and in the later case it would be sugar for specifying an array literal containing the file contents, ala: [0xC, 0xA, 0xF, 0xE, … ]

In any case, I’m not really sure what the right thing is here and unfortunately I don’t have a lot of time to spend on this. I just think that it is an interesting and promising idea - one that I haven’t heard before.

-Chris

3 Likes

I agree with Chris’s suggestion of supporting various types. I suppose if we end up with a fixed and relatively small number of types (I would say max 5), we could name the macro differently to suit each type. Otherwise, we can try and infer the type based on the file extension and etc. However, as Chris mentioned, binary blobs may or may be not Unicode encoded. This generates two possible solutions:

  • Find the encoding of the file from its metadata.
  • Let the user decide what type to use with an enum parameter (.data, .uInt8, …) if that is possible with macro.

The first, obviously, would indeed be wonderful.

I don’t know more, but that’s an interesting point. Encodings other than Unicode aren’t supported by the standard library but by Foundation, so it’s unclear how you’d support reading arbitrarily encoded data using a literal that’s built into the standard library.

You could, however, have some sort of literal that gives you a pointer to code units that you feed into the relevant string initializers. However, even then, you couldn’t use Data (that’s in Foundation), so it’d have to end up being a buffer pointer, which seems pretty awful. And conversion between the data’s encoding and Unicode would still be at runtime.

Is it necessary to limit us with using the STL only? I assume #colorLiteral doesn’t belong to it since it uses UIColor

Again, literals don’t have a type of their own. UIColor supports being initialized by a color literal (it conforms to the reserved protocol _ExpressibleByColorLiteral); that doesn’t require the compiler to know anything about UIColor.

If you’re going to support arbitrary encodings at compile time, the compiler needs to know how to decode that data, and that functionality has to come from Foundation or somewhere else.

1 Like

You might find some inspiration here. I not sure if it’s exactly the same, but it might be useful for naming purposes.

1 Like

I see. But we don’t have to limit ourselves to the Standard Library only. We can declare the protocols in the Standard Library and make use of them including in Foundation. Just as UIColor made use of _ExpressibleByColorLiteral. We have Data and NSString from Foundation, and UInt8 and String from the Standard Library. I suppose this would do for a necessary condition.

Is this whole pitch generalisable to something like #dataLiteral(resourceName, optionalEncoding), where the encoding is either supplied explicitly or inferred using some well defined heuristic (could just be file extension, could be something smarter which examines the content at build time)?

The corresponding protocol would then be something like:

protocol ExpressibleByDataLiteral {
  init(data: Data, encoding: String)
}

It feels cleaner to have a single mechanism for embedding an arbitrary sequence of bytes. If you have the encoding information then it ought to be possible to make it work fairly seamlessly with String (for example), but also any number of other types.

1 Like

Data would require Foundation. Besides, we don’t want to limit ourselves to only Data.

IMO the protocol should be ‘generic’ as other ExpressibleByLiteral protocol:

protocol ExpressibleByFileContentLiteral {

    associatedtype LiteralType

    init(fileContentLiteral: Self.LiteralType)
}

And be defined in the Standard Library, or analogously broke up into several protocols.

Isn’t that just a file literal, but changing the initializer so that it passes along the raw content instead of the resource name?

1 Like

I have the feeling you were answering to @samdeane, since the initializer in my protocol is dependent on LiteralType.

It seems so, but it’s unclear to me how one would express the encoding part when all interesting encoding functionality is in Foundation.

But that doesn’t seem to be fundamental to the proposal where the pitch is to read the data at compile time, and without it the rest seems to be quite reasonable. So I think this is a nice way to formulate it: #dataLiteral would be a straightforward addition to the existing literals.

1 Like

No, I was responding to you. But thanks for pointing out @samdeane’s response.

Obviously you can pass whatever you want to that initializer, depending on how you define LiteralType. Could be NSData or String. I believe I got the parameter label semantics wrong. Should be something like resource

P.S. I edited my response quite a couple of times, I suppose that was the reason for the misunderstanding

I don’t understand how this would work. How would the compiler create something of type NSData without Foundation?

It will be able because a Foundation type should conform to the protocol in case it uses NSData as LiteralType. Without importing Foundation, you won’t be able to use that specific implementation.

What would the protocol to which NSData conforms look like? Literals aren’t magic; at some point, the compiler needs to instantiate a built-in or standard library type to pass as an argument to the literal initializer. What is that type?

Wouldn’t a stream-like type be the good choice here? This could help controlling memory consumption, and support huge resources. Some conforming types may fail early consuming the input data (imagine a json-based type). Some conforming types may reduce the input data (in a hashed value, for example).

Like the one I had in my previous reply. And for other types as well. For the Standard Library, we can have the String variant of the literal, for instance. If you need other encoding or [NS]Data, import Foundation.

Good idea!

Terms of Service

Privacy Policy

Cookie Policy