New literal for string from contents of file

mklbtz · March 9, 2018, 9:49pm

Today, there are a number of literals that we can either use the IDE to generate or type by hand, such as: let red = #colorLiteral(red: 1.0, blue: 0, green: 0).

Back in 2016, @Erica_Sadun pitched the idea to tidy up the syntax and add a bunch of new literals. I liked the idea, but it received some pushback for being too overreaching (that's my impression anyway).

This is a much more focused pitch with a pretty clear use case and precedent in other languages. Please let me know what you think. Would this fly as a proposal? Could you see yourself using this?

Background

Last year, Swift 4 introduced several nice features for string literals, namely the multi-line syntax. This makes hard-coding strings very convenient. In my experience, this is great for medium-length strings, but it gets unwieldy if the string is especially long or contains some kind of code.

It's easy to imagine an application that would have this kind of problem: consider a web server with dozens of SQL queries, all typed out as multi-line strings. It is more difficult to grok the SQL when it's all one color, plus it clutters up the Swift code around it if it's especially long. You could mitigate some of this by moving all the strings to constants in one source file, but that starts to get messy in its own way.

The same thing goes for HTML, markdown, and other such strings.

Proposal

To ease this pain, I'd love to see a new literal introduced that lets us embed the contents of another file as a static string.

For example, suppose we have the following Xcode project or folder structure:

Project
├── Resources
│   └── long_query.sql
└── Sources
    └── server.swift

Our server.swift file might have a function that runs that "long query":

func execLongQuery() {
    let sql = #stringLiteral(resourceName: "long_query.sql")
    database.execute(sql)
}

This function is nice and short because we've hidden the SQL in its own file. It's easy to imagine how, without this, one's Swift code could quickly become overburdened with embedded SQL.

There is a literal that already exists which gives you the URL of a file. You could implement the above function using that instead:

func execLongQuery() {
    let file = #fileLiteral(resourceName: "long_query.sql")
    let sql = String(contentsOf: file)
    database.execute(sql)
}

But, there is a subtle difference here. Since #fileLiteral only gives you the URL at compile time, the program still has to load the contents of the file at run time. This behavior is a nonstarter for binaries that are performance-sensitive or don't ship as bundles — at least for this particular problem.

Today, the only solution is to embed the text in a multi-line string literal. But in some cases this is a real hinderance to your workflow. I think the #stringLiteral syntax has some nice benefits:

You get the contents of the file at compile time, removing runtime dependencies.
Your Swift code is cleaner, since it isn't littered with long string literals. (Although this can always be overdone and introduce obscurity problems.)
Using a separate file lets you use all the text-editing features you're used to, like syntax highlighting and autocomplete.

Notes

There is precedent for this in other languages. The one I'm aware of is Rust where they have a compiler macro for doing this exact thing. I'm sure there are other examples.
In Xcode, the #stringLiteral syntax would likely be hidden behind a nice icon as it is with #fileLiteral, #colorLiteral, etc.
SE-0039 – Modernizing Playground Literals
Medium post showing how to use literals

Chris_Lattner3 · March 10, 2018, 10:09pm

Interesting idea. I don't see anything obviously wrong with this idea, I agree that we don't otherwise have a good solution to this, and that object literal syntax is the right way to go. This could be useful for things like including CUDA code into your program, including binary blobs into your app etc.

That said, I'd want to make sure that it works with a variety of types, including [Int8], Data, String, StaticString etc. It also isn't clear to me whether building the idea of "resources" into the compiler is the right thing to do. It seems more natural to be an include path sort of thing (which could overlap in practice). In the context of SwiftPM / Xcode, I'd want to make sure that it is possible to have a build step run a tool that produces a blob which is then included into an app.

IOW, cool idea, please develop it further :-)

-Chris

Chris_Lattner3 · March 10, 2018, 10:10pm

Also, as to the proposal itself, please find or develop specific motivating use cases, to show that it would carry its weight and be worth adding.

For example, this is a near replacement for multi-line string literals, so you should make it clear that this is important in other cases as well (e.g. binary blobs) and find motivating cases where copying text into a multi-line string literal doesn't make sense.

mklbtz · March 11, 2018, 2:26am

Thanks for the feedback. I'll try to flesh this out more in a gist.

Using a path instead of a resource name, I agree, would be more natural. Originally I was going to pitch it that way until I saw that #fileLiteral takes a resource name and I figured it best to follow precedent. I'm not really familiar with it at all. Maybe it's more coupled with Xcode than I think. Should we change #fileLiteral to take a path too?

I love the idea of supporting a variety of types. Do you have an opinion on the approach? Seems like we have two options:

provide distinct spellings for each supported type
allow type annotations or inference to control the output of the one spelling

We'll need to workshop the spellings either way, probably. In the first case, stringLiteral and dataLiteral work well, but how would we spell it for [UInt8]? cStringLiteral could work. In the second case, it would look weird to see something like #stringLiteral(...) as Data since the left and right sides seem incongruent. If we go this route, perhaps contentLiteral would work as a more neutral term. Fits with the .init(contentsOf:) API too. I'm leaning toward the second option.

As far as motivating use cases go, my experience is mostly in web apps. I should be able to dream up some other use cases, but suggestions from folks with more direct experience would be very appreciated.

Cheers!

xwu · March 11, 2018, 3:16am

String literals in Swift do not have a type of their own; they can be coerced into any type that conforms to ExpressibleByStringLiteral. I'd think that any new way of writing a string literal should work in exactly the same way.

anthonylatsis · March 11, 2018, 4:29am

By the way, there is something I am wondering about. If you paste let color = #colorLiteral(red: 200.0/255.0, green: 0, blue: 0, alpha: 1) into Xcode or a playground, the color will be pure red instead. The same happens for blue and green. If I change the color components to arbitrary numbers 0>..1 it will be pure white. In other words, if a component > 0, it becomes 1, and if it equals 0, it stays 0. Bug?

Chris_Lattner3 · March 11, 2018, 7:06am

I don't have a strong opinion about this, but it probably makes sense to use something like #fileContentLiteral if we're allowing it to be any type, not a string. My recollection is that the ExpressibleByStringLiteral protocols assume that you're dealing with unicode encoded string data, which random binary blobs may not be. @xwu would know more. Maybe this could be made to work with any type that is ExpressibleByString or ExpressibleByArray literal, and in the later case it would be sugar for specifying an array literal containing the file contents, ala: [0xC, 0xA, 0xF, 0xE, ... ]

In any case, I'm not really sure what the right thing is here and unfortunately I don't have a lot of time to spend on this. I just think that it is an interesting and promising idea - one that I haven't heard before.

-Chris

anthonylatsis · March 11, 2018, 7:56am

I agree with Chris's suggestion of supporting various types. I suppose if we end up with a fixed and relatively small number of types (I would say max 5), we could name the macro differently to suit each type. Otherwise, we can try and infer the type based on the file extension and etc. However, as Chris mentioned, binary blobs may or may be not Unicode encoded. This generates two possible solutions:

Find the encoding of the file from its metadata.
Let the user decide what type to use with an enum parameter (.data, .uInt8, ...) if that is possible with macro.

The first, obviously, would indeed be wonderful.

xwu · March 11, 2018, 8:05am

I don't know more, but that's an interesting point. Encodings other than Unicode aren't supported by the standard library but by Foundation, so it's unclear how you'd support reading arbitrarily encoded data using a literal that's built into the standard library.

You could, however, have some sort of literal that gives you a pointer to code units that you feed into the relevant string initializers. However, even then, you couldn't use Data (that's in Foundation), so it'd have to end up being a buffer pointer, which seems pretty awful. And conversion between the data's encoding and Unicode would still be at runtime.

anthonylatsis · March 11, 2018, 8:20am

Is it necessary to limit us with using the STL only? I assume #colorLiteral doesn't belong to it since it uses UIColor ..

xwu · March 11, 2018, 9:05am

Again, literals don't have a type of their own. UIColor supports being initialized by a color literal (it conforms to the reserved protocol _ExpressibleByColorLiteral); that doesn't require the compiler to know anything about UIColor.

If you're going to support arbitrary encodings at compile time, the compiler needs to know how to decode that data, and that functionality has to come from Foundation or somewhere else.

Letan · March 11, 2018, 10:14am

You might find some inspiration here. I not sure if it’s exactly the same, but it might be useful for naming purposes.

anthonylatsis · March 11, 2018, 10:31am

I see. But we don't have to limit ourselves to the Standard Library only. We can declare the protocols in the Standard Library and make use of them including in Foundation. Just as UIColor made use of _ExpressibleByColorLiteral. We have Data and NSString from Foundation, and UInt8 and String from the Standard Library. I suppose this would do for a necessary condition.

samdeane · March 11, 2018, 1:45pm

Is this whole pitch generalisable to something like #dataLiteral(resourceName, optionalEncoding), where the encoding is either supplied explicitly or inferred using some well defined heuristic (could just be file extension, could be something smarter which examines the content at build time)?

The corresponding protocol would then be something like:

protocol ExpressibleByDataLiteral {
  init(data: Data, encoding: String)
}

It feels cleaner to have a single mechanism for embedding an arbitrary sequence of bytes. If you have the encoding information then it ought to be possible to make it work fairly seamlessly with String (for example), but also any number of other types.

anthonylatsis · March 11, 2018, 4:07pm

Data would require Foundation. Besides, we don't want to limit ourselves to only Data.

IMO the protocol should be 'generic' as other ExpressibleByLiteral protocol:

protocol ExpressibleByFileContentLiteral {

    associatedtype LiteralType

    init(fileContentLiteral: Self.LiteralType)
}

And be defined in the Standard Library, or analogously broke up into several protocols.

xwu · March 11, 2018, 5:39pm

Isn't that just a file literal, but changing the initializer so that it passes along the raw content instead of the resource name?

anthonylatsis · March 11, 2018, 6:04pm

I have the feeling you were answering to @samdeane, since the initializer in my protocol is dependent on LiteralType.

xwu · March 11, 2018, 6:08pm

It seems so, but it's unclear to me how one would express the encoding part when all interesting encoding functionality is in Foundation.

But that doesn't seem to be fundamental to the proposal where the pitch is to read the data at compile time, and without it the rest seems to be quite reasonable. So I think this is a nice way to formulate it: #dataLiteral would be a straightforward addition to the existing literals.

xwu · March 11, 2018, 6:08pm

No, I was responding to you. But thanks for pointing out @samdeane's response.

anthonylatsis · March 11, 2018, 6:12pm

Obviously you can pass whatever you want to that initializer, depending on how you define LiteralType. Could be NSData or String. I believe I got the parameter label semantics wrong. Should be something like resource

P.S. I edited my response quite a couple of times, I suppose that was the reason for the misunderstanding