New literal for string from contents of file

I'm with you on that. Vending Data seems like the minimum level of ergonomics we'd want to provide. Even building a string, my original motivation, would be fairly easy.

let _ = String(data: #fileContentLiteral(...), encoding: .utf8)!

That said… all the other literals have a corresponding ExpressibleBy protocol. If we're going to the trouble to vend Data, it wouldn't be much more effort to add such a protocol for this and implement it for Data, String, etc.

What I'm not sure about is how much effort it would take on the compiler side to make this "macro" behave like a function with a generic return type. But since this is how #colorLiteral works, I can't imagine it'd be too much.

It's possible to make your own Color type and still use literals.

let _: MyColor = #colorLiteral(red: 1, green: 1, blue: 1, alpha: 1)

This is why I was pushing for the same kind of design.

let _: StringyString = #fileContentLiteral(...)

True. But, speaking about reasonable quantities, it's great if we don't have to spend that time at run time. Likely the asymptotics are linear, since we are talking about loading and allocating strings and arrays of bytes. But it looks like we must sacrifice space - twice as much - for our data for everything to be fast. This is a very important point to consider.

Unfortunately, the 'ExpressibleByColorLiteral' protocol is internal, but we of course could otherwise, like with the other open *literalExpressible protocols. Maybe ExpressibleByDataLiteral should be internal too to discourage usage with types other than the Standard ones.

1 Like

I'm working on the proposal for this. I'm looking for feedback on some details that haven't yet been discussed yet. Here's the work-in-progress

Regardless of the exact syntax, we want the parameter to be a file path, like this: #xxxx(contentsOf: "path/to/file.ext"). Here's an excerpt on the constraints on this parameter:

It is considered an buildtime error to provide an empty path or a path to a file that does not exist or cannot be read. Paths must be written using Unix conventions. When a relative path is written, the compiler will use the project directory as the root directory for the relative path. Here, "project directory" refers the directory containing the .xcodeproj or Package.swift.

I'm making a few assumptions about relative file paths here. Using the project directory as the root seemed natural to me, but I might be overlooking something. Anyone have thoughts on this?

Also, I realize it's totally possible to have an Xcode project for a SPM package that's in a different directory. I'm not sure how to rectify that. Do Xcode projects have a more explicit "project root"? Would it happen to match the path to Package.swift when you use swift package generate-xcodeproj?

Cheers

I think this is a great idea, but in its current form it looks to become a usability problem, in Xcode at least.

The problem is that, as soon as you enshrine even a relative path in code, something as simple as dragging a project item to a different place in the project hierarchy may easily break the code. A single, isolated instance of this is not hard to deal with, but in a large, source-controlled, multi-developer project, these kinds of errors build up to a constant nuisance.

A different problem arises when the file contains text. Traditionally, Xcode tolerates variations of encodings (UTF-8, UTF-16, and other things) and variations of line-endings in source code. Text files come from external sources with varying combinations of such variations, and by the time they're added to the project, the variations are opaque to text editing.

That says to me that such files need a "compilation" pass (in general). For pure binary data, the "compilation" is a pass-through. For text, the compilation is (say) a re-encoding to UTF-8 with standardized line ending. For other recognized types, if any, a different compilation might be needed.

Separately, it makes no sense to me that the contents of these files are literally compiled into other Swift source files. That make the compilation unit bigger and slower, and (by the syntax-coloring argument used as motivation) no one really wants to look at the file contents in situ in the source file.

Surely it would be better to break this entire proposal into two parts:

  1. Compile the external file independently into an object file, with some mangled/namespaced public symbol, representing some kind of Sequence-compatible data (e.g. a [UInt8] if nothing better), that's linked into the final binary. Maybe this could use a double-barreled file extension convention such as .swiftdata.png for binary, or .swifttext.sql for readable text.

  2. Change the #xxx literal syntax to refer to the public symbol, using the associated data in an initializer for Data, as currently proposed.

I realize this is not a trivial alternative, and — more significantly — cannot be done without changes to Xcode (at least to recognize the need to "compile" one of the special files). However, I can't think of any approach to this file-literal feature that can work reasonably without some Xcode support.

It would be unfortunate if this great feature was implemented in a way that caused Xcode users to shun it because it was too troublesome to use.

FWIW

I have the impression you think the contents of a file will be visibly embedded. The idea is to have a ready-to-use object with the contents loaded into it. Excuse me if I misunderstood.

I don't see why this should be considered a strong argument. Couldn't you say the same about all images, textures, scenes, animations etc. you refer to with paths when developing games, for instance?

Apart from that, you have a point (some of it was already discussed though and people who participated are aware).

Correct. Any kind of special IDE support (like displaying file contents in the editor) is out of scope and indeed would vary from editor to editor. This is more about compiler support for embedding data.

I wasn't sure whether there might be an intention that Xcode would display some representation of the contents (which it does for, say, a color literal in a playground).

If the contents aren't represented "embedded", I was trying to say, that supports the idea of separate compilation. (I was not calling for this kind of representation.)

No, because if you're retrieving resource files at run time, you typically use the Bundle-relative resource APIs, and are not directly concerned with literal path strings at all. (You can choose to use some subpaths within the bundle's resources directory, but it's something of a PITA, and typically not necessary.)

So you mean dragging around a file in your project will likely keep it on the same bundle. I think we can mitigate the shortcoming of invalid paths simply avoiding absolute paths. For example "assets/some.js" would search for an assets folder. Or even use just file names.

1 Like

It will (without the "likely"). What goes into the bundle is controlled by target membership, which is unrelated to the project item hierarchy. The project item hierarchy itself bears no necessary relationship to the file system hierarchy of the files represented by the items. The bundle directory hierarchy bears no necessary relationship to the project item or source file system hierarchies.

Even then, Xcode would have to be changed to tell the Swift compiler where to look, since the above variability means that the location of the .swift file is no guide.

Thanks for pointing this out! Although it is obvious upon reading, I never actually though about it.

Xcode can help with indicating the right bundle, I take it? But yes, of course, for this to work nicely we will need some Xcode support. I presume that is your main point.

Yes! :slight_smile:

1 Like

This is a great idea that solves one of my biggest problems on linux -- there is no way for a unit test to read external files (like json files to parse) without using an absolute path. String literals don't really work because you have to put escapes in them.

I think the relative path should be to the source file, and the absolute path to the project directory. If someone moves a file, you'll get a compile error. Target membership doesn't matter, since it is included at compile time. If you move a file in Xcode, you'll get a compile error if you didn't move the incuded file or change the path in the literal.

I also don't see any reason to worry about encodings. Require it to be utf8. This is part of a source file, not something loaded at runtime. A string variant and a data variant that just returns a byte array should cover all cases, with the convience for string as that would seem to be the most likely case.

This is a feature idea for Swift, not Xcode. Let the Xcode team worry about how to implement it visually.

1 Like

I don't think data literals should be part of the language. We need cross-platform resource management, so maybe this will allow embedded resources (similar to the CREATE_INFOPLIST_SECTION_IN_BINARY setting in Xcode).

I'm going to try to come at this from a different direction.

When you start putting long chunks of data in files, you quickly start running into the need for templating. For instance, your SQL query string needs to support both forwards and backwards sorting. Or the HTML you're generating needs a person's name substituted in. However, we don't want to create a whole templating language from scratch—that's a huge undertaking.

Embedding file names in your code is also a bit fraught. We don't really include any compile-time file names in Swift code currently; it's too big a pain, and the difference between compile-time and runtime locations can be too confusing.

So maybe we should take a different approach here: Instead of having a file name literal syntax, we should make it easier to write functions which merely generate big wads of string data. This would allow much more sophisticated behavior and easier handling for the compiler, at the cost of having a little bit of Swift syntax in the files and giving them a .swift extension.

So with this design, the execLongQuery() function would look more like:

func execLongQuery() {
   let sql = makeLongQuery()
   database.execute(sql)
}

And the actual query definition might look like:

func makeLongQuery() -> String """
    SELECT events.id, events.created_at, ...
      FROM events, ...
     WHERE ...
    """

If you wanted to support changing the sort order, that might look like:

func makeLongQuery(ascending: Bool = true) -> String """
    SELECT events.id, events.created_at, ...
      FROM events, ...
     WHERE ...
     ORDER BY events.created_at \(ascending ? "ASC" : "DESC")
    """

If you ever want to use more complicated logic than can easily fit into an interpolation…well, refactor it into a function with a normal body. It's called in exactly the same way.

A function like this could be declared to return any ExpressibleByStringLiteral type. For example, if you had a "safe SQL statement" type or a "safe HTML fragment" type with automatic escaping, you could declare your function to use those instead.

And as for supporting Data values? Conform Data to ExpressibleByStringLiteral and you can declare your function to return Data instead of String.

A #dataLiteral(resourceName:) would likely be used with NSDataAsset in AppKit and UIKit.

On other platforms, it could be used with a third-party NSDataAsset type for source compatibility.

That's a good insight. I'll admit, I haven't been thinking about interpolation at all thus far. I agree, a templating language is not something we should build into Swift. I guess without direct interpolation you'd have to pipe the content through String.init(format:), which is okay.

Either way, I think that's a red herring. This proposal has become more about embedding any data, not just strings. I should think more on this and perhaps pick a different example as my motivating use case. Chris Lattner suggested there may be some interesting uses around graphics programming (CUDA), I think.

Foundation supports the "data" URL scheme and Base-64 encoded strings.

Smaller data assets could be stored as string constants in the binary.

I'd much rather have a data literal from file than a string from file.

1 Like

omg i would love being able to have GLSL shaders be literals inside my swift code. and you wouldn’t really need templating for that. but it shouldn’t be tied to a specific Foundation API it should just be a String

3 Likes

Random opinions:

I agree with Erica and others that this proposal make sense to focus on data rather than strings. The later can be constructed from the former :-)

I'm personally not too concerned by the load time issues: modern OS's lazily page in the contents of read-only data sections.

I'm also not too worried about the usability issues of "drag files around in Xcode and have references break". I'm not worried for two reasons: on the one hand, this is already an area of fragility in Xcode (sigh) and on the other hand, Xcode does would have exactly the information it would need to update these references. Also, if the code was broken, it would be a build error when the compiler tried to handle this, so there isn't a danger of the code being broken in a subtle way.

-Chris

3 Likes