Should we provide a way to embed/include a file dump as raw data?

Here's some sample syntax:

let myResourceBytes = #embed("myResource")

Following random links, I found a C++ proposal to create an object that is a byte dump of a specified file. It's supposed to replace hacks that read the file and insert it into a *.cpp (or *.hpp) file as a large array literal.

The argument to #embed is a compile-time string. The format of what's in the string to what file, URL, or other resource to read for the data is implementation-specified. I'm wondering what the return type should be. It could be:

  • A standard library Array: [UInt8]
  • A fixed-sized array: [XXX ; UInt8], where XXX is the byte length of the resource

If we go with the latter, then the embedding feature is blocked until FSAs are done. The C++ author wanted to use their embed command in constexpr contexts, where the compiler can trim out unused portions of what would go into the binary. If we can do compile-time manipulations on a standard-library Array, then we could add this feature earlier.

I remember trying out things like:

let tooShort: Int8 = 500

And although the default integer types are part of the standard library, the compiler still pierced the expression and reported the literal is too large for the type. Can we do similar manipulations with Array? If not, could it be added? Could it be added in a way cheaper than the FSA route?

5 Likes

I think the best way is not to choose one type, and instead remove underscore from _ExpressibleByFileReferenceLiteral protocol, and make Array<UInt8> conform to it. That way we can use already existing #fileLiteral, and allow people to add their own conformances

Just like 123 can be used to initialize Int, and UInt8, just like #colorLiteral can be used to initialize UIColor and NSColor, #fileLiteral should be used to initialize both URL and [UInt8], and in the future fixed size arrays

I would love to be able to initialize a String and [UInt8] from a file! Very handy when solving Advent of Code puzzles

2 Likes

I think this would also be useful for test data, where failing to build due to a missing resource is perfectly acceptable.

1 Like

If this is to be generalized, I'd suggest a design where the resource is written to an Array<UInt8>, and then the variable being assigned would be passed this hidden temp variable as a constructor parameter.

source

let resource: MyType = #resource(url: "file:///path/to/resource")

compiler rewrite

let $_resource = [ ... ] // bytes from file
let resource = MyType(bytes: $_resource)

This would simplify the protocol to a single requirement: the constructor. Parameter name (bytes in the example) to be bikeshedded.

The private one already has only one requirement: the constructor :)

I think we should be consistent with the rest of ExpressibleByXLiteral protocols, and name it init(fileLiteral value: FileLiteralType)

We don't have to choose a single type to write the resource to :) That way we can preserve more information, and allow expansion in the future. We can do what the other ExpressibleByXLiteral protocols do and make the type an associated type

public protocol ExpressibleByFileLiteral {
  associatedtype FileLiteralType: _ExpressibleByBuiltinFileLiteral
  init(fileLiteral value: FileLiteralType)
}

and offer multiple possibilities for FileLiteralType.
I would suggest

  • URL: preserves all info, we can extract everything from it, including path, name, filesystem flags
  • Array<UInt8>: That's the one you want for contents, I would guess it could be optimized nicely by a compiler

I'm concerned with how the compiler conceptually transforms the code such that you end up with a variable with a custom type, whose data was loaded from a file. I am not as concerned with the syntax of the literal specifier, or what the literal itself represents to the compiler.

If you passed a URL to the constructor, you would be loading the contents at runtime, right? That doesn't seem like the same feature to me.

A similar topic was discussed last year:

But now that "SE-0271: Package Manager Resources" has been accepted, it might be better to think of .embed as another package rule (alongside the existing .copy and .process rules) to be applied to a file or directory path.

"Type-safe access to individual resource files" would then include support for embedded data.