SE-0451: Raw identifiers

1-877-547-7272 · October 26, 2024, 9:14pm

IMO, it's not that inconsistent. Tokens that start with a number would have to be escaped in normal use anyway to prevent the compiler from interpreting it as a number:

let `0` = 5
let x = 0   // x == 0
let y = `0` // y == 5

The above example isn't that different from:

let tuple = (0, `0`: 5)
let x = tuple.0   // x == 0
let y = tuple.`0` // y == 5

With that being said, I strongly agree that non-tuple types (where there wouldn't be any ambiguity) should be allowed to use numeric members without escaping them.

The proposal argues that we need to escape numeric members in order to prevent confusion between indexed and labeled tuple members, but that's not a concern for non-tuple types. Is anyone going to get confused by a statement like this?

let color = Color(hue: .red, variant: .100)

Forcing all uses of numeric members to be escaped adds unnecessary clutter and hinders the usefulness of this proposal. I think that we should allow unescaped numeric members whenever practical.

disc0infern0 · October 28, 2024, 1:17pm

+1000 for the proposed syntax:
@Test func testSquare() {

It is so much clearer.

grynspan · October 28, 2024, 1:20pm

That's the current syntax!

disc0infern0 · October 28, 2024, 1:27pm

In general, I would support this proposal, though I do not like the proposed Swift testing amendments.
I recently was doing some trig work in swift/SwiftUI, and found that instead of writing Angle(radians: 2*Double.pi) , I wanted to create a static extension on Angle, that would let me write Angle.360 , or given type inference, .360 to refer to an angle of 360 degrees. Similarly for 90, 180, 270.
So I support the idea of being able to define the extension as

extension Angle {
   static let `180` = Angle(radians: Double.pi) 
}

I would want to use it without the backticks.
I also would prefer not to use backticks when referencing enum cases. Perhaps disallow use of raw identifiers in tuples, since this seems a far more rare use case?

disc0infern0 · October 28, 2024, 1:30pm

Oops.I missed the /// comments above it. If those could be picked up automatically, (similarly to how the Xcode quick help functions do?), then that would be great.

grynspan · October 28, 2024, 1:41pm

I've been trying to figure out how to approach this suggestion. I would emphasize really quickly here that this proposal is not about adding a feature to Swift Testing per se—it's just that Swift Testing could benefit from it if it's implemented.

It is already perfectly valid to write what you have here, and it should compile as-is, but you still need to name the function under the comment, and we've gotten feedback that this is not as ergonomic as it could be and that it would be nicer if a test function could just have a single name. That doesn't then imply that @allevato's proposal here is the only option (or the best option or the worst option, for that matter.)

Comments are not part of the syntax tree in Swift and do not contribute to the name of a symbol. Even if they did in this instance (and from a technical standpoint, it's not impossible for us to make @Test consume those comments), it wouldn't solve Swift Testing's problem, nor would it solve the other problems this proposal is intended to address.

allevato · October 28, 2024, 2:00pm

Treating members of tuples and members of non-tuple types differently actually introduces more inconsistency when you start peeling back the edges. To say that you can write this:

struct S { var `0`: Int }; let s = S(); _ = s.0
enum E { case `0` }; _ = E.0

But then force the tuple use case to add backticks for disambiguation:

let x = (a: 1, `0`: 2); _ = x.`0`

...would be unlike any lookup rules currently in the language, because whether an identifier needs to be escaped or not is a decision made purely on its position in the syntax tree, not based on any type information.

If we wanted to have that rule, then are we restricting it to wholly numeric identifiers? We have to require backticks for member access in some cases because non-identifier characters would otherwise be ambiguous. For example,

enum BillboardTop100 { case `99 Luftballoons` }
_ = BillboardTop100.99 Luftballoons  // " " is confusable with token delimiting
_ = BillboardTop100.`99 Luftballoons`  // ok

enum AspectRatio { case `16:9` }
_ = AspectRatio.16:9  // ":" is confusable with ternary colon
_ = AspectRatio.`16:9`  // ok

enum Version { case `15.0` }
_ = Version.15.0  // `.` is confusable with member access
_ = Version.`15.0`  // ok

extension WestCoast { enum Rapper { case `2Pac` } }
_ = WestCoast.Rapper.2Pac  // not entirely numeric, but not confusable

So as far as I can tell, the requested rule would be "for tuple types, raw identifiers must always be escaped by backticks to distinguish them from element indices; for non-tuple types, raw identifiers may not be escaped if they start with an identifier character or a digit and otherwise only contain identifiers characters and digits." That's a mouthful, and it would have wide-ranging consequences for the implementation of the compiler and other tooling.

The proposal instead offers a simpler rule: "raw identifiers must always be escaped by backticks".

I certainly understand the aesthetic reasons behind the request, but optimizing for something so subjective isn't particularly a goal here, and carving out such narrow special exceptions would actually harm the understanding and usage of the feature. That being said, there's nothing in the proposal that specifically closes the door to that space being explored in the future, either.

filiplazov · October 28, 2024, 2:01pm

As someone who has written extremely long unit test function names, because I like to be very descriptive, i find this change way more readable over camelCase long function names.
However I hope we can restrict this per module, like unit test targets because I would really hope people would not use this in regular function names as that would negatively impact readability in my opinion.

ktoso · October 28, 2024, 2:12pm

I think this is a very useful thing to do, and the testing rationale is a very strong motivation IMHO.

This actually enables one of the "styles" that ScalaTest has offered since a while, and look as this in ScalaTest [1]:

class SetSpec extends RefSpec {
  object `A Set` {
    object `when empty` {
      def `should have size 0` {
        assert(Set.empty.size == 0)
      }
    }
  }
}

That's pushing it a bit further, but the basic idea to:

@Test 
func `empty Set: should have size 0`() { }

Small side note is that we could also allow

// @Test 
// var `empty Set: should have size 0` { }

to avoid the "hanging" () at the end there... but perhaps that's only bothering me and it't too big of a deal.

I should also bring up another example from my previous work-life where we used this Scala feature of `` quoting to great effect. The Akka HTTP library I worked on used this to model headers and content explicitly, to great effect:

// akka-http (scala)

// https://tools.ietf.org/html/rfc7232#section-3.2
object `If-None-Match` extends ModeledCompanion[`If-None-Match`] {
  val `*` = `If-None-Match`(EntityTagRange.`*`)
  def apply(first: EntityTag, more: EntityTag*): `If-None-Match` =
    `If-None-Match`(EntityTagRange(first +: more: _*))
}

  val `application/java-archive`                                                  = abin("java-archive", NotCompressible, "jar", "war", "ear")
  val `application/javascript`                                                    = awoc("javascript", "js")
  val `application/json`                                                          = awfc("json", HttpCharsets.`UTF-8`, "json")
  val `application/json-patch+json`                                               = awfc("json-patch+json", HttpCharsets.`UTF-8`)
  val `application/merge-patch+json`                                              = awfc("merge-patch+json", HttpCharsets.`UTF-8`)
  val `application/problem+json`                                                  = awfc("problem+json", HttpCharsets.`UTF-8`)
  val `application/grpc+proto`                                                    = abin("grpc+proto", NotCompressible)

This is very nice in practice because we then don't have to reinvent "weird spellings" and making weird upper-lower-snake-or-something-else constants, but can use the exact spellings a specification uses, which is really nice for reading and mapping the code to actual rutime

Long story short, I think this is a worthy extension to the language! I've been missing it here and there, and I'm really happy to see it come to Swift now :-)

[1] ScalaTest

lukeredpath · October 28, 2024, 4:43pm

I am against this proposal and do not see any tangible benefits - the syntax for both defining and using raw identifiers is undesirable. As others have said, if it were down to me I would not allow this to be used in any codebase I was responsible for.

The redundancy in swift-testing functions is indeed a mild annoyance, it would be great to find a solution to this, but this isn't it. -1 from me.

woolsweater · October 28, 2024, 6:03pm

I have carefully re-read the original proposal and the new version. I don't believe any of the original reasons for rejection have been overcome by circumstances or the amendments. This adds syntactic complexity to the language — particularly in reading code — that is not justified by the mild benefits.

The technical detail that the declaration let `foo bar baz` creates the identifier foo bar baz and not `foo bar baz` is irrelevant to a user of the language, because they have to read and write the backticks everywhere anyways. As a specific and concrete example, this does not in fact allow us to create enum cases with leading digits, because using the identifier always requires a leading backtick in the source: expressions like let code = HTTPStatus.403 and .failure(code: .403) are still invalid. The use case of allowing leading digits in property names is a worthy goal, and I don't believe that this proposal achieves it satisfactorily.

Similarly, the distinctions of meaning between escaped identifiers and raw identifiers are too fine. A reader's brain is often forced to act like a compiler at the semantic level; making them puzzle over syntax as well is an unnecessary extra burden. The marginal utility of writing a tuple like (a: Int, `0`: Bool) does not warrant the awareness of language minutia then required by reading the use sites t.`0` vs. t.0.

I believe this proposal should be rejected, again.

jlukas · October 28, 2024, 7:59pm

Could we not resolve this by banning duplicate labels in tuples? E.g. (a: 1, `0`: 2) would be invalid, thus avoiding the conflict. This would also make the .0 syntax that tuples use less magical.

Nobody1707 · October 28, 2024, 8:29pm

I don't see why numeric identifiers need to be back-ticked on use at all. We shouldn't allow tuples to have both an unnamed first field and a named `0` field at the same time (or more generally an unnamed field with an index of N and a named N should not both exist on the same tuple).

FlorianPircher · October 28, 2024, 10:09pm

I am in favor of raw identifiers, mostly for tests. Also, they can be useful for interoperability with other systems (languages, databases, …) that allow a different identifier syntax than Swift.

I don’t think they will be used much in regular code as the ` backtick character is less convenient than the common camelCase. So, most other code should be unaffected by this change. The `identifier` syntax exists already and this just makes it more useful.

Pippin · October 28, 2024, 10:31pm

No matter what is decided upon, it will still be annoying that there are now different rules depending on the structure of the identifier. Currently:

You can use backticks to make an identifier that matches a Swift keyword, but don't need them at the usage site.
Tuples don't need backticked-int-dot-syntax, because implementation/legacy reasons.

If allowed there would be more "exceptions" people would have to consider:

You can make (almost) any string an identifier, as long as it's backticked.
If a number, you don't need backticks at the usage site.
If not a number, you will need backticks at the usage site.

Obviously the last 2 exceptions might not make it. Inconsistency within the language is a bit annoying and having to know which X rule to apply in Y situation is even more annoying when writing.

I almost wish instead digits were allowed as identifier heads. The issue of "confusing with integers" is also annoying ... but self choosing. The only area I see that being an issue is if a digit-only identifier is in scope and I wish instead the compiler emitted a warning, or potentially an error, that this would cause confusion with integers and offer remedies (explicit self, maybe force backticked for local variables). This is not an issue for dot syntax usages.

https://docs.swift.org/swift-book/documentation/the-swift-programming-language/summaryofthegrammar#Lexical-Structure

There sure are some weird Unicode characters allowed in identifiers. TIL. /aside

I think the proposal puts too much on the testing example. As said above, there are other cleaner to write/read mechanisms to document tests. Where I see this as the most powerful is in tooling, auto-generated values, or domain-specific code like in the HTTP example. However this lands, there will be opinionated micro/macro style guides for its usage. I personally will probably never use this unless it is really^really^really better over other methods.

So, I support it. This will obviously make its way into frameworks but I see its potential in tooling efforts that have been made, or yet to be made, for the language. I can live with some backticks in certain places, even if it’s annoyingly inconsistent.

YOCKOW · October 29, 2024, 4:03am

I want to support this proposal because it is useful for metaprogramming that is covered in "Naturally non-alphabetic identifiers" and "Code generators and FFI".

You may often use directly Swift identifiers rather than dictionaries for kind of performance reason or for some other reason.

Examples which come to my mind are:

Use Case 1: Identifiers starting with numerics

// List of path extensions
enum PathExtension: String, Sendable {
     case `123` = "123"
     case `3dml` = "3dml"
     case `3ds` = "3ds"
     case `3g2` = "3g2"
     case `3gp` = "3gp"
     case `7z` = "7z"
     case aab = "aab"
     case aac = "aac"
    // Other cases follow
}

Use Case 2: Identifiers containing punctuations

// The map of Unicode scalars in "Po" category
enum OtherPunctuation: String, Sendable {
  // This (U+FE50) is allowed in current Swift.
  case `﹐` = "Small Comma"

  // This (U+00A1) is not allowed in current Swift.
  case `¡` = "Inverted Exclamation Mark"

  // :
  // :
}

However, as someone said, I also tend to think such identifiers containing whitespaces or punctuations should be limited to internal use. (Not limited by the language but recommended by the guideline?)

Karl · October 29, 2024, 7:39pm

Your post caused me to consider something else, which I think is quite important.

The reason the Swift grammar limits characters to the set it does is not solely to make parsing easier - it is also for security. Even the Unicode consortium does not recommend that languages accept all characters everywhere: UTS#55 Source Code Handling.

Source code, that is, plain text meant to be interpreted as a computer language, poses special security and usability issues that are absent from ordinary plain text. The reader (who may be the author or a reviewer) should be able to ascertain some properties of the underlying representation of the text by visual inspection, such as:

the extent of lexical elements within the text;

the nature of a lexical element (comment, string, or executable text);

the order in memory of lexical elements;

the equivalence or inequivalence of identifiers.

The potential presence in source code of characters from many writing systems, including ones whose writing direction is right-to-left, can make it difficult to ensure these properties are visually recognizable. Further, the reader may not be aware of these sources of confusion. These issues should be remedied at multiple levels: as part of computer language design, by ensuring that editors and review tools display source code in an appropriate manner, and by providing diagnostics that call out likely issues.

Accordingly, this document provides guidance for multiple levels in the ecosystem of tools and specifications surrounding a computer language. Section 3, Computer Language Specifications, is aimed at language designers; it provides recommendations on the lexical structure, syntax, and semantics of computer languages.

This is explored in more depth in UAX#31 Identifiers and Syntax.

The formal syntax provided here captures the general intent that an identifier consists of a string of characters beginning with a letter or an ideograph, and followed by any number of letters, ideographs, digits, or underscores. It provides a definition of identifiers that is guaranteed to be backward compatible with each successive release of Unicode, but also adds any appropriate new Unicode characters.

The formulations allow for extensions, also known as profiles. That is, the particular set of code points or sequences of code points for each category used by the syntax can be customized according to the requirements of the environment. Profiles are described as additions to or removals from the categories used by the syntax. They can thus be combined, provided that there are no conflicts (whereby one profile adds a character and another removes it), or that the resolution of such conflicts is specified.

And of course, it wouldn't be Unicode if it didn't come with a table, of which characters are allowed in which positions. These are the ID_* and XID_* family of properties, which we even make available in Unicode.Scalar.Properties - isIDStart | Apple Developer Documentation

They document some standard profiles, such as for allowing mathematical symbols and emojis. In theory we could extend it with our own profiles, but any extra characters we allow should be considered carefully.

Anyway, compare that with the set of characters allowed by the proposal:

A raw identifier may contain any valid Unicode characters except for the following:

The backtick (```) itself, which termintes the identifier.

The backslash (\), which is reserved for potential future escape sequences.

Carriage return (U+000D) or newline (U+000A); identifiers must be written on a single line.

The NUL character (U+0000), which already emits a warning if present in Swift source but would be disallowed completely in a raw identifier.

All other non-printable ASCII code units that are also forbidden in single-line Swift string literals (U+0001...U+001F, U+007F).

This seems extremely broad. In particular, it seems to allow isolated combining characters (which may combine with the surrounding text when rendered in an editor) and the Unicode line separator (U+2028), which is explicitly called out in UTS#55 because even though compilers tend not to recognise it as a newline (including the Swift compiler), editors may render it as one:

The Unicode Standard encompasses multiple representations of the New Line Function (NLF). These are described in Section 5.8, Newline Guidelines, in [Unicode], as well as in Unicode Standard Annex #14, Line Breaking Algorithm [UAX14].

An opportunity for spoofing can occur if implementations are not consistent in the supported representations of the newline function: multiple logical lines can be displayed as a single line, or a single logical line can be displayed as multiple lines.

For instance, consider the following snippet of C11, as shown in an editor which conforms to the Unicode Line Breaking Algorithm:
// Check preconditions.
if (arg == (void*)0) return -1;
If the line terminator at the end of line 1 is U+2028 Line Separator, which is not recognized as a line terminator by the language, the compiler will interpret this as a single line consisting only of a comment; to a reviewer, the program is visually indistinguishable from one that has a null check, but that check is really absent.

allevato · October 29, 2024, 7:52pm

This is an excellent point! While I made an adjustment to the proposal after the pitch that forbade an identifier from consisting of only whitespace (which does capture U+2028 and U+2029), that didn't go so far as to forbid the use of those separators in an identifier if it contained other non-whitespace. Just as we currently ban the ASCII line separators from raw identifiers, I think it would be perfectly reasonable (and proper) to extend that to include those Unicode code points as well, essentially matching what the standard library implements in Character.isNewline.

JuneBash · October 29, 2024, 7:54pm

I am all for the idea of having these sorts of raw identifiers, namely for tests, but I agree with some of the other posters here expressing concerns about security & ugliness. For tests, could this be covered by a variant of the @Test macro?

@Test("2 * 2 evaluates to 4") {
  #expect(2 * 2 == 4)
}

allevato · October 29, 2024, 7:57pm

As @grynspan mentioned above, this is not a swift-testing feature proposal, so major changes to how swift-testing declares tests are out of scope here. The swift-testing use case is just used to illustrate one of many places where the feature would provide a benefit, specifically without having to make significant design changes to other libraries.

(I'll add, however, that in my opinion, removing the test function name entirely would only serve to make the testing experience worse. One of the benefits of this proposal is that the test description becomes the name of the test, meaning that there is a single description/symbol that appears in all backtraces, UIs, debugger views, etc. The proposed syntax would remove the need to "name the test twice", but the tooling must still conjure up a name for that thing, and that name would be completely hidden from users and tooling.)