Is this a Character bug?

Is this expected behavior? Should it even be possible to instantiate a Character like myChar?

let myChar = Character.init(" aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc")
print(myChar)              //  aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc
print(myChar.isWhitespace) // true
print(myChar.isLetter)     // false
print(myChar.isLowercase)  // true

(myChar is of type Character, not eg Optional<Character>.)


Which .init is being called there?
Alt-clicking the init in Xcode suggests that it is init?(_ description: String) but that can't be right since the following shows that it isn't a failable initializer:

let myChar2 = Character.init(" aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc")! // ERROR: Cannot force unwrap value of non-optional type `Character`

Jumping to the definition of the intit will take me to this (declared in an extension to Character):

    /// Creates a character from a single-character string.
    ///
    /// The following example creates a new character from the uppercase version
    /// of a string that only holds one character.
    ///
    ///     let a = "a"
    ///     let capitalA = Character(a.uppercased())
    ///
    /// - Parameter s: The single-character string to convert to a `Character`
    ///   instance. `s` must contain exactly one extended grapheme cluster.
    @inlinable public init(_ s: String)

which I guess might be true. But the documentation of this initializer says that the given string must be a single-character string that contains exactly one extended grapheme cluster (which the string of my example doesn't), and it says nothing about what should happen if the given string doesn't meet those requirements. Shouldn't it trap or be a failable initializer?


Is the Character myChar really a "single extended grapheme cluster that approximates a user-perceived character"?
And if so, what does that user-perceived character look like? The following does not work (as expected):

let myChar3: Character = " aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc" // ERROR: Cannot convert value of type 'String' to specified type 'Character'

while eg the following does (as expected):

let myChar4: Character = "πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦" // Compiles
3 Likes

I understand that to be a rhetorical question, but just in case it isn’t: That string contains 5 extended grapheme clusters (one space, three letters and one Emoji cluster); it is not a valid Character.

No. In addition, when I pasted it into a playground (Xcode 10.3 + Swift 5.0.1), it trapped as expected:

Fatal error: Can’t form a Character from a String containing more than one extended grapheme cluster

Are you seeing this happen in a newer or an older version of Swift?

1 Like

No, it should trap.

Swift 5.1 also traps.

1 Like

Thanks, it looks like this happens in only one specific Xcode project of mine (a Command Line Tool project, using Swift 5.0.1).

Details

I tried it with swiftc and then it traps as expected:

s$ swiftc --version
Apple Swift version 5.0.1 (swiftlang-1001.0.82.4 clang-1001.0.46.5)
Target: x86_64-apple-darwin18.7.0
$ cat > test.swift
let myChar = Character.init(" aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc")
print(myChar)              //  aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc
print(myChar.isWhitespace) // true
print(myChar.isLetter)     // false
print(myChar.isLowercase)  // true
$ swiftc test.swift 
$ ./test
Fatal error: Can't form a Character from a String containing more than one extended grapheme cluster
Illegal instruction: 4

And if I start a new Command Line Tool (Xcode 10.3 (10G8)), and let the main.swift file be the same code as above, it will also trap as expected.

But in the Xcode project in which I bumped into and this issue, it sill behaves as reported in the OP ... Trying to isolate the issue, I can remove all code in the project except the main.swift file containing this:

#if swift(>=5.0.1)
print("Swift version >= 5.0.1")
#else
print("Swift version < 5.0.1")
#endif
let myChar = Character.init(" aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc")
print(myChar)              //  aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc
print(myChar.isWhitespace) // true
print(myChar.isLetter)     // false
print(myChar.isLowercase)  // true

And it will still compile and run without trapping, printing this:

Swift version >= 5.0.1
 aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc
true
false
true

I can not seem to reproduce the bug from anywhere else, it happens only in this particular project, so I opened a bug with it as an attachment:

SR-11292

1 Like

can you reproduce in that project on 5.1? if not, could you share why it needs to be investigated on 5.0.1?

I don't have Xcode 11 beta installed, and I guess I thought it might be interesting to know if anyone else can reproduce it using that project, both in Xcode 10.3 and 11 beta (with Swift 5.1), or if it's just me.

I'm not sure why you think it shouldn't be investigated on 5.0.1 (and 5.1)?

It seems that the bug occurs when the program is compiled with optimization:

$ swiftc charbug.swift 
$ ./charbug 
Swift version >= 5.0.1
Fatal error: Can't form a Character from a String containing more than one extended grapheme cluster
Illegal instruction: 4

$ swiftc -O charbug.swift 
$ ./charbug 
Swift version >= 5.0.1
 aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc
true
false
true

Setting the build configuration to "Debug" in your sample project also makes the program abort with a fatal error, as one would expect.

Tested with Xcode 10.3/Swift 5.0.1 and Xcode 11 beta 5/Swift 5.1.

AhπŸ€¦β€β™‚οΈ
Can't believe I didn't check debug/release and thought that it had to do with that specific project ...

Thanks Martin!

I've edited the Jira so that it points to this post, as the bug can now be fully demonstrated simply like so:

$ swiftc --version
Apple Swift version 5.0.1 (swiftlang-1001.0.82.4 clang-1001.0.46.5)
Target: x86_64-apple-darwin18.7.0
$ cat test.swift 
let myChar = Character(" aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc")
print(myChar)              //  aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc
print(myChar.isWhitespace) // true
print(myChar.isLetter)     // false
print(myChar.isLowercase)  // true

$ swiftc test.swift && ./test
Fatal error: Can't form a Character from a String containing more than one extended grapheme cluster
Illegal instruction: 4

(Traps as expected.)


$ swiftc -O test.swift && ./test
 aπŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦bc
true
false
true

(Happily accepts the invalid string and creates a very strange "Character" instance.)


Just to double check, does that mean that the bug has not been fixed in 5.1?

Right, this appears to be enforced as a _debugPrecondition rather than _precondition (https://github.com/apple/swift/blob/175464aa5af150de004c375720013eec7aa10137/stdlib/public/core/Character.swift#L177) at present.

With the reason behind the difference in behaviour now tracked down, I suppose we can say that this isn't a bug. The documentation says:

s: The single-character string to convert to a Character instance. s must contain exactly one extended grapheme cluster.

so it is clearly a programmer error. Whether it makes sense to enforce this at runtime in optimised builds will depend on the performance impact. On the other hand, I'm a little surprised that this isn't picked up at compile time even in optimised builds for this particular case, but I know that Unicode evolution means that this kind of validation can only be done in a best-effort manner.

I still can't see any good reasons why Character.init(_ s: String) shouldn't trap (even in release builds) if s cannot be represented as a valid character (ie exactly one extended grapheme cluster).

The number of extended grapheme clusters in any String is always available at runtime via eg s.count, and if s.count != 1 it could simply trap, no matter if it's debug or release, couldn't it?

If it is for performance reasons, I think adding a Character(unchecked s: String) is better a better option than leaving the existing one as is (ie silently creating strange non-character Character values).

I have had somewhat hard to find bugs that would have been obvious if this init had trapped.

Sure, performance reasons are exactly why _debugPrecondition exists. I doubt converting every function in the standard library that uses _debugPrecondition into separate checked and unchecked versions is the right answer, but perhaps it makes sense in this case. I guess someone could look into the performance impact here. Note that the unchecked initialiser already exists, but it is currently internal.

1 Like