String padding method is broken?

It looks like String's padding method is broken. It's docs say it will pad to characters, not unicode scalars.

Here's a simple example. This string uses a combining character.

var str = "аа́а"
str.count   ==> 3
str.unicodeScalars.count   ==> 4
str.padding(toLength: 3, withPad: " ", startingAt: 0)   ==> "aá"

Rob

1 Like

The padding method is actually a method provided by NSString, so you'll get this behavior in Objective C as well.

It also depends on how your string is formatted:

func printStuff(str: String) {
  let nsstr = NSString(string: str)
  
  print(str.count, nsstr.length, str.unicodeScalars.count)
  
  print(str.padding(toLength: 3, withPad: " ", startingAt: 0))
}

let s1 = "a\u{e1}a"
let s2 = "aa\u{301}a"

printStuff(str: s1)
printStuff(str: s2)

Prints:

3 3 3
aáa
3 4 4

Can you file a documentation bug with Apple (https://bugreport.apple.com) to refer to Unicode scalars rather than characters?

Sure, I can do that. Working by character rather than code point seems more useful and more consistent with Swift strings, but I guess if it's an old Objective-C method, the behavior can't be changed now. Fortunately this is simple to implement.

Oof, now I wonder if it really is Unicode scalars or if it's UTF-16 code units.

2 Likes

I think it's UTF-16. I tried some emoji strings that have different numbers for unicodeScalars.count and utf16.count and then padding(...) destroys the emojis.

2 Likes

It is UTF-16 code units (i.e. the unichar s of NSString):

let s = "🏁"
print(Array(s.unicodeScalars)) // ["\u{0001F3C1}"]
print(Array(s.utf16)) // [55356, 57281]

let p = s.padding(toLength: 3, withPad: " ", startingAt: 0)
print(Array(p.unicodeScalars)) // ["\u{0001F3C1}", " "]
print(Array(p.utf16)) // [55356, 57281, 32]
3 Likes

Good analysis! This looks like a good argument for treating Strings as extended grapheme clusters (most of the time).

str.padding(toLength: 3, withPad: " ", startingAt: 0)

Is still broken for unicode strings. Is there anything I can help with to improve this?