How to convert String to byte array?

I would like to take a String:

let myStr = "This is a string"

And take the first n bytes from it:

let firstNBytes = ...

I'm not even sure what data type holds bytes in Swift, Data? UInt8?

2 Likes
let myStr = "This is a string"
let n = 5

let firstNBytes = Data(myStr.utf8.prefix(n))

// Note:
let tooShort = "..."
let bytes = Data(tooShort.utf8.prefix(n))
assert(bytes.count < n)
2 Likes

Wow, that's very elegant, thanks!
I'm still not sure after reading the docs, what does .utf8. It reads in the docs, "a utf8 encoding of self".
So... are those bytes? Is the string not already utf8?
If I print(string.utf8) it just shows the original string.

Is the string not already utf8?

No.

Swift models a string as a collection of Character values, where each value represents an extended grapheme cluster. This makes it easier to do the right thing when it comes to Unicode. Consider this:

let s1 = "naive"
let s2 = "na\u{ef}ve"   // U+00EF LATIN SMALL LETTER I WITH DIAERESIS
let s3 = "nai\u{308}ve" // U+0308 COMBINING DIAERESIS
print(s1)   // naive
print(s2)   // naïve
print(s3)   // naïve
print(s1.utf8.map { String(format: "%02x", $0) }.joined() ) // 6e61697665
print(s2.utf8.map { String(format: "%02x", $0) }.joined() ) // 6e61c3af7665
print(s3.utf8.map { String(format: "%02x", $0) }.joined() ) // 6e6169cc887665

The .utf8 property returns a UTF8View, that is, a ‘view’ of the string as UTF-8. The string may or may be stored as UTF-8 [1] but .utf8 lets you access it as if it were.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] For the gory details here, see the UTF-8 String blog post.

6 Likes

Note, depending on your use case, you do not need to necessarily even put the utf8 bytes in another data type like Array or Data. You can work directly off the UTF8 string projection.

4 Likes

Is there anyway to ask String to convert between the two forms?

let s2 = "na\u{ef}ve"   // U+00EF LATIN SMALL LETTER I WITH DIAERESIS
s2.convert-all-diacritical-to-separate-combine-form

let s3 = "nai\u{308}ve" // U+0308 COMBINING DIAERESIS
s3.convert-all-combine-character-to-single-diacritical-letter
1 Like

No, not currently.

let naïve = "naïve"
// ↑ I do not even know which it will be after passing through...
//   • the keyboard
//   • the browser text field
//   • the Discourse forum software
//   • the browser communication stream
//   • the forum server
//   • and all the way back again.

import Foundation

let logical = naïve.decomposedStringWithCanonicalMapping
assert(logical.unicodeScalars.count == 6)

let compressed = naïve.precomposedStringWithCanonicalMapping
assert(compressed.unicodeScalars.count == 5)
5 Likes

Ah. Good point: with the Foundation NSString APIs you can!

2 Likes

with the Foundation NSString APIs you can!

Depending on the API, this might have the (possibly unexpected/undesirable) result of converting your string into a bridged NSString, even if it started life as a native Swift string. This could end up converting your underlying actual storage to be UTF16. In theory this would be transparent, except perhaps for performance.

1 Like