I’m working with a C API that represents strings as UTF-8 data tagged with a length but **without a trailing NUL byte**. In other words, its string type is basically a tuple {const char*, size_t}. I need to convert this representation to and from Swift 4 strings.
This needs to be efficient, as these calls will occur in some areas of my project that are known to be performance-critical. (Equivalent conversions in my Obj-C code have already shown up as hot-spots and been carefully optimized.)
For String-to-UTF-8, I’m using String.withCString():
_ = str.withCString { bytes in c_function(bytes, strlen(bytes)) }
An alternative is
let bytes = [UInt8](str.utf8)
c_function(&bytes, bytes.count)
Any idea which of these is more optimal? The former has to call strlen, but I suspect the latter may incur more heap allocation.
For UTF-8-to-String I use this, where `stringPointer` is an UnsafeRawPointer and stringLen is an Int:
let data = Data(bytes: stringPointer, count: stringLen)
return String(data: data, encoding: String.Encoding.utf8)
I’m unhappy about this because it incurs both heap allocation and copying the string bytes. But Data doesn’t seem to have the “noCopy” options that NSData does. Any way to pass the bytes directly to String without an intermediate copy?
—Jens
PS: I’m aware this is an FAQ, but I’ve already put in time searching. Most of the hits are obsolete because the damn String API keeps changing, or else they assume NUL-terminated C strings; and the remainder don’t consider performance.
Why not just profile it? Set up a loop of 100,000 or so with each method and time it.
Ryan
···
On 4 Nov 2017, at 6:42 am, Jens Alfke via swift-users <swift-users@swift.org> wrote:
I’m working with a C API that represents strings as UTF-8 data tagged with a length but **without a trailing NUL byte**. In other words, its string type is basically a tuple {const char*, size_t}. I need to convert this representation to and from Swift 4 strings.
This needs to be efficient, as these calls will occur in some areas of my project that are known to be performance-critical. (Equivalent conversions in my Obj-C code have already shown up as hot-spots and been carefully optimized.)
For String-to-UTF-8, I’m using String.withCString():
_ = str.withCString { bytes in c_function(bytes, strlen(bytes)) }
An alternative is
let bytes = [UInt8](str.utf8)
c_function(&bytes, bytes.count)
Any idea which of these is more optimal? The former has to call strlen, but I suspect the latter may incur more heap allocation.
For UTF-8-to-String I use this, where `stringPointer` is an UnsafeRawPointer and stringLen is an Int:
let data = Data(bytes: stringPointer, count: stringLen)
return String(data: data, encoding: String.Encoding.utf8)
I’m unhappy about this because it incurs both heap allocation and copying the string bytes. But Data doesn’t seem to have the “noCopy” options that NSData does. Any way to pass the bytes directly to String without an intermediate copy?
—Jens
PS: I’m aware this is an FAQ, but I’ve already put in time searching. Most of the hits are obsolete because the damn String API keeps changing, or else they assume NUL-terminated C strings; and the remainder don’t consider performance.
_______________________________________________
swift-users mailing list
swift-users@swift.org https://lists.swift.org/mailman/listinfo/swift-users
There are lots of different ways to achieve your two goals and I wouldn’t even start optimising this without a realistic model of what your strings look like in practice, and a performance test based on that model.
Share and Enjoy
···
On 3 Nov 2017, at 19:42, Jens Alfke via swift-users <swift-users@swift.org> wrote:
Any way to pass the bytes directly to String without an intermediate copy?
doesn’t the compiler like to optimize the loop out of the benchmarking
code? i’ve always had a hard time writing benchmarks in Swift
···
On Fri, Nov 3, 2017 at 7:10 PM, Ryan Walklin via swift-users < swift-users@swift.org> wrote:
Why not just profile it? Set up a loop of 100,000 or so with each method
and time it.
Ryan
> On 4 Nov 2017, at 6:42 am, Jens Alfke via swift-users < > swift-users@swift.org> wrote:
>
> I’m working with a C API that represents strings as UTF-8 data tagged
with a length but **without a trailing NUL byte**. In other words, its
string type is basically a tuple {const char*, size_t}. I need to convert
this representation to and from Swift 4 strings.
>
> This needs to be efficient, as these calls will occur in some areas of
my project that are known to be performance-critical. (Equivalent
conversions in my Obj-C code have already shown up as hot-spots and been
carefully optimized.)
>
> For String-to-UTF-8, I’m using String.withCString():
> _ = str.withCString { bytes in c_function(bytes, strlen(bytes)) }
> An alternative is
> let bytes = [UInt8](str.utf8)
> c_function(&bytes, bytes.count)
> Any idea which of these is more optimal? The former has to call strlen,
but I suspect the latter may incur more heap allocation.
>
> For UTF-8-to-String I use this, where `stringPointer` is an
UnsafeRawPointer and stringLen is an Int:
> let data = Data(bytes: stringPointer, count: stringLen)
> return String(data: data, encoding: String.Encoding.utf8)
> I’m unhappy about this because it incurs both heap allocation and
copying the string bytes. But Data doesn’t seem to have the “noCopy”
options that NSData does. Any way to pass the bytes directly to String
without an intermediate copy?
>
> —Jens
>
> PS: I’m aware this is an FAQ, but I’ve already put in time searching.
Most of the hits are obsolete because the damn String API keeps changing,
or else they assume NUL-terminated C strings; and the remainder don’t
consider performance.
> _______________________________________________
> swift-users mailing list
> swift-users@swift.org
> https://lists.swift.org/mailman/listinfo/swift-users