Extension write() to OutputStream

PatrickA · December 14, 2020, 8:28pm

I'm attempting to write a small extension to OutputStream in Swift 5. I'm unsure of myself, especially dealing with UnsafeBytes, rawBufferPointer and such.

So I'm reaching out here for some constructive criticism to this extension in the hope of better understanding and improving the code. Could the code mishandle the buffer, could self.write() return 0 and loop? Is there a better way to write to an outputStream (potentially by increments)

extension OutputStream {
  /**
   Write `String` to `OutputStream`

   - parameter string:                The `String` to write.
   - parameter encoding:              The `String.Encoding` to use when writing the string. This will default to `.utf8`.
   - parameter allowLossyConversion:  Whether to permit lossy conversion when writing the string. Defaults to `false`.

    - Returns:                         Return total number of bytes written upon success. Return `-1` upon failure.
   */
  
  func write(_ string: String, encoding: String.Encoding = String.Encoding.utf8, allowLossyConversion: Bool = true) -> Int {
    if let data = string.data(using: encoding, allowLossyConversion: allowLossyConversion) {
      var remainingBytes = data.count;
      var totalBytesWritten = 0;
      while (remainingBytes > 0) {
        let bytesWritten = data.withUnsafeBytes({ (rawBufferPointer : UnsafeRawBufferPointer) -> Int in
          let bufferPointer = rawBufferPointer.bindMemory(to: UInt8.self)
          return self.write(bufferPointer.baseAddress! + totalBytesWritten, maxLength: remainingBytes)
        })
        guard bytesWritten >= 0 else { return -1 }
        totalBytesWritten += bytesWritten
        remainingBytes -= bytesWritten
      }
      
      return totalBytesWritten
    }
    return -1
  } // end write
} // end extension

eskimo · December 15, 2020, 10:23am

There’s a bunch of things I’d do differently here but I want to start by clarifying your semantics. Your method returns an Int where:

-1 indicating an error
The only other possible value being the size of the data written

The first result is a quite weird: Usually when I wrap write(_:maxLength:) I have the method throw if it gets an error. In your current model the client has to use the streamError property to find out what went wrong, and that won’t work in all cases [1].

The second result is also a bit weird because you loop until you’ve written all the bytes. Are you expecting the client to need to know how many bytes were written after the string-to-data conversion.

Personally, I’d split this into two:

First create a method that writes a Data value. This can throw but it has no result because it always either writes all the data or fails.
Next create a wrapper that does the string-to-data conversion and then calls the above-mentioned method. You may want this to return an Int saying how many bytes were written, but you could then make that result discardable for clients who don’t care.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] If the string-to-data conversion fails you return -1 without ever having written to the stream and thus streamError won’t be valid.

PatrickA · December 16, 2020, 12:50am

Thank you Eskimo for your much appreciated comments.

In another version of that code, I was throwing an error if the string-to-data conversion failed or the (byte) write operation.failed. I only returned actually written bytes when no error occured. I decided after all to mimick the behaviour of outputStream.write() which returns -1 when failing, but I agree that this is not a good idea since -1 could also indicate a failure to convert and the user would not easily be able to distinguish the two causes of failures (conversion error/write error).

I'm interested in any other comment you may have on the code. Any simpler way to do this?

Why doesn't it seem that String.write(to:) is doing similar things (allow to append, allow lossy conversion for all encodings), or maybe it already does so?

Lantua · December 16, 2020, 1:47am

When dealing with UnsafeBufferPointer, I find it Swiftier to navigate it using Collection APIs rather than pointer arithmetic.

To nitpick, the first if could be guard let data = ..., and you don't need semicolons.

Now, if I include eskimo's suggestion, and split the function:

extension OutputStream {
  func write(_ data: Data) -> Int {
    data.withUnsafeBytes { rawBuffer in
      var buffer = rawBuffer.bindMemory(to: UInt8.self)
      let byteCount = buffer.count // Gotta save this before mutating `buffer`
      while !buffer.isEmpty {
        let written = self.write(buffer.baseAddress!, maxLength: buffer.count)
        guard written >= 0 else {
          return -1
        }
        buffer = .init(rebasing: buffer.dropFirst(written))
      }

      return rawBuffer.count
    }
  }
}

Then you can call it like this:

stream.write(string.data(using: .utf8))

Note still that if written is 0, you'll return -1 instead of 0 but streamError won't be valid in that case. You may want to consider returning written instead, then the "error" would match the write API.

eskimo · December 16, 2020, 9:39am

I'm interested in any other comment you may have on the code.

OK, here we go…

I’d use guard let rather than if let at the start. This let’s you get the string conversion out of the way before diving into the unsafe stuff.

Like @Lantua I tend to lean on collections rather than pointer arithmetic, but lean even harder. Consider this code:

extension OutputStream {

    func write(data: Data) throws {
        var remaining = data[...]
        while !remaining.isEmpty {
            let bytesWritten = remaining.withUnsafeBytes { buf in
                // The force unwrap is safe because we know that `remaining` is
                // not empty. The `assumingMemoryBound(to:)` is there just to
                // make Swift’s type checker happy. This would be unnecessary if
                // `write(_:maxLength:)` were (as it should be IMO) declared
                // using `const void *` rather than `const uint8_t *`.
                self.write(
                    buf.baseAddress!.assumingMemoryBound(to: UInt8.self),
                    maxLength: buf.count
                )
            }
            guard bytesWritten >= 0 else {
                // … if -1, throw `streamError` …
                // … if 0, well, that’s a complex question …
                fatalError()
            }
            remaining = remaining.dropFirst(bytesWritten)
        }
    }
}

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

Lantua · December 16, 2020, 12:15pm

It's interesting that you put the loop outside the withUnsafe block. I was thinking that if the Data is multi-region, we'd have multiple unnecessary copies. Wouldn't that be the case?

eskimo · December 17, 2020, 9:27am

Wouldn't that be the case?

I think you’re correct that the performance would be suboptimal with discontiguous data. Honestly, I was aiming for simplicity rather than performance.

Having said that, my experience is that discontiguous data is pretty darn rare in practice.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

PatrickA · December 18, 2020, 5:44am

Thank you Lantua and eskimo for your constructive comments.