Thoughts on the current HTTP API

Hi,

I just had a brief look at the HTTP APIs [1]. Some thoughts (I realise some are different to what I initially proposed :wink:):

- why do we store `original` here? [2]

    public struct HTTPHeaders {
        var original: [(name: Name, value: String)]?
        var storage: [Name: [String]] {
            didSet { original = nil }
        }

- I don't think we should name the method to write the HTTP response head `writeHeader`, that's too confusing [3], it should be `writeResponseHead` maybe?

- why do we have a `struct HTTPResponse` [4] and then we don't use it to write the response? I think the `writeResponseHead` (currently named `writeHeader`) should take a `HTTPResponseHead` at its sole argument

- I think we should rename `HTTPRequest` and `HTTPResponse` to `HTTPRequestHead` and `HTTPResponseHead` as they're not full requests/responses but just the head (body is not included)

- `func writeBody(_ data: UnsafeHTTPResponseBody, ...) ` [5]: where did this come from?
  `UnsafeHTTPResponseBody` is an existential (a protocol-typed variable). That means with this type signature, you'll do an allocation for every call, that's pretty bad. A better option would be
   `func writeBody<Body: UnsafeHTTPResponseBody>(_ data: Body, ...)` which would get rid of the existential and thus make it faster. But I think this should really

   `func writeBody<Bytes: Collection>(_ data: Bytes, ...) where Collection.Element == UInt8`

   that way you
     * don't have the existential
     * it works directly (no conversion necessary) with `[UInt8]`, `Data`, `DispatchData`, `"someString".utf8`, `UnsafeRawBufferPointer`, `UnsafeBufferPointer<UInt8>` and all sorts of things

- we should decide and document the flushing behaviour. Probably it's good enough if we flush after each `write{ReponseHead, Trailer, Body}(...)` call. Some applications would get more performance if there'd be an explicit `flush()` call but then we need to make that a requirement which would make the API even less appealing to use directly. What do people think?

- regarding the PoC implementation: we really need an asynchronous PoC implementation and see if our APIs make sense. I don't quite see the value of the synchronous PoC. I'd recommend to do that on top of `DispatchIO` (without backpressure quite straightforward) or `DispatchSources` (bit more work). Obviously libevent, libuv, ... are also options but then there's always a dependency on quite a bit of C code that doesn't come with Swift

- given Swift 4's new String APIs, we should not depend on `Foundation` I believe. There's no need of going through `Data` anymore as `String` now has a new constructor: `String(decoding: bytes, as: UTF8.self)`:

    let bytes = "some string".utf8 // utf8 encoding (previously `"some string".data(using: .utf8)` from Foundation)
    let asStringAgain = String(decoding: bytes, as: UTF8.self) // utf8 decoding (previously `String(bytes: ..., encoding: .utf8)` from Foundation)

  or
    
     let bytes = [240, 159, 145, 138]
     let firstEmoji = String(decoding: bytes, as: UTF8.self)

  which works on all Collections with UInt8 elements, ie. Data, DispatchData, [UInt8], UTF8View, ...

- I think it'd make sense to clearly split (two different top-level directories) what's the API we propose and what's the PoC implementation. Maybe even two different repositories?

-- Johannes
PS: Please let me know if you want me to expand on the overhead of existential types in APIs. It seems like that's often missed in Swift and often it probably doesn't matter all that much. But in APIs that are meant for performant stuff it does.

[1]: http/Sources/HTTP at develop · swift-server/http · GitHub
[2]: http/HTTPHeaders.swift at develop · swift-server/http · GitHub
[3]: http/HTTPResponse.swift at develop · swift-server/http · GitHub
[4]: http/HTTPResponse.swift at develop · swift-server/http · GitHub
[5]: http/HTTPResponse.swift at develop · swift-server/http · GitHub

Hey,

I just had a brief look at the HTTP APIs [1]. Some thoughts (I realise some are different to what I initially proposed :wink:):

- why do we store `original` here? [2]

   public struct HTTPHeaders {
       var original: [(name: Name, value: String)]?
       var storage: [Name: [String]] {
           didSet { original = nil }
       }

I guess I don’t care too much about this, but I can see two reasons:
a) For proxy like applications, for debugging and other stuff,
   it is nice to preserve the original
b) A dictionary-of-arrays has a higher overhead than a plain array for
   the HTTP header level of complexity.

Probably makes sense to have just one of the two,
and I would suggest the first.

- I don't think we should name the method to write the HTTP response head `writeHeader`, that's too confusing [3], it should be `writeResponseHead` maybe?

+1

- why do we have a `struct HTTPResponse` [4] and then we don't use it to write the response? I think the `writeResponseHead` (currently named `writeHeader`) should take a `HTTPResponseHead` at its sole argument

I think the idea is that you use this method for both, writeResponseHead and for 100-continue writes.
(as mentioned before I’m very much against coupling the two, those should stay distinct methods IMO)

- I think we should rename `HTTPRequest` and `HTTPResponse` to `HTTPRequestHead` and `HTTPResponseHead` as they're not full requests/responses but just the head (body is not included)

+1 (as this was my proposal from the beginning [the discussion around this should not be restarted, it is in the archives])

- `func writeBody(_ data: UnsafeHTTPResponseBody, ...) ` [5]: where did this come from?

This was suggested by someone on GitHub I think. The basic idea is to avoid copying. Say you have a template and you want to get to an IO buffer w/ minimal copying. You wouldn’t want to copy the buffer of the template objects to a Data, only to pass that over to the API.

My opinion is that this really does not belong here. Also: it is poorly implemented. If this is done for performance, it absolutely has to support iovec’s, not just a plain array.
Personally I would just go w/ DispatchData, it is made for this and can preserve discontinuous buffers (iovecs …). (What DispatchData is lacking is getting an iovec out, but maybe we could get such an API?).

`UnsafeHTTPResponseBody` is an existential (a protocol-typed variable). That means with this type signature, you'll do an allocation for every call, that's pretty bad. A better option would be
  `func writeBody<Body: UnsafeHTTPResponseBody>(_ data: Body, …)`

That sounds good to me.

which would get rid of the existential and thus make it faster. But I think this should really

  `func writeBody<Bytes: Collection>(_ data: Bytes, ...) where Collection.Element == UInt8`

I think it may be a good choice for this effort. A problem w/ that is getting back to the buffer, which is what you need for I/O. Collection has no methods for that, right? (withUnsafeBuffer etc is higher level, right?).

And if you do this, you would probably do a `if let data as? DispatchData {}` in the implementation, right?

- we should decide and document the flushing behaviour. Probably it's good enough if we flush after each `write{ReponseHead, Trailer, Body}(...)` call. Some applications would get more performance if there'd be an explicit `flush()` call but then we need to make that a requirement which would make the API even less appealing to use directly. What do people think?

I’m not entirely sure why you think about buffering here, this seems out of scope and the responsibility of a streaming library on top.
In an async implementation I would assume that every write is immediately handed off to the async I/O library (e.g. DispatchChannel.write) and no HTTP-level buffering is involved (the async-IO would be responsible if it would want to do anything extra).
Maybe you could elaborate a little more how you see a `flush` being used.

- regarding the PoC implementation: we really need an asynchronous PoC implementation and see if our APIs make sense. I don't quite see the value of the synchronous PoC. I'd recommend to do that on top of `DispatchIO` (without backpressure quite straightforward) or `DispatchSources` (bit more work).

I agree, and there is plenty of FOSS code available which does an async socket, I don’t even see the use in doing an own from-scratch PoC socket library.
One issue w/ async libs is, that they are harder to integrate w/ TLS libs. But there are solutions for this (I think we found an x-platform one for Noze.io).

For async sockets I have two things to offer, and people can happily steal code from it for this effort, or just look how that works:
a) GitHub - helje5/SwiftSockets: A simple GCD based socket wrapper for Swift.
b) Noze.io: https://github.com/NozeIO/Noze.io/blob/master/Sources/fs/GCDChannelBase.swift

The first uses non-blocking GCD I/O, but it doesn’t use GCD channels (just DispatchSources, the I/O is then done using regular Posix stuff).

The second uses Dispatch channels. Which I think is what you mean by `DispatchIO`. I don’t see issues w/ back-pressures here. It calls a closure when it is done, and that can be passed up.

In general I kinda question the viability of GCD for real server workloads, but IMO it should be fine and the choice for this effort.

- given Swift 4's new String APIs, we should not depend on `Foundation` I believe. There's no need of going through `Data` anymore as `String` now has a new constructor: `String(decoding: bytes, as: UTF8.self)`:

   let bytes = "some string”.utf8

Nice, sounds good to me.

- I think it'd make sense to clearly split (two different top-level directories) what's the API we propose and what's the PoC implementation. Maybe even two different repositories?

I guess keeping them in one is OK. PoC is just a step until the ‘real’ implementation is ready.
(and it is living in separate source dir IIRC)

PS: Please let me know if you want me to expand on the overhead of existential types in APIs. It seems like that's often missed in Swift and often it probably doesn't matter all that much. But in APIs that are meant for performant stuff it does.

Well, I don’t think the latter is really a goal. This is using Strings and stuff. But still a very good advise!

hh

···

On 13. Oct 2017, at 09:34, Johannes Weiß via swift-server-dev <swift-server-dev@swift.org> wrote:

[1]: https://github.com/swift-server/http/tree/develop/Sources/HTTP
[2]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPHeaders.swift#L11
[3]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPResponse.swift#L28
[4]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPResponse.swift#L13
[5]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPResponse.swift#L39

It is a little off-topic, but since you brought it up in here ;-) Thinking about the existential:

`UnsafeHTTPResponseBody` is an existential (a protocol-typed variable). That means with this type signature, you'll do an allocation for every call, that's pretty bad.

Is that intended or a compiler bug? My assumption for passing something as a protocol would be that

writeBody(_ data: UnsafeHTTPResponseBody)
writeBody(myImp)

Essentially becomes something like

writeBody(_ dataOwner: Any, _ data: vtable<UnsafeHTTPResponseBody>)
writeBody(myImp, myImp.vtableForProtocol<UnsafeHTTPResponseBody>)

You are saying that it is more like this?:

struct existential {
   let dataOwner : Any
   let data : vtable<UnsafeHTTPResponseBody>
}
let e = malloc(existential)
writeBody(e)
free(e)

That sounds a little crazy, doesn’t it? :-)

A better option would be
`func writeBody<Body: UnsafeHTTPResponseBody>(_ data: Body, …)`

But this is only better if the concrete `Body` type lives in the same module like the `writeBody` function, right? (so that WMO optimisation can kick in and inline the instantiation of the generic type?)

In the HTTP module case `Body` would be in a different module and in my understanding would always use some dynamic ‘generics’ dispatcher? Which is slow? Is that really faster than a protocol type?

hh

···

On 13. Oct 2017, at 12:39, Helge Heß via swift-server-dev <swift-server-dev@swift.org> wrote:

Hi,

Hey,

I just had a brief look at the HTTP APIs [1]. Some thoughts (I realise some are different to what I initially proposed :wink:):

- why do we store `original` here? [2]

  public struct HTTPHeaders {
      var original: [(name: Name, value: String)]?
      var storage: [Name: [String]] {
          didSet { original = nil }
      }

I guess I don’t care too much about this, but I can see two reasons:
a) For proxy like applications, for debugging and other stuff,
  it is nice to preserve the original
b) A dictionary-of-arrays has a higher overhead than a plain array for
  the HTTP header level of complexity.

Probably makes sense to have just one of the two,
and I would suggest the first.

that's true. However we'll pay twice the ARC overhead (ie. passing `HTTPHeaders` around will be two retains (for the underlying storages of the array/dict)). Of course we could box the storage internally and implement CoW manually this way

public struct HTTPHeaders {
  private var storage = _HTTPHeaderStorage /* with CoW implemented */
  private class _HTTPHeaderStorage {
    var original: ...
    var storage: ...
  }

  ... mutating func ... {
      if !isKnownUniquelyReferenced(&self.storage) {
          self.storage = self.storage.copy()
      }
      /* perform modification */
  }
}

but actually, ignore me, we can always retrofit this if it turns out to be a perf issue. Lets go with original and storage.

Why is original `Optional` btw?

- I don't think we should name the method to write the HTTP response head `writeHeader`, that's too confusing [3], it should be `writeResponseHead` maybe?

+1

- why do we have a `struct HTTPResponse` [4] and then we don't use it to write the response? I think the `writeResponseHead` (currently named `writeHeader`) should take a `HTTPResponseHead` at its sole argument

I think the idea is that you use this method for both, writeResponseHead and for 100-continue writes.
(as mentioned before I’m very much against coupling the two, those should stay distinct methods IMO)

makes sense! Should we have two methods with clear names and doc?

- I think we should rename `HTTPRequest` and `HTTPResponse` to `HTTPRequestHead` and `HTTPResponseHead` as they're not full requests/responses but just the head (body is not included)

+1 (as this was my proposal from the beginning [the discussion around this should not be restarted, it is in the archives])

- `func writeBody(_ data: UnsafeHTTPResponseBody, ...) ` [5]: where did this come from?

This was suggested by someone on GitHub I think. The basic idea is to avoid copying. Say you have a template and you want to get to an IO buffer w/ minimal copying. You wouldn’t want to copy the buffer of the template objects to a Data, only to pass that over to the API.

My opinion is that this really does not belong here. Also: it is poorly implemented. If this is done for performance, it absolutely has to support iovec’s, not just a plain array.
Personally I would just go w/ DispatchData, it is made for this and can preserve discontinuous buffers (iovecs …). (What DispatchData is lacking is getting an iovec out, but maybe we could get such an API?).

agreed, we have DispatchData and the collection (both methods). Sometimes you can implement a collection without a buffer and produce the actual bytes on the fly (like String.utf8 with UTF8View). But there are cases when you already actually have a straight up memory buffer. DispatchData then serves those needs.

`UnsafeHTTPResponseBody` is an existential (a protocol-typed variable). That means with this type signature, you'll do an allocation for every call, that's pretty bad. A better option would be
`func writeBody<Body: UnsafeHTTPResponseBody>(_ data: Body, …)`

That sounds good to me.

which would get rid of the existential and thus make it faster. But I think this should really

`func writeBody<Bytes: Collection>(_ data: Bytes, ...) where Collection.Element == UInt8`

I think it may be a good choice for this effort. A problem w/ that is getting back to the buffer, which is what you need for I/O. Collection has no methods for that, right? (withUnsafeBuffer etc is higher level, right?).

indeed. The collection doesn't necessarily exist in (contiguous) memory (yet).

And if you do this, you would probably do a `if let data as? DispatchData {}` in the implementation, right?

- we should decide and document the flushing behaviour. Probably it's good enough if we flush after each `write{ReponseHead, Trailer, Body}(...)` call. Some applications would get more performance if there'd be an explicit `flush()` call but then we need to make that a requirement which would make the API even less appealing to use directly. What do people think?

I’m not entirely sure why you think about buffering here, this seems out of scope and the responsibility of a streaming library on top.
In an async implementation I would assume that every write is immediately handed off to the async I/O library (e.g. DispatchChannel.write) and no HTTP-level buffering is involved (the async-IO would be responsible if it would want to do anything extra).
Maybe you could elaborate a little more how you see a `flush` being used.

well, let's say you write an HTTP server that produces its response on the fly as a whole:

GET / HTTP/1.1
Some: headers
Content-Length: 12

Hello World!

Ideally, you'd want that to be a single write(v). But

- writeResponseHead(...) produces

GET / HTTP/1.1
Some: headers
Content-Length: 12

- write Body produces

Hello World!

But if the socket buffers are empty, if you hand it to the async library, it'll issue

  write("GET / ...")
  write("Hello World!")

in two syscalls. It can be preferable to have

  writev(iovec { "GET / ... " }, iovec { "Hello World "})

but it can only really do that if you issue it as two commands to the async library:

  asyncWrite("GET / ...")
  asyncWrite("Hello World!")
  flush()

In other words: I think the async library has basically two choices (the socket is writable):

  1) try handing the kernel the data as soon as it arrives one by one
  2) buffer the data until it gets told to flush and then write everything that's buffered in one writev()

I don't see any performant solution that's neither (1) nor (2) at the moment.

Without an explicit flush() call, the only thing we can implement is (1) which is probably fine.

Does that make some sense?

- regarding the PoC implementation: we really need an asynchronous PoC implementation and see if our APIs make sense. I don't quite see the value of the synchronous PoC. I'd recommend to do that on top of `DispatchIO` (without backpressure quite straightforward) or `DispatchSources` (bit more work).

I agree, and there is plenty of FOSS code available which does an async socket, I don’t even see the use in doing an own from-scratch PoC socket library.
One issue w/ async libs is, that they are harder to integrate w/ TLS libs. But there are solutions for this (I think we found an x-platform one for Noze.io).

For async sockets I have two things to offer, and people can happily steal code from it for this effort, or just look how that works:
a) GitHub - helje5/SwiftSockets: A simple GCD based socket wrapper for Swift.
b) Noze.io: https://github.com/NozeIO/Noze.io/blob/master/Sources/fs/GCDChannelBase.swift

The first uses non-blocking GCD I/O, but it doesn’t use GCD channels (just DispatchSources, the I/O is then done using regular Posix stuff).

The second uses Dispatch channels. Which I think is what you mean by `DispatchIO`. I don’t see issues w/ back-pressures here. It calls a closure when it is done, and that can be passed up.

In general I kinda question the viability of GCD for real server workloads, but IMO it should be fine and the choice for this effort.

- given Swift 4's new String APIs, we should not depend on `Foundation` I believe. There's no need of going through `Data` anymore as `String` now has a new constructor: `String(decoding: bytes, as: UTF8.self)`:

  let bytes = "some string”.utf8

Nice, sounds good to me.

- I think it'd make sense to clearly split (two different top-level directories) what's the API we propose and what's the PoC implementation. Maybe even two different repositories?

I guess keeping them in one is OK. PoC is just a step until the ‘real’ implementation is ready.
(and it is living in separate source dir IIRC)

right, what's the value of PoC in the actual repo then?

PS: Please let me know if you want me to expand on the overhead of existential types in APIs. It seems like that's often missed in Swift and often it probably doesn't matter all that much. But in APIs that are meant for performant stuff it does.

Well, I don’t think the latter is really a goal. This is using Strings and stuff. But still a very good advise!

not happy with Swift's String performance? Or using Strings in places where a byte array could be used?

-- Johannes

···

On 13 Oct 2017, at 11:39 am, Helge Heß via swift-server-dev <swift-server-dev@swift.org> wrote:
On 13. Oct 2017, at 09:34, Johannes Weiß via swift-server-dev <swift-server-dev@swift.org> wrote:

hh

[1]: https://github.com/swift-server/http/tree/develop/Sources/HTTP
[2]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPHeaders.swift#L11
[3]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPResponse.swift#L28
[4]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPResponse.swift#L13
[5]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPResponse.swift#L39

_______________________________________________
swift-server-dev mailing list
swift-server-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-server-dev

Can I ask for clarification here? Are you looking for something like writeResponseHead and write100Continue? Or something like writeResponse and writeInformationalResponse?

I ask this because a common error made with HTTP is to treat 100-Continue as a lone special case. This is unwise: the specification allows for other informational responses, and the IETF is actively considering specifying other 1XX codes (e.g. 103 Early Hints). While I’m totally sympathetic to the idea that the code should reflect the HTTP framing layer, if it does so you really want three methods:

1. writeInformationalResponseHead (may be called zero or more times until writeResponseHead is called)
2. writeResponseHead (must be called exactly once).
3. writeResonseTrailer (must be called no more than once, after writeResponseHead)

I will note that this puts 101 Switching Protocols in a slightly tricky spot, but it’s possible that this API wants to wash its hands of HTTP upgrade (I’d be very sympathetic to that, HTTP upgrade is no fun) and so could choose to disregard it entirely. Otherwise, unlike 100 Continue it turns out that 101 Switching Protocols *really is* a special snowflake that needs to be considered entirely separately.

···

On 13 Oct 2017, at 07:34, Johannes Weiß via swift-server-dev <swift-server-dev@swift.org> wrote:

- why do we have a `struct HTTPResponse` [4] and then we don't use it to write the response? I think the `writeResponseHead` (currently named `writeHeader`) should take a `HTTPResponseHead` at its sole argument

I think the idea is that you use this method for both, writeResponseHead and for 100-continue writes.
(as mentioned before I’m very much against coupling the two, those should stay distinct methods IMO)

makes sense! Should we have two methods with clear names and doc?

It is a little off-topic, but since you brought it up in here ;-) Thinking about the existential:

`UnsafeHTTPResponseBody` is an existential (a protocol-typed variable). That means with this type signature, you'll do an allocation for every call, that's pretty bad.

Is that intended or a compiler bug? My assumption for passing something as a protocol would be that

writeBody(_ data: UnsafeHTTPResponseBody)
writeBody(myImp)

Essentially becomes something like

writeBody(_ dataOwner: Any, _ data: vtable<UnsafeHTTPResponseBody>)
writeBody(myImp, myImp.vtableForProtocol<UnsafeHTTPResponseBody>)

You are saying that it is more like this?:

struct existential {
  let dataOwner : Any
  let data : vtable<UnsafeHTTPResponseBody>
}
let e = malloc(existential)
writeBody(e)
free(e)

That sounds a little crazy, doesn’t it? :-)

IIRC in Swift 3, that's exactly how it worked. In Swift 4, many of those existential boxes can indeed be put on the stack which is a massive improvement.
That is assuming your actual type is 3 words or less in size. If your object is larger than that, then you'll still get a heap allocation for the actual data.

Also, going through existentials makes many performance improvements impossible. So with the <Bytes: UnsafeHTTPResponseBody> we can safely assume that the specialisation for `[UInt8]` will literally just take Data's underlying pointer out as everything will be inlined. That won't be true for existentials.

I always find it quite hard to estimate how much the difference is in real life.

A better option would be
`func writeBody<Body: UnsafeHTTPResponseBody>(_ data: Body, …)`

But this is only better if the concrete `Body` type lives in the same module like the `writeBody` function, right? (so that WMO optimisation can kick in and inline the instantiation of the generic type?)

yes, unless you put an private API @_specialize(...) in there. However I'd assume that there's a few well known types that will always be used (`[UInt8]`, `Data`, ...) so you'd assume it's specialised for those anyway.

In the HTTP module case `Body` would be in a different module and in my understanding would always use some dynamic ‘generics’ dispatcher? Which is slow? Is that really faster than a protocol type?

I _think_ that if in that module you use it for say `[UInt8]`, then external callers can also benefit from that specialisation.

-- Johannes

···

On 13 Oct 2017, at 1:07 pm, Helge Heß via swift-server-dev <swift-server-dev@swift.org> wrote:
On 13. Oct 2017, at 12:39, Helge Heß via swift-server-dev <swift-server-dev@swift.org> wrote:

hh

_______________________________________________
swift-server-dev mailing list
swift-server-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-server-dev

I just had a brief look at the HTTP APIs [1]. Some thoughts (I realise some are different to what I initially proposed :wink:):

- why do we store `original` here? [2]

public struct HTTPHeaders {
     var original: [(name: Name, value: String)]?
     var storage: [Name: [String]] {
         didSet { original = nil }
     }

I guess I don’t care too much about this, but I can see two reasons:
a) For proxy like applications, for debugging and other stuff,
it is nice to preserve the original
b) A dictionary-of-arrays has a higher overhead than a plain array for
the HTTP header level of complexity.

Probably makes sense to have just one of the two,
and I would suggest the first.

that's true. However we'll pay twice the ARC overhead (ie. passing `HTTPHeaders` around will be two retains (for the underlying storages of the array/dict)).

… well, if I read the code you quoted above right, only one of the two is being used, which also answers your other question:

Why is original `Optional` btw?

You posted it:

    var storage: [Name: [String]] { didSet { original = nil } }

I guess all this doesn’t make much sense to me. I would just store a single array.

- why do we have a `struct HTTPResponse` [4] and then we don't use it to write the response? I think the `writeResponseHead` (currently named `writeHeader`) should take a `HTTPResponseHead` at its sole argument

I think the idea is that you use this method for both, writeResponseHead and for 100-continue writes.
(as mentioned before I’m very much against coupling the two, those should stay distinct methods IMO)

makes sense! Should we have two methods with clear names and doc?

My opinion is yes, but the code ended up with ‘no'. ¯\_(ツ)_/¯

- `func writeBody(_ data: UnsafeHTTPResponseBody, ...) ` [5]: where did this come from?

This was suggested by someone on GitHub I think. The basic idea is to avoid copying. Say you have a template and you want to get to an IO buffer w/ minimal copying. You wouldn’t want to copy the buffer of the template objects to a Data, only to pass that over to the API.

My opinion is that this really does not belong here. Also: it is poorly implemented. If this is done for performance, it absolutely has to support iovec’s, not just a plain array.
Personally I would just go w/ DispatchData, it is made for this and can preserve discontinuous buffers (iovecs …). (What DispatchData is lacking is getting an iovec out, but maybe we could get such an API?).

agreed, we have DispatchData and the collection (both methods). Sometimes you can implement a collection without a buffer and produce the actual bytes on the fly (like String.utf8 with UTF8View).

The important point is that the thing you need in the end is always a buffer, or better iovec. ‘Producing bytes on the fly’ is what you usually want to avoid as much as possible, IMO.
Instead you want direct and stable access to storage buffers backing the objects. (Which Swift unfortunately does not provide in the stdlib.)

But there are cases when you already actually have a straight up memory buffer. DispatchData then serves those needs.

This is not the primary intention. I think the intention is that the conforming objects produce their buffer representation directly into the target buffer (which will be ‘send’).
Instead of your `Producing bytes on the fly`, a `produce a C buffer on the fly`. (I guess mmap’ing a file could be one example).

Don’t know, I personally think it is overkill and I’m good w/ DispatchData. With the exception that I’d like to have API to get an iovec out of a DispatchData stack (instead of just the enumeration method we have now).

BTW: Something which is missing for even a simple web server is some support for `sendfile` to serve static resources.

which would get rid of the existential and thus make it faster. But I think this should really

`func writeBody<Bytes: Collection>(_ data: Bytes, ...) where Collection.Element == UInt8`

I think it may be a good choice for this effort. A problem w/ that is getting back to the buffer, which is what you need for I/O. Collection has no methods for that, right? (withUnsafeBuffer etc is higher level, right?).

indeed. The collection doesn't necessarily exist in (contiguous) memory (yet).

Of course not, but it should have API to produce a contiguous C buffer. If a particular collection *is* backed by contiguous memory, right away, otherwise by generating it.

Or as I said above, even better a `var unsafeBuffer : UnsafeBufferPointer<Element>?`, which would be nil if the collection doesn’t have a stable buffer, and otherwise provide direct access to the buffer (which then could be passed into DispatchData w/ an associated destructor).

But all this is not relevant for this effort :-) W/ the current Swift setup it can’t provide the desired zero-copy behaviours we had with ObjC. ¯\_(ツ)_/¯

So what would

  `func writeBody<Bytes: Collection>(_ data: Bytes, …)`

do? Presumably a malloc(), memcpy(from: data). IMO not great and sub-standard in 2017, but can’t be done much better in Swift v0.4, I guess. (Especially if protocol calls imply a malloc, how crazy is that :palmface:)

I’m not entirely sure why you think about buffering here, this seems out of scope and the responsibility of a streaming library on top.

...

But if the socket buffers are empty, if you hand it to the async library, it'll issue

write("GET / ...")
write("Hello World!")

in two syscalls. It can be preferable to have

writev(iovec { "GET / ... " }, iovec { "Hello World “})

Ah, OK, yes. You want to bundle small bodies with the HTTP header. I agree, that can make some sense.

But introducing explicit buffering for just that case, don’t know. Maybe. I guess +0.5.

Does that make some sense?

Yes.

I guess keeping them in one is OK. PoC is just a step until the ‘real’ implementation is ready.
(and it is living in separate source dir IIRC)

right, what's the value of PoC in the actual repo then?

Have a working demo implementation? To test the API?
Why is it synchronous? I suppose because it is based on the Kitura stuff? ¯\_(ツ)_/¯

not happy with Swift's String performance? Or using Strings in places where a byte array could be used?

We discussed that at length before. I have a set of reasons for that.

Yes, I think throwing a Unicode beast at something which is by definition Latin-1 is just wrong (for performance oriented stuff). Just think what that means wrt to comparing “Content-Length” w/ “content-length” … Even if there are short paths, availability of them would need to be checked first, etc.

What I would do? I have two possible designs in mind, one is zero-copying the input buffers and peek directly into the them for lookup etc. The other one is directly parsing the headers into a HTTP/2 style reuse-table and assign integer ids (at the http_parser.c level). Likely also parsing the values of knows headers (kinda like `enum HTTPHeader { case contentLength(Int), userAgent(String), …`).

Again, I don’t think it matters for this effort. Having API compatibility w/ Swift stdlib/Foundation is more important than actual efficiency. (… a malloc for protocol func calls??? - just too hard to believe ;-)

hh

···

On 13. Oct 2017, at 16:34, Johannes Weiß <johannesweiss@apple.com> wrote:

On 13. Oct 2017, at 09:34, Johannes Weiß via swift-server-dev <swift-server-dev@swift.org> wrote:

- why do we have a `struct HTTPResponse` [4] and then we don't use it to write the response? I think the `writeResponseHead` (currently named `writeHeader`) should take a `HTTPResponseHead` at its sole argument

I think the idea is that you use this method for both, writeResponseHead and for 100-continue writes.
(as mentioned before I’m very much against coupling the two, those should stay distinct methods IMO)

makes sense! Should we have two methods with clear names and doc?

Can I ask for clarification here? Are you looking for something like writeResponseHead and write100Continue? Or something like writeResponse and writeInformationalResponse?

I ask this because a common error made with HTTP is to treat 100-Continue as a lone special case. This is unwise: the specification allows for other informational responses, and the IETF is actively considering specifying other 1XX codes (e.g. 103 Early Hints). While I’m totally sympathetic to the idea that the code should reflect the HTTP framing layer, if it does so you really want three methods:

1. writeInformationalResponseHead (may be called zero or more times until writeResponseHead is called)
2. writeResponseHead (must be called exactly once).
3. writeResonseTrailer (must be called no more than once, after writeResponseHead)

All that makes sense to me.

I will note that this puts 101 Switching Protocols in a slightly tricky spot, but it’s possible that this API wants to wash its hands of HTTP upgrade (I’d be very sympathetic to that, HTTP upgrade is no fun) and so could choose to disregard it entirely. Otherwise, unlike 100 Continue it turns out that 101 Switching Protocols *really is* a special snowflake that needs to be considered entirely separately.

I think it would be kinda embarrassing if the API wouldn’t support WebSocket upgrade?

hh

···

On 13. Oct 2017, at 18:18, Cory Benfield <cbenfield@apple.com> wrote:

On 13 Oct 2017, at 07:34, Johannes Weiß via swift-server-dev <swift-server-dev@swift.org> wrote:

Sure, but that doesn’t mean it needs to be supported at this level of the API: it can be considered the special snowflake it is and be handled by the HTTP server, rather than requiring that the user send the 101 Switching Protocols response themselves.

If that is allowed, then the rules get a bit trickier. Not a lot trickier, but a bit.

Cory

···

On 13 Oct 2017, at 14:08, Helge Heß via swift-server-dev <swift-server-dev@swift.org> wrote:

I will note that this puts 101 Switching Protocols in a slightly tricky spot, but it’s possible that this API wants to wash its hands of HTTP upgrade (I’d be very sympathetic to that, HTTP upgrade is no fun) and so could choose to disregard it entirely. Otherwise, unlike 100 Continue it turns out that 101 Switching Protocols *really is* a special snowflake that needs to be considered entirely separately.

I think it would be kinda embarrassing if the API wouldn’t support WebSocket upgrade?

hh

Hi,

I just had a brief look at the HTTP APIs [1]. Some thoughts (I realise some are different to what I initially proposed :wink:):

- why do we store `original` here? [2]

public struct HTTPHeaders {
    var original: [(name: Name, value: String)]?
    var storage: [Name: [String]] {
        didSet { original = nil }
    }

I guess I don’t care too much about this, but I can see two reasons:
a) For proxy like applications, for debugging and other stuff,
it is nice to preserve the original
b) A dictionary-of-arrays has a higher overhead than a plain array for
the HTTP header level of complexity.

Probably makes sense to have just one of the two,
and I would suggest the first.

that's true. However we'll pay twice the ARC overhead (ie. passing `HTTPHeaders` around will be two retains (for the underlying storages of the array/dict)).

… well, if I read the code you quoted above right, only one of the two is being used, which also answers your other question:

Why is original `Optional` btw?

You posted it:

   var storage: [Name: [String]] { didSet { original = nil } }

I guess all this doesn’t make much sense to me. I would just store a single array.

- why do we have a `struct HTTPResponse` [4] and then we don't use it to write the response? I think the `writeResponseHead` (currently named `writeHeader`) should take a `HTTPResponseHead` at its sole argument

I think the idea is that you use this method for both, writeResponseHead and for 100-continue writes.
(as mentioned before I’m very much against coupling the two, those should stay distinct methods IMO)

makes sense! Should we have two methods with clear names and doc?

My opinion is yes, but the code ended up with ‘no'. ¯\_(ツ)_/¯

- `func writeBody(_ data: UnsafeHTTPResponseBody, ...) ` [5]: where did this come from?

This was suggested by someone on GitHub I think. The basic idea is to avoid copying. Say you have a template and you want to get to an IO buffer w/ minimal copying. You wouldn’t want to copy the buffer of the template objects to a Data, only to pass that over to the API.

My opinion is that this really does not belong here. Also: it is poorly implemented. If this is done for performance, it absolutely has to support iovec’s, not just a plain array.
Personally I would just go w/ DispatchData, it is made for this and can preserve discontinuous buffers (iovecs …). (What DispatchData is lacking is getting an iovec out, but maybe we could get such an API?).

agreed, we have DispatchData and the collection (both methods). Sometimes you can implement a collection without a buffer and produce the actual bytes on the fly (like String.utf8 with UTF8View).

The important point is that the thing you need in the end is always a buffer, or better iovec. ‘Producing bytes on the fly’ is what you usually want to avoid as much as possible, IMO.
Instead you want direct and stable access to storage buffers backing the objects. (Which Swift unfortunately does not provide in the stdlib.)

The main point is when writing Strings. There is no contiguously stored UTF-8 buffer but with "some string".utf8 we get an UTF8View which produces the bytes when needed.

But there are cases when you already actually have a straight up memory buffer. DispatchData then serves those needs.

This is not the primary intention. I think the intention is that the conforming objects produce their buffer representation directly into the target buffer (which will be ‘send’).
Instead of your `Producing bytes on the fly`, a `produce a C buffer on the fly`. (I guess mmap’ing a file could be one example).

we probably need to support both. And I agree that DispatchData is probably good enough for the 'I have a (set) of buffers' usecase

Don’t know, I personally think it is overkill and I’m good w/ DispatchData. With the exception that I’d like to have API to get an iovec out of a DispatchData stack (instead of just the enumeration method we have now).

yeah, a while ago I filed

about it not being possible (in a guaranteed safe way) to use writev from [Data]/[[UInt8]]/DispatchData.

BTW: Something which is missing for even a simple web server is some support for `sendfile` to serve static resources.

which would get rid of the existential and thus make it faster. But I think this should really

`func writeBody<Bytes: Collection>(_ data: Bytes, ...) where Collection.Element == UInt8`

I think it may be a good choice for this effort. A problem w/ that is getting back to the buffer, which is what you need for I/O. Collection has no methods for that, right? (withUnsafeBuffer etc is higher level, right?).

indeed. The collection doesn't necessarily exist in (contiguous) memory (yet).

Of course not, but it should have API to produce a contiguous C buffer. If a particular collection *is* backed by contiguous memory, right away, otherwise by generating it.

Or as I said above, even better a `var unsafeBuffer : UnsafeBufferPointer<Element>?`, which would be nil if the collection doesn’t have a stable buffer, and otherwise provide direct access to the buffer (which then could be passed into DispatchData w/ an associated destructor).

But all this is not relevant for this effort :-) W/ the current Swift setup it can’t provide the desired zero-copy behaviours we had with ObjC. ¯\_(ツ)_/¯

So what would

`func writeBody<Bytes: Collection>(_ data: Bytes, …)`

do? Presumably a malloc(), memcpy(from: data).

well, that needs to happen somewhere. You can do it before passing it to the library or within.

IMO not great and sub-standard in 2017, but can’t be done much better in Swift v0.4, I guess. (Especially if protocol calls imply a malloc, how crazy is that :palmface:)

(only if the actual implementation's type is > 3 words (in Swift 4).)

I’m not entirely sure why you think about buffering here, this seems out of scope and the responsibility of a streaming library on top.

...

But if the socket buffers are empty, if you hand it to the async library, it'll issue

write("GET / ...")
write("Hello World!")

in two syscalls. It can be preferable to have

writev(iovec { "GET / ... " }, iovec { "Hello World “})

Ah, OK, yes. You want to bundle small bodies with the HTTP header. I agree, that can make some sense.

But introducing explicit buffering for just that case, don’t know. Maybe. I guess +0.5.

not saying we have to have it but I thought it might be worth bringing that up.

Does that make some sense?

Yes.

I guess keeping them in one is OK. PoC is just a step until the ‘real’ implementation is ready.
(and it is living in separate source dir IIRC)

right, what's the value of PoC in the actual repo then?

Have a working demo implementation? To test the API?
Why is it synchronous? I suppose because it is based on the Kitura stuff? ¯\_(ツ)_/¯

not happy with Swift's String performance? Or using Strings in places where a byte array could be used?

We discussed that at length before. I have a set of reasons for that.

Yes, I think throwing a Unicode beast at something which is by definition Latin-1 is just wrong (for performance oriented stuff). Just think what that means wrt to comparing “Content-Length” w/ “content-length” … Even if there are short paths, availability of them would need to be checked first, etc.

What I would do? I have two possible designs in mind, one is zero-copying the input buffers and peek directly into the them for lookup etc. The other one is directly parsing the headers into a HTTP/2 style reuse-table and assign integer ids (at the http_parser.c level). Likely also parsing the values of knows headers (kinda like `enum HTTPHeader { case contentLength(Int), userAgent(String), …`).

Again, I don’t think it matters for this effort. Having API compatibility w/ Swift stdlib/Foundation is more important than actual efficiency. (… a malloc for protocol func calls??? - just too hard to believe ;-)

hh

-- Johannes

···

On 13 Oct 2017, at 10:03 pm, Helge Heß via swift-server-dev <swift-server-dev@swift.org> wrote:
On 13. Oct 2017, at 16:34, Johannes Weiß <johannesweiss@apple.com> wrote:

On 13. Oct 2017, at 09:34, Johannes Weiß via swift-server-dev <swift-server-dev@swift.org> wrote:

_______________________________________________
swift-server-dev mailing list
swift-server-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-server-dev

This mail can be kinda skipped and is only for those w/ spare time, it doesn’t add much to the discussion :-)

The important point is that the thing you need in the end is always a buffer, or better iovec. ‘Producing bytes on the fly’ is what you usually want to avoid as much as possible, IMO.
Instead you want direct and stable access to storage buffers backing the objects. (Which Swift unfortunately does not provide in the stdlib.)

The main point is when writing Strings. There is no contiguously stored UTF-8 buffer but with "some string".utf8 we get an UTF8View which produces the bytes when needed.

Well, this is kinda incorrect. Correct is ‘there is not *always*’ a contiguously stored UTF-8 buffer. Quite often (a hell lot of time) there is (should be) one, but Swift hides the fact.
This is essentially my complaint :-)

Don’t know, I personally think it is overkill and I’m good w/ DispatchData. With the exception that I’d like to have API to get an iovec out of a DispatchData stack (instead of just the enumeration method we have now).

yeah, a while ago I filed

[SR-5143] Can't easily drive writev(2) from a collection of Data · Issue #3847 · apple/swift-corelibs-foundation · GitHub
[SR-5144] Can't easily drive writev(2) from a DispatchData · Issue #688 · apple/swift-corelibs-libdispatch · GitHub

about it not being possible (in a guaranteed safe way) to use writev from [Data]/[[UInt8]]/DispatchData.

Nice.

So what would

`func writeBody<Bytes: Collection>(_ data: Bytes, …)`

do? Presumably a malloc(), memcpy(from: data).

well, that needs to happen somewhere. You can do it before passing it to the library or within.

Same like above. It needs to happen somewhere w/ the current Swift stdlib because you can’t get access to underlying buffers, which are usually already there and wouldn’t need to be copied.
But: ¯\_(ツ)_/¯

(Especially if protocol calls imply a malloc, how crazy is that :palmface:)

(only if the actual implementation's type is > 3 words (in Swift 4).)

I’m still a little flabbergasted that this was ever in, but sure, no need to complain about the past :-)

Now if they would fix the malloc when converting between cString and String …

But introducing explicit buffering for just that case, don’t know. Maybe. I guess +0.5.

not saying we have to have it but I thought it might be worth bringing that up.

Sure, thanks, didn’t think of that.

hh

···

On Oct 16, 2017, at 5:41 PM, Johannes Weiß <johannesweiss@apple.com> wrote:

This mail can be kinda skipped and is only for those w/ spare time, it doesn’t add much to the discussion :-)

The important point is that the thing you need in the end is always a buffer, or better iovec. ‘Producing bytes on the fly’ is what you usually want to avoid as much as possible, IMO.
Instead you want direct and stable access to storage buffers backing the objects. (Which Swift unfortunately does not provide in the stdlib.)

The main point is when writing Strings. There is no contiguously stored UTF-8 buffer but with "some string".utf8 we get an UTF8View which produces the bytes when needed.

Well, this is kinda incorrect. Correct is ‘there is not *always*’ a contiguously stored UTF-8 buffer. Quite often (a hell lot of time) there is (should be) one, but Swift hides the fact.
This is essentially my complaint :-)

Don’t know, I personally think it is overkill and I’m good w/ DispatchData. With the exception that I’d like to have API to get an iovec out of a DispatchData stack (instead of just the enumeration method we have now).

yeah, a while ago I filed

[SR-5143] Can't easily drive writev(2) from a collection of Data · Issue #3847 · apple/swift-corelibs-foundation · GitHub
[SR-5144] Can't easily drive writev(2) from a DispatchData · Issue #688 · apple/swift-corelibs-libdispatch · GitHub

about it not being possible (in a guaranteed safe way) to use writev from [Data]/[[UInt8]]/DispatchData.

Nice.

So what would

`func writeBody<Bytes: Collection>(_ data: Bytes, …)`

do? Presumably a malloc(), memcpy(from: data).

well, that needs to happen somewhere. You can do it before passing it to the library or within.

Same like above. It needs to happen somewhere w/ the current Swift stdlib because you can’t get access to underlying buffers, which are usually already there and wouldn’t need to be copied.
But: ¯\_(ツ)_/¯

(Especially if protocol calls imply a malloc, how crazy is that :palmface:)

(only if the actual implementation's type is > 3 words (in Swift 4).)

I’m still a little flabbergasted that this was ever in, but sure, no need to complain about the past :-)

It was described on last years ( 2016 ) WWDC, Session 416 Understanding Swift Performance - WWDC16 - Videos - Apple Developer Section "The Existential Container”

···

On 16. Oct 2017, at 18:01, Helge Heß via swift-server-dev <swift-server-dev@swift.org> wrote:
On Oct 16, 2017, at 5:41 PM, Johannes Weiß <johannesweiss@apple.com> wrote:

Now if they would fix the malloc when converting between cString and String …

But introducing explicit buffering for just that case, don’t know. Maybe. I guess +0.5.

not saying we have to have it but I thought it might be worth bringing that up.

Sure, thanks, didn’t think of that.

hh

_______________________________________________
swift-server-dev mailing list
swift-server-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-server-dev