when interacting with the subscript(canonicalForm:) APIs on the various HTTP header types (1, 2), i struggle to choose between that API and the standard subscript(_:) APIs.
the docstrings suggest that subscript(canonicalForm:) should be preferred.
Retrieves the header values for the given header field in
“canonical form”: that is, splitting them on commas as
extensively as possible such that multiple values received
on the one line are returned as separate entries. Also
respects the fact that Set-Cookie should not be split in this
way.
but when i try using the canonicalizing APIs for real headers like User-Agent, i find the subscript splits the headers incorrectly, and that this is expected behavior.
There isn't a good answer here, sadly, only bad ones. Parsing HTTP header fields sucks, and has typically relied on having bespoke parsers for each field.
However, there is a retrofit specification in progress. This defines rough shapes for a number of field types, which you could use to map those fields onto the existing parsers. Sadly, not all fields map well, and User-Agent is one of them.
This leads to the following decision tree:
Does your program know how to parse a given field value? If so:
a. Is it a structured field? If so, use the regular subscript, join the elements with ,, and pass the collection to the structured field parser. (Sidebar: we intend to add an API that accepts a collection of header fields directly, someone just needs to resurrect the existing PR that tries this. That would let you skip the join.)
b. Is it a field with a known different parser? If so, use the regular subscript and parse the elements. The different parser should know how to handle repeated field entries.
Your program knows this field has list semantics, but doesn't have a parser for it. In that case, use canonicalForm.
Your program knows this field does not have list semantics, but doesn't have a parser for it. In that case, use the regular subscript.
Your program doesn't know the semantics of this field. In that case, use the regular subscript.
You should use canonical form only when you are taking this shortcut. I'd rather hoped that canonicalForm would be more useful than it actually is, but sadly it tends to be a bit of an attractive nuisance. A better long-term strategy would be widespread adoption of structured fields.