The Swift OpenAPI Generator improvement proposal SOAR-0002: Improved naming of content types is now In Review.
The review period will run until 14th August. Please reply to this thread or on the pull request with any feedback.
The Swift OpenAPI Generator improvement proposal SOAR-0002: Improved naming of content types is now In Review.
The review period will run until 14th August. Please reply to this thread or on the pull request with any feedback.
Here's a direct link for easier reading of the proposal: https://github.com/czechboy0/swift-openapi-generator/blob/hd-soar-0002-content-type-naming/Sources/swift-openapi-generator/Documentation.docc/Proposals/SOAR-0002.md
These specific values were not chosen arbitrarily, instead I wrote a script that collected and processed about 1200 OpenAPI documents from the wild, and aggregated usage statistics. These content types, in this order, were the top used content types from those documents.
Could you share more info about the aggregate statistics? What percentage of all content-types that you discovered did the 7 chosen ones account for? What percentage did the 7th and 8th most used account for?
Having "short" names also feels slightly at odds with the principle to "Faithfully represent the OpenAPI document".
Certainly. Out of the 1192 OpenAPI documents sampled, the following number had the following content type included at least once. Note that a content type is only counted once in an OpenAPI document, so even a doc with 10 occurrences of a content type still counts as 1.
application/json
: 1073 (90%)application/x-www-form-urlencoded
: 96 (8%)multipart/form-data
: 76 (6%)text/plain
: 75 (6%)*/*
: 60 (5%)application/xml
: 49 (4%)application/octet-stream
: 39 (3%)That's where I drew the line, the next content types were quickly dropping off, with types like text/html
, application/yaml
, text/csv
, image/png
, application/pdf
, image/jpeg
, all around 10 each. Then the long tail continued for a total of 492 content types.
Let me talk about each of the content types and why they deserve a special treatment:
application/json
- the most important content type in REST services, with 90% clearly the most popular one by farapplication/x-www-form-urlencoded
, multipart/form-data
, and application/xml
- structured content types that the OpenAPI specification documents enough that, while we don't today, we could generate type-safe types for in the future, just like we do for JSON todaytext/plain
- commonly used to send unstructured text data, like logs, so deserves to be deserialized into the native Swift container: Swift.String
*/*
- also explicitly called out in the OpenAPI specification, however we're still figuring out if/how we'd generate special code for it, but it's clearly popular; and its long name is not very nice (see below)application/octet-stream
- also specially called out in the OpenAPI specification as the default raw bytes content type; its long name also isn't super beginner-friendly, "octet stream" is the first term developers use when talking about raw dataThat's in contrast to the content types I left under the line, like text/html
, image/png
, and application/pdf
, which the OpenAPI specification doesn't document any structure for, so it's unlikely we'll ever try to introspect; instead, we'll continue to pass the raw bytes to the adopter's code to handle however they like. So it seemed like a natural point to draw the line, as we only do the work of coming up with short names for 7/492 = ~1.4% of content types, but they still cover the vast majority of use cases.
That's a fair interpretation, but let me compare and contrast the two options we're deciding between here.
Content type | Proposed short name | Long name |
---|---|---|
application/json |
json |
application_sol_json |
application/x-www-form-urlencoded |
form |
application_sol_x_hyphen_www_hyphen_form_hyphen_urlencoded |
multipart/form-data |
multipart |
multipart_sol_form_hyphen_data |
text/plain |
text |
text_sol_plain |
*/* |
any |
_ast__sol__ast_ |
application/xml |
xml |
application_sol_xml |
application/octet-stream |
binary |
application_sol_octet_hyphen_stream |
I think the short names on the left actually represent the intent of the OpenAPI author better than the names on the right. In a world where content types were a closed set, we could come up with a short name for every content type, but since that's not the case, we have to draw a line somewhere between the frequently used content types that deserve pretty names, and all other content types, which we stringify using a scheme that results in as few conflicts as possible while still being readable (even if not pretty), recently updated in SOAR-0001.
If we don't use this split approach, we either would have to write short names for all content types, which we can't (as mentioned, it's an open set), or we don't write short names for any content types, which is more consistent, but I feel it'd be sacrificing readability and user ergonomics for little benefit, since the extra complexity of the short names is negligible. The split logic seems justified by the fact that 90%+ of adopters will see these identifiers all over their request/response bodies, and have to spell them by hand when sending requests and unwrapping received responses.
I do wish we could apply a simple rule one way or another, but alas I don't think that'd serve the adopters best.
That said, I'm more than happy to discuss where the line is drawn, and what the exact spelling of the short names should be. I came up with these very unscientifically, by just trying to pick the name I most commonly hear developers use when referring to these content types.
I'm particularly not very happy with form
and any
, so would like some ideas on those, the rest I think are okay; but again - feedback on any of them is very welcome.
I'm inclined to agree: these short names lose quite a bit of resolution.
I don't think that anyone would argue that application_sol_json
is a good name for application/json
. However, looking at RFC 2045 - Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies, it doesn't appear we need to be as defensive in the name mapping as we are for OpenAPI identifiers, and I would venture that we could probably shoot for something like application_json
pretty safely—that is: we could replace the slash that separates the type frome the subtype wtih an underscore (cf. _sol_
).
That said, I think there's some precedent for some shorter names: https://github.com/vapor/core/blob/main/Sources/Core/MediaType.swift.
Looking at the link above, they seemed content with any
and form-data
.
My biggest gripe is the use of multipart
for multipart/form-data
which seems to stand out as being the only one "squatting" on a top-level type for its name. What would we call other multipart/<subtype>
types.
While I'm still not sold on short names, I think that they need to attempt to convey the specificity of the full content-type.
Similar with text
, I'd be happier with plaintext
.
Yeah, those sound better: formData
, plainText
, urlEncodedForm
. That way, none of them squat their top level type. I'll update to those names in a revision of my proposal - thanks!
I'll also investigate getting rid of the _sol_
part by safing each component of the MIME type separately, and concatenating them with an underscore. Also to come in the next revision.
Interesting – thanks for sharing your data!
I understand your point entirely: providing short names for all media types is impossible. As you also noted the fallback names are not the most readable.
The concern I have is the transition between the two, while 3% sounds pretty low, out of 1000 OpenAPI documents it's still 30 documents, and each document may use these types multiple times. It's also a bit surprising to me that, what I would consider to be common, types like text/html
aren't covered by the short names.
I think we should:
Sounds good. Here's the continued list with proposed short names, suggestions for better names are welcome:
Content type | Number of occurrences | Proposed short name |
---|---|---|
text/json |
21 | None, I think folks are using this by mistake, and mean application/json ? |
text/html |
19 | html |
application/yaml |
14 | yaml |
text/csv |
14 | csv |
text/xml |
14 | None, same as text/json ? |
image/png |
13 | png |
application/pdf |
11 | pdf |
image/jpeg |
10 | jpeg |
Under 10 the long tail of various custom application/vnd.*
types start and quickly approach the number of occurrences of 1. So I think stopping at the number of occurrences 10 is reasonable.
I'm investigating the idea @beaumont proposed above, where we actually safe the type and subtype separately, and concatenate them with an underscore, so for foo/bar
we'd end up with foo_bar
instead of foo_sol_bar
, which I hope is enough of an improvement to work for the long tail.
Hi everyone,
thanks for the feedback so far. I just pushed v2 of the proposal.
Diff from v1 to v2 is in this commit: SOAR-0002: Improved naming of content types by czechboy0 · Pull Request #170 · apple/swift-openapi-generator · GitHub
It aims to address the two main points of feedback:
You can find the current rendered version of the proposal here, and it contains a versions section describing what changes were made. Also, inline in the proposal, I highlighted what changed from v1 to v2.
One minor naming note:
application/x-www-form-urlencoded
maps to urlEncodedForm
multipart/form-data
maps to formData
Should urlEncodedForm
be urlEncodedFormData
for consistency with formData
?
I'm not sure, as the word "data" doesn't appear anywhere in application/x-www-form-urlencoded
.
Looking around more, I think a lot of people use the term "multipart" and know what it means, I wonder if we could borrow the short name multipartForm
from Hummingbird for multipart/form-data
?
That way, we'd have:
application/x-www-form-urlencoded
-> urlEncodedForm
multipart/form-data
-> multipartForm
That achieves that consistency that I think you were looking for, as these are just "two kinds of forms", and IMO is closer to the terms used for these content types day to day. WDYT?
Works for me.
Thanks @georgebarnett.
Ok, here's v3 of the proposal, diff from v2: SOAR-0002: Improved naming of content types by czechboy0 · Pull Request #170 · apple/swift-openapi-generator · GitHub
Latest rendered version: https://github.com/czechboy0/swift-openapi-generator/blob/hd-soar-0002-content-type-naming/Sources/swift-openapi-generator/Documentation.docc/Proposals/SOAR-0002.md
Thanks for making the amendments so far @Honza_Dvorsky!
While the idealist in me doesn't love having to pick an arbitrary cutoff for short names, nor a solution that doesn't generalise to all names, the proposal as-is adds value so +1 from me.
The review period has now ended. Feedback broadly fell into two areas:
Feedback has converged and v3 of the proposal is accepted. SOAR-0002 is now Ready for Implementation.
Thanks @georgebarnett.
Ok, the change landed in main now behind the multipleContentTypes
feature flag.