Certainly. Out of the 1192 OpenAPI documents sampled, the following number had the following content type included at least once. Note that a content type is only counted once in an OpenAPI document, so even a doc with 10 occurrences of a content type still counts as 1.
application/json: 1073 (90%)
application/x-www-form-urlencoded: 96 (8%)
multipart/form-data: 76 (6%)
text/plain: 75 (6%)
*/*: 60 (5%)
application/xml: 49 (4%)
application/octet-stream: 39 (3%)
That's where I drew the line, the next content types were quickly dropping off, with types like text/html, application/yaml, text/csv, image/png, application/pdf, image/jpeg, all around 10 each. Then the long tail continued for a total of 492 content types.
Let me talk about each of the content types and why they deserve a special treatment:
application/json - the most important content type in REST services, with 90% clearly the most popular one by far
application/x-www-form-urlencoded, multipart/form-data, and application/xml - structured content types that the OpenAPI specification documents enough that, while we don't today, we could generate type-safe types for in the future, just like we do for JSON today
text/plain - commonly used to send unstructured text data, like logs, so deserves to be deserialized into the native Swift container: Swift.String
*/* - also explicitly called out in the OpenAPI specification, however we're still figuring out if/how we'd generate special code for it, but it's clearly popular; and its long name is not very nice (see below)
application/octet-stream - also specially called out in the OpenAPI specification as the default raw bytes content type; its long name also isn't super beginner-friendly, "octet stream" is the first term developers use when talking about raw data
That's in contrast to the content types I left under the line, like text/html, image/png, and application/pdf, which the OpenAPI specification doesn't document any structure for, so it's unlikely we'll ever try to introspect; instead, we'll continue to pass the raw bytes to the adopter's code to handle however they like. So it seemed like a natural point to draw the line, as we only do the work of coming up with short names for 7/492 = ~1.4% of content types, but they still cover the vast majority of use cases.
That's a fair interpretation, but let me compare and contrast the two options we're deciding between here.
| Content type |
Proposed short name |
Long name |
application/json |
json |
application_sol_json |
application/x-www-form-urlencoded |
form |
application_sol_x_hyphen_www_hyphen_form_hyphen_urlencoded |
multipart/form-data |
multipart |
multipart_sol_form_hyphen_data |
text/plain |
text |
text_sol_plain |
*/* |
any |
_ast__sol__ast_ |
application/xml |
xml |
application_sol_xml |
application/octet-stream |
binary |
application_sol_octet_hyphen_stream |
I think the short names on the left actually represent the intent of the OpenAPI author better than the names on the right. In a world where content types were a closed set, we could come up with a short name for every content type, but since that's not the case, we have to draw a line somewhere between the frequently used content types that deserve pretty names, and all other content types, which we stringify using a scheme that results in as few conflicts as possible while still being readable (even if not pretty), recently updated in SOAR-0001.
If we don't use this split approach, we either would have to write short names for all content types, which we can't (as mentioned, it's an open set), or we don't write short names for any content types, which is more consistent, but I feel it'd be sacrificing readability and user ergonomics for little benefit, since the extra complexity of the short names is negligible. The split logic seems justified by the fact that 90%+ of adopters will see these identifiers all over their request/response bodies, and have to spell them by hand when sending requests and unwrapping received responses.
I do wish we could apply a simple rule one way or another, but alas I don't think that'd serve the adopters best.
That said, I'm more than happy to discuss where the line is drawn, and what the exact spelling of the short names should be. I came up with these very unscientifically, by just trying to pick the name I most commonly hear developers use when referring to these content types.
I'm particularly not very happy with form and any, so would like some ideas on those, the rest I think are okay; but again - feedback on any of them is very welcome.