[Proposal - Ready for Implementation] SOAR-0001 Improved OpenAPI -> Swift name mapping

Honza_Dvorsky · June 29, 2023, 8:35am

The Swift OpenAPI Generator improvement proposal SOAR-0001 Improved OpenAPI -> Swift name mapping is now In Review.

The review period runs until July 10.

Please provide feedback either in this thread on the forums, or on the pull request.

Thank you!

cc @denilchungath @beaumont

FranzBusch · June 29, 2023, 8:49am

I think this a problem worth solving but the proposed solution can still result in name conflicts if someone is literally declaring a name that looks like a substituted name e.g. if my yaml contains a b and a_space_b right?

Honza_Dvorsky · June 29, 2023, 9:10am

Correct, we're just trying to reduce the chance of collisions in the real world, but can't ever fully eliminate it. Let me copy my detailed comment here.

It only leads to a collision if an OpenAPI document contains both the prettified and unprettified version in the same namespace.

Since the original space of allowed characters in OpenAPI is greater than the reduced character space legal in Swift identifiers, and we're mapping from the former to the latter, there will always be room for conflicts. There is no deterministic mapping that produces no conflicts.

We have to balance readability, faithfulness-to-the-OpenAPI-doc, predictability, and generator complexity. We'll want to choose the strategy that overall has the best balance of all of these.

So let's suggest improvements to the mapping here, but we'll be able to find a conflict for every mapping, the question is just how likely it's to happen in real-world OpenAPI docs.

Dante-Broggi · June 29, 2023, 12:28pm

No?
What about the following:
Replace regex (__+[a-zA-Z0-9]) with _\1 and then replace invalid scalars with __{html code}_.
This removes (__[a-zA-Z0-9]) from the output space and then encodes the invalid scalars into that portion of the output space, and so I believe it is fully reversible.

beaumont · June 29, 2023, 2:03pm

I want to take a moment to offer a left-of-field suggestion.

Rather than any automatic mapping to any pretty names, which, as folks have pointed out, will present some opportunity for conflicts, we could instead provide just the hex encoding—which is much less likely to conflict with other handwritten identifiers.

Support for structured preprocessing the OpenAPI document

Then if folks don't like what they get I'm proposing we offer a different solution which has some more general benefits (which I'll explain after).

We could allow for adopters to provide a list of JSON Patches to apply to the OpenAPI document before generation.

JSON Patch is outlined in RFC-6902 and is a good fit for this kind of operation and allows for modifications of keys and values based on their JSON path.

One notable use of this is in Kustomize: a tool extensively used in the Kubernetes ecosystem for managing and composing lots of YAML—so much so they upstreamed the functionality into Kubectl. There's also a more accessible primer at jsonpatch.com.

We could extend the schema for the plugin config file to provide an array of such patches:

# openapi-generator-config.yaml
...
patches:
- op: replace
  path: "/paths//foo/get/operationId"
  value: "getFoo"

- op: move
  from: "/components/headers/my header" 
  path: "/components/headers/MyHeader"

This would transform the following OpenAPI document...

# openapi.yaml
paths:
  /foo:
    get:
      operationId: get+foo
components:
  headers:
    my header:
      schema:
        type: string

...into...

# openapi.yaml
paths:
  /foo:
    get:
      operationId: getFoo
components:
  headers:
    MyHeader
      schema:
        type: string

Our tool could take care of rewriting any references to moved values.

Additional benefits

I suggest that it's better to maintain the changes to the OpenAPI document in this manner because the input to the pipeline is still the unaltered document as published by the service maintainer.

It also allows for the correction or removal of problematic parts of the spec, using the remove JSON Patch operation.

Patching out parts of the spec can also be useful while we are still filling out support for OpenAPI features, as it allows adopters to make use of part of an OpenAPI document, until those features are supported by Swift OpenAPI Generator.

denilchungath · June 29, 2023, 2:31pm

The JSON patch approach is quite clever indeed. It moves the responsibility back to the user. So I assume we can remove any sanitization from our side?

AlexisQapa · June 29, 2023, 3:00pm

Neat !
The spec generated by some server I use is missing a couple fields which I don’t need but make the spec be rejected. Having those patches would allow me to fix the spec without intervention from the server maintainers and keep as you said the original spec as is, easy to update.

Honza_Dvorsky · June 29, 2023, 9:00pm

I might be misunderstanding the details of the algorithm you're proposing, but wouldn't it produce the same output for _1 and __1, resulting in a conflict?

Honza_Dvorsky · June 29, 2023, 9:11pm

I think it's a great idea for many use cases, where users want to make changes to the input OpenAPI document in a way that doesn't impact the generated serialization logic. So for doing things like making operarionIds camelCased, improving description, or even patching bugs in cases where the OpenAPI document is out of sync with the implementation, it'd all could serve really well. I think we should break out this idea into a separate GitHub issue and explore it in more detail!

However, I don't think it's a solution for the problem at hand - and that's turning strings legal in an OpenAPI document into strings legal as Swift identifiers.

For example, let's consider that the OpenAPI document contains an object schema with a property called $nake. That's completely legal in OpenAPI, but in Swift we can't generate a property of this name, as it wouldn't compile. So today, we turn it into _nake and with the proposal at hand, I believe it's become _dollar_nake. However, when we're generating the struct's Codable conformance, we still have to use the original string $nake as the coding key, which is used to parse out the property from the JSON payload coming from the server.

If we relied on preprocessing, and the user patched the OpenAPI document for the property to say snake, then the generated code would look for the key snake, not $nake, and fail to deserialize.

So we have to preserve the original name in the OpenAPI document, as that drives the serialization, but also generate a string safe to be a Swift identifier. And since the space of legal Swift identifier characters is smaller than the space of legal OpenAPI property name characters, we'll always have room for conflicts, we just have to balance that our with usability and readability.

Dante-Broggi · June 29, 2023, 9:46pm

No, single underscore _1 is preserved as _1, even though __1 (2 underscores) goes to ___1 (3 underscores), ___1, and (3 underscores) goes to ____1 (4 underscores), etc.

Honza_Dvorsky · June 29, 2023, 10:09pm

Ah gotcha - yeah that makes sense, and will avoid conflicts, as long as we always use a reversible mapping of illegal characters to legal ones.

We'd basically have to restrict a specific sequence of chars that are legal in Swift, but we wouldn't preserve them as-written by the OpenAPI author. Maybe that's a reasonable trade-off to get rid of all conflicts, I just never considered we'd modify completely legal Swift identifiers (only those that need legalizing).

To show an example: if a user writes __A (legal Swift identifier), we don't respect it, instead we turn it into ___A. If we're happy to relax this expectation, we can do something like you suggested.

// Edit: one more different property of the suggested algorithm is that if you run it repeatedly, you'll get a different string every time. Whereas the existing algorithm is idempotent, as it leaves all legal characters unchanged, and turns illegal ones into legal ones. So running it again gives you the same output as the input. Might not be important to preserve, we can refractor the generator a bit to guarantee to never sanitize a string twice, just wanted to make sure we're aware of it.

Honza_Dvorsky · June 29, 2023, 10:59pm

One more thing to keep in mind that a Swift identifier has a few more limitations:

cannot start with a number
cannot match a keyword

Those also need to be made room for in the transformation, today the code we have for these is here: swift-openapi-generator/Sources/_OpenAPIGeneratorCore/Extensions/String.swift at ec7a1c5b37df0fc20fc5033717c09315a4d40fc3 · apple/swift-openapi-generator · GitHub

beaumont · June 30, 2023, 8:44am

Right, I can see how the JSON Patch would work fine for parts of OpenAPI document that are given friendly names (e.g. Operation IDs), but for JSON/YAML keys or values in the OpenAPI document that are used as-is for the wire protocol, it falls down.

On reflection, though, my tweaked proposal leans in the direction of shifting control back to the user for cases that don't work well in Swift and I'd still be keen to defer that, rather than provide a one-size-fits-most approach.

This could be realised either with an adopter-provided map for illegal characters, or for specific identifiers. We'd then use this in conjunction with the (unedited) OpenAPI document to use the original identifiers for coding keys.

If we do opt for a predetermined, non-configurable mapping, then I'd love to find a one-size-fits-all (guarantees no conflicts), rather than a one-size-fits-most, so I'm sympathetic to @Dante-Broggi's suggestion.

denilchungath · June 30, 2023, 9:10am

This is kind of out there, but might be possible.

Building off of JSON patch, we add our custom operations, obviously this would not be compliant with the standard anymore, and edging on writing our own DSL. This can have a new operation swiftName that the user wants. When generating the types, for each path we query the patch document as well if there is a matching path there and if so apply the name.

denilchungath · June 30, 2023, 9:13am

@Dante-Broggi 's algorithm would be adding underscore on a scalar basis or an a group basis?
So would hello world become _h_e_l_l_o__space___w_o_r_l_d or _hello__space___world ?

This also has a slight disadvantage towards readability, but guarantees conflict free names.

Honza_Dvorsky · June 30, 2023, 9:21am

I think the JSON Patch approach is really promising for a bunch of use cases, can you please file an issue and put some of the existing thinking on it there, @beaumont? I think it can make the story nicer around cosmetic changes that Swift adopters might want to make to OpenAPI docs that use naming conventions more optimized for other languages.

In the spirit of helping provide @denilchungath with actionable feedback on this proposal, I'd like us to come back to the main topic - transforming almost arbitrary OpenAPI strings into constrained Swift identifiers. We have to solve that for real-world OpenAPI documents that we've already encountered, and I'm glad this proposal aims to solve that.

I also like @Dante-Broggi's suggestion, and in my opinion, the downside of it transforming some already-legal strings into different Swift identifiers (when two or more underscores are included) is justified by the benefit of having a fully reversible transformation, guaranteed to avoid conflicts.

To that end, a next step here might be for @denilchungath to work with @Dante-Broggi and incorporate the suggestion and flesh it out fully, as the suggestion doesn't include the extra limitations that Swift identifiers also have, in addition to having to legalize individual characters, and those are not matching reserved keywords and not starting with a number. Plus a question remains about characters outside the printable ASCII range, which in the current shape of the proposal would be transformed into their hex code, but Dante's proposal mentions HTML codes, and the two could end up conflicting unless they each reserve their own legal sequence of characters.

If we can get this proposal to suggest the pseudocode for such an algorithm, that's fully reversible and ends up with legal Swift identifiers, I'd be strongly in favor of taking it as a big improvement over the status quo, and considering additional add-ons in subsequent proposals (the JSON Patch idea, stopping to transform non-wire-affecting names, etc), as we want to keep proposals focused.

beaumont · June 30, 2023, 10:12am

In the spirit of helping provide @denilchungath with actionable feedback on this proposal, I'd like us to come back to the main topic - transforming almost arbitrary OpenAPI strings into constrained Swift identifiers.

I think you may have overlooked my previous post where I acknowledge that the JSON Patch approach isn't going to solve the issue because of wire-protocol concerns, but suggest we put the mapping of problematic characters or identifiers in the hands into configuration.

I object to the suggestion this is off-topic or not moving towards actionable feedback. It's taking a step back from the proposed solution to look at the problem.

For clarity, my intention was not to block or derail @denilchungath—and I hope they don't see it that way. It's feedback of the form "could this problem be solved in a different way?", along with a concrete suggestion of how. Requesting this be discussed entirely separately implies they are orthogonal, which I don't believe they are.

denilchungath · June 30, 2023, 10:33am

I am absolutely fine with having wild new ideas. Honestly, I am not that happy with the current proposal, as it does produce some weird names. Even if we go with @Dante-Broggi that would still create long and unreadable names for some cases. So I suggest we should definitely explore more on the problem before landing on a solution.

For example names like X-Extra-Arguments become X_hyphen_Extra_hyphen_Arguments
With the new suggestion it might be something like _X__ hyphen___Extra__hyphen___Arguments

adam-fowler · July 2, 2023, 8:24am

Why don't you back tick quote keywords eg

let `protocol`: String

instead of adding a _ prefix. You would need multiple sanitizedForSwiftCode functions though, as keywords don't need sanitized when they are a parameter labels.

I had a similar issue in the Soto code generator and this solution seems cleaner.

beaumont · July 6, 2023, 8:14am

I think my current lean here is toward:

A mapping that is most unlikely to conflict, even if it produced slightly cumbersome names; paired with...
A configurable mapping for specific keys and values, that the adopter can use if they don't like what they get from (1).

adam-fowler:

Why don't you back tick quote keywords eg
let `protocol`: String
instead of adding a _ prefix. You would need multiple sanitizedForSwiftCode functions though, as keywords don't need sanitized when they are a parameter labels.

I had a similar issue in the Soto code generator and this solution seems cleaner.

I'd be +1 on using this, where possible.