Range formatting

Hi @akyrtzi, I'm kinda late to the party, but I'd like to do this task outside of GSoC (open-source is fun, right? :slight_smile:)

AFAIK it's not assigned to anybody and no ticket is assigned at swift-board.

Regarding range formatting: all we need to do is to pass "applicationRange" (the range where the rules should apply) to the Context and for every format and lint rule call isRuleEnabled on the context by passing the rule-name and the node.
Obviously "applicationRange" is passed to the context from application code that knows the range of the code to be formatted/linted

// Contex.swift
  public func isRuleEnabled(_ ruleName: String, node: Syntax) -> Bool {
    let loc = node.startLocation(converter: self.sourceLocationConverter)
    guard let line = loc.line else { return false }
    
    let applicationLines: ClosedRange<Int>
    if let lo = applicationRange?.start.line, let hi = applicationRange?.end.line {
      applicationLines = lo...hi
    } else {
      applicationLines = Int.min...Int.max
    }
    
    switch ruleMask.ruleState(ruleName, atLine: line) {
    case .default:
      return applicationLines.contains(line) && (configuration.rules[ruleName] ?? false)
    case .disabled:
      return false
    case .enabled:
      return applicationLines.contains(line)
    }
  }
// Example from DoNotUseSemicolons.swift
public final class DoNotUseSemicolons: SyntaxFormatRule {
  private func transformItems(_ items: CodeBlockItemListSyntax) -> CodeBlockItemListSyntax {
      ...
      // the only change is this if
      if context.isRuleEnabled(DoNotUseSemicolons.ruleName, node: item) {
        item = applyRule(item)
      }

      newItems[idx] = item
    }
    return SyntaxFactory.makeCodeBlockItemList(newItems)
  }

What do you think about this solution? complete diff
I tried it on some tests and it worked (the logic is straight-forward though)

I've tried to integrate SwiftFormat to SourceKit-LSP (my diff), but I get EXC_BAD_ACCESS when running tests (meanwhile running swift-format directly works fine :man_shrugging:)

PS. my previous comment is no longer relevant, since no subtree projection is necessary, everything is already there ;)

I've split this up and moved it to swift-format category
\cc @allevato for feedback on implementing the range API.

Before looking into SourceKit-LSP integration I'd suggest to wait a bit until we have finished overhauling the way we build our Swift packages in our CI.

Thanks for looking into this, @denis631!

The approach you describe sounds like it will work for the tree-transforming rules (the ones implemented in SwiftFormatRules), but will need to be adapted to work with the pretty printer that we use to handle indentation and line wrapping.

Right now, the pretty printer is fairly prescriptive; the token stream creator walks the syntax tree and builds a linear stream of formatting pseudo-tokens that are either the program text or some kind of semantic whitespace (e.g., where lines are allowed to break), and then the pretty printer consumes that, computing the size of each line and figuring out where to break and where to indent.

In order to support range formatting, we might think about doing something like emitting the original text and whitespace verbatim when we're looking at tokens outside of the desired range, and then using the regular behavior when inside the range. But that isn't as straightforward as it sounds, because those semantic breaks that would have gotten inserted in the stream would dictate the indentation depth of everything, so the indentation might not be correct if we only apply breaks partially in the stream.

I haven't given this a great deal more thought yet, so there may be some nice ways to avoid a lot of complexity here. I'm definitely open to ideas!

My idea was to append dummy Intent.empty (which is alias for Indent.spaces(0)) to the indentStack in the print method, when the token (at the line in original code) is not in the application range and append normal indentation is the case we are in the range (actually start of the range is adjusted by -1 so that in captures the open break).

This works pretty well (tests which I wrote pass :slight_smile:). The problem is, however, the code which is indented in original code and is not in application range. The exact problem is that the source code indentation is not represented in the original token stream as Token.space but rather as Token.open and in the code, I can not differentiate between open which was indented in original code and which is not, so if I don't indent then the code is aligned to the left border even though it was not intended to be formatted.
Currently, have no ideas of fixing this (modify token stream creator?).

Current diff

My idea could work, but then token stream creator needs to be modified in order to preserve the spaces/tabs from the syntax tree.
The spaces/tabs need to be consumed/replaced when performing formatting and ignored/replicated when the code is not in application range

Yeah, a strategy like that sounds like it would work. Unfortunately it's complicated by some design decisions we had to make to format things like comments correctly.

When we process the original tokens in the file, we strip off the existing trivia and ignore it, with the exception of comments and discretionary newlines (when the configuration option to respect them is enabled). Then, the thing that gets pushed onto the formatting token stream for the syntax token is just the text of the token, rather than the syntax token itself.

If we pushed the whole TokenSyntax onto the stream, then at pretty-printing time we could make a decision about whether we wanted to print just the text or the entire token including its original trivia, but that would include comments that we've already pushed into the stream elsewhere, unless we somehow excluded those as well. But the trick here is that the leading trivia of a token could potentially span many many lines (think a doc comment for example), so if a user selected a range that partially encompassed a comment, we'd have to figure out how to reconcile that. (In fact, that could still be a problem today, because even though we extract the comments, we still treat the whole comment as a unit.)

We might need to do something in the middle, by recognizing the following property. If we support range formatting, there are two modes:

  • When outside the range, any text that we print should be preceded by the original whitespace that came before it and the pretty-printer should not print any whitespace of its own.
  • When inside the range, original whitespace should be discarded and the pretty-printer should be responsible for it.

So, given that we still need to push comments independently of the TokenSyntax that they're attached to, we could consider augmenting formatting tokens that represent text (comments, verbatim, text) to also remember the whitespace that preceded them. Then, the pretty-printer could make a decision about what to do with it depending on which mode it's in. This will be a little bit more trivia slicing work when we process token trivia, but I wonder if it could lead to a working solution.

Your idea totally makes sense!

I implemented it. Diff is here

My tests pass (obviously more tests need to be written, but this is just the beginning), including all PrettyPrinting tests (except the one with expressions in strings, e.g. "(a + b)") are passing.

Right now it works the following way. If in processing range, run as before, if not in processing range, do something else, this way, old pretty printing code is not affected (therefore old tests are not influenced)

Tell me what you think. Show I make a PR?

Terms of Service

Privacy Policy

Cookie Policy