[SourceKit] NSRange, Swift.String, and NSString


(Tyler Stromberg) #1

I'm currently working on integrating SourceKit with a macOS application. AppKit APIs (e.g. NSAttributedString, NSLayoutManager, etc) deal in terms of NSRange (UTF-16 code units?). SourceKit, however, deals in terms of integer offsets and lengths (UTF-8 code units?). Is there a more efficient or easier way to convert back and forth between the two other than doing the index(_:offsetBy:) -> samePosition(in:) dance?


(Ben Langmuir) #2

I'm currently working on integrating SourceKit with a macOS application. AppKit APIs (e.g. NSAttributedString, NSLayoutManager, etc) deal in terms of NSRange (UTF-16 code units?). SourceKit, however, deals in terms of integer offsets and lengths (UTF-8 code units?).

UTF8 byte offsets

Is there a more efficient or easier way to convert back and forth between the two other than doing the index(_:offsetBy:) -> samePosition(in:) dance?

If you’re doing a bunch of queries in the same file, you could build a table of line start offsets in both UTF8 and UTF16 you may get faster results by going UTF8 offset -> UTF8 line + delta -> UTF16 line + delta -> UTF16 offset. Since then the expensive part is O(line length) instead of O(file size).

I don’t know of a good canned solution.

···

On Mar 24, 2017, at 10:59 AM, Tyler Stromberg via swift-dev <swift-dev@swift.org> wrote:

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


(JP Simard) #3

I ended up writing some convenience APIs to perform these conversions along
with many other useful SourceKit<->Cocoa conversions like line+column,
UTF-8, UTF-16 and String.Index in SourceKitten. It's MIT-licensed so feel
free to grab the String extensions from the project yourself:
https://github.com/jpsim/SourceKitten/blob/master/Source/SourceKittenFramework/String+SourceKitten.swift

That being said, you might have an easier time working with SourceKitten
than with with SourceKit directly, since it does a whole lot more, like
dynamically resolving+loading which SourceKit to use, caching expensive
operations, easier multi-threaded access, generating documentation, etc.

···

On Fri, 24 Mar 2017 at 10:59 Tyler Stromberg via swift-dev < swift-dev@swift.org> wrote:

I'm currently working on integrating SourceKit with a macOS application.
AppKit APIs (e.g. NSAttributedString, NSLayoutManager, etc) deal in terms
of NSRange (UTF-16 code units?). SourceKit, however, deals in terms of
integer offsets and lengths (UTF-8 code units?). Is there a more efficient
or easier way to convert back and forth between the two other than doing
the index(_:offsetBy:) -> samePosition(in:) dance?
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


(Tyler Stromberg) #4

I'm currently working on integrating SourceKit with a macOS application. AppKit APIs (e.g. NSAttributedString, NSLayoutManager, etc) deal in terms of NSRange (UTF-16 code units?). SourceKit, however, deals in terms of integer offsets and lengths (UTF-8 code units?).

UTF8 byte offsets

Thanks for the clarification. =)

Is there a more efficient or easier way to convert back and forth between the two other than doing the index(_:offsetBy:) -> samePosition(in:) dance?

If you’re doing a bunch of queries in the same file, you could build a table of line start offsets in both UTF8 and UTF16 you may get faster results by going UTF8 offset -> UTF8 line + delta -> UTF16 line + delta -> UTF16 offset. Since then the expensive part is O(line length) instead of O(file size).

I don’t know of a good canned solution.

Yeah, this is what I figured I'd have to do.

Thanks for the help!

···

On Mar 24, 2017, at 12:08 PM, Ben Langmuir <blangmuir@apple.com> wrote:

On Mar 24, 2017, at 10:59 AM, Tyler Stromberg via swift-dev <swift-dev@swift.org> wrote:


(Tyler Stromberg) #5

We started off using those convenience APIs from SourceKitten (huge thanks for SourceKitten, BTW) but ended up moving to our own solution — for this issue due to some performance issues, particularly in large files, and for interfacing with SourceKit in general because we wanted a little more control over request/response handling.

···

On Mar 24, 2017, at 12:09 PM, Jean-Pierre Simard <jp@jpsim.com> wrote:

I ended up writing some convenience APIs to perform these conversions along with many other useful SourceKit<->Cocoa conversions like line+column, UTF-8, UTF-16 and String.Index in SourceKitten. It's MIT-licensed so feel free to grab the String extensions from the project yourself: https://github.com/jpsim/SourceKitten/blob/master/Source/SourceKittenFramework/String+SourceKitten.swift

That being said, you might have an easier time working with SourceKitten than with with SourceKit directly, since it does a whole lot more, like dynamically resolving+loading which SourceKit to use, caching expensive operations, easier multi-threaded access, generating documentation, etc.

On Fri, 24 Mar 2017 at 10:59 Tyler Stromberg via swift-dev <swift-dev@swift.org> wrote:
I'm currently working on integrating SourceKit with a macOS application. AppKit APIs (e.g. NSAttributedString, NSLayoutManager, etc) deal in terms of NSRange (UTF-16 code units?). SourceKit, however, deals in terms of integer offsets and lengths (UTF-8 code units?). Is there a more efficient or easier way to convert back and forth between the two other than doing the index(_:offsetBy:) -> samePosition(in:) dance?
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev