Swift 5 UnsafeMutablePointer not working as intended

Hi all,

I'm working on a text editor with syntax highlighting and I'm using a C framework for my parser (to find out what characters to highlight).

The parser receives a String (as char), a start Int and a length Int and gives back an AST made out of structs called token. The return type is UnsafeMutablePointer<token>?.

struct token {
	unsigned short		type;			//!< Type for the token
	short				can_open;		//!< Can token open a matched pair?
	short				can_close;		//!< Can token close a matched pair?
	short				unmatched;		//!< Has token been matched yet?

	size_t				start;			//!< Starting offset in the source string
	size_t				len;			//!< Length of the token in the source string

	struct token 	*	next;			//!< Pointer to next token in the chain
	struct token 	*	prev;			//!< Pointer to previous token in the chain
	struct token 	*	child;			//!< Pointer to child chain

	struct token 	*	tail;			//!< Pointer to last token in the chain

	struct token 	*	mate;			//!< Pointer to other token in matched pair
};

Then I go over this tree using a recursive function:

let t = mmd_engine_parse_substring(e, location, length)

// Match the String with the Syntax
self.handleTokenTree(t)
func handleTokenTree(_ t: UnsafeMutablePointer<token>?) {
        // A binary search tree of token nodes, each of which can point to a child and/or a "next" sibling token. That builds up the structure of the document!.pointee. You walk through sibling nodes, recursively handling children:
        while t != nil {
            // Recursive function to walk the token tree and apply formatting based on the token types
            self.applyHighlightingForTokenTree(t)
            
            if t!.pointee.child != nil {
                // Recurse into children tokens
                handleTokenTree(t!.pointee.child)
            }
            // Next sibling token
            t = t!.pointee.next
        }
        
        // Cleanup the memory
        token_free(t)
    }

This code worked great in Swift 4.2 and also works perfectly fine in Objective-C.

struct token * t = mmd_engine_parse_substring(e, extendedRange.location, extendedRange.length);

// Match the String with the Syntax
[self handleTokenTree:t];
-(void)handleTokenTree:(token *)t
{
    // A binary search tree of token nodes, each of which can point to a child and/or a "next" sibling token. That builds up the structure of the document!.pointee. You walk through sibling nodes, recursively handling children:
    while (t != nil) {
        // Recursive function to walk the token tree and apply formatting based on the token types
        [self applySyntaxHighlightingForTokenTree:t];
        
        if (t->child != nil) {
            // Recurse into children tokens
            [self handleTokenTree:t->child];
        }
        // Next sibling token
        t = t->next;
    }
    
    // Cleanup the memory
    token_free(t);
}

But in Swift 5, the token loses it's values.

For example, the parser parses the text and returns the token with a proper child and tail. But when it reaches Swift (I placed a breakpoint in the line between let t and handleTokenTree), the token doesn't have the child or the tail and has nil/NULL for everything except for type and mate (which was supposed to be nil/NULL).

I have made sure the issue is just with Swift 5 by trying different devices, different OS versions, Swift 4.2, Objective-C and debugging, using both breakpoints and print statements. The issue occurs only in Swift 5.

Another thing I've noticed, though I'm not sure if it has any effect is, I've kept an eye on the pointer address and Swift 5 adds a lot of leading zeros to the pointer. For example, if the pointer address is 0x109a4000, Swift 5 makes it into 0x000000109a4000.

What can I do to fix the issue except for using Objective-C?

I'm currently using Objective-C as a workaround, but would love to get this resolved and use Swift 5 since the rest of my code is in Swift 5 and I want to calculate offsets using Swift's UTF8/16Views.

This issue has been driving me crazy since I moved to Swift 5 with the release last Monday.

Any help would be much appreciated :pray:

It would help to have a little bit more context. Are you able to share a complete project? What is the e that you're passing into mmd_engine_parse_substring?

Hi @Joe_Groff! :slightly_smiling_face:

I can try and abstract the issue into a smaller but complete project.

MultiMarkdown is the C framework I'm using and it uses a struct called mmd_engine which is saved in e. But it's basically just a string and some options.

This code is all being run in a NSTextStorage subclass, where self.applyHighlightingForTokenTree actually applies the attributes according to the token.

I can also share some screenshots that might help make it clearer.

Thanks. One thing that I'm wondering is whether you might have pointer lifetime issues. If you're constructing the mmd_engine by passing a Swift string as a char*, then that pointer that C receives is only good for the duration of that immediate call. If mmd_engine is holding on to any references into that temporary char* they will end up invalid. Since String's implementation changed heavily in Swift 5 this could've been a latent problem that's now getting exacerbated.

This might be the issue, since I am passing a Swift String to create the engine. Here's the code that creates the engine.

// Create the engine using the NSTextStorage string and specified extensions.
self.e = mmd_engine_create_with_string(self.storage.string, UInt(EXT_NOTES.rawValue | EXT_SMART.rawValue | EXT_CRITIC.rawValue))

Here are the function defintions for mmd_engine_parse_substring and mmd_engine_create_with_string:

/// Parse part of the string into a token tree
token * mmd_engine_parse_substring(mmd_engine * e, size_t byte_start, size_t byte_len);
/// Create MMD Engine using a C string (A private copy of the string will be
/// made.  The one passed here can be freed by the calling function)
mmd_engine * mmd_engine_create_with_string(
	const char *	str,
	unsigned long	extensions
);

How would you suggest I pass the string in a way that won't invalidate the pointer? Although, mmd_engine supposedly creates a private copy of the passed string.

MultiMarkdown is open source, by the way.

Yeah, it looks like string lifetime could well be the issue then. The most straightforward way to address this would be to allocate a copy of the C string you control the lifetime of:

self.cstring = strdup(self.storage.string) // free(self.cstring) in your deinit
self.e = mmd_engine_create_with_string(self.cstring, ...)

I don't know if NSTextStorage itself has a way to get a pointer to its interior storage as a stable char* pointing to a C string.

It didn't fix the issue :pensive:

Here are some screenshots of the code and how it looks on the iPhone.

Putting a breakpoint inside mmd_engine_parse_substring shows the same token for both languages, meaning it's only being altered in Swift 5.

As Sl mentioned, the char pointer is copied to an internal structure (on the assumption that further things might be done with the string and we need a private copy of the contents) and maintained by the engine. So the lifetime of the pointer passed from the swift string should not be an issue - it only needs to last long enough to have the data copied to a newly allocated char *. Similarly, the tokens don’t point to the source string. They simply measure byte offsets and point to other tokens. (All are malloc’ed)

If I can answer any questions, don’t hesitate to ask.

Thanks for the clarification. @Michael_Ilseman or @David_Smith, are there any other Swift 5 string changes that you think could trigger a behavior change like this between Swift 4.2 and Swift 5?

Nothing that seems like it would be relevant. There's bridging changes that change pointer lifetimes, but copying the contents like that should be enough to avoid any issues there.

(I'm working with @PastaCoder to try to reproduce and dig deeper)

Yesterday I made some more debugging and I managed to finally solve the issue.

The issue wasn't in neither Swift or MultiMarkdown, but really just on me.

The thing that caused this mess is - in token.h, there were out_start and out_len in the token struct and they weren't in the header I was using, so that caused Swift to read the token struct incorrectly.

Thank you all for the help! I truly appreciate it! :blush: :pray:t2: @Joe_Groff @Michael_Ilseman @fletcher @David_Smith

2 Likes

Nice, glad you were able to figure it out!