May I enhance UUID to be more lenient when parsing UUID strings?

JetForMe · January 8, 2023, 5:35am

When working with various sources of data, one often gets UUIDs as strings in differing formats. Often, they come without dashes, or maybe dashes in non-standard locations. I wish UUID would be more lenient in accepting those strings.

So I decided to take a look at the code, and it seemed easy enough to handle arbitrary formats. I coded up a fix, but ran into some issues building swift-corelibs-foundation.

Before I go to the trouble of getting it to build, would such an enhancement be accepted?

Here's the as-yet uncompiled and untested change to uuid_parse() I would offer in a PR. Basically, if the existing parsing fails, I'll copy the input string, stripping it of anything not in [0-9a-fA-F], and try again. I think it would be cleaner to just do that always, but there's a slight performance hit with the stripping for preferred-formatted input that someone might object to.

int
uuid_parse(const uuid_string_t in, uuid_t uu)
{
	int n = 0;

	sscanf(in,
		"%2hhx%2hhx%2hhx%2hhx-"
		"%2hhx%2hhx-"
		"%2hhx%2hhx-"
		"%2hhx%2hhx-"
		"%2hhx%2hhx%2hhx%2hhx%2hhx%2hhx%n",
		&uu[0], &uu[1], &uu[2], &uu[3],
		&uu[4], &uu[5],
		&uu[6], &uu[7],
		&uu[8], &uu[9],
		&uu[10], &uu[11], &uu[12], &uu[13], &uu[14], &uu[15], &n);

	if (n == 36 && in[n] == '\0') {
		return 0;
	}
	
	//	Parsing the above failed, so strip anything not in [0-9a-f]
	//	and try again…
	
	char stripped[37];
	int i = 0;
	int j = 0;
	while (in[j] != '\0' && i < sizeof (stripped) && j < 37) {
		if ( '0' <= in[j] <= '9' || 'a' <= in[j] <= 'f' || 'A' <= in[j] <= 'F')  {
			stripped[i++] = in[j];
        }
        j += 1;
	}
    
    stripped[i] = '\0';
	
	sscanf(stripped,
		"%2hhx%2hhx%2hhx%2hhx"
		"%2hhx%2hhx"
		"%2hhx%2hhx"
		"%2hhx%2hhx"
		"%2hhx%2hhx%2hhx%2hhx%2hhx%2hhx%n",
		&uu[0], &uu[1], &uu[2], &uu[3],
		&uu[4], &uu[5],
		&uu[6], &uu[7],
		&uu[8], &uu[9],
		&uu[10], &uu[11], &uu[12], &uu[13], &uu[14], &uu[15], &n);

	if (n == 32 && stripped[n] == '\0') {
		return 0;
	}
	
	return -1;
}

Edit: Alternatively, and perhaps better, would be to add UUID(string:String) to the Swift class, that accepts an arbitrary string.

Karl · January 8, 2023, 6:28am

Library solution, if you're interested - GitHub - karwa/uniqueid: Random and time-ordered UUID generation in Swift

The string initializer is quite lenient - Documentation

It's pure Swift, and generic, so it supports Substring without copying or tedious type conversions (particularly useful these days now that we have native Regex, whose matches are returned as Substrings). There's also a UTF8 initializer for binary strings.

xwu · January 8, 2023, 1:24pm

In the past, at least, the requirement has always been that the open-source implementation should match Apple’s closed sourced implementation for Apple platforms. Therefore, one would need to find someone who could shepherd the same change through an internal review at Apple for any shot at a change in behavior being accepted, which I’ve not yet seen successfully accomplished by an external contributor.

JetForMe · January 10, 2023, 2:02am

I'd think an additive change would be acceptable, wouldn't it? I guess I can just add my own new constructor.

It occurs to me that I thought these things were intended to eventually be "pure Swift," but the current Foundation, and UUID, depend on sscanf().

xwu · January 10, 2023, 2:31am

Nope.