Memory leak with String.range(options: .regularExpression) [SR-3536]

Hi everyone,

I am working on [SR-3536] Memory leak with String.range(options: .regularExpression) · Issue #4303 · apple/swift-corelibs-foundation · GitHub

There is a memory leak when searching for the substring of a string using
regular expression.

import Foundation

let myString = "Foo"
for _ in 1...10000 {
  let _ = myString.range(of: "bar", options: .regularExpression)
}

From the above test case i could see that over a period of time, around 60

Mb of memory was leaked.

I see in String.range we eventually call NSString._createRegexForPattern.
Here we maintain a mapping between NSString and NSRegularExpression object
in NSCache<NSString, NSRegularExpression>. All the entries in the cache are
maintained in a dictionary ( _entries ) which takes the UnsafeRawPointer as
the key, which seems to be the address of the NSString Object and
NSCachedEntry as value.

Though the pattern is of type String, it is stored in the NSCache as
NSString. And since we are storing the NSCachedEntry objects in a
dictionary indexed by the address (UnsafeRawPointer) of the NSString
object, there is a new cache entry created for each iteration ( in the test
case ) though the pattern string remains the same.

Can someone guide me about how to go about resolving this issue.

Thank you.

- Nethra Ravindran

Hi Nethra,

Hi everyone,

I am working on [SR-3536] Memory leak with String.range(options: .regularExpression) · Issue #4303 · apple/swift-corelibs-foundation · GitHub
There is a memory leak when searching for the substring of a string using regular expression.

import Foundation

let myString = "Foo"
for _ in 1...10000 {
  let _ = myString.range(of: "bar", options: .regularExpression)
}

From the above test case i could see that over a period of time, around 60 Mb of memory was leaked.

I see in String.range we eventually call NSString._createRegexForPattern. Here we maintain a mapping between NSString and NSRegularExpression object in NSCache<NSString, NSRegularExpression>. All the entries in the cache are maintained in a dictionary ( _entries ) which takes the UnsafeRawPointer as the key, which seems to be the address of the NSString Object and NSCachedEntry as value.

Though the pattern is of type String, it is stored in the NSCache as NSString. And since we are storing the NSCachedEntry objects in a dictionary indexed by the address (UnsafeRawPointer) of the NSString object, there is a new cache entry created for each iteration ( in the test case ) though the pattern string remains the same.

Can someone guide me about how to go about resolving this issue.

Looks like you’ve done most of the analysis, so you’re already pretty much there. =)

Is there some other way we could be caching the results here?

- Tony

···

On Feb 7, 2017, at 11:44 PM, Nethra Ravindran via swift-corelibs-dev <swift-corelibs-dev@swift.org> wrote:
Thank you.

- Nethra Ravindran

_______________________________________________
swift-corelibs-dev mailing list
swift-corelibs-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-corelibs-dev

There's a 'cache.countLimit = 10' set on the cache:

Shouldn't it start discarding some of the previous entries after it hits the first 10?

Alex

···

On 8 Feb 2017, at 16:51, Tony Parker via swift-corelibs-dev <swift-corelibs-dev@swift.org> wrote:

Hi Nethra,

On Feb 7, 2017, at 11:44 PM, Nethra Ravindran via swift-corelibs-dev <swift-corelibs-dev@swift.org <mailto:swift-corelibs-dev@swift.org>> wrote:

Hi everyone,

I am working on [SR-3536] Memory leak with String.range(options: .regularExpression) · Issue #4303 · apple/swift-corelibs-foundation · GitHub
There is a memory leak when searching for the substring of a string using regular expression.

import Foundation

let myString = "Foo"
for _ in 1...10000 {
  let _ = myString.range(of: "bar", options: .regularExpression)
}

From the above test case i could see that over a period of time, around 60 Mb of memory was leaked.

I see in String.range we eventually call NSString._createRegexForPattern. Here we maintain a mapping between NSString and NSRegularExpression object in NSCache<NSString, NSRegularExpression>. All the entries in the cache are maintained in a dictionary ( _entries ) which takes the UnsafeRawPointer as the key, which seems to be the address of the NSString Object and NSCachedEntry as value.

Though the pattern is of type String, it is stored in the NSCache as NSString. And since we are storing the NSCachedEntry objects in a dictionary indexed by the address (UnsafeRawPointer) of the NSString object, there is a new cache entry created for each iteration ( in the test case ) though the pattern string remains the same.

Can someone guide me about how to go about resolving this issue.

Looks like you’ve done most of the analysis, so you’re already pretty much there. =)

Is there some other way we could be caching the results here?

There's a 'cache.countLimit = 10' set on the cache:

https://github.com/apple/swift-corelibs-foundation/blob/16657160c2c441a58ea01bf7baa90607a0b395f7/Foundation/NSString.swift#L109

Shouldn't it start discarding some of the previous entries after it hits the first 10?

Doesn't look like it removes any entries from the cache, nor does the cache work when a String is used as a key. Added info to [SR-3536] Memory leak with String.range(options: .regularExpression) · Issue #4303 · apple/swift-corelibs-foundation · GitHub

> c.setObject("foo",forKey:"foo")
> c.object(forKey:"foo")
$R6: Foundation.NSString? = nil

Alex

···

On 8 Feb 2017, at 16:59, Alex Blewitt via swift-corelibs-dev <swift-corelibs-dev@swift.org> wrote:

Some data points that may be useful for those debugging:
Looks to me like this bug is different for Linux vs Mac.
I discovered this bug leaking ~hundreds of MB in some code and stopped to do an analysis.

Using a condensed version like OP's, it leaks at about 15 Mb/s:

import Foundation

let string = "asdfthisisnotastring"

for i in 0..<10000000{
    if i % 10000 == 0{
        print(i)
    }
    //leaks
    let _ = string.range(of: "string", options: .regularExpression) != nil 

    //no leaks
    //let _ = string.contains("string") 
}

This was on a Mac.
Interestingly, a colleague who ran the same code on Linux did not see significant leakage.

Bless your compiler-dude/dudette souls and may the god of bugs smile upon your quest to fix this one.

You'll need to put an autorelease pool in there, because the regex version calls out to ObjC and that generates autoreleased objects.

It's not at all obvious but it is just a consequence of how the language and frameworks are designed.

1 Like

Ok that's good to know, thanks.
I don't see how that explains the Mac vs Linux difference though

So if I read you correctly, I should not expect this to ever be 'fixed', as in it is a fundamental part of the way the language is structured?

We don't have autoreleasepools or ObjC on Linux. :smile:

So in Linux it's just done by some other mechanism then?
Should we ever expect this leak to be 'fixed', or is this baked in? Without the manual autorelease I mean
Thanks

I use a slightly different definition of the term leak -- to me, that means memory that is lost and you can never find it again. Autoreleased objects are not leaked - they are stored in the pool. If the pool is never drained then it forms "abandoned" memory, which has a reference but is no longer in use.

Autorelease pools are a fact of life when working with ObjC code. You may not ever create the autoreleased object yourself - it may be done inside the framework's own code, which assumes some larger autorelease pool is in place. That can be something as simple as writing [NSArray array], e.g. We've been slowly removing some of these over a few releases for performance reasons, but there are a lot of cases where objects cannot be eagerly released because we find out that various bits of code out there are relying on the lifetime being extended as a side effect. I would say it's safe to assume this is basically not going to change.

Therefore, if you have tight loops where you call out to framework code that generates autoreleased objects (something you can discover via looking at memory usage over time in Instruments, for example), then you can simply insert an autorelease { } block in there to make sure the pool has a chance to drain. That will reduce the peak memory usage.

1 Like

Awesome, thanks for forgiving my ignorance and for the great explanation!
Do you know off the top of your head how this ObjC functionality is provided on Linux, curiosity?

There is no Objective C on Linux, so Linux has no autoreleased objects. Nor does it have or need the functionality to drain pools that do not exist.

I often use no‐op standins like this so that calls to autorelease compile on all platforms.

Is there a way to never use Objective C, even on Mac? Like can you tell it to compile on Mac as though it were Linux? Or does it have to use Objective C on Mac?