Why does memory leak even when I release the HeapObject object obtained from swift_allocBox?

I have a tuple -

// X is an enum defined below
(predecessor: X, (Float) -> Float)

enum X {
  case bb1(Builtin.RawPointer)
}

I'm allocating this tuple on the heap using swift_allocBox.

Even though I am making sure the boxed object is released by calling swift_release on the originally acquired HeapObject * from swift_allocBox, it seems like I am leaking the memory associated to the closure context of the 2 argument of the tuple.

  1. Am I missing some obvious steps?
  2. Does deallocating the box, not actually destroy and then deallocate the contained value?

No, deallocation is just deallocation.

The comments on the swift_allocBox and swift_release functions seem incorrect in that case? Here's what they say.

/// Allocates a heap object that can contain a value of the given type.
/// Returns a Box structure containing a HeapObject* pointer to the
/// allocated object, and a pointer to the value inside the heap object.
/// The value pointer points to an uninitialized buffer of size and alignment
/// appropriate to store a value of the given type.
/// The heap object has an initial retain count of 1, and its metadata is set
/// such that destroying the heap object destroys the contained value.
BoxPair swift_allocBox(Metadata const *type);

/// Atomically decrements the retain count of an object.  If the
/// retain count reaches zero, the object is destroyed as follows:
///
///   size_t allocSize = object->metadata->destroy(object);
///   if (allocSize) swift_deallocObject(object, allocSize);
///
/// \param object - may be null, in which case this is a no-op
void swift_release(HeapObject *object);

The below parts of the code comments are what I'm confused about.

/// The heap object has an initial retain count of 1, and its metadata is set
/// such that destroying the heap object destroys the contained value.

As well as

/// If the retain count reaches zero, the object is destroyed as follows:
///
/// size_t allocSize = object->metadata->destroy(object);
/// if (allocSize) swift_deallocObject(object, allocSize);

That's correct, swift_release is not just deallocation.

What metadata are you passing to swift_allocBox?

I believe it's TargetTupleTypeMetadata. For a bit more context, I'm modifying this Autodiff builtin to take a metatada pointer instead of a uint. Then I'm simply calling alloc_box on the received metadata pointer and storing the obtained HeapObject * for a release later.

 auto [heapObjectPtr, projection] = swift_allocBox(tupleTypeMetadata); 
// Store heapObjectPtr for release during cleanup

I'm constructing the corresponding metatype in SIL, like below -

auto tupleMetatypeType = CanMetatypeType::get(tupleType, MetatypeRepresentation::Thick);
auto tupleMetatypeSILType = SILType::getPrimitiveObjectType(tupleMetatypeType);
auto tupleMetatype = Builder.createMetatype(original->getLocation(), pbTupleMetatypeSILType);

That looks right, although there’s an alloc_box instruction you could be using in SIL instead of directly calling the runtime function. Are you sure the function value isn’t getting released? It may be that the leak is from an extra retain somewhere else.

That looks right, although there’s an alloc_box instruction you could be using in SIL instead of directly calling the runtime function.

I believe we're deliberately not using alloc_box at the SIL level (anymore) as that was causing memory allocation scaling issues.

Are you sure the function value isn’t getting released? It may be that the leak is from an extra retain somewhere else.

The corresponding SIL code we generate essentially moves the closure into the memory allocated by the box. I don't think there's any retains on the closure other than that. In fact, if I don't store the closure in the boxed memory, the leak goes away.

I don’t see how the instruction would have any “scaling” issues that the corresponding lowering wouldn’t.

Anyway, could you post the SIL you’re generating?

I don’t see how the instruction would have any “scaling” issues that the corresponding lowering wouldn’t.

In the current lowering (without the changes I'm trying to make) we're actually using a bump pointer allocator to allocate the types I've mentioned above. But you are right -- the changes I'm trying to make (which simply allocate a box in the runtime) should encounter the same memory scaling issues that used to exist prior to using the bump pointer allocator.

We're implementing a short term solution to get rid of the leaks in the current implementation where we are storing closures in the memory allocated by the bump pointer allocator, but not properly cleaning up these closures upon deallocation of the bump ptr allocator's memory (we cannot do it because the bump pointer allocator has no type information).

Anyway, could you post the SIL you’re generating?

Here is the SIL for the function that actually calls the autodiff builtins which perform the boxed allocations.

// reverse-mode derivative of f(x:)
sil hidden @$s4main1f1xS2f_tFTJrSpSr : $@convention(thin) (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) {
// %0                                             // users: %59, %3
bb0(%0 : $Float):
  %1 = metatype $@thick (predecessor: _AD__$s4main1f1xS2f_tF_bb3__Pred__src_0_wrt_0, (Float) -> Float).Type // user: %2
  %2 = builtin "autoDiffCreateLinearMapContext"(%1 : $@thick (predecessor: _AD__$s4main1f1xS2f_tF_bb3__Pred__src_0_wrt_0, (Float) -> Float).Type) : $Builtin.NativeObject // users: %37, %48, %43, %64, %67
  debug_value %0 : $Float, let, name "x", argno 1 // id: %3
  %4 = float_literal $Builtin.FPIEEE32, 0x3F800000 // 1 // user: %5
  %5 = struct $Float (%4 : $Builtin.FPIEEE32)     // users: %59, %6
  debug_value %5 : $Float, let, name "r"          // id: %6
  %7 = alloc_stack $IndexingIterator<Range<Int>>, var, name "$generator", implicit // users: %18, %52, %25
  %8 = integer_literal $Builtin.Int64, 0          // user: %9
  %9 = struct $Int (%8 : $Builtin.Int64)          // user: %12
  %10 = integer_literal $Builtin.Int64, 1         // user: %11
  %11 = struct $Int (%10 : $Builtin.Int64)        // user: %12
  %12 = struct $Range<Int> (%9 : $Int, %11 : $Int) // user: %14
  %13 = alloc_stack $Range<Int>                   // users: %16, %14, %19
  store %12 to %13 : $*Range<Int>                 // id: %14
  // function_ref specialized Collection<>.makeIterator()
  %15 = function_ref @$sSlss16IndexingIteratorVyxG0B0RtzrlE04makeB0ACyFSnySiG_Tg5 : $@convention(method) (Range<Int>) -> IndexingIterator<Range<Int>> // user: %17
  %16 = load %13 : $*Range<Int>                   // user: %17
  %17 = apply %15(%16) : $@convention(method) (Range<Int>) -> IndexingIterator<Range<Int>> // user: %18
  store %17 to %7 : $*IndexingIterator<Range<Int>> // id: %18
  dealloc_stack %13 : $*Range<Int>                // id: %19
  %20 = tuple ()                                  // user: %21
  %21 = enum $_AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0, #_AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0.bb0!enumelt, %20 : $() // user: %22
  br bb1(%21 : $_AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0) // id: %22

// %23                                            // user: %32
bb1(%23 : $_AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0): // Preds: bb2 bb0
  %24 = alloc_stack $Optional<Int>                // users: %28, %30, %31
  %25 = begin_access [modify] [static] %7 : $*IndexingIterator<Range<Int>> // users: %27, %29
  // function_ref specialized IndexingIterator.next()
  %26 = function_ref @$ss16IndexingIteratorV4next7ElementQzSgyFSnySiG_Tg5 : $@convention(method) (@inout IndexingIterator<Range<Int>>) -> Optional<Int> // user: %27
  %27 = apply %26(%25) : $@convention(method) (@inout IndexingIterator<Range<Int>>) -> Optional<Int> // user: %28
  store %27 to %24 : $*Optional<Int>              // id: %28
  end_access %25 : $*IndexingIterator<Range<Int>> // id: %29
  %30 = load %24 : $*Optional<Int>                // user: %35
  dealloc_stack %24 : $*Optional<Int>             // id: %31
  %32 = tuple $(predecessor: _AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0) (%23) // users: %50, %39
  %33 = metatype $@thick (predecessor: _AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0).Type // user: %37
  %34 = metatype $@thick (predecessor: _AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0).Type // user: %48
  switch_enum %30 : $Optional<Int>, case #Optional.some!enumelt: bb2, case #Optional.none!enumelt: bb3 // id: %35

bb2(%36 : $Int):                                  // Preds: bb1
  %37 = builtin "autoDiffAllocateSubcontext"(%2 : $Builtin.NativeObject, %33 : $@thick (predecessor: _AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0).Type) : $Builtin.RawPointer // users: %40, %38
  %38 = pointer_to_address %37 : $Builtin.RawPointer to [strict] $*(predecessor: _AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0) // user: %39
  store %32 to %38 : $*(predecessor: _AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0) // id: %39
  %40 = enum $_AD__$s4main1f1xS2f_tF_bb2__Pred__src_0_wrt_0, #_AD__$s4main1f1xS2f_tF_bb2__Pred__src_0_wrt_0.bb1!enumelt, %37 : $Builtin.RawPointer // user: %41
  %41 = tuple $(predecessor: _AD__$s4main1f1xS2f_tF_bb2__Pred__src_0_wrt_0) (%40) // user: %45
  %42 = metatype $@thick (predecessor: _AD__$s4main1f1xS2f_tF_bb2__Pred__src_0_wrt_0).Type // user: %43
  %43 = builtin "autoDiffAllocateSubcontext"(%2 : $Builtin.NativeObject, %42 : $@thick (predecessor: _AD__$s4main1f1xS2f_tF_bb2__Pred__src_0_wrt_0).Type) : $Builtin.RawPointer // users: %46, %44
  %44 = pointer_to_address %43 : $Builtin.RawPointer to [strict] $*(predecessor: _AD__$s4main1f1xS2f_tF_bb2__Pred__src_0_wrt_0) // user: %45
  store %41 to %44 : $*(predecessor: _AD__$s4main1f1xS2f_tF_bb2__Pred__src_0_wrt_0) // id: %45
  %46 = enum $_AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0, #_AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0.bb2!enumelt, %43 : $Builtin.RawPointer // user: %47
  br bb1(%46 : $_AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0) // id: %47

bb3:                                              // Preds: bb1
  %48 = builtin "autoDiffAllocateSubcontext"(%2 : $Builtin.NativeObject, %34 : $@thick (predecessor: _AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0).Type) : $Builtin.RawPointer // users: %51, %49
  %49 = pointer_to_address %48 : $Builtin.RawPointer to [strict] $*(predecessor: _AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0) // user: %50
  store %32 to %49 : $*(predecessor: _AD__$s4main1f1xS2f_tF_bb1__Pred__src_0_wrt_0) // id: %50
  %51 = enum $_AD__$s4main1f1xS2f_tF_bb3__Pred__src_0_wrt_0, #_AD__$s4main1f1xS2f_tF_bb3__Pred__src_0_wrt_0.bb1!enumelt, %48 : $Builtin.RawPointer // user: %63
  dealloc_stack %7 : $*IndexingIterator<Range<Int>> // id: %52
  %53 = metatype $@thin Float.Type                // user: %59
  // function_ref static Float.+ infix(_:_:)
  %54 = function_ref @$sSf1poiyS2f_SftFZ : $@convention(method) (Float, Float, @thin Float.Type) -> Float // user: %57
  // function_ref autodiff subset parameters thunk for forward-mode derivative from static Float.+ infix(_:_:)
  %55 = function_ref @$sSf1poiyS2f_SftFZS3fXMtS3fIegyd_IetMyyydo_TJSfSSUpSrSUUP : $@convention(method) (Float, Float, @thin Float.Type) -> (Float, @owned @callee_guaranteed (Float) -> Float) // user: %57
  // function_ref autodiff subset parameters thunk for reverse-mode derivative from static Float.+ infix(_:_:)
  %56 = function_ref @$sSf1poiyS2f_SftFZS3fXMtS3fIegyd_IetMyyydo_TJSrSSUpSrSUUP : $@convention(method) (Float, Float, @thin Float.Type) -> (Float, @owned @callee_guaranteed (Float) -> Float) // user: %57
  %57 = differentiable_function [parameters 0] [results 0] %54 : $@convention(method) (Float, Float, @thin Float.Type) -> Float with_derivative {%55 : $@convention(method) (Float, Float, @thin Float.Type) -> (Float, @owned @callee_guaranteed (Float) -> Float), %56 : $@convention(method) (Float, Float, @thin Float.Type) -> (Float, @owned @callee_guaranteed (Float) -> Float)} // user: %58
  %58 = differentiable_function_extract [vjp] %57 : $@differentiable(reverse) @convention(method) (Float, @noDerivative Float, @noDerivative @thin Float.Type) -> Float // user: %59
  %59 = apply %58(%0, %5, %53) : $@convention(method) (Float, Float, @thin Float.Type) -> (Float, @owned @callee_guaranteed (Float) -> Float) // users: %61, %60
  %60 = tuple_extract %59 : $(Float, @callee_guaranteed (Float) -> Float), 0 // user: %68
  %61 = tuple_extract %59 : $(Float, @callee_guaranteed (Float) -> Float), 1 // user: %63
  // function_ref pullback of f(x:)
  %62 = function_ref @$s4main1f1xS2f_tFTJpSpSr : $@convention(thin) (Float, @guaranteed Builtin.NativeObject) -> Float // user: %67
  %63 = tuple $(predecessor: _AD__$s4main1f1xS2f_tF_bb3__Pred__src_0_wrt_0, @callee_guaranteed (Float) -> Float) (%51, %61) // user: %66
  %64 = builtin "autoDiffProjectTopLevelSubcontext"(%2 : $Builtin.NativeObject) : $Builtin.RawPointer // user: %65
  %65 = pointer_to_address %64 : $Builtin.RawPointer to [strict] $*(predecessor: _AD__$s4main1f1xS2f_tF_bb3__Pred__src_0_wrt_0, @callee_guaranteed (Float) -> Float) // user: %66
  store %63 to %65 : $*(predecessor: _AD__$s4main1f1xS2f_tF_bb3__Pred__src_0_wrt_0, @callee_guaranteed (Float) -> Float) // id: %66
  %67 = partial_apply [callee_guaranteed] %62(%2) : $@convention(thin) (Float, @guaranteed Builtin.NativeObject) -> Float // user: %68
  %68 = tuple (%60 : $Float, %67 : $@callee_guaranteed (Float) -> Float) // user: %69
  return %68 : $(Float, @callee_guaranteed (Float) -> Float) // id: %69
} // end sil function '$s4main1f1xS2f_tFTJrSpSr'

I meant the SIL that does this boxed allocation and moves the tuple into the storage.

The SIL does not do the boxed allocation itself.

It calls builtins - autoDiffCreateLinearMapContext and autoDiffAllocateSubcontext, which perform the boxed allocation and return Builtin.RawPointers.

The SIL then performs initialization (using store) using these Builtin.RawPointers.

I can create a branch with the updated builtins, if you need to take a look at those. But essentially this is what they look like -

class AutoDiffLinearMapContext : public HeapObject {
private:
  OpaqueValue *topLevelLinearMapContextProjection;
public:
  llvm::SmallVector<HeapObject*, 4> heapObjects;


  AutoDiffLinearMapContext(OpaqueValue * const);
  
  llvm::BumpPtrAllocator& getAllocator() const {
    return const_cast<llvm::BumpPtrAllocator&>(this->allocator);
  }

  OpaqueValue *getTopLevelLinearMapContextProjection() const {
    return this->topLevelLinearMapContextProjection;
  }
};

AutoDiffLinearMapContext *swift::swift_autoDiffCreateLinearMapContext(
    const Metadata *topLevelLinearMapContextMetadata) {
  // Allocate a box for the top-level linear map context
  auto [topLevelContextHeapObjectPtr, toplevelContextProjection] = swift_allocBox(topLevelLinearMapContextMetadata); 

  // Create a linear map context object that stores the projection 
  // for the top level context
  auto linearMapContext = new AutoDiffLinearMapContext(toplevelContextProjection);
  
  linearMapContext->heapObjects.push_back(topLevelContextHeapObjectPtr);

  // Return the newly created linear map context object
  return linearMapContext;
}

void *swift::swift_autoDiffProjectTopLevelSubcontext(
    AutoDiffLinearMapContext *linearMapContext) {
  return static_cast<void*>(linearMapContext->getTopLevelLinearMapContextProjection());
}

void *swift::swift_autoDiffAllocateSubcontext(
    AutoDiffLinearMapContext *linearMapContext, const Metadata *linearMapSubcontextMetadata) {
  // Allocate a box for the linear map subcontext
  auto [subcontextHeapObjectPtr, subcontextProjection] = swift_allocBox(linearMapSubcontextMetadata);

  linearMapContext->heapObjects.push_back(subcontextHeapObjectPtr);

  // Return the subcontext projection
  return static_cast<void*>(subcontextProjection);
}

Okay. And the heap metadata for AutoDiffLinearMapContext destroys it somehow?

Oh sorry, yeah the destroyer for AutoDiffLinearMapContext looks like this -

static void destroyLinearMapContext(SWIFT_CONTEXT HeapObject *obj) {
  auto *linearMapContext = static_cast<AutoDiffLinearMapContext *>(obj);

  for (auto *heapObjectPtr : linearMapContext->heapObjects) {
    swift_release(heapObjectPtr);
  }

  linearMapContext->~AutoDiffLinearMapContext();
  free(obj);
}

Hmm, alright. Yeah, that seems reasonable. You're sure you know which object is being leaked?

I do feel fairly confident about the leak, yes. This is what the output from leaks looks like -

STACK OF 1 INSTANCE OF 'ROOT LEAK: <Swift closure context>':
9   dyld                                  0x18d5a7f28 start + 2236
8   check-if-leaks                        0x1001124c8 main + 12
7   check-if-leaks                        0x1001126c8 m() + 348
6   libswift_Differentiation.dylib        0x21afe6fd0 valueWithPullback<A, B>(at:of:) + 152
5   check-if-leaks                        0x100112924 thunk for @callee_guaranteed (@unowned Float) -> (@unowned Float, @owned @escaping @callee_guaranteed (@unowned Float) -> (@unowned Float)) + 48
4   check-if-leaks                        0x100112c10 reverse-mode derivative of f(x:) + 292
3   check-if-leaks                        0x100112f1c autodiff subset parameters thunk for reverse-mode derivative from static Float.+ infix(_:_:) + 84
2   libswiftCore.dylib                    0x19c9a8790 swift_allocObject + 64
1   libswiftCore.dylib                    0x19c9a8594 swift_slowAlloc + 64
0   libsystem_malloc.dylib                0x18d740d88 _malloc_zone_malloc_instrumented_or_legacy + 128
====
    1 (48 bytes) ROOT LEAK: <Swift closure context 0x13df04180> [48]

We originally triaged this issue in this github issue.

Okay. Can you also trace the retains and releases on that object?

How do I do that?

Instruments has a tool for that. Otherwise, I’m not sure.