nRF: memalign issue

ebariaux · August 19, 2024, 11:20am

Hello,

I've been extending the nRF blink example (1. thanks a lot to all those who put up the examples together, 2. I'll publish blog posts and code repos when I've got some more interesting stuff) and I'm now facing some weird issue with the memalign implementation.

My simple swift code is

print("a")
var iPtr: UnsafeMutablePointer<UInt16> = UnsafeMutablePointer<UInt16>.allocate(capacity: 1)
print("b")

and this prints a to the console, then is stuck.

I did add a couple of print statements to the memalign implementation from the repo:

int
posix_memalign(void **memptr, size_t alignment, size_t size)
{
  printk(">>posix_memalign\n");
  void *p = aligned_alloc(alignment, size);
  printk("Returned from aligned_alloc\n");
  if (p) {
    *memptr = p;
    return 0;
  }

  return errno;
}

and that's what I see in the console

a
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign
>>posix_memalign

and the code halts there.

This would suggest some kind of recursive call from aligned_alloc to posix_memalloc, but looking at the Zephyr source code, I could not see how that would happen.

If I use malloc() instead of aligned_alloc(), code works properly and I can use the UnsafeMutablePointer normally.
However, I suppose not respecting the alignment could lead to some issues down the road.

I've seen that Zephyr should have a k_aligned_alloc but this results in an undefined reference when linking (that I have not yet taken the time to investigate further).

Has anyone faced anything similar ? Does anybody have an explanation or ideas on what/how I could further debug the issue ?

This is much lower level that the code I'm usually working with, so bear with me if I don't understand some of the mechanisms at play here.

Thanks,
Eric

Josh_Osborne · August 19, 2024, 2:00pm

Maybe log the actual requested alignment? Or take a leap of faith and special case an alignment of 1 (and 0!) and see if that fixes it?

ebariaux · August 19, 2024, 3:38pm

Thanks for your feedback.

If I log the alignment, the initial value is 16.

I then forced the alignement to 1 or 0 when calling aligned_alloc in the posix_memalign implementation (that's what I understood from your suggestion).

And the output now looks like

a
>>posix_memalign
alignment 16
>>posix_memalign
alignment 1
>>posix_memalign
alignment 1
...

or

a
>>posix_memalign
alignment 16
>>posix_memalign
alignment 0
>>posix_memalign
...

Which I interpret as a confirmation of the recursive call.

Josh_Osborne · August 21, 2024, 9:19pm

Well what I actually meant was check the incoming alignment and for alignment of 0 or 1 you are already aligned so you are good to go. I had thought the aligned_alloc might call posix_memalign with something like 0 or 1...

Lets see as an implementation detail macOS and iOS et al have a malloc that aligns everything to at least 16 bytes. So maybe that is relevant here. On a non-Apple platform maybe just use regular malloc add mod 16 and when free gets called lop off the low bits...except there isn't a different free is there? Aw, rats.

ebariaux · August 23, 2024, 3:10pm

I did some further digging with using k_aligned_alloc() instead of aligned_alloc().

This was initially giving me a "undefined reference to `k_aligned_alloc'" error message in a linking stage.

Looking into Zephyr source code, I found that this function is defined in kernel/mempool.c under a conditional
#if (K_HEAP_MEM_POOL_SIZE > 0)

Searching for that constant, I found some information in the Migration guide to Zephyr v3.6.0 — Zephyr Project Documentation page, indicating that this constant was indeed now driving the system heap availability.

I did not, however, find anyway to make this work, defining that constant in Stubs.c has no effect.

As the release notes indicated that the old option was still available, I tried that, adding CONFIG_HEAP_MEM_POOL_SIZE=2048 to my prj.conf file.

With that in place, I can now use
void *p = k_aligned_alloc(alignment, size);
in the posix_memalign() implementation in Stubs.c

I now need to address the deallocation, calling .deallocate() on my UnsafeMutablePointer should eventually call k_free()

ebariaux · October 3, 2024, 4:22pm

I kept digging on and off during the last few weeks, and although I don't have all my answers yet, I thought I would post as it might interest some people (and I could also receive some help while I keep going down the rabbit hole).

I verified that using k_aligned_alloc in the posix_memalign() implementation works, returns addresses that are properly aligned but fails to deallocate.

Calling deallocate() on my UnsafeMutablePointer calls free() (I suppose) and this does not know anything about the memory allocated by k_aligned_alloc() and so does nothing.

Doing a loop of .allocate(capacity:) and deallocate() eventually ends up with an error (as there is no more memory to allocate).

Sample with CONFIG_HEAP_MEM_POOL_SIZE=256in prj.conf, eventually allocations fail

    for i in 1..<100 {
      var ptr: UnsafeMutablePointer<UInt32> = UnsafeMutablePointer<UInt32>.allocate(capacity: 1)
      ptr.deallocate()
    }

As free is already implemented, I cannot re-implement it to call k_free instead.
I looked into the implementation of deallocate in UnsafePointer.swift (swift/stdlib/public/core/UnsafePointer.swift at c874e6f84cf7c5fdb3665b33febb54b352676e50 · swiftlang/swift · GitHub) but this calls Builtin.deallocRaw() and this goes beyond my current understanding of the inner workings of Swift.

I tried directly calling k_free() from my Swift code and that works but:

I'm not sure if there could be potential side effects, I also tried calling both deallocate() and k_free() and that works
it's definitely not elegant and does make the code less readable for a Swift developer

Following works fine

    for i in 1..<100 {
      var ptr: UnsafeMutablePointer<UInt32> = UnsafeMutablePointer<UInt32>.allocate(capacity: 1)
      ptr.deallocate()
      k_free(ptr)
    }

The other path I took is to use malloc(). This works fine and memory is freed as required, but the requested alignement is not respected, as shown in the below example. I'm not sure how bad this is, never caused issues in the little projects I've worked on.

posix_memalign() implementation

int
posix_memalign(void **memptr, size_t alignment, size_t size)
{
  printk(">>posix_memalign, asking for %d with align %d\n", size, alignment);
  printk("Size of size_t %d\n", sizeof (size_t));

  void *p = malloc(size);
  if (p) {
    printk("malloc returned %p, aligned %s\n", p, ((uint32_t)p % alignment) == 0 ? "yes":"no");
    printk("aligned on size_t ? %s\n", ((uint32_t)p % sizeof (size_t)) == 0 ? "yes":"no");
    *memptr = p;
    return 0;
  }

  printk("Error, code %d\n", errno);
  return errno;
}

My swift code

struct MediumStruct {
  var a = 42
  var b = true
  var c = "Hello"
  var d = 43
  var e = 44
  var f = 45
}

    var iPtr: UnsafeMutablePointer<UInt16> = UnsafeMutablePointer<UInt16>.allocate(capacity: 1)
    var pMedium = UnsafeMutablePointer<MediumStruct>.allocate(capacity: 1)
    var a: [Bool] = Array(repeating: true, count: 1)
    var iPtr2: UnsafeMutablePointer<UInt16> = UnsafeMutablePointer<UInt16>.allocate(capacity: 1)

results in

>>posix_memalign, asking for 2 with align 16
Size of size_t 4
malloc returned 0x20002088, aligned no
aligned on size_t ? yes
>>posix_memalign, asking for 32 with align 16
Size of size_t 4
malloc returned 0x20002090, aligned yes
aligned on size_t ? yes
>>posix_memalign, asking for 17 with align 4
Size of size_t 4
malloc returned 0x200020b8, aligned yes
aligned on size_t ? yes
>>posix_memalign, asking for 2 with align 16
Size of size_t 4
malloc returned 0x200020d8, aligned no
aligned on size_t ? yes

Regarding my observation that, with the original implementation in the example repo, there was some recursion, I found that, in EmbeddedRuntime.swift, there is the implementation of alignedAlloc() swift/stdlib/public/core/EmbeddedRuntime.swift at c874e6f84cf7c5fdb3665b33febb54b352676e50 · swiftlang/swift · GitHub and it calls posix_memalign().
But stubs.c is calling aligned_alloc() (different casing). Are those the same thing ?

@kubamracek , as you did the commit with that implementation, would you mind shedding some light on this for me ? That would be tremendously helpful.

And one more question for now, although an UnsafeMutablePointer.allocate() do indeed call posix_memalign(), creating an instance of a class does not. I still think the instance ends up on the heap and not the stack but I don't understand the mechanism behind, still experimenting with that one.

Apologies for the super long post. I'll soon post a GitHub repo with my different test projects and when I finally understand the details of this, I'll create a blog post about it.

ebariaux · October 5, 2024, 12:47pm

I created a series of example around this at GitHub - nelcea/EmbeddedSwift-nRF52-MemoryTests

One more observation from those tests is that if I use the "minimal libc" from Zephyr and not newlib, the original code works fine, allocating with the correct alignment and de-allocating as required.

But the flag to use newlib (CONFIG_NEWLIB_LIBC=y in prj.conf) was added in the same commit that added the implementation of posix_memalign.

tera · October 5, 2024, 1:08pm

Pretty much so.

YAGNI suggests that you only worry about it (and deal with it) only once (and if) it actually happens.

You didn't include this information, so I have to guess: you need to implement posix_memalign because it is required for some component (swift runtime?) and you are getting a linking error without implementing it yourself because it is not provided out of the box?

I'd do this to get the thing off the ground (pseudocode):

int posix_memalign(sizem, wantedAlignment, address) {
    p = malloc(size)
    if (!p) { return -1 }
    if p's alignment is less than wantedAlignment {
        fatalError("start worrying about it")
    }
    *address = p
    return 0
}

PS. It is very tricky to implement posix_memalign properly if there is no underlying call that allocates aligned memory.

ebariaux · October 6, 2024, 8:00am

Thank you for your feedback.

You didn't include this information, so I have to guess: you need to implement posix_memalign because it is required for some component (swift runtime?) and you are getting a linking error without implementing it yourself because it is not provided out of the box?

The original code in the Apple embedded swift examples repo was implementing it, so I did not even question that, but I saw it getting called anyway.
Just to confirm, I removed it and I indeed get an "undefined reference" during the link (at least when using newlib).

if p's alignment is less than wantedAlignment {
fatalError("start worrying about it")
}
That's what I'm observing, when using UnsafeMutablePointer.allocate(), the requested alignment is 16, but malloc seems to align on 8 bytes boundary.
Other operations (e.g. Array() initializer) have other requirements (e.g. 4 in this case).

I'm currently looking into the Swift source code to understand why the 16 bytes requirements is there, starting with the allocate() implementation in UnsafePointer.swift in stdlib, which already lead me

/// Manually allocated memory is at least 16-byte aligned in Swift.
///
/// When swift_slowAlloc is called with "default" alignment (alignMask ==
/// ~(size_t(0))), it will execute the "aligned allocation path" (AlignedAlloc)
/// using this value for the alignment.
///
/// This is done so users do not need to specify the allocation alignment when
/// manually deallocating memory via Unsafe[Raw][Buffer]Pointer. Any
/// user-specified alignment less than or equal to _swift_MinAllocationAlignment
/// results in a runtime request for "default" alignment. This guarantees that
/// manual allocation always uses an "aligned" runtime allocation. If an
/// allocation is "aligned" then it must be freed using an "aligned"
/// deallocation. The converse must also hold. Since manual Unsafe*Pointer
/// deallocation is always "aligned", the user never needs to specify alignment
/// during deallocation.
///
/// This value is inlined (and constant propagated) in user code. On Windows,
/// the Swift runtime and user binaries need to agree on this value.
#define _swift_MinAllocationAlignment (__swift_size_t) 16

in RuntimeShims.h
but there are still holes in my understanding, so I'll report back when I have more information.