Yes, I think that's another way to put my point above: I don't see a need for withLockIfAvailable to synchronize (i.e. to establish transitive ordering) with other threads when it fails, and from discussion, it seems like it would it have a very real performance overhead that we'd prefer to avoid.
Maybe, when we finally get around to nailing down the memory model, we'd benefit from another comprehensive audit of platform implementations to see which ones actually try to treat all mutex operations as sequentially consistent. But I don't think we need that for this proposal.