I've been working on some patches which add basic support for PGO to swift [1].
What I have so far is just a proof-of-concept. I'd like to get some feedback on
the approach I've taken.
I've added support for loading profile data, matching up execution counts to
the right parts of the AST, and attaching those execution counts to conditional
branches. I added two fields to CondBranchInst (in SIL):
/// The number of times the True branch was executed.
Optional<uint64_t> TrueBBCount;
/// The number of times the False branch was executed.
Optional<uint64_t> FalseBBCount;
I fixed up the SILCloner and a few other sites where conditional branches are
created to propagate the branch taken counts. I have a patch to propagate
branch taken counts through the SILOptimizer, but I didn't include it in [1]
because I don't know how to write tests for it.
In IRGen, I added some logic to scale execution counts and create llvm
branch_weight metadata.
Some questions:
1. Is it acceptable to make some SIL objects larger in order to store
execution counts (e.g CondBranchInst)? If not, what's the best way to make
this information visible to SILOptimizer and IRGen?
2. Is it better to associate counts with SIL instructions, or with
SILSuccessor?
3. Does anyone have tips on modifying swift/benchmark? I'd like to add a
benchmark driver which generates profile data, and another driver which
uses that data to turn on PGO.
I've been working on some patches which add basic support for PGO to swift [1].
What I have so far is just a proof-of-concept. I'd like to get some feedback on
the approach I've taken.
I've added support for loading profile data, matching up execution counts to
the right parts of the AST, and attaching those execution counts to conditional
branches. I added two fields to CondBranchInst (in SIL):
/// The number of times the True branch was executed.
Optional<uint64_t> TrueBBCount;
/// The number of times the False branch was executed.
Optional<uint64_t> FalseBBCount;
I fixed up the SILCloner and a few other sites where conditional branches are
created to propagate the branch taken counts. I have a patch to propagate
branch taken counts through the SILOptimizer, but I didn't include it in [1]
because I don't know how to write tests for it.
In IRGen, I added some logic to scale execution counts and create llvm
branch_weight metadata.
Some questions:
1. Is it acceptable to make some SIL objects larger in order to store
execution counts (e.g CondBranchInst)?
Yes, I don’t see a problem with that.
But for saving some space, you might want to store the counts as a plain uint64_t instead of an Optional and use a special value (e.g. ~0) for the none-case.
If not, what's the best way to make
this information visible to SILOptimizer and IRGen?
2. Is it better to associate counts with SIL instructions, or with
SILSuccessor?
I think storing the counts in SILSuccessor makes sense, because counts are needed for all kind of terminator instructions (like switch_enum), except for the unconditional br, but IMO we can live with wasting some bytes for this instructions.
Another thing to consider is that the count information should be printed/parsed/serialized when writing and reading SIL.
···
On Sep 6, 2016, at 12:36 PM, Vedant Kumar via swift-dev <swift-dev@swift.org> wrote:
3. Does anyone have tips on modifying swift/benchmark? I'd like to add a
benchmark driver which generates profile data, and another driver which
uses that data to turn on PGO.
I've been working on some patches which add basic support for PGO to swift [1].
What I have so far is just a proof-of-concept. I'd like to get some feedback on
the approach I've taken.
I've added support for loading profile data, matching up execution counts to
the right parts of the AST, and attaching those execution counts to conditional
branches. I added two fields to CondBranchInst (in SIL):
/// The number of times the True branch was executed.
Optional<uint64_t> TrueBBCount;
/// The number of times the False branch was executed.
Optional<uint64_t> FalseBBCount;
I fixed up the SILCloner and a few other sites where conditional branches are
created to propagate the branch taken counts. I have a patch to propagate
branch taken counts through the SILOptimizer, but I didn't include it in [1]
because I don't know how to write tests for it.
In IRGen, I added some logic to scale execution counts and create llvm
branch_weight metadata.
Some questions:
1. Is it acceptable to make some SIL objects larger in order to store
execution counts (e.g CondBranchInst)?
Yes, I don’t see a problem with that.
But for saving some space, you might want to store the counts as a plain uint64_t instead of an Optional and use a special value (e.g. ~0) for the none-case.
Right, this makes sense. Reserving that value should be fine, since it looks
like llvm expects weights to be scaled below UINT32_MAX.
If not, what's the best way to make
this information visible to SILOptimizer and IRGen?
2. Is it better to associate counts with SIL instructions, or with
SILSuccessor?
I think storing the counts in SILSuccessor makes sense, because counts are needed for all kind of terminator instructions (like switch_enum), except for the unconditional br, but IMO we can live with wasting some bytes for this instructions.
Storing counts in SILSuccessor has pros and cons, like you mentioned. It may
lead to wasted space in some cases, but would require fewer invasive API
changes to implement.
Another thing to consider is that the count information should be printed/parsed/serialized when writing and reading SIL.
Yes! I jotted these down as "TODO's" in my commit messages.
3. Does anyone have tips on modifying swift/benchmark? I'd like to add a
benchmark driver which generates profile data, and another driver which
uses that data to turn on PGO.
I still need to work out how to configure swift/benchmark.
At any rate, I'm going to start gradually cleaning up these patches and
submitting PR's. I expect swift to support frontend-based PGO in the future,
but if it doesn't, some of these patches will still be useful. I'd rather have
them in-tree than let them bitrot.