It looks like the performance difference might come down to us picking a slower string interpolation path in the generic case. If I change all of the test* functions to this:
ret = "\(s as Any), net: \(net as Any), tax: \(tax as Any), gross: \(gross as Any)"
which forces it to always pick the most general dynamic string interpolation implementation, then I get pretty much identical timings for the generic and non-generic implementations:
Double time: -1.4328429698944092
Decimal time: -4.693014979362488
DecFP64 time: -1.6121209859848022
Dec64 time: -1.5860040187835693
TDouble time: -1.400465965270996
TDec64FPtime: -1.6180580854415894
cc @beccadax and @Michael_Ilseman. I would guess that we have at least a specific overload in StringInterpolation for interpolating Double, and that in the generic case, we fall into the most generic entry point, since in templTest we wouldn't be able to see the specific entry points. Maybe a tailored optimization in the specializer to re-specialize string interpolation calls would help.
Another experiment I tried was making it so that templTest required T: CustomStringConvertible in addition to FloatingPoint, enabling string interpolation to find the conformance statically instead of by dynamic lookup. This also brings Dec64 and TDec64 in line (though Double still apparently benefits from the Double-specific printing overload only in the static case):
Double time: -1.0079069137573242
Decimal time: -3.8584940433502197
DecFP64 time: -0.9931479692459106
Dec64 time: -0.982342004776001
TDouble time: -1.5873949527740479
TDec64FPtime: -0.9742140769958496
And finally, if I remove the interpolation entirely, and change the functions to all return structs, then the generic and non-generic cases also fall into line:
Double time: -0.000970005989074707
Decimal time: -1.1530550718307495
DecFP64 time: -0.08175003528594971
Dec64 time: -0.09572494029998779
TDouble time: -0.0009540319442749023
TDec64FPtime: -0.08098399639129639
suggesting that, at least, the numeric part of the code is not hitting any optimization barriers.