Because Substring is required to be able to produce indexes valid in the original String, and because it’s not a type that’s as important to optimize because it’s not intended to be stored long-term. (I don’t remember if four words is under or over our pass-indirectly threshold, though.)
It's over, the pass-indirectly threshold is 3 words.
It’s target-specific, but currently it’s 4 on every target except i386, which is register-poor and uses 3. That number is the number of “things that can fit in a register”, though, not necessarily words, so a 4x4 matrix that stores four
SIMD4<Float> types would still be direct.
You may be thinking of the boxing threshold for protocol types, which is three words.
Substring.base exists and returns a String. At a minimum, at least one of those 4 pointers will have to point back to the parent String to service that API, even if the actual contents are duplicated in to in-line storage, and even if there was some magic way to produce compatible indexes.
Lacking that, and once you start taking in to account the need to produce compatible indexes (as @jrose mentioned), you basically eat all of that space and arrive more-or-less where we are.
These things happen all the time, and they're really annoying. I remember hearing (many years ago) that Microsoft spent a lot of time trying to replace the Windows Registry (the system's key-value tree thing). The problem was that once you considered all the use-cases they had to support, you ended up with something that looked exactly like the Windows Registry. Frustrating.
This could still be done via some cleverness, like working on the assumption most indices into the string are usually compactable down to a smaller width, and so when possible using less storage for them, leaving more room for content, and widening them back into the full index type when you serve them up.
But this leads to questioning the original assumption, which is that storing more ASCII characters inline in a substring would be useful. A lot of the win with small strings is avoiding allocating memory, but in most cases a substring doesn't need to – since substrings (almost always) came from an existing string, and so they can access that string's storage instead of allocating their own. Small strings also avoid reference counting but the use cases for substrings is hopefully mainly when the reference counting can be avoided by a combination of the optimizer and calling conventions.
So what you end up asking is, would it be more efficient to avoid occasional reference counting and some indirect memory access, vs the cost of packing and unpacking into the larger storage. IIRC during the
String redesign, much of the speedups came from simplifying the internal representations to reduce branching, which these "smarter' storage patterns tend to cause more of.