This mental model is indeed a quite brilliant one and actually implemented in HW in the past too, see Sony’s Allegrex VFPU co-processor:

### 4.5 COP2 (VFPU)

The psp's VFPU (Vector Floating Point Unit) is a coprocessor that can perform quite a few useful operations. The main purpose of it is vector and matrix processing, but it also supports trigonemtric functions and other mathematical operations, conversions, and mathematical constants.

#### 4.5.1 Registers

The VFPU has 128 single precision floating point (IEEE 754) registers (VFR0-VFR127), but they are arranged and accessed in various ways that make it very flexible. Many of the instructions for the VFPU support operations on:

- a single register
- a pair of registers
- three registers
- four regiters
- 2x2 matrix
- 3x3 matrix
- 4x4 matrix

And if that weren't enough, it can work with matrices in normal or transposed orders. The registers are grouped into 8 blocks of 16 registers each. This gives you enough room to work with 8 4x4 matrices, 8 3x3 matrices, 32 2x2 matrices. Or you can store up to 32 quad vectors, 40 triple vectors, 64 paired vectors, or 128 single values.”

http://hitmen.c02.at/files/yapspd/psp_doc/chap4.html#sec4.5