I’d go with a magic number 3, maybe also 4. Even with 10 parameter it still have only 3-4 layers. More parameters will just spread the clunkiness horizontally rather than vertically. But well, that definitely can be done.
I’d go with a magic number 3, maybe also 4. Even with 10 parameter it still have only 3-4 layers. More parameters will just spread the clunkiness horizontally rather than vertically. But well, that definitely can be done.