The SoC of the solo4k has the same limitation. Not because it's budget, but because it was the first 4K SoC from Broadcom.
Playing audio and video is not something the CPU has anything to do with. I am not familiar with the details (because they're heavily NDA), but the overall concept is that the A/V section of the section has a certain maximum bandwidth and a certain amount of channels (that can be used to transport like anything and can be bonded as well, if I understood correctly). A video decoder needs a certain amount of bandwidth to operate. Apparently it also needs more bandwidth when the size is bigger and/or the location is more the right/bottom hand side. So the PiP decoder cannot be enlarged or located random at will, certain combination would require too much bandwidth on the SoC.
Back to your first comment: the SoC has just one audio decoder and it's linked to the first video decoder. You cannot have audio with the second video decoder, simple as that.
You may be confused by the quadpip function, that seems to have four audio decoders, but it only seems like that, all audio device nodes connect to just one audio decoder, in the end, so you can just listen to one service at a time. The difference is that this audio decoder can be connected to any of the four video decoders, which non-quadpip decoders can't do (nor can the quadpip decoders when not in quadpip mode).
The SoC in the ET9x00, indeed, was the only SoC that could decode two audio streams in parallel, which I already mentioned.
So, conclusion. If you want to have audio of the second decoder, you can't. If you try to work around it trying to enlarge the PiP window to full screen, only a handful of receivers can do it. But it won't buy you much, as the second video decoder (PiP) is free-running, it's not synchronised to any audio, so audio and video will always be several seconds apart.