Regarding FIVR and die TDP:
In the case of Bradwell to Skylake, the FIVR did contribute to Skylake increasing power on the CPU but not too much. From what I can gather it didn't really do too much. Intel will implement things like this because, at the time, they believed it would have a larger impact to their CPUs. After some time and testing they figure out it doesn't do much.
Skylake TDP:
Even though we have gone down in feature size (22nm to 14nm), Skylake saw an increase in power consumption. It is true that smaller FETs allow for lower power, but they also allow for higher current to pass through. With every CPU, current consumption will either rise or stay somewhere close to the last generation. Its pretty easy to understand why with looking at one part of the cpu: transistor count. Each area of the CPU grows in transistor count with a smaller feature size. Now that areas are more compact, a higher density of current needs to be delivered. In order to do this, voltage needs to rise. Remember, voltage maintains current, and when it goes to low, current can bounce voltage around causing instability.
So how come we saw a drop in voltage with Haswell/Broadwell? If you can recall, Intel went to FinFET at this time. This type of semiconductor process allowed for lower voltage to drive higher current, because it had more surface area. Now we are returning to the point we were at with typical CMOS FETs. That is, voltage needs to rise just a bit to keep current happy. Higher voltage doesn't always mean TDP rises. Current consumption can lower in some places when voltage rises.
To put this in physics terms:
V=IR (R = resistance from surface area of FET). Decreasing feature size decreases R. In order to maintain TDP while keeping current the same as last generation, V must rise.