The concepts involing shared or stacked memory are also being made available in OpenGL, and as a member of the Khronos developer program I've had beta development packages for a few months. Even when it's released, don't expect to see it in new titles for a long while. It's not the games that require modification (well, yeah, it is them too), it's the game engines. Most game production companies don't make their own engines. Those that do share the engine code among several releases (and reuse engines with MINOR modifications for years). New rendering component of an existing engine can take anywhere from 6 months to 2 years to end up in a title.
The benefit isn't quite what you'd expect in multi-card setups. The technology is really aimed at designs like AMD's APU's on consoles first. The problem is that RAM can't be shared over a bus in existing card designs, not as addressable memory. I can't be sure of the most recent nVidia, but the most recent AMD cards are barely equipped to support it (firmware may help). Instead, data to be shared still has to be copied from one card to another, shipped over the bus. It's a serial concept, still steeped in the client/server architecture of general GPU programming. One can request assets (materials, models, etc) for use in another GPU, but it still has to be shipped as blocks of data over the bus. This isn't much different in timing compared to the CPU providing it outright. It's more logical to divide assets into groups which can be distributed to one card or another for processing, such that the vertex and fragment processing phases can operate in partial scene content, posting to a common output buffer (which still requires bus traffic in most cases). This is a "divide and conquor" approach where the two (or more) cards render a scene's contents divided among them, posting the final results to the hidden display image. It's not really stacking. It doesn't make two cards with 4G each look like an 8G card. It's still two 4G cards, only now the method of using each card is 'inverted'.
When multiple cards render a scene under the current paradigm, the output is banded or tiled. Each card processes the entire scene, which is duplicated in each card. Pixel by pixel, the GPU power is divided over the display area.
In the new paradigm, each card can have different scene content. There can be a common output buffer, and they can still tile or band, but now the scene content and materials data need not be fully duplicated. This does not always reduce duplication, however. Materials are often reused, sometimes as layers or components to shader code which renders them quite differently. Quick example, you can have a brick texture that's common, but provide different noise and 'dirt' components to make different brick walls appear very unique. Models representing instances are also reused, and in the case of the new paradigm, assets like these will require duplication on multiple cards.
Further, this causes a slight loss in overall storage compared to a flat memory model. If you have 6 Gbytes of assets, you have to figure out how to divide these among two cards. You may easily think 3 G in each, and roughly that may be the case, but inevitably some duplication will make that work out more like 3.5 G in each card, and there are also hard boundaries (you can't split textures or model content) such that the final "balance" will be more like 3.7 in one card and 3.4 in the other. In other words, it won't be quite as comfortable as thinking two 4G cards equate to a single 8G memory space. It's still two 4G memory spaces. If you had 8G of assets to squeeze in, it probably won't cleanly divide into two 4G databases. There will be "slack" of wasted potential. It will work out to 7G and change.
This would allow greater complexity of models and materials, but it's doubtful there will be a gain in performance. There could actually be a slight loss of performance if the tendency were to increase model complexity or material complexity.