AFAIK from my understandings from the microcontroller class I took, all ins and outs (memory access, instructions, data manipulation) only occur from cache. IE, the cores themselves shouldn't talk directly to each other. They can only access the same shared cache, but not the core exclusive cache, and manipulate that data.
For example, core 0 adds two numbers. The result is stored in cache L2. If core 2 needs that data, it must be in the shared cache, so it can access it or in RAM. This can be accomplished many different ways (one very inefficient example: core 0 saves the data to RAM, core 2 loads data from RAM, etc).