As previously mentioned, performance is a major concern when the cache is switched into no-fill mode ("frozen"). The broad range of CPU architectures (multi-CPU, multi-core, multi-threads) and respective cache configurations makes the matter even more complex.
First off: only a single CPU's cache needs to be "frozen" in order to effectively protect the encryption key; other CPUs are allowed to operate in normal cache mode. This is true for as long as each (logical/virtual) CPU uses its own cache exclusively: CPUs that employ threading technology (like Intel's HyperThreading) appear as two (or potentially more) logical CPUs, but these two CPUs share the same cache and because of this must both change into no-fill cache mode. The situation may be different for multi-core CPUs, if the cores all have their own (L1 and L2) caches.
The encryption key resides only in a single CPU's cache. Only this CPU must therefore execute the encryption and decryption routines. The most prevalent architecture among Full-Disk-Encryption solutions is to employ a kernel module, which spawns a designated kernel thread for the encryption and decryption logic. Kernel threads are schedulable entities and are therefore bindable onto the CPU, which holds the encryption key in its cache.
Back to more "traditional" performance aspects. What can be done to minimize the impact of freezing the CPU cache? Loading the most frequently used memory areas into the cache (before freezing it) is a great start. Among the highest potential candidates are: the system call entry point, the timer interrupt routine and its "helper" functions and the encryption/decryption functions executed by the kernel thread. Current L2 caches are usually large enough to hold all this code, but one also needs to consider the cache's associativity in order to not shoot one into the foot. Another good idea is to schedule all other processes onto any of the other available CPUs (which don't use the frozen cache): this allows for them to be executed at "normal" speed. There's another why this is important, but we'll get to this some other time.
It should be obvious by now, that an implementation will have to identify the specific CPU/cache components at runtime and "manage" them accordingly. My proof-of-concept implementation for Linux will be purely for single-CPU systems (for simplicity), but I'll explain the technical details in a future post (like so many other things).
A blog about the development of a general-purpose solution for mitigating cold-boot attacks on Full-Disk-Encryption solutions.