Use CUDA and other GPU acceleration systems with properly optimised machine code and share the processing across ALL available processors, not just CPU or GPU but use both at same time.. You should be aiming for tens of millions of key tests per second on modern hardware!