Computer scientists have discovered a new way to multiply large matrices faster by eliminating a previously unknown inefficiency, leading to the largest improvement in matrix multiplication efficiency ...
If you start naively without any library that avoids the problem then memory access is the problem. Have a look at how much effort is needed to avoid the problem, for example with blocking algorithms.