Google researchers have published a new quantization technique called TurboQuant that compresses the key-value (KV) cache in ...
Researchers at Tsinghua University and Z.ai built IndexCache to eliminate redundant computation in sparse attention models ...