# INFERENCE

- [Why is inference important?](/inference/why-is-inference-important.md): Speed and cost counts
- [Grouped Query Attention](/inference/why-is-inference-important/grouped-query-attention.md)
- [Key Value Cache](/inference/why-is-inference-important/key-value-cache.md): Managing model memory usage
- [Flash Attention](/inference/why-is-inference-important/flash-attention.md)
- [Flash Attention 2](/inference/why-is-inference-important/flash-attention-2.md): The seminal July 2023 paper
- [StreamingLLM](/inference/why-is-inference-important/streamingllm.md)
- [Paged Attention and vLLM](/inference/why-is-inference-important/paged-attention-and-vllm.md)
- [TensorRT-LLM](/inference/why-is-inference-important/tensorrt-llm.md)
- [Torchscript](/inference/why-is-inference-important/torchscript.md)
- [NVIDIA L40S GPU](/inference/why-is-inference-important/nvidia-l40s-gpu.md): Low cost inference
- [Triton Inference Server - Introduction](/inference/why-is-inference-important/triton-inference-server-introduction.md)
- [Triton Inference Server](/inference/why-is-inference-important/triton-inference-server.md)
- [FiDO: Fusion-in-Decoder optimised for stronger performance and faster inference](/inference/why-is-inference-important/fido-fusion-in-decoder-optimised-for-stronger-performance-and-faster-inference.md): Google Research, December 2022
- [Is PUE a useful measure of data centre performance?](/inference/why-is-inference-important/is-pue-a-useful-measure-of-data-centre-performance.md)
- [SLORA](/inference/why-is-inference-important/slora.md)
