Quantized models
Quantized models reduce the precision of weights and activations (e.g., from **32-bit float to 8-bit integer**) to optimize performance and lower memory usage on edge devices. This results in faster inference speeds and lower power consumption, making them ideal for mobile and embedded AI applications.