Inference Technique - Search News

Multi-token prediction technique triples LLM inference speed without auxiliary draft models

With reported 3x speed gains and limited degradation in output quality, the method targets one of the biggest pain points in production AI systems: latency at scale.

Researchers baked 3x inference speedups directly into LLM weights — without speculative decoding

Researchers from the University of Maryland, Lawrence Livermore, Columbia and TogetherAI have developed a training technique that triples LLM inference speed without auxiliary models or infrastructure ...

13d

Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

Nvidia researchers developed dynamic memory sparsification (DMS), a technique that compresses the KV cache in large language models by up to 8x while maintaining reasoning accuracy — and it can be ...

InfoQ

Gemma 3n Introduces Novel Techniques for Enhanced Mobile AI Inference

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Cory Benfield discusses the evolution of ...

EurekAlert!

KAIST develops new AI inference-scaling method for planning

Diffusion models are widely used in many AI applications, but research on efficient inference-time scalability*, particularly for reasoning and planning (known as System 2 abilities) has been lacking.

How AI Inference Costs Are Reshaping The Cloud Economy

The shift from training-focused to inference-focused economics is fundamentally restructuring cloud computing and forcing ...

TechRepublic

DeepSeek-GRM: Introducing an Enhanced AI Reasoning Technique

Researchers from DeepSeek and Tsinghua University say combining two techniques improves the answers the large language model creates with computer reasoning techniques. Image: Envato/DC_Studio ...

Semiconductor Engineering

Review of Tools & Techniques for DL Edge Inference

A new technical paper titled “Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review” was published in “Proceedings of the IEEE” by researchers at University ...

Devdiscourse

AI is a double-edged sword for digital privacy

In machine learning, privacy risks often emerge from inference-based attacks. Model inversion techniques can reconstruct ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results