Large Language Models Benchmarks

DeepSeek-V4, Chinese AI model adapted for Huawei chips

DeepSeek previews new AI model that ‘closes the gap’ with frontier models

Chinese AI lab DeepSeek has launched two preview versions of its newest large language model, DeepSeek V4, a much-awaited update to last year’s V3.2 model and the accompanying R1 reasoning model that ...

· 13h

DeepSeek previews new AI model adapted to run on Huawei chips

· 2h

Factbox-DeepSeek-V4, the Chinese AI model adapted for Huawei chips

· 4h

DeepSeek promises its new AI model has 'world-class' reasoning

DeepSeek has released its latest AI models, the V4 Pro and Flash versions, a bit over a year after it went viral and became the top rated free app on Apple's App Store in the US.

· 10h

China’s DeepSeek Launches Long-Awaited AI Model

· 8h

China's AI darling DeepSeek previews new model adapted for Huawei chip technology

· 10h

China's DeepSeek launches preview of new AI model

The DeepSeek-V4 is available in a pro version and a cheaper flash version.

· 53m

La startup china DeepSeek lanza una esperada actualización de su modelo de IA

· 7h

China's Long-Awaited DeepSeek V4 AI Model Is Now Available For Preview

Hosted on MSN

Local LLM benchmarks offer guidance for C++ AI use

A recent evaluation of three local large language models (LLMs) provides practical insights for developers integrating AI into C++ workflows. The comparison of Gemma 4 E4B, gpt-oss 20B, and Qwen 3.5 9B across image analysis, structured explanations, and ...

Live Science

AI benchmarking platform is helping top companies rig their model performances, study claims

LMArena, a popular benchmark for large language models, has been accused of giving preferential treatment to AIs made by big tech firms, potentially enabling them to game their results. When you purchase through links on our site, we may earn an affiliate ...

Renal & Urology News

Large Language Models Perform Poorly for Differential Diagnosis

Differential diagnosis was less accurate than diagnostic testing, but final diagnosis and management were more accurate.

Hosted on MSN

Three local AI models tested for real-world performance

A recent hands-on comparison put three local large language models—Gemma 4 E4B, gpt-oss 20B, and Qwen 3.5 9B—through identical real-world tasks to assess practical usability. The tests, run on an RTX 3070, focused on image analysis, structured ...

STAT

OpenAI leaps into health care with AI benchmark to evaluate models

OpenAI on Monday released a large dataset for evaluating how well large language models answer questions related to health care. Experts lauded the open-source data and detailed evaluation rubrics, calling them “unprecedented” in scale and breadth.

19h

OpenAI's GPT-5.5 is here, and it's no potato: narrowly beats Anthropic's Claude Mythos Preview on Terminal-Bench 2.0

So when it comes to models that the general public can access, GPT-5.5 has retaken the crown for OpenAI, achieving the state-of-the-art across 14 benchmarks.

China’s DeepSeek unveils latest V4 model with 1M context and flagship reasoning

Chinese artificial intelligence (AI) company DeepSeek on Friday officially released its next-generation large language model, the DeepSeek-V4 Preview, which highlights a massive 1-million-token context window and formidable performance,