AI Stats by Phaseo brings together model, provider, and gateway data for teams building with AI APIs.

Explore

Models
Chat
Compare
Providers
Apps
Rankings
Monitor

Build

Documentation
API Reference
Quickstart
SDKs
Methodology

Company

Announcements
Pricing
Works With
Support
Privacy
Terms

Community

Discord
GitHub
Reddit
LinkedIn
X

© 2025 • AI Stats

Spotted a data issue or broken page?Open an issueorcontact support

Models Chat Compare Providers Apps Rankings

Instruct HumanEval - Benchmark Leaderboard & Model Performance | AI Stats

Instruct HumanEval

Type: percentageGeneral

Recorded Results

1

Average Score

73.84%

Score Range

73.84% - 73.84%

Leading Model

73.84% - Llama 3.1 Nemotron 70B Instruct

Scores Over Time

Individual benchmark scores plotted by date.

Models Using This Benchmark

Organisation	Model	Reported	Top Score	Info	Self Reported	Source
Nvidia	Llama 3.1 Nemotron 70B Instruct	01 Oct 2024	73.84%	-	Yes	Source