神经网络的一些benchmark示例

常见的测试类型包括矩阵乘法、向量加法、卷积操作等，通过这些基准可以有效评估硬件资源的利用率、带宽、延迟等关键指标，帮助开发者优化 CUDA 程序的性能。该基准测试框架的目标是推动 GNN 研究的进展，使研究人员能够更有效地开发和优化 GNN 模型，提升其在真实世界应用中的性能。它提供了一个更具挑战性的搜索空间，并基于更广泛的架构评估结果，避免了对真实硬件的高昂训练成本。它支持不同任务类型，包括节点

koolive

1611人浏览 · 2024-10-09 21:23:27

koolive · 2024-10-09 21:23:27 发布

1.MLPerf

https://github.com/mlcommons/inference?tab=readme-ov-file
https://docs.mlcommons.org/inference/benchmarks/text_to_image/sdxl/

MLPerf 是一个业界标准的机器学习基准测试套件，旨在评估各种硬件、框架和模型的性能。它包含训练和推理两个部分，涵盖多种机器学习任务，例如图像分类、对象检测、自然语言处理、推荐系统等。MLPerf 主要分为几个子项目，包括 Training（训练）、Inference（推理）和 Tiny（针对低功耗设备的测试）。它支持多种平台，如 CPU、GPU 和专用的 AI 加速器，广泛用于学术界、企业和开源社区评估 AI 系统的性能
下文以https://docs.mlcommons.org/inference/benchmarks/text_to_image/sdxl/#__tabbed_1_1 为例

1. 安装cm

python >=3.8(本例使用的是3.9)
git 版本不能太低，否则有些命令执行不了

pip install cmind
pip install --no-use-pep517 cm4mlops

#需要绕过pep517 ，否则报一下错误

  note: This error originates from a subprocess, and is likely not a problem with pip.
 ERROR: Failed building wheel for cm4mlops
 Running setup.py clean for cm4mlops
Failed to build cm4mlops
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (cm4mlops)

2. inference

MLCommons-Python -> pytoch -> cuda -> native -> Performance Estimation for Offline Scenario

cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
   --model=sdxl \
   --implementation=reference \
   --framework=pytorch \
   --category=edge \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --quiet \
   --test_query_count=50

如果git clone https://github.com/mlcommons/inference.git不了，手动下载之后在上述命令假如下述路径

 --inference_src=${USER_PATH}$/cuda/inference

部分数据下载较久

Collecting torch
  Downloading torch-2.4.1-cp39-cp39-manylinux1_x86_64.whl (797.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 797.1/797.1 MB 4.1 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 3.8 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 2.0 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 3.6 MB/s eta 0:00:00
Requirement already satisfied: sympy in /home/u200810220/CM/repos/local/cache/13d32961b74a4500/mlperf/lib/python3.9/site-packages (from torch) (1.13.3)
Collecting filelock
  Downloading filelock-3.16.1-py3-none-any.whl (16 kB)
Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 7.2 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 3.6 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 4.1 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 8.9 MB/s eta 0:00:00
Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 9.4 MB/s eta 0:00:00
Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 4.8 MB/s eta 0:00:00
Collecting fsspec
  Downloading fsspec-2024.9.0-py3-none-any.whl (179 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 179.3/179.3 kB 6.7 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 4.8 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 5.5 MB/s eta 0:00:00
Collecting networkx
  Downloading networkx-3.2.1-py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 5.4 MB/s eta 0:00:00
Collecting jinja2
  Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 4.9 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 6.3 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions>=4.8.0 in /home/u200810220/CM/repos/local/cache/13d32961b74a4500/mlperf/lib/python3.9/site-packages (from torch) (4.12.2)
Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl (19.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.7/19.7 MB 5.2 MB/s eta 0:00:00

Downloading: rclone sync 'mlc-inference:mlcommons-inference-wg-public/stable_diffusion_fp32' '/home/u200810220/CM/repos/local/cache/e00a5f70d26c4213/stable_diffusion_fp32' -P --error-on-no-transfer
Transferred:       12.824 GiB / 12.926 GiB, 99%, 28.401 KiB/s, ETA 1h2m34s
Transferred:           18 / 19, 95%
Elapsed time:     23m42.3s
Transferring:
 * checkpoint_pipe/unet/d…orch_model.safetensors: 98% /9.565Gi, 28.421Ki/s, 1h2m31s^Z
[5]+  Stopped                 cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=sdxl --implementation=reference --framework=pytorch --category=edge --scenario=Offli
Transferred:       12.841 GiB / 12.926 GiB, 99%, 27.910 KiB/s, ETA 53m11s
Transferred:           18 / 19, 95%
Elapsed time:      34m2.3s
Transferring:
 * checkpoint_pipe/unet/d…orch_model.safetensors: 99% /9.565Gi, 28.406Ki/s, 52m16s^Z
[5]+  Stopped                 cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=sdxl --implementation=reference --framework=pytorch --category=edge --scenario=Offli
Transferred:       12.858 GiB / 12.926 GiB, 99%, 27.617 KiB/s, ETA 42m33s
Transferred:           18 / 19, 95%
Elapsed time:     44m44.3s
Transferring:
 * checkpoint_pipe/unet/d…orch_model.safetensors: 99% /9.565Gi, 27.648Ki/s, 42m30s

2.CUDA_benchmark

https://github.com/hibagus/CUDA_Bench
CUDA Benchmark 是一种用于评估在 NVIDIA GPU 上运行的程序性能的工具。它提供了一系列基准测试，用来测量不同算法、库或工作负载在 CUDA 平台上的性能表现。常见的测试类型包括矩阵乘法、向量加法、卷积操作等，通过这些基准可以有效评估硬件资源的利用率、带宽、延迟等关键指标，帮助开发者优化 CUDA 程序的性能。CUDA Benchmark 通常用于比较不同 GPU 或优化代码执行效率。

CMake更新办法：

需要cmake3.20.1以上版本

sudo apt remove cmake
wget https://github.com/Kitware/CMake/releases/download/v3.20.1/cmake-3.20.1-linux-x86_64.sh
sudo bash cmake-3.20.1-linux-x86_64.sh --skip-license --prefix=/usr/local
export PATH=/usr/local/bin:$PATH

make 不通过手动下载https://github.com/rapidsai/rapids-cmake 适配不成功

(base) @n1:~/cuda/CUDA_Bench/build$ make
[ 11%] Built target cutlass
[ 13%] Performing configure step for 'nvbench'
make[3]: Entering directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
make[4]: Entering directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
make[5]: Entering directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
make[5]: Leaving directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
make[5]: Entering directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
[ 11%] Performing update step for 'rapids-cmake-populate'
fatal: unable to access 'https://github.com/rapidsai/rapids-cmake.git/': Failed to connect to github.com port 443: Connection timed out
CMake Error at /home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild/rapids-cmake-populate-prefix/tmp/rapids-cmake-populate-gitupdate.cmake:97 (execute_process):
  execute_process failed command indexes:

    1: "Child return code: 128"



CMakeFiles/rapids-cmake-populate.dir/build.make:135: recipe for target 'rapids-cmake-populate-prefix/src/rapids-cmake-populate-stamp/rapids-cmake-populate-update' failed
make[5]: *** [rapids-cmake-populate-prefix/src/rapids-cmake-populate-stamp/rapids-cmake-populate-update] Error 1
make[5]: Leaving directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
CMakeFiles/Makefile2:82: recipe for target 'CMakeFiles/rapids-cmake-populate.dir/all' failed
make[4]: *** [CMakeFiles/rapids-cmake-populate.dir/all] Error 2
make[4]: Leaving directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
Makefile:90: recipe for target 'all' failed
make[3]: *** [all] Error 2
make[3]: Leaving directory '/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'

CMake Error at /usr/local/share/cmake-3.20/Modules/FetchContent.cmake:1012 (message):
  Build step for rapids-cmake failed: 2
Call Stack (most recent call first):
  /usr/local/share/cmake-3.20/Modules/FetchContent.cmake:1141:EVAL:2 (__FetchContent_directPopulate)
  /usr/local/share/cmake-3.20/Modules/FetchContent.cmake:1141 (cmake_language)
  /usr/local/share/cmake-3.20/Modules/FetchContent.cmake:1184 (FetchContent_Populate)
  /home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/NVBENCH_RAPIDS.cmake:35 (FetchContent_MakeAvailable)
  cmake/NVBenchRapidsCMake.cmake:9 (include)
  CMakeLists.txt:16 (nvbench_load_rapids_cmake)


-- Configuring incomplete, errors occurred!
See also "/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/CMakeFiles/CMakeOutput.log".
CMakeFiles/nvbench.dir/build.make:91: recipe for target 'nvbench/build/src/nvbench-stamp/nvbench-configure' failed
make[2]: *** [nvbench/build/src/nvbench-stamp/nvbench-configure] Error 1
CMakeFiles/Makefile2:252: recipe for target 'CMakeFiles/nvbench.dir/all' failed
make[1]: *** [CMakeFiles/nvbench.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

[ 11%] Built target cutlass
[ 13%] Performing configure step for 'nvbench'

3.NAS-Bench-Graph

https://github.com/THUMNLab/NAS-Bench-Graph
https://github.com/THUMNLab/AutoGL/tree/agnn
NAS-Bench-Graph 是一个用于神经架构搜索（Neural Architecture Search, NAS）的基准测试工具，专注于图神经网络（Graph Neural Networks, GNNs）。它提供了一个预定义的搜索空间，涵盖了多种图神经网络架构，并包含了这些架构在多个数据集上的训练与评估结果。通过这个基准，研究人员可以更轻松地进行神经架构搜索的实验，并快速比较不同方法的性能。这一工具大大加速了 GNN 的架构搜索和优化研究。
版本不适配

graph_neural_network

https://github.com/mlcommons/training/tree/master/graph_neural_network

cd training/gnn_node_classification/
docker build -f Dockerfile -t training_gnn:latest .

高性能平台docker 命令被拒绝

4.ogb

https://github.com/snap-stanford/ogb
OGB（Open Graph Benchmark）是一个专门用于图神经网络（GNN）研究的大规模基准测试集合，涵盖各种真实世界的图数据集，如社交网络、知识图谱和分子图。OGB 提供标准化的数据集和性能评估指标，使研究人员能够更有效地比较不同的图神经网络模型。它支持不同任务类型，包括节点分类、边预测和图分类，适用于大规模的图结构数据，推动图神经网络研究和应用的发展。

pip install ogb

依赖版本不适配

5.nasbench301

NAS-Bench-301 是一个神经架构搜索（NAS）基准，专门用于提升 NAS 方法的研究效率。它提供了一个更具挑战性的搜索空间，并基于更广泛的架构评估结果，避免了对真实硬件的高昂训练成本。NAS-Bench-301 提供了一个高效的代理模型，能够快速预测神经网络架构的性能，而无需重新训练每个架构。这个基准支持多种搜索策略，并帮助研究人员更轻松地进行实验与评估。
https://github.com/automl/nasbench301
https://www.cnblogs.com/pprp/p/15491922.html

6.benchmarking-gnns

https://github.com/graphdeeplearning/benchmarking-gnns
https://www.cvmart.net/community/detail/1578
Benchmarking-GNNs 是一个专注于图神经网络（Graph Neural Networks, GNNs）的基准测试框架，用于系统地比较各种 GNN 模型在不同图形任务上的性能。它为 GNN 研究提供了标准化的数据集和实验环境，支持的任务包括节点分类、边预测和图分类。该基准测试框架的目标是推动 GNN 研究的进展，使研究人员能够更有效地开发和优化 GNN 模型，提升其在真实世界应用中的性能。

# Setup CUDA 10.2 on Ubuntu 18.04
sudo apt-get --purge remove "*cublas*" "cuda*"
sudo apt --purge remove "nvidia*"
sudo apt autoremove
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.2.89-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.2.89-1_amd64.deb
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt update
sudo apt install -y cuda-10-2
sudo reboot
cat /usr/local/cuda/version.txt # Check CUDA version is 10.2

# Clone GitHub repo
conda install git
git clone https://github.com/graphdeeplearning/benchmarking-gnns.git
cd benchmarking-gnns

# Install python environment
conda env create -f environment_gpu.yml 

# Activate environment
conda activate benchmark_gnn

cuda 版本不适配

7.gnn-benchmark

https://github.com/shchur/gnn-benchmark

(base) :~/cuda$ sudo apt-get install -y mongodb-org=3.6.4 mongodb-org-server=3.6.4 mongodb-org-shell=3.6.4 mongodb-org-mongos=3.6.4 mongodb-org-tools=3.6.4
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package mongodb-org
E: Unable to locate package mongodb-org-server
E: Unable to locate package mongodb-org-shell
E: Unable to locate package mongodb-org-mongos
E: Version '3.6.4' for 'mongodb-org-tools' was not found