神经网络的一些benchmark示例
常见的测试类型包括矩阵乘法、向量加法、卷积操作等,通过这些基准可以有效评估硬件资源的利用率、带宽、延迟等关键指标,帮助开发者优化 CUDA 程序的性能。该基准测试框架的目标是推动 GNN 研究的进展,使研究人员能够更有效地开发和优化 GNN 模型,提升其在真实世界应用中的性能。它提供了一个更具挑战性的搜索空间,并基于更广泛的架构评估结果,避免了对真实硬件的高昂训练成本。它支持不同任务类型,包括节点
1.MLPerf
https://github.com/mlcommons/inference?tab=readme-ov-file
https://docs.mlcommons.org/inference/benchmarks/text_to_image/sdxl/
MLPerf 是一个业界标准的机器学习基准测试套件,旨在评估各种硬件、框架和模型的性能。它包含训练和推理两个部分,涵盖多种机器学习任务,例如图像分类、对象检测、自然语言处理、推荐系统等。MLPerf 主要分为几个子项目,包括 Training(训练)、Inference(推理)和 Tiny(针对低功耗设备的测试)。它支持多种平台,如 CPU、GPU 和专用的 AI 加速器,广泛用于学术界、企业和开源社区评估 AI 系统的性能
下文以https://docs.mlcommons.org/inference/benchmarks/text_to_image/sdxl/#__tabbed_1_1 为例
1. 安装cm
python >=3.8(本例使用的是3.9)
git 版本不能太低,否则有些命令执行不了
pip install cmind
pip install --no-use-pep517 cm4mlops
#需要绕过pep517 ,否则报一下错误
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for cm4mlops
Running setup.py clean for cm4mlops
Failed to build cm4mlops
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (cm4mlops)
2. inference
MLCommons-Python -> pytoch -> cuda -> native -> Performance Estimation for Offline Scenario
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=sdxl \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
如果git clone https://github.com/mlcommons/inference.git
不了,手动下载之后在上述命令假如下述路径
--inference_src=${USER_PATH}$/cuda/inference
部分数据下载较久
Collecting torch
Downloading torch-2.4.1-cp39-cp39-manylinux1_x86_64.whl (797.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 797.1/797.1 MB 4.1 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.20.5
Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 3.8 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 2.0 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1
Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 3.6 MB/s eta 0:00:00
Requirement already satisfied: sympy in /home/u200810220/CM/repos/local/cache/13d32961b74a4500/mlperf/lib/python3.9/site-packages (from torch) (1.13.3)
Collecting filelock
Downloading filelock-3.16.1-py3-none-any.whl (16 kB)
Collecting nvidia-cufft-cu12==11.0.2.54
Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 7.2 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106
Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 3.6 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu12==12.1.105
Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 4.1 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107
Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 8.9 MB/s eta 0:00:00
Collecting nvidia-nvtx-cu12==12.1.105
Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 9.4 MB/s eta 0:00:00
Collecting triton==3.0.0
Downloading triton-3.0.0-1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 4.8 MB/s eta 0:00:00
Collecting fsspec
Downloading fsspec-2024.9.0-py3-none-any.whl (179 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 179.3/179.3 kB 6.7 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106
Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 4.8 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105
Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 5.5 MB/s eta 0:00:00
Collecting networkx
Downloading networkx-3.2.1-py3-none-any.whl (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 5.4 MB/s eta 0:00:00
Collecting jinja2
Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 4.9 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105
Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 6.3 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions>=4.8.0 in /home/u200810220/CM/repos/local/cache/13d32961b74a4500/mlperf/lib/python3.9/site-packages (from torch) (4.12.2)
Collecting nvidia-nvjitlink-cu12
Downloading nvidia_nvjitlink_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl (19.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.7/19.7 MB 5.2 MB/s eta 0:00:00
Downloading: rclone sync 'mlc-inference:mlcommons-inference-wg-public/stable_diffusion_fp32' '/home/u200810220/CM/repos/local/cache/e00a5f70d26c4213/stable_diffusion_fp32' -P --error-on-no-transfer
Transferred: 12.824 GiB / 12.926 GiB, 99%, 28.401 KiB/s, ETA 1h2m34s
Transferred: 18 / 19, 95%
Elapsed time: 23m42.3s
Transferring:
* checkpoint_pipe/unet/d…orch_model.safetensors: 98% /9.565Gi, 28.421Ki/s, 1h2m31s^Z
[5]+ Stopped cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=sdxl --implementation=reference --framework=pytorch --category=edge --scenario=Offli
Transferred: 12.841 GiB / 12.926 GiB, 99%, 27.910 KiB/s, ETA 53m11s
Transferred: 18 / 19, 95%
Elapsed time: 34m2.3s
Transferring:
* checkpoint_pipe/unet/d…orch_model.safetensors: 99% /9.565Gi, 28.406Ki/s, 52m16s^Z
[5]+ Stopped cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=sdxl --implementation=reference --framework=pytorch --category=edge --scenario=Offli
Transferred: 12.858 GiB / 12.926 GiB, 99%, 27.617 KiB/s, ETA 42m33s
Transferred: 18 / 19, 95%
Elapsed time: 44m44.3s
Transferring:
* checkpoint_pipe/unet/d…orch_model.safetensors: 99% /9.565Gi, 27.648Ki/s, 42m30s
2.CUDA_benchmark
https://github.com/hibagus/CUDA_Bench
CUDA Benchmark 是一种用于评估在 NVIDIA GPU 上运行的程序性能的工具。它提供了一系列基准测试,用来测量不同算法、库或工作负载在 CUDA 平台上的性能表现。常见的测试类型包括矩阵乘法、向量加法、卷积操作等,通过这些基准可以有效评估硬件资源的利用率、带宽、延迟等关键指标,帮助开发者优化 CUDA 程序的性能。CUDA Benchmark 通常用于比较不同 GPU 或优化代码执行效率。
CMake更新办法:
需要cmake3.20.1以上版本
sudo apt remove cmake
wget https://github.com/Kitware/CMake/releases/download/v3.20.1/cmake-3.20.1-linux-x86_64.sh
sudo bash cmake-3.20.1-linux-x86_64.sh --skip-license --prefix=/usr/local
export PATH=/usr/local/bin:$PATH
make 不通过 手动下载https://github.com/rapidsai/rapids-cmake 适配不成功
(base) @n1:~/cuda/CUDA_Bench/build$ make
[ 11%] Built target cutlass
[ 13%] Performing configure step for 'nvbench'
make[3]: Entering directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
make[4]: Entering directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
make[5]: Entering directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
make[5]: Leaving directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
make[5]: Entering directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
[ 11%] Performing update step for 'rapids-cmake-populate'
fatal: unable to access 'https://github.com/rapidsai/rapids-cmake.git/': Failed to connect to github.com port 443: Connection timed out
CMake Error at /home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild/rapids-cmake-populate-prefix/tmp/rapids-cmake-populate-gitupdate.cmake:97 (execute_process):
execute_process failed command indexes:
1: "Child return code: 128"
CMakeFiles/rapids-cmake-populate.dir/build.make:135: recipe for target 'rapids-cmake-populate-prefix/src/rapids-cmake-populate-stamp/rapids-cmake-populate-update' failed
make[5]: *** [rapids-cmake-populate-prefix/src/rapids-cmake-populate-stamp/rapids-cmake-populate-update] Error 1
make[5]: Leaving directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
CMakeFiles/Makefile2:82: recipe for target 'CMakeFiles/rapids-cmake-populate.dir/all' failed
make[4]: *** [CMakeFiles/rapids-cmake-populate.dir/all] Error 2
make[4]: Leaving directory '/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
Makefile:90: recipe for target 'all' failed
make[3]: *** [all] Error 2
make[3]: Leaving directory '/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/_deps/rapids-cmake-subbuild'
CMake Error at /usr/local/share/cmake-3.20/Modules/FetchContent.cmake:1012 (message):
Build step for rapids-cmake failed: 2
Call Stack (most recent call first):
/usr/local/share/cmake-3.20/Modules/FetchContent.cmake:1141:EVAL:2 (__FetchContent_directPopulate)
/usr/local/share/cmake-3.20/Modules/FetchContent.cmake:1141 (cmake_language)
/usr/local/share/cmake-3.20/Modules/FetchContent.cmake:1184 (FetchContent_Populate)
/home/u200810220/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/NVBENCH_RAPIDS.cmake:35 (FetchContent_MakeAvailable)
cmake/NVBenchRapidsCMake.cmake:9 (include)
CMakeLists.txt:16 (nvbench_load_rapids_cmake)
-- Configuring incomplete, errors occurred!
See also "/cuda/CUDA_Bench/build/nvbench/build/src/nvbench-build/CMakeFiles/CMakeOutput.log".
CMakeFiles/nvbench.dir/build.make:91: recipe for target 'nvbench/build/src/nvbench-stamp/nvbench-configure' failed
make[2]: *** [nvbench/build/src/nvbench-stamp/nvbench-configure] Error 1
CMakeFiles/Makefile2:252: recipe for target 'CMakeFiles/nvbench.dir/all' failed
make[1]: *** [CMakeFiles/nvbench.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2
[ 11%] Built target cutlass
[ 13%] Performing configure step for 'nvbench'
3.NAS-Bench-Graph
https://github.com/THUMNLab/NAS-Bench-Graph
https://github.com/THUMNLab/AutoGL/tree/agnn
NAS-Bench-Graph 是一个用于神经架构搜索(Neural Architecture Search, NAS)的基准测试工具,专注于图神经网络(Graph Neural Networks, GNNs)。它提供了一个预定义的搜索空间,涵盖了多种图神经网络架构,并包含了这些架构在多个数据集上的训练与评估结果。通过这个基准,研究人员可以更轻松地进行神经架构搜索的实验,并快速比较不同方法的性能。这一工具大大加速了 GNN 的架构搜索和优化研究。
版本不适配
graph_neural_network
https://github.com/mlcommons/training/tree/master/graph_neural_network
cd training/gnn_node_classification/
docker build -f Dockerfile -t training_gnn:latest .
高性能平台docker 命令被拒绝
4.ogb
https://github.com/snap-stanford/ogb
OGB(Open Graph Benchmark)是一个专门用于图神经网络(GNN)研究的大规模基准测试集合,涵盖各种真实世界的图数据集,如社交网络、知识图谱和分子图。OGB 提供标准化的数据集和性能评估指标,使研究人员能够更有效地比较不同的图神经网络模型。它支持不同任务类型,包括节点分类、边预测和图分类,适用于大规模的图结构数据,推动图神经网络研究和应用的发展。
pip install ogb
依赖版本不适配
5.nasbench301
NAS-Bench-301 是一个神经架构搜索(NAS)基准,专门用于提升 NAS 方法的研究效率。它提供了一个更具挑战性的搜索空间,并基于更广泛的架构评估结果,避免了对真实硬件的高昂训练成本。NAS-Bench-301 提供了一个高效的代理模型,能够快速预测神经网络架构的性能,而无需重新训练每个架构。这个基准支持多种搜索策略,并帮助研究人员更轻松地进行实验与评估。
https://github.com/automl/nasbench301
https://www.cnblogs.com/pprp/p/15491922.html
6.benchmarking-gnns
https://github.com/graphdeeplearning/benchmarking-gnns
https://www.cvmart.net/community/detail/1578
Benchmarking-GNNs 是一个专注于图神经网络(Graph Neural Networks, GNNs)的基准测试框架,用于系统地比较各种 GNN 模型在不同图形任务上的性能。它为 GNN 研究提供了标准化的数据集和实验环境,支持的任务包括节点分类、边预测和图分类。该基准测试框架的目标是推动 GNN 研究的进展,使研究人员能够更有效地开发和优化 GNN 模型,提升其在真实世界应用中的性能。
# Setup CUDA 10.2 on Ubuntu 18.04
sudo apt-get --purge remove "*cublas*" "cuda*"
sudo apt --purge remove "nvidia*"
sudo apt autoremove
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.2.89-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.2.89-1_amd64.deb
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt update
sudo apt install -y cuda-10-2
sudo reboot
cat /usr/local/cuda/version.txt # Check CUDA version is 10.2
# Clone GitHub repo
conda install git
git clone https://github.com/graphdeeplearning/benchmarking-gnns.git
cd benchmarking-gnns
# Install python environment
conda env create -f environment_gpu.yml
# Activate environment
conda activate benchmark_gnn
cuda 版本不适配
7.gnn-benchmark
https://github.com/shchur/gnn-benchmark
(base) :~/cuda$ sudo apt-get install -y mongodb-org=3.6.4 mongodb-org-server=3.6.4 mongodb-org-shell=3.6.4 mongodb-org-mongos=3.6.4 mongodb-org-tools=3.6.4
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package mongodb-org
E: Unable to locate package mongodb-org-server
E: Unable to locate package mongodb-org-shell
E: Unable to locate package mongodb-org-mongos
E: Version '3.6.4' for 'mongodb-org-tools' was not found
原版本要求的MongoDB在ubuntu18.04中找不到
更多推荐
所有评论(0)