kubectl top与docker stats内存不一致

名称类型单位说明container_memory_rssgauge字节数 bytesRSS内存，即常驻内存集（Resident Set Size），是分配给进程使用实际物理内存，而不是磁盘上缓存的虚拟内存。RSS内存包括所有分配的栈内存和堆内存，以及加载到物理内存中的共享库占用的内存空间，但不包括进入交换分区的内存。container_memory_usage_bytesgauge字节数 byte

寒蝉_yc

3084人浏览 · 2020-09-10 14:16:43

寒蝉_yc · 2020-09-10 14:16:43 发布

名称	类型	单位	说明
container_memory_rss	gauge	字节数 bytes	RSS内存，即常驻内存集（Resident Set Size），是分配给进程使用实际物理内存，而不是磁盘上缓存的虚拟内存。RSS内存包括所有分配的栈内存和堆内存，以及加载到物理内存中的共享库占用的内存空间，但不包括进入交换分区的内存。
container_memory_usage_bytes	gauge	字节数 bytes	当前使用的内存量，包括所有使用的内存，不管有没有被访问。
container_memory_max_usage_bytes	gauge	字节数 bytes	最大内存使用量的记录。
container_memory_cache	gauge	字节数 bytes	高速缓存（cache）的使用量。cache是位于CPU与主内存间的一种容量较小但速度很高的存储器，是为了提高cpu和内存之间的数据交换速度而设计的 Size），是分配给进程使用实际物理内存，而不是磁盘上缓存的虚拟内存。RSS内存包括所有分配的栈内存和堆内存，以及加载到物理内存中的共享库占用的内存空间，但不包括进入交换分区的内存。
container_memory_swap	gauge	字节数 bytes	虚拟内存使用量。虚拟内存（swap）指的是用磁盘来模拟内存使用。当物理内存快要使用完或者达到一定比例，就可以把部分不用的内存数据交换到硬盘保存，需要使用时再调入物理内存
container_memory_working_set_bytes	gauge	字节数 bytes	当前内存工作集（working set）使用量。
container_memory_failcnt	counter	次	申请内存失败次数计数。
container_memory_failures_total	counter	次	累计的内存申请错误次数。

container_memory_working_set_bytes = container_memory_usage_bytes - total_inactive_anon - total_inactive_file
memory used =container_memory_usage_bytes - cache
cache = total_inactive_file + total_active_file

PS：kubelet比较container_memory_working_set_bytes和container_spec_memory_limit_bytes来决定oom container

total_inactive_anon、total_inactive_file为非活动内存，可以被交换到磁盘 cache 缓存存储器存储当前保存在内存中的磁盘数据，所以判断container_memory_working_set_bytes会比container_memory_usage_bytes更为准确

https://segmentfault.com/a/1190000021402244?utm_source=tag-newest
https://blog.csdn.net/palet/article/details/82889493
https://zhuanlan.zhihu.com/p/96597715
https://www.ibm.com/support/pages/kubectl-top-pods-and-docker-stats-show-different-memory-statistics

kubectl top 12.5G
在这里插入图片描述

docker stats 11.42G
在这里插入图片描述

memory_stats和memory.usage_in_bytes
在这里插入图片描述

使用kubectl top(container_memory_working_set_bytes) = memory.usage_in_bytes - inactive_file 可以得出数值12.5G
使用docker stats(memory used) = memory.usage_in_bytes - cache可以得到数值11.42G

https://segmentfault.com/a/1190000021402244?utm_source=tag-newest

https://blog.csdn.net/palet/article/details/82889493

https://zhuanlan.zhihu.com/p/96597715

https://www.ibm.com/support/pages/kubectl-top-pods-and-docker-stats-show-different-memory-statistics

https://docs.signalfx.com/en/latest/integrations/integrations-reference/integrations.kubernetes.html

PS：以下为一些官方指标，英语水平较差，很多未进行翻译，避免误翻译造成歧义，CPU与磁盘的翻译后也将原文一同记录，仅供查阅使用。

docker官方注解：

https://docs.docker.com/config/containers/runmetrics/

memory.stat：

Metric	Description
cache	The amount of memory used by the processes of this control group that can be associated precisely with a block on a block device. When you read from and write to files on disk, this amount increases. This is the case if you use “conventional” I/O (open, read, write syscalls) as well as mapped files (with mmap). It also accounts for the memory used by tmpfs mounts, though the reasons are unclear.
rss	The amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps.
mapped_file	Indicates the amount of memory mapped by the processes in the control group. It doesn’t give you information about how much memory is used; it rather tells you how it is used.
pgfault, pgmajfault	Indicate the number of times that a process of the cgroup triggered a “page fault” and a “major fault”, respectively. A page fault happens when a process accesses a part of its virtual memory space which is nonexistent or protected. The former can happen if the process is buggy and tries to access an invalid address (it is sent a SIGSEGV signal, typically killing it with the famous Segmentation fault message). The latter can happen when the process reads from a memory zone which has been swapped out, or which corresponds to a mapped file: in that case, the kernel loads the page from disk, and let the CPU complete the memory access. It can also happen when the process writes to a copy-on-write memory zone: likewise, the kernel preempts the process, duplicate the memory page, and resume the write operation on the process’s own copy of the page. “Major” faults happen when the kernel actually needs to read the data from disk. When it just duplicates an existing page, or allocate an empty page, it’s a regular (or “minor”) fault.
swap	The amount of swap currently used by the processes in this cgroup.
active_anon, inactive_anon	The amount of anonymous memory that has been identified has respectively active and inactive by the kernel. “Anonymous” memory is the memory that is not linked to disk pages. In other words, that’s the equivalent of the rss counter described above. In fact, the very definition of the rss counter is active_anon + inactive_anon - tmpfs (where tmpfs is the amount of memory used up by tmpfs filesystems mounted by this control group). Now, what’s the difference between “active” and “inactive”? Pages are initially “active”; and at regular intervals, the kernel sweeps over the memory, and tags some pages as “inactive”. Whenever they are accessed again, they are immediately retagged “active”. When the kernel is almost out of memory, and time comes to swap out to disk, the kernel swaps “inactive” pages.
active_file, inactive_file	Cache memory, with active and inactive similar to the anon memory above. The exact formula is cache = active_file + inactive_file + tmpfs. The exact rules used by the kernel to move memory pages between active and inactive sets are different from the ones used for anonymous memory, but the general principle is the same. When the kernel needs to reclaim memory, it is cheaper to reclaim a clean (=non modified) page from this pool, since it can be reclaimed immediately (while anonymous pages and dirty/modified pages need to be written to disk first).
unevictable	The amount of memory that cannot be reclaimed; generally, it accounts for memory that has been “locked” with mlock. It is often used by crypto frameworks to make sure that secret keys and other sensitive material never gets swapped out to disk.
memory_limit, memsw_limit	These are not really metrics, but a reminder of the limits applied to this cgroup. The first one indicates the maximum amount of physical memory that can be used by the processes of this control group; the second one indicates the maximum amount of RAM+swap.

CPU指标

名称	类型	单位	说明
container_cpu_usage_seconds_total	counter	秒	该容器服务针对每个CPU累计消耗的CPU时间。如果有多个CPU，则总的CPU时间需要把各个CPU耗费的时间相加 Cumulative cpu time consumed per cpu in nanoseconds.
container_cpu_user_seconds_total	counter	秒	该容器服务用户（user）累计消耗的CPU时间 Cumulative user cpu time consumed in nanoseconds.
container_cpu_system_seconds_total	counter	秒	该容器服务系统（system）累计消耗的CPU时间 Cumulative system cpu time consumed in nanoseconds.
container_cpu_cfs_throttled_seconds_total	counter	秒	cfs 是完全公平调度器(Completely Fair Scheduler)的缩写，是Linux的一种控制CPU资源占用的机制，可以按指定比例分配调度CPU 的使用时间。这个指标指的是该容器服务被限制使用的CPU时间 Counter Total time duration the container has been throttled seconds
container_cpu_cfs_throttled_periods_total	counter	秒	文档注释是：“Number of throttled period intervals.”，解释为被限制/节流的CPU时间周期数。 Counter Number of throttled period intervals
container_cpu_cfs_periods_total	counter	秒	已经执行的CPU时间周期数。 Counter Number of elapsed enforcement period intervals
container_cpu_load_average_10s	gauge		过去10秒内的CPU负载的平均值。 Gauge Value of container cpu load average over the last 10 seconds

CPU计算公式

PS:CPU指数只有时间单位，所以需要使用rate函数进行转换：

rate(container_cpu_usage_seconds_total{name=~“组件名称.*”}[5m]) #单个CPU消耗占比

sum(rate(container_cpu_usage_seconds_total{name=~“组件名称.*”}[5m]))
#CPU消耗总和

rate(container_cpu_user_seconds_total{name=~“组件名称.*”}[5m]) #用户消耗CPU占比

rate(container_cpu_system_seconds_total{name=~“组件名称.*”}[5m])
#系统消耗CPU占比

磁盘指标

名称	类型	单位	说明
container_fs_writes_bytes_total	counter	字节	写入的累计字节数 Cumulative count of bytes written
container_fs_reads_bytes_total	counter	字节	读取的累计字节数 Cumulative count of bytes read
container_fs_usage_bytes	counter	字节	容器磁盘空间使用 Number of bytes that are consumed by the container on this filesystem.
container_fs_io_time_seconds_total	counter	秒	执行I/O所花费的时间 Cumulative count of seconds spent doing I/Os
container_fs_io_time_weighted_seconds_total	counter	秒	累计加权I/O时间 Cumulative weighted I/O time in seconds

网络指标

名称	类型	单位	说明
container_network_receive_bytes_total	counter	字节	请求流量数（一段时间内） Cumulative count of bytes received
container_network_transmit_bytes_total	counter	字节	出口流量数 Cumulative count of bytes transmitted
Cumulative count of packets received	counter	个	请求数据包数 Cumulative count of packets received
container_network_transmit_packets_total	counter	个	出口数据包数 Cumulative count of packets transmitted
container_network_receive_packets_dropped_total	counter	个	请求丢包数 Cumulative count of packets dropped while receiving
container_network_transmit_packets_dropped_total	counter	个	出口丢包数 Cumulative count of packets dropped while transmitting
container_network_receive_errors_total	counter	个	请求流量数错误数 Cumulative count of errors encountered while receiving
container_network_transmit_errors_total	counter	个	出口流量数错误数 Cumulative count of errors encountered while transmitting

腾讯云开发者社区

腾讯云面向开发者汇聚海量精品云计算使用和开发经验，营造开放的云计算技术生态圈。

更多推荐

自动化提示词生成工具盘点

腾讯云开发者社区

AI PPT免费使用技巧盘点：如何快速制作专业PPT？

腾讯云开发者社区

腾讯云架构师技术沙龙 · 长沙站圆满落幕，共话AI驱动下的技术架构与前沿应用

人工智能已成为推动技术创新与产业变革的重要引擎，开发者正身处一场前所未有的技术变革之中。通过本次腾讯云架构师技术沙龙，各位专家深入分享前沿技术洞察，探讨 AI 落地的应用路径与实践经验，为架构师的职业发展指明方向。腾讯云架构师长沙同盟和腾讯云架构师技术同盟长沙地区理事会正式成立。未来，腾讯云架构师长沙同盟将凝心聚力，打造属于本地架构师的学习与成长的家园，助力中国架构的蓬勃发展。未来已来，让我们携手