一、pg_statistic的toast表数据损坏

问题现象

在安装插件的时候使用\dx元命令的时候,突然发现报了一个错误:

postgres=# \dx
ERROR:  missing chunk number 0 for toast value 32789 in pg_toast_2619

根据提示来看,主表字段还留存着Toast Pointer,但Toast表中已经没有对应的Chunk条目,怀疑toast表存在损坏或者缺失数据。

toast表的表名是字符串"pg_toast"与表的oid拼接而成,根据这个pg_toast_2619的2619的oid,可以定位到是pg_statistic的表。

postgres=# select 2619::regclass;
   regclass   
--------------
 pg_statistic
(1 row)
postgres=# select * from pg_statistic;
ERROR:  missing chunk number 0 for toast value 32789 in pg_toast_2619
postgres=# 

问题原因

对应toast表的数据可能存在损坏。

处理方法

需要定位损坏的行的位置,然后可以删除有问题的行。因为pg_statistic的toast数据是更新统计信息的时候插入的,损坏后可以直接清理。vacuum analyze后会重新生成。

其他系统表损坏比较建议建议zero_damaged_pages设置为on来跳过损坏的块,然后备份业务数据,之后恢复到一个新的环境。

可以暂时先做下简单的修复,看是否能解决问题,可能执行过程会出现报错:

postgres=# REINDEX TABLE pg_toast.pg_toast_2619;
postgres=# REINDEX TABLE pg_statistic;
postgres=# VACUUM ANALYZE pg_statistic;

可以使用如下的存储过程

 DO $$
DECLARE
  rec record;
BEGIN
FOR rec in SELECT * FROM pg_statistic LOOP
  raise notice 'Parameter is: %', rec.ctid;
   raise notice 'Parameter is: %', rec;
END LOOP; 
END;
$$
LANGUAGE plpgsql;

结果如下:

NOTICE:  Parameter is: (0,1)
NOTICE:  Parameter is: (1255,28,f,0.9830149,1294,-0.016985118,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,,,,,,,)
NOTICE:  Parameter is: (0,2)
NOTICE:  Parameter is: (1255,29,f,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,,,,,,,)
NOTICE:  Parameter is: (0,3)
NOTICE:  Parameter is: (2606,22,f,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,,,,,,,)
NOTICE:  Parameter is: (0,4)
NOTICE:  Parameter is: (1260,1,f,0,4,-1,2,3,0,0,0,609,609,0,0,0,0,0,0,0,0,,{0.4365325},,,,"{10,3373,3374,3375,3377,4200,4544,4549,4550,4569,4570,4571,6171,6181,6182,6304,16392,16393}",,,,)
NOTICE:  Parameter is: (0,5)
NOTICE:  Parameter is: (1260,2,f,0,64,-1,2,3,0,0,0,660,660,0,0,0,950,950,0,0,0,,{0.104231164},,,,"{pg_checkpoint,pg_create_subscription,pg_database_owner,pg_execute_server_program,pg_maintain,pg_monitor,pg_read_all_data,pg_read_all_settings,pg_read_all_stats,pg_read_server_files,pg_signal_backend,pg_stat_scan_tables,pg_use_reserved_connections,pg_write_all_data,pg_write_server_files,postgres,u1,u2}",,,,)
NOTICE:  Parameter is: (0,6)
NOTICE:  Parameter is: (1260,3,f,0,1,-0.11111111,1,3,0,0,0,91,58,0,0,0,0,0,0,0,0,{0.9444444},{0.68421054},,,,{f},,,,)
NOTICE:  Parameter is: (0,7)
NOTICE:  Parameter is: (1260,4,f,0,1,1,1,3,0,0,0,91,58,0,0,0,0,0,0,0,0,{1},{1},,,,{t},,,,)

... ...

NOTICE:  Parameter is: (16,6)
NOTICE:  Parameter is: (2618,7,f,0,5,2,1,3,0,0,0,98,664,0,0,0,950,950,0,0,0,{0.993007},{0.9716709},,,,{<>},,,,)
NOTICE:  Parameter is: (16,7)
NOTICE:  Parameter is: (2618,8,f,0,589,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,,,,,,,)
ERROR:  missing chunk number 0 for toast value 32789 in pg_toast_2619
CONTEXT:  PL/pgSQL function inline_code_block line 5 at FOR over SELECT rows·

最后在ctid=(16,7)的后边停了下来,我们根据查询下发现ctid= '(16,7)'的是正常的数据,接下来的ctid= '(16,8)'是有问题的数据。

postgres=# select * from pg_statistic where ctid= '(16,7)';
 starelid | staattnum | stainherit | stanullfrac | stawidth | stadistinct | stakind1 | stakind2 | stakind3 | stakind4 | stakind5 | staop1 | staop2 | sta
op3 | staop4 | staop5 | stacoll1 | stacoll2 | stacoll3 | stacoll4 | stacoll5 | stanumbers1 | stanumbers2 | stanumbers3 | stanumbers4 | stanumbers5 | sta
values1 | stavalues2 | stavalues3 | stavalues4 | stavalues5 
----------+-----------+------------+-------------+----------+-------------+----------+----------+----------+----------+----------+--------+--------+----
----+--------+--------+----------+----------+----------+----------+----------+-------------+-------------+-------------+-------------+-------------+----
--------+------------+------------+------------+------------
     2618 |         8 | f          |           0 |      589 |          -1 |        0 |        0 |        0 |        0 |        0 |      0 |      0 |    
  0 |      0 |      0 |        0 |        0 |        0 |        0 |        0 |             |             |             |             |             |    
        |            |            |            | 
(1 row)

postgres=# select * from pg_statistic where ctid= '(16,8)';
ERROR:  missing chunk number 0 for toast value 32789 in pg_toast_2619

接下来删除此条有问题的数据,然后发现\dx的元命令可以执行了。

postgres=# delete from pg_statistic where ctid= '(16,8)';
DELETE 1
postgres=# \dx
                                            List of installed extensions
        Name        | Version |   Schema   |                              Description                               
--------------------+---------+------------+------------------------------------------------------------------------
 pg_stat_statements | 1.10    | public     | track planning and execution statistics of all SQL statements executed
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

紧接着我们再次查询pg_statistic表,发现还存在问题数据。函数再执行一次,发现又报了问题。

postgres=# select * from pg_statistic;
ERROR:  missing chunk number 0 for toast value 32791 in pg_toast_2619

NOTICE:  Parameter is: (18,15)
NOTICE:  Parameter is: (12624,1,f,0,5,-0.81349206,1,2,3,0,0,98,664,664,0,0,950,950,950,0,0,"{0.018518519,0.017195767,0.014550265,0.013227513,0.011904762,0.011904762,0.011904762,0.01058201,0.01058201,0.01058201,0.01058201,0.009259259,0.009259259,0.007936508,0.007936508,0.007936508,0.0066137565,0.005291005,0.005291005,0.003968254,0.003968254,0.003968254,0.0026455026,0.0026455026}",,{0.97739166},,,"{E061,E021,E081,E121,E051,E141,F051,E091,F031,F041,T321,E011,F021,E071,F131,F311,F261,E031,E101,E151,E152,F501,F302,S011}","{B011,B016,B033,B111,B117,B126,B203,B209,E161,F035,F054,F114,F171,F221,F271,F304,F341,F385,F393,F405,F432,F437,F471,F531,F641,F690,F711,F762,F812,F841,F847,F856,F862,F868,M005,M012,M018,M024,S023,S041,S091,S096,S161,S211,S251,S281,S404,T031,T044,T050,T061,T101,T133,T173,T180,T212,T218,T271,T322,T332,T434,T491,T522,T571,T612,T618,T624,T651,T662,T812,T824,T829,T835,T851,T865,T871,T877,X010,X016,X034,X041,X046,X052,X058,X068,X074,X081,X090,X111,X131,X142,X152,X170,X192,X205,X232,X254,X263,X281,X301,X410}",,,)
NOTICE:  Parameter is: (18,16)
NOTICE:  Parameter is: (12624,3,f,0,1,18,1,2,3,0,0,98,664,664,0,0,950,950,950,0,0,"{0.81349206,0.03042328,0.02910053,0.023809524,0.022486772,0.017195767,0.015873017,0.013227513,0.01058201,0.005291005,0.005291005,0.0026455026,0.0026455026,0.0026455026}",,{0.26829666},,,"{"""",01,02,03,04,05,06,07,08,09,10,11,12,13}","{14,16,17,19}",,,)
ERROR:  missing chunk number 0 for toast value 32791 in pg_toast_2619
CONTEXT:  PL/pgSQL function inline_code_block line 5 at FOR over SELECT rows

同样的再次删除问题的行,发现pg_statistic表已经可以正常访问了。

postgres=# delete from pg_statistic where ctid= '(18,17)';
DELETE 1
postgres=# select * from pg_statistic;                    
 starelid | staattnum | stainherit | stanullfrac | stawidth |  stadistinct  | stakind1 | stakind2 | stakind3 | stakind4 | stakind5 | staop1 | staop2 | s
taop3 | staop4 | staop5 | stacoll1 | stacoll2 | stacoll3 | stacoll4 | stacoll5 |                                                                        
... ...

完成以上步骤后对该表进行一次完整的维护和索引重建,。

postgres=# REINDEX TABLE pg_toast.pg_toast_2619;
postgres=# REINDEX TABLE pg_statistic;
postgres=# VACUUM ANALYZE pg_statistic;

二、业务表toast损坏恢复手段

1.恢复手段,基于备份恢复(比较建议)

在实际的使用中,如果业务的toast表发生了损坏,我们的首选,其实是根据备份恢复,如果存在全量的备份以及连续的wal日志的话,我们可以在另外的测试环境进行PITR基于时间点恢复,然后导出对应的表的数据,在沟通好变更窗口后,进行恢复操作。(切忌盲目在生产环境直接恢复,就算要操作生产环境也要沟通好变更窗口进行变更

2.跳过损坏的块,备份数据(谨慎使用

会用到一个参数zero_damaged_pages,可以将该参数设置为on来跳过损坏的块,然后重新备份,恢复。

3.删除损坏的行来恢复表(谨慎使用

但是如果没有完整的备份,也没有其他更好的恢复手段的时候,可以考虑删除掉损坏的行来恢复表。
这个时候也需要定位有问题的行,可以使用如下的脚本,适当进行调整,最好根据主键去定位:

#!/bin/bash
 j=500         #500是表的总行数
for ((i=1; i<=j;i++)) 
do
  psql -U user_recover -d postgres -c "SELECT * FROM test_tab order by id  LIMIT 1 offset $i" >/dev/null || echo $i
done

执行的结果大致如下

postgres@ubuntu-linux-22-04-desktop:~$  sh check_toast_tup.sh 
错误:  missing chunk number 0 for toast value 3132142 in pg_toast_521351
100
错误:  missing chunk number 0 for toast value 3132142 in pg_toast_521351
101
错误:  missing chunk number 0 for toast value 3132143 in pg_toast_521351
102
错误:  missing chunk number 0 for toast value 3132144 in pg_toast_521351
103
错误:  missing chunk number 0 for toast value 3132145 in pg_toast_521351
104
错误:  missing chunk number 0 for toast value 3132146 in pg_toast_521351
105
错误:  missing chunk number 0 for toast value 3132147 in pg_toast_521351
106
错误:  missing chunk number 0 for toast value 3132148 in pg_toast_521351
107
错误:  missing chunk number 0 for toast value 3132149 in pg_toast_521351
108

定位到数据后,使用delete语句清除,然后再查询,表的数据恢复正常。

select * from test_tab;
 id  | name 
-----+--------
   1 | a
   2 | b
   3 | c
   4 | d
   5 | e
   6 | f
   7 | g
... ...

完成以上步骤后对该表进行一次完整的维护和索引重建。

postgres=# REINDEX TABLE test_tab;
postgres=# VACUUM ANALYZE test_tab;

postgresql

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐