Попытка создать аналог ASH для PostgreSQL

в 13:55, , рубрики: postgresql, postgresql performance, Администрирование баз данных

Постановка задачи

Для оптимизации запросов PostgreSQL, очень требуется возможность анализировать историю активности, в частности – ожидания, блокировки, статистика таблиц.

Имеющиеся возможности

Инструмент анализа исторической нагрузки или «AWR для Postgres»: очень интересное решение, но, нет истории pg_stat_activity и pg_locks.

Расширение pgsentinel :
"Вся накопленная информация хранится только в оперативной памяти, а потребляемый объём памяти регулируется количеством последних хранимых записей.

Добавляется поле queryid — тот самый queryid из расширения pg_stat_statements (требуется предварительная установка)."

Это конечно сильно бы помогло, но самая неприятность именно первый пункт “Вся накопленная информация хранится только в оперативной памяти ”, т.е. имеем место импакт на целевую базу. К тому, же нет истории блокировок и статистики таблиц. Т.е. решение вообще говоря неполное: “Готового пакета для установки пока нет. Предлагается скачать исходники и собрать библиотеку самостоятельно. Предварительно требуется установить «devel»-пакет для своего сервера и в переменную PATH прописать путь до pg_config.”.

В общем – возни много, а в случае серьезных продакшн баз, может быть, и не будет возможности что-то делать с сервером. Нужно опять, придумывать, что-то свое.

Предупреждение.

В силу довольно большого объема и в связи с незавершением периода тестирования, статья носит в основном ознакомительный характер, скорее как набор тезисов и результатов.
Более подробный материал, будет подготовлен позже, по частям

Эскизные требования к решению

Необходимо разработать инструмент позволяющий хранить:

Историю представления pg_stat_activity
Историю блокировок сессий, используя представление pg_locks

Требование к решению–минимизировать влияние на целевую базу данных.

Общая идея–агент сбора данных запускается не в целевой базе, а в базе данных мониторинга как сервис systemd. Да, возможны некоторые потери данных, но это не критично, для отчетности, зато нет импакта на целевую базу по памяти и месту на диске. А в случае использования пула соединений, импакт по пользовательским процессам минимален.

Этапы реализации

1.Сервисные таблицы

Для хранения таблиц используется отдельная схема, что бы не затруднять анализ основных использованных таблиц.

DROP SCHEMA IF EXISTS activity_hist ;
CREATE SCHEMA activity_hist AUTHORIZATION monitor ;

Важно: схема создается не в целевой базе данных, а в базе данных мониторинга.
История представления pg_stat_activity
Для хранения текущих снимков представления pg_stat_activity используется таблица

activity_hist.history_pg_stat_activity :

--ACTIVITY_HIST.HISTORY_PG_STAT_ACTIVITY
DROP TABLE IF EXISTS activity_hist.history_pg_stat_activity;
CREATE TABLE activity_hist.history_pg_stat_activity
(
  timepoint timestamp without time zone ,
  datid             oid  , 
  datname           name ,
  pid               integer,
  usesysid          oid    ,
  usename           name   ,
  application_name  text   ,
  client_addr       inet   ,
  client_hostname   text   ,
  client_port       integer,
  backend_start     timestamp with time zone ,
  xact_start        timestamp with time zone ,
  query_start       timestamp with time zone ,
  state_change      timestamp with time zone ,
  wait_event_type   text ,                     
  wait_event        text ,                   
  state             text ,                  
  backend_xid       xid  ,                 
  backend_xmin      xid  ,                
  query             text ,               
  backend_type      text ,  
  queryid           bigint
);

Для ускорения вставки – никаких индексов и ограничений.

Для хранения непосредственно истории используется секционированная таблица:

activity_hist.archive_pg_stat_activity :

DROP TABLE IF EXISTS activity_hist.archive_pg_stat_activity;
CREATE TABLE activity_hist.archive_pg_stat_activity
(
  timepoint timestamp without time zone ,
  datid             oid  , 
  datname           name ,
  pid               integer,
  usesysid          oid    ,
  usename           name   ,
  application_name  text   ,
  client_addr       inet   ,
  client_hostname   text   ,
  client_port       integer,
  backend_start     timestamp with time zone ,
  xact_start        timestamp with time zone ,
  query_start       timestamp with time zone ,
  state_change      timestamp with time zone ,
  wait_event_type   text ,                     
  wait_event        text ,                   
  state             text ,                  
  backend_xid       xid  ,                 
  backend_xmin      xid  ,                
  query             text ,               
  backend_type      text ,
  queryid           bigint
)
PARTITION BY RANGE (timepoint);

Поскольку в данном случае нет требований по скорости вставки, созданы некоторые индексы, для ускорения создания отчетов.

История блокировок сессий

Для хранения текущих снимков блокировок сессий используется таблица:

activity_hist.history_locking :

--ACTIVITY_HIST.HISTORY_LOCKING
DROP TABLE IF EXISTS activity_hist.history_locking;
CREATE TABLE activity_hist.history_locking
(
	timepoint timestamp without time zone ,
	locktype text ,
	relation oid ,
	mode text ,
	tid xid ,
	vtid text ,
	pid integer ,
	blocking_pids integer[] ,
	granted boolean
);

Также, для ускорения вставки – никаких индексов и ограничений.

Для хранения непосредственно истории используется секционированная таблица:

activity_hist.archive_locking:

DROP TABLE IF EXISTS activity_hist.archive_locking;
CREATE TABLE activity_hist.archive_locking
(
	timepoint timestamp without time zone ,
	locktype text ,
	relation oid ,
	mode text ,
	tid xid ,
	vtid text ,
	pid integer ,
	blocking_pids integer[] ,
	granted boolean	
)
PARTITION BY RANGE (timepoint);

Поскольку в данном случае нет требований по скорости вставки, созданы некоторые индексы, для ускорения создания отчетов.

2.Заполнение текущей истории

Для непосредственного сбора снимков представления используется bash скрипт, запускающий plpgsql функцию.

get_current_activity.sh

#!/bin/bash
#########################################################
#get_current_activity.sh

ERROR_FILE='/home/demon/get_current_activity'$(date +%Y%m%d-)'T'$(date +%H)$(date +%M)$(date +%S)
host=$1
s_name=$2
s_pass=$3

psql  -A -t -q -v ON_ERROR_STOP=1 -c "SELECT activity_hist.get_current_activity( '$host' , '$s_name' , '$s_pass' )" >/dev/null 2>$ERROR_FILE

line_count=`cat $ERROR_FILE | wc -l`
if [[ $line_count != '0' ]];
then
    rm -f /home/demon/*.err >/dev/null 2>/dev/null
	cp $ERROR_FILE $ERROR_FILE'.err' >/dev/null 2>/dev/null  
fi
rm $ERROR_FILE >/dev/null 2>/dev/null
exit 0

plpgsql функция по dblink обращается к представлениям в целевой базе данных и вставляет строки в сервисные таблицы в базе данных мониторинга.

get_current_activity.sql

CREATE OR REPLACE FUNCTION activity_hist.get_current_activity( current_host text , current_s_name text , current_s_pass text ) RETURNS BOOLEAN AS $$
DECLARE 
  database_rec record;
  dblink_str text ;
BEGIN   

	EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||current_host||' port=5432 dbname=postgres'||
	                                         ' user='||current_s_name||' password='||current_s_pass|| ' '')';



--------------------------------------------------------------------
--GET pg_stat_activity stats
	INSERT INTO activity_hist.history_pg_stat_activity
	(
		SELECT * FROM dblink('LINK1',
			'SELECT 
			now() , 
			datid             , 
			datname           ,
			pid               ,
			usesysid              ,
			usename              ,
			application_name     ,
			client_addr          ,
			client_hostname      ,
			client_port       ,
			backend_start         ,
			xact_start            ,
			query_start           ,
			state_change          ,
			wait_event_type    ,                     
			wait_event         ,                   
			state              ,                  
			backend_xid         ,                 
			backend_xmin        ,                
			query              ,               
			backend_type   			
		FROM pg_stat_activity
		') 
		AS t (
		    timepoint 		  timestamp without time zone ,			
			datid             oid  , 
			datname           name ,
			pid               integer,
			usesysid          oid    ,
			usename           name   ,
			application_name  text   ,
			client_addr       inet   ,
			client_hostname   text   ,
			client_port       integer,
			backend_start     timestamp with time zone ,
			xact_start        timestamp with time zone ,
			query_start       timestamp with time zone ,
			state_change      timestamp with time zone ,
			wait_event_type   text ,                     
			wait_event        text ,                   
			state             text ,                  
			backend_xid       xid  ,                 
			backend_xmin      xid  ,                
			query             text ,               
			backend_type      text 			
		)
	);

---------------------------------------	
--ACTIVITY_HIST.HISTORY_LOCKING	
	INSERT INTO activity_hist.history_locking
	(
		SELECT * FROM dblink('LINK1',
			'SELECT 
			now() , 
			lock.locktype,
			lock.relation,
			lock.mode,
			lock.transactionid as tid,
			lock.virtualtransaction as vtid,
			lock.pid,
			pg_blocking_pids(lock.pid), 
			lock.granted
			FROM 	pg_catalog.pg_locks lock LEFT JOIN pg_catalog.pg_database db ON db.oid = lock.database
			WHERE NOT lock.pid = pg_backend_pid()	
		') 
		AS t (
			timepoint timestamp without time zone ,
			locktype text ,
			relation oid , 
			mode text ,
			tid xid ,
			vtid text ,
			pid integer ,
			blocking_pids integer[] ,
			granted boolean
		)
	);
	PERFORM dblink_disconnect('LINK1');
	
	RETURN TRUE ;
END
$$ LANGUAGE plpgsql;

Для сбора снимков представления используется systemd service, и два скрипта:

pg_current_activity.service

# /etc/systemd/system/pg_current_activity.service
[Unit]
Description=Collect history of pg_stat_activity , pg_locks 
Wants=pg_current_activity.timer

[Service]
Type=forking
StartLimitIntervalSec=0
ExecStart=/home/postgres/pgutils/demon/get_current_activity.sh 10.124.70.40 postgres postgres

[Install]
WantedBy=multi-user.target

pg_current_activity.timer

# /etc/systemd/system/pg_current_activity.timer
[Unit]
Description=Run pg_current_activity.sh every 1 second
Requires=pg_current_activity.service

[Timer]
Unit=pg_current_activity.service
OnCalendar=*:*:0/1
AccuracySec=1

[Install]
WantedBy=timers.target

Назначаем права скриптам:
# chmod 755 pg_current_activity.timer
# chmod 755 pg_current_activity.service

Запускаем сервис:
# systemctl daemon-reload
# systemctl start pg_current_activity.service

Таким образом, история представлений собирается в виде ежесекундных снимков. Конечно, если все оставить как есть, таблицы очень быстро увеличатся в размерах и более менее продуктивная работа станет невозможна.

Нужно организовать архивирование данных.

3.Архивирование истории

Для архивация используется секционированные таблицы archive*.

Новые секции создаются каждый час, при этом старые данные из таблиц history* удаляются, таким образом, размер таблиц history* сильно не меняется и скорость вставки не деградируется со временем.

Создание новых секций выполняется plpgsql функцией activity_hist.archive_current_activity. Алгоритм работы очень прост( на примере секции для таблицы archive_pg_stat_activity).

Создаём и заполняем новую секцию

EXECUTE format(
'CREATE TABLE ' || partition_name || 
' PARTITION OF activity_hist.archive_pg_stat_activity FOR VALUES FROM ( %L ) TO ( %L ) ' , 
to_char(date_trunc('year', partition_min_range ),'YYYY')||'-'||
to_char(date_trunc('month', partition_min_range ),'MM')||'-'||
to_char(date_trunc('day', partition_min_range ),'DD')||' '||
to_char(date_trunc('hour', partition_min_range ),'HH24')||':00', 
to_char(date_trunc('year', partition_max_range ),'YYYY')||'-'||
to_char(date_trunc('month', partition_max_range ),'MM')||'-'||
to_char(date_trunc('day', partition_max_range ),'DD')||' '||
to_char(date_trunc('hour', partition_max_range ),'HH24')||':00'
);

INSERT INTO activity_hist.archive_pg_stat_activity
(
	SELECT 	* 
	FROM 	activity_hist.history_pg_stat_activity
	WHERE 	timepoint BETWEEN partition_min_range AND partition_max_range 		
);

Создаем индексы

EXECUTE format	(
'CREATE INDEX '||index_name||
' ON '||partition_name||' ( wait_event_type , backend_type , timepoint )' 
);

EXECUTE format	('CREATE INDEX '||index_name||
' ON '||partition_name||' ( wait_event_type , backend_type , timepoint , queryid )' 
);

Удаляем старые данные из таблицы history_pg_stat_activity

DELETE 
FROM 	activity_hist.history_pg_stat_activity
WHERE 	timepoint < partition_max_range;

Разумеется периодически, старые секции, за ненадобностью удаляются.

Базовые отчеты

Собственно для чего все это делается. Для получения отчетов очень отдаленно, приближенно напоминающих AWR Oracle.

Важно добавить, для получения отчетов необходимо построить связь между представлениями pg_stat_activity и pg_stat_statements. Таблицы связываются путем добавления в таблицы 'history_pg_stat_activity', 'archive_pg_stat_activity' столбца 'queryid'. Способ добавления значения столбца, выходит за рамки данной статьи и будет подробно описан чуть позже.

TOTAL CPU TIME FOR QUERIES

Запрос :

WITH hist AS
(
SELECT 
	aa.query ,aa.queryid ,			
	count(*) * interval '1 second' AS duration 
FROM 	activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND  pg_stat_history_end+(current_hour_diff * interval '1 hour')  AND backend_type = 'client backend' AND datname != 'postgres' AND	( aa.wait_event_type IS NULL  ) ANDaa.state = 'active'
GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid		
UNION 
SELECT 
	ha.query ,ha.queryid,
	count(*) * interval '1 second' AS duration 
FROM 	activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour')  AND 	backend_type = 'client backend' AND datname != 'postgres' AND ( ha.wait_event_type IS NULL  )AND ha.state = 'active'
GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid		
)
SELECT 	query , queryid , SUM( duration ) as duration 
FROM hist
GROUP BY  query , queryid 
ORDER BY 3 DESC

Пример:

-------------------------------------------------------------------
| TOTAL CPU TIME FOR QUERIES : 07:47:36
+----+----------------------------------------+--------------------
|   #|                                 queryid|            duration
+----+----------------------------------------+--------------------
|   1|                      389015618226997618|            04:28:58
|   2|                                        |            01:07:29
|   3|                     1237430309438971376|            00:59:38
|   4|                     4710212362688288619|            00:50:48
|   5|                       28942442626229688|            00:15:50
|   6|                     9150846928388977274|            00:04:46
|   7|                    -6572922443698419129|            00:00:06
|   8|                                        |            00:00:01
+----+----------------------------------------+--------------------

TOTAL WAITINGS TIME FOR QUERIES

Запрос :

WITH hist AS
(
SELECT 
	aa.query ,aa.queryid ,			
	count(*) * interval '1 second' AS duration 
FROM 	activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour')  AND 
	backend_type = 'client backend' AND datname != 'postgres' AND
	( aa.wait_event_type IS NOT NULL  ) 
GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid		
UNION 
SELECT 
	ha.query ,ha.queryid,
	count(*) * interval '1 second' AS duration 
FROM 	activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour')  AND 
	backend_type = 'client backend' AND datname != 'postgres' AND				
	( ha.wait_event_type IS NOT NULL  )
GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid		
)
SELECT 	query , queryid , SUM( duration ) as duration 
FROM hist
GROUP BY  query , queryid 
ORDER BY 3 DESC 

Пример :

-------------------------------------------------------------------
| TOTAL WAITINGS TIME FOR QUERIES : 21:55:04
+----+----------------------------------------+--------------------
|   #|                                 queryid|            duration
+----+----------------------------------------+--------------------
|   1|                      389015618226997618|            16:19:05
|   2|                                        |            03:47:04
|   3|                     8085340880788646241|            00:40:20
|   4|                     4710212362688288619|            00:13:35
|   5|                     9150846928388977274|            00:12:25
|   6|                       28942442626229688|            00:11:32
|   7|                     1237430309438971376|            00:09:45
|   8|                     2649515222348904837|            00:09:37
|   9|                                        |            00:03:45
|  10|                     3167065002719415275|            00:02:20
|  11|                     5731212217001535134|            00:02:13
|  12|                     8304755792398128062|            00:01:31
|  13|                     2649515222348904837|            00:00:59
|  14|                     2649515222348904837|            00:00:22
|  15|                                        |            00:00:12
|  16|                     3422818749220588372|            00:00:08
|  17|                    -5730801771815999400|            00:00:03
|  18|                    -1473395109729441239|            00:00:02
|  19|                     2404820632950544954|            00:00:02
|  20|                    -6572922443698419129|            00:00:02
|  21|                     2369289265278398647|            00:00:01
|  22|                      180077086776069052|            00:00:01
+----+----------------------------------------+--------------------

WAITINGS FOR QUERIES

Запросы:

WITH hist AS
(
SELECT 
	aa.wait_event_type , aa.wait_event 
FROM 	activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND 
	backend_type = 'client backend' AND datname != 'postgres' AND
	aa.wait_event IS NOT NULL 
GROUP BY aa.wait_event_type , aa.wait_event
UNION 
SELECT 
	ha.wait_event_type , ha.wait_event 
FROM 	activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND 
	backend_type = 'client backend' AND datname != 'postgres' AND
	ha.wait_event IS NOT NULL 
GROUP BY ha.wait_event_type , ha.wait_event		
)
SELECT 	wait_event_type , wait_event 
FROM hist
GROUP BY wait_event_type , wait_event
ORDER BY 1 ASC,2 ASC

----------------------------------------------------------------------

WITH hist AS
(
SELECT 
	aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid ,			
	count(*) * interval '1 second' AS duration 
FROM 	activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND 
	backend_type = 'client backend' AND datname != 'postgres' AND
	( aa.wait_event_type = waitings_stat_rec.wait_event_type AND aa.wait_event = waitings_stat_rec.wait_event )
GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid		
UNION 
SELECT 
	ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid,
	count(*) * interval '1 second' AS duration 
FROM 	activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND 
	backend_type = 'client backend' AND datname != 'postgres' AND				
	( ha.wait_event_type = waitings_stat_rec.wait_event_type AND ha.wait_event = waitings_stat_rec.wait_event )
GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid		
)
SELECT 	query , queryid , SUM( duration ) as duration 
FROM hist
GROUP BY  query , queryid 
ORDER BY 3 DESC

Пример:

------------------------------------------------
| WAITINGS FOR QUERIES
+-----------------------------------------------
|                      wait_event_type = Client|
|                       wait_event = ClientRead|
|                        Total time  = 00:46:56|
------------------------------------------------
|    #|             queryid|            duration
+-----+--------------------+--------------------
|    1| 8085340880788646241|            00:40:20
|    2|                    |            00:03:45
|    3| 5731212217001535134|            00:01:53
|    4|                    |            00:00:12
|    5| 9150846928388977274|            00:00:09
|    6| 3422818749220588372|            00:00:08
|    7| 1237430309438971376|            00:00:06
|    8|   28942442626229688|            00:00:05
|    9| 4710212362688288619|            00:00:05
|   10|-5730801771815999400|            00:00:03
|   11| 8304755792398128062|            00:00:02
|   12|-6572922443698419129|            00:00:02
|   13|-1473395109729441239|            00:00:02
|   14| 2404820632950544954|            00:00:02
|   15|  180077086776069052|            00:00:01
|   16| 2369289265278398647|            00:00:01

+-----------------------------------------------
|                          wait_event_type = IO|
|                      wait_event = BufFileRead|
|                        Total time  = 00:00:38|
------------------------------------------------
|    #|             queryid|            duration
+-----+--------------------+--------------------
|    1|   28942442626229688|            00:00:38

+-----------------------------------------------

LOCKED PROCESSES HISTORY

Запрос:

SELECT 
MIN(date_trunc('second',timepoint)) AS started , 
	count(*) * interval '1 second' as duration ,
	pid , blocking_pids , relation , mode , locktype 	 
FROM 
	activity_hist.archive_locking al 
WHERE 
	timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
	NOT granted AND 
	locktype = 'relation' 
GROUP BY pid , blocking_pids , relation , mode , locktype			
UNION
SELECT 
	MIN(date_trunc('second',timepoint)) AS started , 
	count(*) * interval '1 second' as duration ,
	pid , blocking_pids , relation , mode , locktype
FROM 
	activity_hist.history_locking 
WHERE 
	timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
	NOT granted AND 
	locktype = 'relation' 
GROUP BY pid , blocking_pids , relation , mode , locktype			
ORDER BY 1

Пример:

-------------------------------------------------------------------------------------------------------------------------------------
| LOCKED PROCESSES HISTORY
+-----+----------+--------------------+----------+--------------------+--------------------+--------------------+--------------------
|    #|       pid|             started|  duration|       blocking_pids|            relation|                mode|            locktype
+-----+----------+--------------------+----------+--------------------+--------------------+--------------------+--------------------
|    1|     26224| 2019-09-02 19:32:16|  00:01:45|             {26211}|               16541|     AccessShareLock|            relation
|    2|     26390| 2019-09-02 19:34:03|  00:00:53|             {26211}|               16541|     AccessShareLock|            relation
|    3|     26391| 2019-09-02 19:34:03|  00:00:53|             {26211}|               16541|     AccessShareLock|            relation
|    4|     26531| 2019-09-02 19:35:27|  00:00:12|             {26211}|               16541|     AccessShareLock|            relation
|    5|     27284| 2019-09-02 19:44:02|  00:00:19|             {27276}|               16541|     AccessShareLock|            relation
|    6|     27283| 2019-09-02 19:44:02|  00:00:19|             {27276}|               16541|     AccessShareLock|            relation
|    7|     27286| 2019-09-02 19:44:02|  00:00:19|             {27276}|               16541|     AccessShareLock|            relation
|    8|     27423| 2019-09-02 19:45:24|  00:00:12|             {27394}|               16541|     AccessShareLock|            relation
|    9|     27648| 2019-09-02 19:48:06|  00:00:20|             {27647}|               16541|     AccessShareLock|            relation
|   10|     27650| 2019-09-02 19:48:06|  00:00:20|             {27647}|               16541|     AccessShareLock|            relation
|   11|     27735| 2019-09-02 19:49:08|  00:00:06|             {27650}|               16541| AccessExclusiveLock|            relation
|   12|     28380| 2019-09-02 19:56:03|  00:01:56|             {28379}|               16541|     AccessShareLock|            relation
|   13|     28379| 2019-09-02 19:56:03|  00:00:01|               28377|               16541| AccessExclusiveLock|            relation
|     |          |                    |          |               28376|                    | 

BLOCKING PROCESSES HISTORY

Запросы:

SELECT 
blocking_pids 
FROM 
	activity_hist.archive_locking al 
WHERE 
	timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
	NOT granted AND 
	locktype = 'relation' 
GROUP BY blocking_pids 		
UNION
SELECT 
	blocking_pids 
FROM 
	activity_hist.history_locking 
WHERE 
	timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
	NOT granted AND 
	locktype = 'relation' 
GROUP BY blocking_pids 		
ORDER BY 1

---------------------------------------------------------------

SELECT 
	pid , usename , application_name , datname ,
	MIN(date_trunc('second',timepoint)) as started , 
	count(*) * interval '1 second' as duration ,		 
	state , 
	query
				FROM  	activity_hist.archive_pg_stat_activity
				WHERE 	pid= current_pid AND 
						timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') 						 
				GROUP BY pid , usename , application_name , 
						datname , 
						state_change, 
						state , 
						query 
				UNION
				SELECT 
					pid , usename , application_name , datname ,
					MIN(date_trunc('second',timepoint)) as started , 
					count(*) * interval '1 second' as duration ,		 
					state , 
					query
				FROM  	activity_hist.history_pg_stat_activity_for_reports
				WHERE 	pid= current_pid AND 
						timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') 						 
				GROUP BY pid , usename , application_name , 
						datname , 
						state_change, 
						state , 
						query 
				ORDER BY 5 , 1

Пример:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BLOCKING PROCESSES HISTORY
+----+----------+----------+--------------------+----------+--------------------+--------------------+------------------------------+----------------------------------------
|   #|       pid|   usename|    application_name|   datname|             started|            duration|                         state|                                   query
+----+----------+----------+--------------------+----------+--------------------+--------------------+------------------------------+----------------------------------------
|   1|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:31:54|            00:00:04|                          idle|
|   2|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:31:58|            00:00:06|           idle in transaction|                                  begin;
|   3|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:32:16|            00:01:45|           idle in transaction|                  lock table wafer_data;
|   4|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:35:54|            00:01:23|                          idle|                                 commit;
|   5|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:38:46|            00:00:02|           idle in transaction|                                  begin;
|   6|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:38:54|            00:00:08|           idle in transaction|                  lock table wafer_data;
|   7|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:39:08|            00:42:42|                          idle|                                 commit;
|   8|     26211|     tuser|                psql|      tdb1| 2019-09-03 07:12:07|            00:00:52|                        active|                     select test_del ();

Развитие.

Показанные базовые запросы и получаемые отчеты, уже сильно облегчают жизнь при анализе инцидентов производительности.
На основе базовых запросов, можно получить отчет, отдаленно приближенно напоминающий AWR Oracle.

Пример сводного отчета

+------------------------------------------------------------------------------------
| CONSOLIDATED REPORT FOR ACTIVITY AND WAITINGS . DATETIME : 03.09.2019 14:08
|------------------------------------------------------------------------------------
| HOST :10.124.70.40
| BEGIN_SNAPSHOT :02.09.2019 14:08 END_SNAPSHOT : 03.09.2019 14:00
|------------------------------------------------------------------------------------
| CURRENT DATABASES SIZE  :
| DATABASE :monitor
| SIZE (MB) :1370.00
|------------------------------------------------------------------------------------
| CLUSTER CPU TIME : 19:44:22
| CLUSTER WAITINGS TIME : 78:49:16
|
| SQL DBTIME : 65:53:09
| SQL CPU TIME : 19:05:21
| SQL WAITINGS TIME : 21:50:46
| SQL IOTIME : 20:53:00
| SQL READ TIME : 20:52:55
| SQL WRITE TIME : 00:00:05
|
| SQL CALLS : 311293
-------------------------------------------------------------
| SQL SHARED BLOCKS READS : 13351563334
| SQL SHARED BLOCKS HITS : 2775427045
| SQL SHARED BLOCKS HITS/READS % : 20.79
| SQL SHARED BLOCKS DIRTED : 21105
| SQL SHARED BLOCKS WRITTEN : 3656
|
| SQL TEMPORARY BLOCKS READS : 7464932
| SQL TEMPORARY BLOCKS WRITTEN : 10176024
-------------------------------------------------------------
|
| WAITINGS STATICTICS
|
+------------------------------------------------------------------------------------
| TOP 10 WAITINGS BY TOTAL WAIT TIME FOR SYSTEM PROCESSES
+-----+------------------------------+--------------------+--------------------
|    #|               wait_event_type|          wait_event|            duration
+-----+------------------------------+--------------------+--------------------
|    1|                      Activity| LogicalLauncherMain|            11:21:01
|    2|                      Activity|    CheckpointerMain|            11:20:35
|    3|                      Activity|      AutoVacuumMain|            11:20:31
|    4|                      Activity|       WalWriterMain|            11:19:35
|    5|                      Activity|        BgWriterMain|            10:14:19
|    6|                      Activity|   BgWriterHibernate|            01:06:04
|    7|                      Activity|       WalSenderMain|            00:04:05
|    8|                        Client|         ClientWrite|            00:04:00
|    9|                            IO|        BufFileWrite|            00:02:45
|   10|                        LWLock|      buffer_mapping|            00:02:14
+-----+------------------------------+--------------------+--------------------
| TOP 10 WAITINGS BY TOTAL WAIT TIME FOR CLIENTS PROCESSES
+-----+------------------------------+--------------------+--------------------+----------
|    #|               wait_event_type|          wait_event|            duration|  % dbtime
+-----+------------------------------+--------------------+--------------------+----------
|    1|                          Lock|       transactionid|            11:55:37|      18.1
|    2|                            IO|        DataFileRead|            07:19:43|     11.12
|    3|                        Client|          ClientRead|            00:46:54|      1.19
|    4|                          Lock|            relation|            00:40:37|      1.03
|    5|                        LWLock|      buffer_mapping|            00:31:08|      0.79
|    6|                        LWLock|           buffer_io|            00:22:12|      0.56
|    7|                       Timeout|             PgSleep|            00:10:58|      0.28
|    8|                          Lock|               tuple|            00:01:30|      0.04
|    9|                            IO|        BufFileWrite|            00:01:16|      0.03
|   10|                            IO|         BufFileRead|            00:00:37|      0.02
+-----+------------------------------+--------------------+--------------------+----------
| WAITINGS TYPES BY TOTAL WAIT TIME, FOR SYSTEM PROCESSES
+-----+------------------------------+--------------------
|    #|               wait_event_type|            duration
+-----+------------------------------+--------------------
|    1|                      Activity|            56:46:10
|    2|                            IO|            00:05:13
|    3|                        Client|            00:04:00
|    4|                        LWLock|            00:03:07
+-----+------------------------------+--------------------
| WAITINGS TYPES BY TOTAL WAIT TIME, FOR CLIENTS PROCESSES
+-----+------------------------------+--------------------+--------------------
|    #|               wait_event_type|            duration|            % dbtime
+-----+------------------------------+--------------------+--------------------
|    1|                          Lock|            12:37:44|               19.17
|    2|                            IO|            07:21:40|               11.17
|    3|                        LWLock|            00:53:26|                1.35
|    4|                        Client|            00:46:54|                1.19
|    5|                       Timeout|            00:10:58|                0.28
|    6|                           IPC|            00:00:04|                   0
+-----+------------------------------+--------------------+--------------------
| WAITINGS FOR SYSTEM PROCESSES
+-----+-----------------------------+----------+--------------------+----------------------+--------------------
|    #|                 backend_type|    dbname|     wait_event_type|            wait_event|            duration
+-----+-----------------------------+----------+--------------------+----------------------+--------------------
|    1| logical replication launcher|          |            Activity|   LogicalLauncherMain|            11:21:01
|    2|                 checkpointer|          |            Activity|      CheckpointerMain|            11:20:35
|    3|          autovacuum launcher|          |            Activity|        AutoVacuumMain|            11:20:31
|    4|                    walwriter|          |            Activity|         WalWriterMain|            11:19:35
|    5|            background writer|          |            Activity|          BgWriterMain|            10:14:19
|    6|            background writer|          |            Activity|     BgWriterHibernate|            01:06:04
|    7|                    walsender|          |            Activity|         WalSenderMain|            00:04:05
|    8|                    walsender|          |              Client|           ClientWrite|            00:04:00
|    9|              parallel worker|      tdb1|                  IO|          BufFileWrite|            00:02:45
|   10|              parallel worker|      tdb1|              LWLock|        buffer_mapping|            00:02:05
|   11|              parallel worker|      tdb1|                  IO|          DataFileRead|            00:01:10
|   12|              parallel worker|      tdb1|                  IO|           BufFileRead|            00:01:05
|   13|              parallel worker|      tdb1|              LWLock|             buffer_io|            00:00:45
|   14|            autovacuum worker|      tdb1|              LWLock|        buffer_mapping|            00:00:09
|   15|                    walwriter|          |                  IO|              WALWrite|            00:00:08
|   16|                    walwriter|          |              LWLock|          WALWriteLock|            00:00:04
|   17|            background writer|          |              LWLock|          WALWriteLock|            00:00:03
|   18|            background writer|          |                  IO|              WALWrite|            00:00:02
|   19|            background writer|          |                  IO|         DataFileWrite|            00:00:02
|   20|                 checkpointer|          |                  IO| ControlFileSyncUpdate|            00:00:01
|   21|            autovacuum worker|      tdb1|              LWLock|             buffer_io|            00:00:01
+-----+-----------------------------+----------+--------------------+----------------------+--------------------
| WAITINGS FOR SQL
+-----+-------------------------+----------+--------------------+--------------------+--------------------+----------
|    #|                  queryid|    dbname|     wait_event_type|          wait_event|            duration|  % dbtime
+-----+-------------------------+----------+--------------------+--------------------+--------------------+----------
|    1|       389015618226997618|      tdb1|                Lock|       transactionid|            09:47:43|     14.87
|    2|       389015618226997618|      tdb1|                  IO|        DataFileRead|            05:47:07|      8.78
|    3|                         |      tdb1|                Lock|       transactionid|            02:07:54|      3.24
|    4|                         |      tdb1|                  IO|        DataFileRead|            01:30:24|      2.29
|    5|      8085340880788646241|      tdb1|              Client|          ClientRead|            00:40:20|      1.02
|    6|       389015618226997618|      tdb1|              LWLock|      buffer_mapping|            00:20:41|      0.52
|    7|       389015618226997618|      tdb1|              LWLock|           buffer_io|            00:17:30|      0.44
|    8|      2649515222348904837|      tdb1|             Timeout|             PgSleep|            00:10:58|      0.28
|    9|      4710212362688288619|      tdb1|                Lock|            relation|            00:10:44|      0.27
|   10|      9150846928388977274|      tdb1|                Lock|            relation|            00:10:24|      0.26
|   11|        28942442626229688|      tdb1|                Lock|            relation|            00:07:48|       0.2
|   12|      1237430309438971376|      tdb1|                Lock|            relation|            00:07:32|      0.19
|   13|                         |      tdb1|              LWLock|      buffer_mapping|            00:04:32|      0.11
|   14|                         |      tdb1|              LWLock|           buffer_io|            00:04:13|      0.11
|   15|                         |      tdb1|              Client|          ClientRead|            00:03:57|       0.1
|   16|      4710212362688288619|      tdb1|              LWLock|      buffer_mapping|            00:02:26|      0.06
|   17|      3167065002719415275|      tdb1|                Lock|            relation|            00:02:20|      0.06
|   18|      5731212217001535134|      tdb1|              Client|          ClientRead|            00:01:53|      0.05
|   19|      1237430309438971376|      tdb1|              LWLock|      buffer_mapping|            00:01:42|      0.04
|   20|       389015618226997618|      tdb1|                Lock|               tuple|            00:01:30|      0.04
|   21|      8304755792398128062|      tdb1|                Lock|            relation|            00:01:29|      0.04
|   22|        28942442626229688|      tdb1|                  IO|        BufFileWrite|            00:01:16|      0.03
|   23|      9150846928388977274|      tdb1|                  IO|        DataFileRead|            00:01:07|      0.03
|   24|        28942442626229688|      tdb1|              LWLock|      buffer_mapping|            00:01:03|      0.03
|   25|      9150846928388977274|      tdb1|              LWLock|      buffer_mapping|            00:00:44|      0.02
|   26|        28942442626229688|      tdb1|                  IO|         BufFileRead|            00:00:37|      0.02
|   27|        28942442626229688|      tdb1|              LWLock|           buffer_io|            00:00:25|      0.01
|   28|      1237430309438971376|      tdb1|                  IO|        DataFileRead|            00:00:24|      0.01
|   29|        28942442626229688|      tdb1|                  IO|        DataFileRead|            00:00:22|      0.01
|   30|      5731212217001535134|      tdb1|                Lock|            relation|            00:00:20|      0.01
|   31|      4710212362688288619|      tdb1|                  IO|        DataFileRead|            00:00:19|      0.01
|   32|      9150846928388977274|      tdb1|              Client|          ClientRead|            00:00:09|         0
|   33|      3422818749220588372|      tdb1|              Client|          ClientRead|            00:00:08|         0
|   34|      1237430309438971376|      tdb1|              Client|          ClientRead|            00:00:06|         0
|   35|       389015618226997618|      tdb1|              LWLock|      buffer_content|            00:00:05|         0
|   36|      4710212362688288619|      tdb1|              Client|          ClientRead|            00:00:05|         0
|   37|      4710212362688288619|      tdb1|              LWLock|           buffer_io|            00:00:04|         0
|   38|        28942442626229688|      tdb1|              Client|          ClientRead|            00:00:04|         0
|   39|        28942442626229688|      tdb1|                 IPC|      ParallelFinish|            00:00:03|         0
|   40|       389015618226997618|      tdb1|                  IO|       DataFileWrite|            00:00:02|         0
|   41|     -5730801771815999400|      tdb1|              Client|          ClientRead|            00:00:02|         0
|   42|      2404820632950544954|      tdb1|              Client|          ClientRead|            00:00:02|         0
|   43|     -6572922443698419129|      tdb1|              Client|          ClientRead|            00:00:02|         0
|   44|      8304755792398128062|      tdb1|              Client|          ClientRead|            00:00:02|         0
|   45|     -1473395109729441239|      tdb1|              Client|          ClientRead|            00:00:02|         0
|   46|                         |      tdb1|              LWLock|      buffer_content|            00:00:01|         0
|   47|       180077086776069052|      tdb1|              Client|          ClientRead|            00:00:01|         0
|   48|                         |      tdb1|                  IO|       DataFileWrite|            00:00:01|         0
|   49|        28942442626229688|      tdb1|                 IPC| MessageQueueReceive|            00:00:01|         0
|   50|      2369289265278398647|      tdb1|              Client|          ClientRead|            00:00:01|         0
|   51|      9150846928388977274|      tdb1|                  IO|       DataFileWrite|            00:00:01|         0
+-----+-------------------------+----------+--------------------+--------------------+--------------------+----------
|
| CLIENT SQL STATICTICS
|
+------------------------------------------------------------------------------------
| CLIENT SQL ordered by Elapsed Time
+--------------------+----------+----------+----------+----------+----------+--------------------
|        elapsed time|     calls|  % dbtime|     % CPU|      % IO|    dbname|             queryid
+--------------------+----------+----------+----------+----------+----------+--------------------
|            06:43:19|        36|      10.2|     85.09|     17.38|      tdb1|  389015618226997618
|            02:06:53|       715|      3.21|      0.85|      0.06|      tdb1| 1237430309438971376
|            01:52:07|       720|      2.84|      1.19|      0.08|      tdb1| 4710212362688288619
|            00:39:03|       357|      0.99|      1.02|      0.33|      tdb1|   28942442626229688
|            00:22:00|         8|      0.56|      0.96|         0|      tdb1| 2649515222348904837
+--------------------+----------+----------+----------+----------+----------+--------------------
| CLIENT SQL ordered by CPU Time
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
|            cpu time|     calls|  % dbtime|total_time|     % CPU|      % IO|    dbname|             queryid
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
|            16:14:38|        36|      10.2|  06:43:19|     85.09|     17.38|      tdb1|  389015618226997618
|            00:13:38|       720|      2.84|  01:52:07|      1.19|      0.08|      tdb1| 4710212362688288619
|            00:11:39|       357|      0.99|  00:39:03|      1.02|      0.33|      tdb1|   28942442626229688
|            00:10:58|         8|      0.56|  00:22:00|      0.96|         0|      tdb1| 2649515222348904837
|            00:09:44|       715|      3.21|  02:06:53|      0.85|      0.06|      tdb1| 1237430309438971376
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
| CLIENT SQL ordered by User I/O Wait Time
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
|        io_wait time|     calls|  % dbtime|total_time|     % CPU|      % IO|    dbname|             queryid
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
|            05:47:09|        36|      10.2|  06:43:19|     85.09|     17.38|      tdb1|  389015618226997618
|            00:02:15|       357|      0.99|  00:39:03|      1.02|      0.33|      tdb1|   28942442626229688
|            00:00:24|       715|      3.21|  02:06:53|      0.85|      0.06|      tdb1| 1237430309438971376
|            00:00:19|       720|      2.84|  01:52:07|      1.19|      0.08|      tdb1| 4710212362688288619
|            00:00:00|         8|      0.56|  00:22:00|      0.96|         0|      tdb1| 2649515222348904837
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
| CLIENT SQL ordered by Shared Buffers Reads
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
|       buffers reads|     calls|  % dbtime|total_time|     % CPU|      % IO|    dbname|             queryid
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
|          2562353244|        36|      10.2|  06:43:19|     85.09|     17.38|      tdb1|  389015618226997618
|            11041689|       357|      0.99|  00:39:03|      1.02|      0.33|      tdb1|   28942442626229688
|             3303551|       715|      3.21|  02:06:53|      0.85|      0.06|      tdb1| 1237430309438971376
|             3242892|       720|      2.84|  01:52:07|      1.19|      0.08|      tdb1| 4710212362688288619
|                   0|         8|      0.56|  00:22:00|      0.96|         0|      tdb1| 2649515222348904837
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
| CLIENT SQL ordered by Disk Reads Time
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
|           read time|     calls|  % dbtime|total_time|     % CPU|      % IO|    dbname|             queryid
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
|            03:37:46|        36|      10.2|  06:43:19|     85.09|     17.38|      tdb1|  389015618226997618
|            00:04:07|       357|      0.99|  00:39:03|      1.02|      0.33|      tdb1|   28942442626229688
|            00:00:59|       720|      2.84|  01:52:07|      1.19|      0.08|      tdb1| 4710212362688288619
|            00:00:42|       715|      3.21|  02:06:53|      0.85|      0.06|      tdb1| 1237430309438971376
|            00:00:00|         8|      0.56|  00:22:00|      0.96|         0|      tdb1| 2649515222348904837
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
| CLIENT SQL ordered by Executions
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
|               calls|      rows|  % dbtime|total_time|     % CPU|      % IO|    dbname|             queryid
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
|                 720|       720|      2.84|  01:52:07|      1.19|      0.08|      tdb1| 4710212362688288619
|                 715|       715|      3.21|  02:06:53|      0.85|      0.06|      tdb1| 1237430309438971376
|                 357|         0|      0.99|  00:39:03|      1.02|      0.33|      tdb1|   28942442626229688
|                  36|        36|      10.2|  06:43:19|     85.09|     17.38|      tdb1|  389015618226997618
|                   8|         8|      0.56|  00:22:00|      0.96|         0|      tdb1| 2649515222348904837
+--------------------+----------+----------+----------+----------+----------+----------+--------------------
| Complete List of SQL Text
----------------------------------------------
…

Продолжение – следует. На очереди формирование истории блокировок (pg_stat_locks), более подробное описание процесса заполнения таблиц.

Автор: rinace

Источник


* - обязательные к заполнению поля


https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js