ablog

不器用で落着きのない技術者のメモ

MySQL InnoDB で大きなトランザクションを見つける

SHOW ENGINE INNODB STATUS の

で、History list length が大きいとMySQLインスタンス全体での UNDO のサイズが大きくなりロールバックにかかる時間が長くなっており、undo log entries が大きいトランザクションが UNDO ページを大量に使っている犯人だと類推できる。undo log entries は insert/update/delete などを行った行数(未コミット)でサイズではない。
MariaDB 5.5.27以降10未満(XtraDB)では Information Schema の INNODB_UNDO_LOGS. SIZE で undo log が使っているサイズを確認できる。

  • テーブルを作成する
> create table undo_test(
	col1 char(255),
	col2 char(255),
	col3 char(255),
	col4 char(255),
	col5 char(255)
);
  • 1行 insert する
> start transaction;
> insert into undo_test(col1) values('1');
> show engine innodb stauts\G
(中略)
------------
TRANSACTIONS
------------
TRANSACTIONS
------------
Trx id counter 111207046
Purge done for trx's n:o < 111206999 undo n:o < 0 state: running but idle
History list length 719 ★ history list length がインスタンス全体のUNDOページ数
LIST OF TRANSACTIONS FOR EACH SESSION:
(中略)
---TRANSACTION 111207036, ACTIVE 2 sec ★2秒経過
1 lock struct(s), heap size 376, 0 row lock(s), undo log entries 1 ★1行insertしたので、undo log が 1になっている
MySQL thread id 8697(★スレッドID), OS thread handle 0x2b9eb4b47700, query id 1400669 27.0.3.145(★接続元IP) awsuser(★ユーザー名) delayed send ok done
  • 5行 insert する
> start transaction;
> insert into undo_test(col1) values('2');
> insert into undo_test(col1) values('3');
> insert into undo_test(col1) values('4');
> insert into undo_test(col1) values('5');
  • 行数を確認する
> select count(1) from undo_test;
+----------+
| count(1) |
+----------+
|        5 |
+----------+
1 row in set (0.01 sec)
> show engine innodb stauts\G
(中略)
------------
TRANSACTIONS
------------
Trx id counter 111207337
Purge done for trx's n:o < 111206999 undo n:o < 0 state: running but idle
History list length 808
LIST OF TRANSACTIONS FOR EACH SESSION:
(中略)
---TRANSACTION 111207036, ACTIVE 58 sec ★58秒経過
1 lock struct(s), heap size 376, 0 row lock(s), undo log entries 5 ★5行insertしたので、undo log が 5 になっている
MySQL thread id 8697, OS thread handle 0x2b9ef9e43700, query id 1400956 27.0.3.145 awsuser cleaned up
Trx read view will not see trx with id >= 111207290, sees < 111207036
  • セッションを kill する
> call mysql.rds_kill(8697); ★show engine innodb stauts の TRANSACTIONS に表示されていた MySQL thread id 
> show engine innodb status\G
(中略)
------------
TRANSACTIONS
------------
Trx id counter 111208582
Purge done for trx's n:o < 111206999 undo n:o < 0 state: running but idle
History list length 726
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 111208485, not started
MySQL thread id 8719, OS thread handle 0x2b9eb4a43700, query id 1402459 localhost rdsadmin cleaned up
---TRANSACTION 111205044, not started
MySQL thread id 8664, OS thread handle 0x2b9eb4bc9700, query id 1402491 27.0.3.145 awsuser System lock
show engine innodb status
---TRANSACTION 111208058, not started
MySQL thread id 11, OS thread handle 0x2b9eb37ed700, query id 1401866 localhost rdsadmin delayed send ok done
---TRANSACTION 111208576, not started
MySQL thread id 2, OS thread handle 0x2b9eb4a43700, query id 1402490 localhost rdsadmin delayed send ok done
★トランザクションがなくなっている

参考

Global history and purge operations
In addition to every record having a reference to its previous version, there is also a global view of the history of the entire database, called the “history list”. As each transaction is committed, its history is linked into this global history list in transaction serialization (commit) order. The history list is used primarily for cleaning up after a transaction, once no existing read view still needs its history (all other transactions have completed).

In the background, InnoDB runs a continuous “purge” process which is responsible for two things:

  • Actually deleting delete-marked records2, if the current version of the record in the index at the time of purge is still delete-marked and bears the same transaction ID. (That is, the record hasn’t been re-inserted.)
  • Freeing undo log pages and unlinking them from the global history list to make them available for re-use.

InnoDB exposes the total amount of history present in the system as a “History list length”, which can be seen in SHOW ENGINE INNODB STATUS. This is the count of all database modifications present in the undo logs, in units of undo logs (which may contain a single record modification or many).

The basics of the InnoDB undo logging and history system – Jeremy Cole

XtraDB は Information Schema の INNODB_UNDO_LOGS. SIZE(Size in pages of the segment) で undo log が使っているサイズが分かる。MariaDB 5.5.27 で入って、10では削除されてる。

Information Schema INNODB_UNDO_LOGS Table
The Information Schema INNODB_UNDO_LOGS table is a Percona enchancement, and is only available for XtraDB, not InnoDB (see XtraDB and InnoDB). It contains information about the the InnoDB undo log, with each record being an undo log segment. It was removed in MariaDB 10.0.

It has the following columns:

  • ColumnDescription
  • TRX_ID: Unique transaction ID number, matching the value from the information_schema.INNODB_TRX table.
  • RSEG_ID: Rollback segment ID, matching the value from the information_schema.INNODB_RSEG table.
  • USEG_ID: Undo segment ID.
  • SEGMENT_TYPE: Indicates the operation type, for example INSERT or UPDATE.
  • STATE: Segment state: one of ACTIVE (contains active transaction undo log), CACHED, TO_FREE (insert undo segment can be freed), TO_PURGE (update undo segment won't be re-used and can be purged when all undo data is removed) or PREPARED (segment of a prepared transaction).
  • SIZE: Size in pages of the segment.
Information Schema INNODB_UNDO_LOGS Table - MariaDB Knowledge Base

ソースコード

  • mysql-5.6.10/storage/innobase/lock/lock0lock.cc
/*********************************************************************//**
Prints info of locks for all transactions.
@return FALSE if not able to obtain lock mutex
and exits without printing info */
UNIV_INTERN
ibool
lock_print_info_summary(
/*====================*/
        FILE*   file,   /*!< in: file where to print */
        ibool   nowait) /*!< in: whether to wait for the lock mutex */
{
        /* if nowait is FALSE, wait on the lock mutex,
        otherwise return immediately if fail to obtain the
        mutex. */
        if (!nowait) {
                lock_mutex_enter();
        } else if (lock_mutex_enter_nowait()) {
                fputs("FAIL TO OBTAIN LOCK MUTEX, "
                      "SKIP LOCK INFO PRINTING\n", file);
                return(FALSE);
        }

        if (lock_deadlock_found) {
                fputs("------------------------\n"
                      "LATEST DETECTED DEADLOCK\n"
                      "------------------------\n", file);

                if (!srv_read_only_mode) {
                        ut_copy_file(file, lock_latest_err_file);
                }
        }

        fputs("------------\n"
              "TRANSACTIONS\n"
              "------------\n", file);

        fprintf(file, "Trx id counter " TRX_ID_FMT "\n",
                trx_sys_get_max_trx_id());

        fprintf(file,
                "Purge done for trx's n:o < " TRX_ID_FMT
                " undo n:o < " TRX_ID_FMT " state: ",
                purge_sys->iter.trx_no,
                purge_sys->iter.undo_no);

        /* Note: We are reading the state without the latch. One because it
        will violate the latching order and two because we are merely querying
        the state of the variable for display. */

        switch (purge_sys->state){
        case PURGE_STATE_EXIT:
        case PURGE_STATE_INIT:
                /* Should never be in this state while the system is running. */
                ut_error;

        case PURGE_STATE_DISABLED:
                fprintf(file, "disabled");
                break;

        case PURGE_STATE_RUN:
                fprintf(file, "running");
                /* Check if it is waiting for more data to arrive. */
                if (!purge_sys->running) {
                        fprintf(file, " but idle");
                }
                break;

        case PURGE_STATE_STOP:
                fprintf(file, "stopped");
                break;
        }

        fprintf(file, "\n");

        fprintf(file,
                "History list length %lu\n",
                (ulong) trx_sys->rseg_history_len);

#ifdef PRINT_NUM_OF_LOCK_STRUCTS
        fprintf(file,
                "Total number of lock structs in row lock hash table %lu\n",
                (ulong) lock_get_n_rec_locks());
#endif /* PRINT_NUM_OF_LOCK_STRUCTS */
        return(TRUE);
}
  • mysql-5.6.10/storage/innobase/trx/trx0trx.cc
/**********************************************************************//**
Prints info about a transaction.
Caller must hold trx_sys->mutex. */
UNIV_INTERN
void
trx_print_low(
/*==========*/
        FILE*           f,
                        /*!< in: output stream */
        const trx_t*    trx,
                        /*!< in: transaction */
        ulint           max_query_len,
                        /*!< in: max query length to print,
                        or 0 to use the default max length */
        ulint           n_rec_locks,
                        /*!< in: lock_number_of_rows_locked(&trx->lock) */
        ulint           n_trx_locks,
                        /*!< in: length of trx->lock.trx_locks */
        ulint           heap_size)
                        /*!< in: mem_heap_get_size(trx->lock.lock_heap) */
{
        ibool           newline;
        const char*     op_info;

        ut_ad(mutex_own(&trx_sys->mutex));

        fprintf(f, "TRANSACTION " TRX_ID_FMT, trx->id);

        /* trx->state cannot change from or to NOT_STARTED while we
        are holding the trx_sys->mutex. It may change from ACTIVE to
        PREPARED or COMMITTED. */
        switch (trx->state) {
        case TRX_STATE_NOT_STARTED:
                fputs(", not started", f);
                goto state_ok;
        case TRX_STATE_ACTIVE:
                fprintf(f, ", ACTIVE %lu sec",
                        (ulong) difftime(time(NULL), trx->start_time));
                goto state_ok;
        case TRX_STATE_PREPARED:
                fprintf(f, ", ACTIVE (PREPARED) %lu sec",
                        (ulong) difftime(time(NULL), trx->start_time));
                goto state_ok;
        case TRX_STATE_COMMITTED_IN_MEMORY:
                fputs(", COMMITTED IN MEMORY", f);
                goto state_ok;
        }
        fprintf(f, ", state %lu", (ulong) trx->state);
        ut_ad(0);
state_ok:

        /* prevent a race condition */
        op_info = trx->op_info;

        if (*op_info) {
                putc(' ', f);
                fputs(op_info, f);
        }

        if (trx->is_recovered) {
                fputs(" recovered trx", f);
        }

        if (trx->declared_to_be_inside_innodb) {
                fprintf(f, ", thread declared inside InnoDB %lu",
                        (ulong) trx->n_tickets_to_enter_innodb);
        }

        putc('\n', f);

        if (trx->n_mysql_tables_in_use > 0 || trx->mysql_n_tables_locked > 0) {
                fprintf(f, "mysql tables in use %lu, locked %lu\n",
                        (ulong) trx->n_mysql_tables_in_use,
                        (ulong) trx->mysql_n_tables_locked);
        }

        newline = TRUE;

        /* trx->lock.que_state of an ACTIVE transaction may change
        while we are not holding trx->mutex. We perform a dirty read
        for performance reasons. */

        switch (trx->lock.que_state) {
        case TRX_QUE_RUNNING:
                newline = FALSE; break;
        case TRX_QUE_LOCK_WAIT:
                fputs("LOCK WAIT ", f); break;
        case TRX_QUE_ROLLING_BACK:
                fputs("ROLLING BACK ", f); break;
        case TRX_QUE_COMMITTING:
                fputs("COMMITTING ", f); break;
        default:
                fprintf(f, "que state %lu ", (ulong) trx->lock.que_state);
        }

        if (n_trx_locks > 0 || heap_size > 400) {
                newline = TRUE;

                fprintf(f, "%lu lock struct(s), heap size %lu,"
                        " %lu row lock(s)",
                        (ulong) n_trx_locks,
                        (ulong) heap_size,
                        (ulong) n_rec_locks);
        }

        if (trx->has_search_latch) {
                newline = TRUE;
                fputs(", holds adaptive hash latch", f);
        }

        if (trx->undo_no != 0) {
                newline = TRUE;
                fprintf(f, ", undo log entries "TRX_ID_FMT, trx->undo_no);
        }

        if (newline) {
                putc('\n', f);
        }

        if (trx->mysql_thd != NULL) {
                innobase_mysql_print_thd(f, trx->mysql_thd, max_query_len);
        }
}
/************************************************************************//**
Implements the SHOW ENGINE INNODB STATUS command. Sends the output of the
InnoDB Monitor to the client.
@return 0 on success */
static
int
innodb_show_status(
/*===============*/
        handlerton*     hton,   /*!< in: the innodb handlerton */
        THD*            thd,    /*!< in: the MySQL query thread of the caller */
        stat_print_fn*  stat_print)
{
        trx_t*                  trx;
        static const char       truncated_msg[] = "... truncated...\n";
        const long              MAX_STATUS_SIZE = 1048576;
        ulint                   trx_list_start = ULINT_UNDEFINED;
        ulint                   trx_list_end = ULINT_UNDEFINED;

        DBUG_ENTER("innodb_show_status");
        DBUG_ASSERT(hton == innodb_hton_ptr);

        /* We don't create the temp files or associated
        mutexes in read-only-mode */

        if (srv_read_only_mode) {
                DBUG_RETURN(0);
        }

        trx = check_trx_exists(thd);

        trx_search_latch_release_if_reserved(trx);

        innobase_srv_conc_force_exit_innodb(trx);

        /* We let the InnoDB Monitor to output at most MAX_STATUS_SIZE
        bytes of text. */

        char*   str;
        ssize_t flen, usable_len;

        mutex_enter(&srv_monitor_file_mutex);
        rewind(srv_monitor_file);

        srv_printf_innodb_monitor(srv_monitor_file, FALSE,
                                  &trx_list_start, &trx_list_end);

        os_file_set_eof(srv_monitor_file);

        if ((flen = ftell(srv_monitor_file)) < 0) {
                flen = 0;
        }

        if (flen > MAX_STATUS_SIZE) {
                usable_len = MAX_STATUS_SIZE;
                srv_truncated_status_writes++;
        } else {
                usable_len = flen;
        }

        /* allocate buffer for the string, and
        read the contents of the temporary file */

        if (!(str = (char*) my_malloc(usable_len + 1, MYF(0)))) {
                mutex_exit(&srv_monitor_file_mutex);
                DBUG_RETURN(1);
        }

        rewind(srv_monitor_file);

        if (flen < MAX_STATUS_SIZE) {
                /* Display the entire output. */
                flen = fread(str, 1, flen, srv_monitor_file);
        } else if (trx_list_end < (ulint) flen
                   && trx_list_start < trx_list_end
                   && trx_list_start + (flen - trx_list_end)
                   < MAX_STATUS_SIZE - sizeof truncated_msg - 1) {

                /* Omit the beginning of the list of active transactions. */
                ssize_t len = fread(str, 1, trx_list_start, srv_monitor_file);

                memcpy(str + len, truncated_msg, sizeof truncated_msg - 1);
                len += sizeof truncated_msg - 1;
                usable_len = (MAX_STATUS_SIZE - 1) - len;
                fseek(srv_monitor_file, flen - usable_len, SEEK_SET);
                len += fread(str + len, 1, usable_len, srv_monitor_file);
                flen = len;
        } else {
                /* Omit the end of the output. */
                flen = fread(str, 1, MAX_STATUS_SIZE - 1, srv_monitor_file);
        }

        mutex_exit(&srv_monitor_file_mutex);

        stat_print(thd, innobase_hton_name, (uint) strlen(innobase_hton_name),

        my_free(str);

        DBUG_RETURN(0);
}