PostgreSQL 源码解读（214）- 后台进程#13（checkpointer-IsCheckpointOnSchedule）

发布时间：2020-08-08 01:57:40 作者：husthxd
来源：ITPUB博客阅读：660

本节介绍了checkpoint中用于控制checkpoint刷盘频率的函数:IsCheckpointOnSchedule.

一、数据结构

宏定义
checkpoints request flag bits
checkpoints request flag bits,检查点请求标记位定义.


/*
 * OR-able request flag bits for checkpoints.  The "cause" bits are used only
 * for logging purposes.  Note: the flags must be defined so that it's
 * sensible to OR together request flags arising from different requestors.
 */
/* These directly affect the behavior of CreateCheckPoint and subsidiaries */
#define CHECKPOINT_IS_SHUTDOWN  0x0001  /* Checkpoint is for shutdown */
#define CHECKPOINT_END_OF_RECOVERY  0x0002  /* Like shutdown checkpoint, but
                       * issued at end of WAL recovery */
#define CHECKPOINT_IMMEDIATE  0x0004  /* Do it without delays */
#define CHECKPOINT_FORCE    0x0008  /* Force even if no activity */
#define CHECKPOINT_FLUSH_ALL  0x0010  /* Flush all pages, including those
                     * belonging to unlogged tables */
/* These are important to RequestCheckpoint */
#define CHECKPOINT_WAIT     0x0020  /* Wait for completion */
#define CHECKPOINT_REQUESTED  0x0040  /* Checkpoint request has been made */
/* These indicate the cause of a checkpoint request */
#define CHECKPOINT_CAUSE_XLOG 0x0080  /* XLOG consumption */
#define CHECKPOINT_CAUSE_TIME 0x0100  /* Elapsed time */

WRITES_PER_ABSORB


/* interval for calling AbsorbSyncRequests in CheckpointWriteDelay */
//调用AbsorbSyncRequests的间隔,默认值为1000
#define WRITES_PER_ABSORB   1000

二、源码解读

IsCheckpointOnSchedule
该函数判断是否在完成checkpoint的调度中,如返回T则可以休息,否则返回F则需要干活.


/*
 * Calculate CheckPointSegments based on max_wal_size_mb and
 * checkpoint_completion_target.
 * 计算CheckPointSegments
 */
static void
CalculateCheckpointSegments(void)
{
  double    target;
  /*-------
   * Calculate the distance at which to trigger a checkpoint, to avoid
   * exceeding max_wal_size_mb. This is based on two assumptions:
   *
   * a) we keep WAL for only one checkpoint cycle (prior to PG11 we kept
   *    WAL for two checkpoint cycles to allow us to recover from the
   *    secondary checkpoint if the first checkpoint failed, though we
   *    only did this on the master anyway, not on standby. Keeping just
   *    one checkpoint simplifies processing and reduces disk space in
   *    many smaller databases.)
   * b) during checkpoint, we consume checkpoint_completion_target *
   *    number of segments consumed between checkpoints.
   *-------
   */
  //#define ConvertToXSegs(x,segsize) (x / ((segsize) / (1024 * 1024)))
  target = (double) ConvertToXSegs(max_wal_size_mb, wal_segment_size) /
    (1.0 + CheckPointCompletionTarget);
  /* round down */
  CheckPointSegments = (int) target;
  if (CheckPointSegments < 1)
    CheckPointSegments = 1;
}
/*
 * IsCheckpointOnSchedule -- are we on schedule to finish this checkpoint
 *     (or restartpoint) in time?
 * IsCheckpointOnSchedule -- 是否在完成checkpoint的调度中
 *
 * Compares the current progress against the time/segments elapsed since last
 * checkpoint, and returns true if the progress we've made this far is greater
 * than the elapsed time/segments.
 * 当前的进度与消逝的time/xlog segments进行比较,如果进度要早,那么返回T(进入休息状态)
 */
static bool
IsCheckpointOnSchedule(double progress)
{
  XLogRecPtr  recptr;
  struct timeval now;
  double    elapsed_xlogs,
        elapsed_time;
  Assert(ckpt_active);
  /* Scale progress according to checkpoint_completion_target. */
  //实际进度调整为progress*checkpoint_completion_target
  progress *= CheckPointCompletionTarget;
  /*
   * Check against the cached value first. Only do the more expensive
   * calculations once we reach the target previously calculated. Since
   * neither time or WAL insert pointer moves backwards, a freshly
   * calculated value can only be greater than or equal to the cached value.
   * 如果进度小于缓存值,返回F,需加快进度了!
   */
  if (progress < ckpt_cached_elapsed)
    return false;
  /*
   * Check progress against WAL segments written and CheckPointSegments.
   * 进度 vs WAL
   *
   * We compare the current WAL insert location against the location
   * computed before calling CreateCheckPoint. The code in XLogInsert that
   * actually triggers a checkpoint when CheckPointSegments is exceeded
   * compares against RedoRecptr, so this is not completely accurate.
   * However, it's good enough for our purposes, we're only calculating an
   * estimate anyway.
   *
   * During recovery, we compare last replayed WAL record's location with
   * the location computed before calling CreateRestartPoint. That maintains
   * the same pacing as we have during checkpoints in normal operation, but
   * we might exceed max_wal_size by a fair amount. That's because there can
   * be a large gap between a checkpoint's redo-pointer and the checkpoint
   * record itself, and we only start the restartpoint after we've seen the
   * checkpoint record. (The gap is typically up to CheckPointSegments *
   * checkpoint_completion_target where checkpoint_completion_target is the
   * value that was in effect when the WAL was generated).
   */
  if (RecoveryInProgress())
    recptr = GetXLogReplayRecPtr(NULL);
  else
    recptr = GetInsertRecPtr();
  elapsed_xlogs = (((double) (recptr - ckpt_start_recptr)) /
           wal_segment_size) / CheckPointSegments;
  if (progress < elapsed_xlogs)
  {
    //进度小于产生xlogs的速度,需干活
    ckpt_cached_elapsed = elapsed_xlogs;
    return false;
  }
  /*
   * Check progress against time elapsed and checkpoint_timeout.
   * 比较时间
   */
  gettimeofday(&now, NULL);
  elapsed_time = ((double) ((pg_time_t) now.tv_sec - ckpt_start_time) +
          now.tv_usec / 1000000.0) / CheckPointTimeout;
  if (progress < elapsed_time)
  {
    //进度慢于消逝的时间,需干活
    ckpt_cached_elapsed = elapsed_time;
    return false;
  }
  /* It looks like we're on schedule. */
  //处于调度中,可以休息
  return true;
}

三、跟踪分析

N/A

四、参考资料

PG Source Code
PgSQL · 特性分析 · 谈谈checkpoint的调度

PostgreSQL 源码解读（214）- 后台进程#13（checkpointer-IsCheckpointOnSchedule）

一、数据结构

二、源码解读

三、跟踪分析

四、参考资料

相关阅读