Skip to content

Commit 1b558ed

Browse files
MMeentHeikki Linnakangas
authored andcommitted
Emit nbtree vacuum cycle id in nbtree xlog through FPIs
NBTree needs a vacuum cycle ID on pages of whom the split resulted in a new right page that is located before the original page, or who were split from such split pages in the current vacuum cycle. By WAL-logging the cycle_id and restoring it in recovery, we assure vacuum doesn't fail to clean up the earlier pages. During recovery, we extract the cycle ID from the original page if this page had an FPI, either directly (when the page was restored) or indirectly (from the record data). This fixes neondatabase/neon#9929 (Cherry-picked to v18 from v17 by Heikki. Not sure why this was missing. Fixes test_nbtree_pagesplit_cycleid)
1 parent 1c44e35 commit 1b558ed

File tree

2 files changed

+94
-1
lines changed

2 files changed

+94
-1
lines changed

src/backend/access/nbtree/nbtinsert.c

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1494,6 +1494,7 @@ _bt_split(Relation rel, Relation heaprel, BTScanInsert itup_key, Buffer buf,
14941494
bool newitemonleft,
14951495
isleaf,
14961496
isrightmost;
1497+
uint16 origcycleid;
14971498

14981499
/*
14991500
* origpage is the original page to be split. leftpage is a temporary
@@ -1514,6 +1515,8 @@ _bt_split(Relation rel, Relation heaprel, BTScanInsert itup_key, Buffer buf,
15141515
isrightmost = P_RIGHTMOST(oopaque);
15151516
maxoff = PageGetMaxOffsetNumber(origpage);
15161517
origpagenumber = BufferGetBlockNumber(buf);
1518+
/* NEON: store the page's former cycle ID for FPI check later */
1519+
origcycleid = oopaque->btpo_cycleid;
15171520

15181521
/*
15191522
* Choose a point to split origpage at.
@@ -1969,6 +1972,7 @@ _bt_split(Relation rel, Relation heaprel, BTScanInsert itup_key, Buffer buf,
19691972
xl_btree_split xlrec;
19701973
uint8 xlinfo;
19711974
XLogRecPtr recptr;
1975+
uint8 bufflags = REGBUF_STANDARD;
19721976

19731977
xlrec.level = ropaque->btpo_level;
19741978
/* See comments below on newitem, orignewitem, and posting lists */
@@ -1981,7 +1985,27 @@ _bt_split(Relation rel, Relation heaprel, BTScanInsert itup_key, Buffer buf,
19811985
XLogBeginInsert();
19821986
XLogRegisterData(&xlrec, SizeOfBtreeSplit);
19831987

1984-
XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
1988+
/*
1989+
* NEON: If we split to earlier pages during a btree vacuum cycle,
1990+
* then we have to include the cycle ID in the WAL record. The
1991+
* easiest method to do that is to force an image, which happens to
1992+
* be relatively cheap, as the data already contained in the record is
1993+
* enough to populate the new right page.
1994+
*
1995+
* We MUST log an FPI when the page split during a vacuum cycle, and:
1996+
* - The right page's blckno < the left page's blckno, or
1997+
* - The right page might be 'C' in a page spit chain B > C > A after
1998+
* B split B > A => B > C > A; or B > C > D > A, etc. (as indicated
1999+
* by the presense of a cycle ID).
2000+
*/
2001+
if (oopaque->btpo_cycleid != 0 &&
2002+
(origpagenumber > rightpagenumber || oopaque->btpo_cycleid == origcycleid))
2003+
{
2004+
/* cycle ID is required */
2005+
bufflags |= REGBUF_FORCE_IMAGE;
2006+
}
2007+
2008+
XLogRegisterBuffer(0, buf, bufflags);
19852009
XLogRegisterBuffer(1, rbuf, REGBUF_WILL_INIT);
19862010
/* Log original right sibling, since we've changed its prev-pointer */
19872011
if (!isrightmost)

src/backend/access/nbtree/nbtxlog.c

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,75 @@ btree_xlog_split(bool newitemonleft, XLogReaderState *record)
432432
MarkBufferDirty(buf);
433433
}
434434

435+
/*
436+
* NEON: If the original page was supposed to be recovered from FPI,
437+
* then we need to correct the cycle ID (see _bt_split for reasons)
438+
*
439+
* Note that we can't just use the buffer in WALRedo on Pageserver,
440+
* as that may be InvalidBuffer when the original (left) page of the
441+
* split wasn't requested.
442+
*/
443+
if (XLogRecGetBlock(record, 0)->has_image)
444+
{
445+
/*
446+
* btree split FPIs may contain important cycle IDs on the original
447+
* page's FPI; make sure we correctly transfer this over
448+
*/
449+
450+
/*
451+
* Because we don't want to decompress the page if it's not needed, or
452+
* reconstruct a whole 8kB page when we're only interested in the 2
453+
* bytes of the bkpimg, we recognise there are 3 different ways we can
454+
* get the data, in order of efficiency (from most efficient to least
455+
* efficient):
456+
* - There is an original (left) page in the buffer
457+
* - There is original buffer, the logged FPI was not compressed
458+
* - There is original buffer, the logged FPI was compressed
459+
*/
460+
if (BufferIsValid(buf))
461+
{
462+
/*
463+
* Neat, we can just use the buffer to copy the cycle ID
464+
*/
465+
BTPageOpaque oopaque = BTPageGetOpaque(BufferGetPage(buf));
466+
ropaque->btpo_cycleid = oopaque->btpo_cycleid;
467+
}
468+
else if (!BKPIMAGE_COMPRESSED(XLogRecGetBlock(record, 0)->bimg_info))
469+
{
470+
/*
471+
* Good, we don't have to decompress the data, so we can use
472+
* calculated offsets into bkpb->bkp_image
473+
*/
474+
475+
/*
476+
* offset of the start of cycleid relative to the end of the page,
477+
* which is also relative to the end of the FPI
478+
*/
479+
const int cycleid_off = MAXALIGN(sizeof(BTPageOpaqueData))
480+
- offsetof(BTPageOpaqueData, btpo_cycleid);
481+
char *cycleid_ptr; /* may not be aligned */
482+
DecodedBkpBlock *bkpb = XLogRecGetBlock(record, 0);
483+
484+
cycleid_ptr = &bkpb->bkp_image[bkpb->bimg_len - cycleid_off];
485+
486+
memcpy(&ropaque->btpo_cycleid, cycleid_ptr, sizeof(BTCycleId));
487+
}
488+
else
489+
{
490+
/*
491+
* Bummer, we have to decompress the data.
492+
*/
493+
PGAlignedBlock tmp;
494+
BTPageOpaque oopaque;
495+
496+
/* Expensive decompression of data */
497+
RestoreBlockImage(record, 0, tmp.data);
498+
499+
oopaque = BTPageGetOpaque(tmp.data);
500+
ropaque->btpo_cycleid = oopaque->btpo_cycleid;
501+
}
502+
}
503+
435504
/* Fix left-link of the page to the right of the new right sibling */
436505
if (spagenumber != P_NONE)
437506
{

0 commit comments

Comments
 (0)