-
Notifications
You must be signed in to change notification settings - Fork 63
Implement direct mapped cache for instruction fetch #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
98114a7 to
0e4f67b
Compare
|
Consider 2-Way Page Cache: Given that the current page-level cache already achieves 98.30% hit rate, consider simply upgrading it instead: // Current: single-entry page cache
mmu_fetch_cache_t cache_fetch;
// Proposed: 2-way set-associative page cache
mmu_fetch_cache_t cache_fetch[2]; // Only +16 bytes overhead
Use parity hash (like load/store caches):
uint32_t idx = __builtin_parity(vpn) & 0x1;
if (unlikely(vpn != vm->cache_fetch[idx].n_pages)) {
// ... fill
} |
bb9e6cb to
74e3b99
Compare
This comment was marked as outdated.
This comment was marked as outdated.
686cede to
5478710
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unify the naming scheme.
Extend the existing architecture to cache the last fetched PC instruction, improving instruction fetch hit rate by approximately 2%. Also includes clang-format fixes for several expressions.
- Rename I-cache structures from ic to icache to avoid ambiguity. - Add explanations for instruction-cache definitions and masks, and align the macro names with the terminology used in the comments. - Fix tag calculation to use the precomputed tag rather than shifting the physical address.
Replace the previous 1-entry direct-mapped design with a 2-entry direct-mapped cache using hash-based indexing (same parity hash as cache_load). This allows two hot virtual pages to coexist without thrashing. Measurement shows that the number of virtual-to-physical translations during instruction fetch (mmu_translate() calls) decreased by ~10%.
Introduce a small victim cache to reduce conflict misses in the direct-mapped instruction cache. On an I-cache miss, probe the victim cache; on hit, swap the victim block with the current I-cache block and return the data. Also rename ic.block → ic.i_block to distinguish between primary I-cache blocks and victim cache blocks.
Adjust instruction cache related defines and identifiers: - Rename IC/ic prefix to ICACHE/icache - Rename VC/vc prefix to VCACHE/vcache
Previous implementation did not correctly place the evicted I-cache block into the victim cache, leaving all victim entries empty and thus never hit. This patch properly stores the replaced I-cache block into the victim cache before refill, allowing victim hits to function as intended. Measurement shows that the number of virtual-to-physical translations during instruction fetch (mmu_translate() calls) decreased by ~7%.
fe2d95e to
f657fb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase the latest 'master' branch and resolve build errors.
Adjust expressions to align with the new 2-entry cache_fetch design introduced in "Adopt 2-entry direct-mapped page cache".
aab465c to
db3a37f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squash commits and refine commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This series appears to contain several "fix-up," "refactor," or "build-fix" commits that correct or adjust a preceding patch.
To maintain a clean history and ensure the project is bisectable, each patch in a series should be complete and correct on its own.
|
As a friendly reminder regarding project communication: Please ensure that when you quote-reply to others' comments, you do not translate the quoted text into any language other than English. This is an open-source project, and it's important that we keep all discussions in English. This ensures that the conversation remains accessible to everyone in the community, including current and future participants who may not be familiar with other languages. |
| icache_block_t tmp = *blk; | ||
| *blk = *vblk; | ||
| *vblk = tmp; | ||
| blk->tag = tag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks suspicious to me.
When you move the evicted I-cache block (tmp) back into the victim cache, you are setting the vblk->tag to tmp.tag, which is the 16-bit I-cache tag.
Won't this corrupts the victim cache entry? The VC search logic requires a 24-bit tag ([ICache Tag | ICache Index]) to function. Because you're only storing the 16-bit tag, this VCache entry will never be hit again.
Extend the existing architecture to cache the last fetched PC instruction, improving instruction fetch hit rate by approximately 2%.
Also includes clang-format fixes for several expressions.
Summary by cubic
Implemented a direct-mapped instruction fetch cache with a small victim cache to reduce conflict misses and speed up fetches. Improves instruction fetch hit rate by ~2% (per MMU cache stats from #99) and reduces fetch translations by ~7%.
New Features
Refactors
Written for commit db3a37f. Summary will update automatically on new commits.