第一句子网 > 【Cache篇】Linux中的Cache

【Cache篇】Linux中的Cache

时间：2019-07-29 13:48:32

🌟🌟🌟博主主页：MuggleZero🌟🌟🌟
《ARMv8架构初学者笔记》专栏地址：《ARMv8架构初学者笔记》

前文：

【Cache篇】初见Cache
【Cache篇】Cache的映射方式
【Cache篇】包容性和排他性的Cache
【Cache篇】Cache的分类
【Cache篇】MESI协议
【Cache篇】Cache伪共享
【Cache篇】DMA和Cache的一致性

一般来说，cache line的大小都很小（典型值32字节）。CPU的cache时线性排列的，也就是说对于32字节的cache line是与32字节的地址对齐的。

cache在Linux内核中有着广泛且巧妙的应用，接下来我们看看都有哪些妙用。

常用数据结构对齐

在proc_caches_init中，mm_struct，fs_cache，files_cache和signal_cache等结构体都通过标志位SLAB_HWCACHE_ALIGN创建了slab描述符并且与L1 cache进行了对齐。

如何对齐呢？

我们先以mm_struct进行举例分析，kmem_cache_create_usercopy创建的cache对象是可以拷贝到用户层的。通过是否传入usersize来进行区分调用流程，显然上面传入了usersize。

kmem_cache_create_usercopy

如果传入了usersize，那么就需要计算对齐的大小，calculate_alignment主要针对硬件缓存的对齐方式不能覆盖指定的对齐方式。

cache_line_size通过读取SYS_CTR_EL0这个寄存器来获取CPU的L1 cache大小，获取失败会使用ARM64中DMA最小的对齐大小128字节。

#define ARCH_DMA_MINALIGN (128)static inline int cache_line_size_of_cpu(void){u32 cwg = cache_type_cwg();return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;}static inline u32 cache_type_cwg(void){return (read_cpuid_cachetype() >> CTR_CWG_SHIFT) & CTR_CWG_MASK;}static inline u32 __attribute_const__ read_cpuid_cachetype(void){return read_cpuid(CTR_EL0);}#define read_cpuid(reg) read_sysreg_s(SYS_ ## reg)

获得正确的对齐大小后使用create_cache->__kmem_cache_create()创建一个对齐的cache对象。

kmem_cache_create

kmem_cache_create调用kmem_cache_create_usercopy并传入空的usersize。因此只会调用到__kmem_cache_alias->find_mergeable。find_mergeable核心思想也是使用calculate_alignment先计算对齐的大小，然后去系统的slab caches链表中寻找一个可合并的slab缓存。

埋点：具体slab系统是如何创建的后面再讲。

cache和内存交换的最小单位就是cache line，如果结构体没有与cache line对齐，那么一个结构体很有可能占用了多个cache line，导致性能下降！

SMP系统中的对齐

针对SMP系统，一些常用的数据结构（zone,irqaction,irq_stat,worker_pool）在定义时就使用了cacheline_aligned_in_smp和cacheline_internodealigned_in_smp等宏来定义数据结构。之前提到的cache伪共享问题在SMP系统中会有很大的影响，解决它的办法是让结构体按照cache line进行对齐，例如Linux中按照L1_CACHE_BYTES对齐。

#ifndef L1_CACHE_ALIGN#define L1_CACHE_ALIGN(x) __ALIGN_KERNEL(x, L1_CACHE_BYTES)#endif#ifndef SMP_CACHE_BYTES#define SMP_CACHE_BYTES L1_CACHE_BYTES#endif#ifndef ____cacheline_aligned#define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))#endif#ifndef ____cacheline_aligned_in_smp#ifdef CONFIG_SMP#define ____cacheline_aligned_in_smp ____cacheline_aligned#else#define ____cacheline_aligned_in_smp#endif /* CONFIG_SMP */#endif#ifndef __cacheline_aligned#define __cacheline_aligned \__attribute__((__aligned__(SMP_CACHE_BYTES),\__section__(".data..cacheline_aligned")))#endif /* __cacheline_aligned */#ifndef __cacheline_aligned_in_smp#ifdef CONFIG_SMP#define __cacheline_aligned_in_smp __cacheline_aligned#else#define __cacheline_aligned_in_smp#endif /* CONFIG_SMP */#endif#if !defined(____cacheline_internodealigned_in_smp)#if defined(CONFIG_SMP)#define ____cacheline_internodealigned_in_smp \__attribute__((__aligned__(1 << (INTERNODE_CACHE_SHIFT))))#else#define ____cacheline_internodealigned_in_smp#endif#endif#ifndef CONFIG_ARCH_HAS_CACHE_LINE_SIZE#define cache_line_size() L1_CACHE_BYTES#endif

独占cache line

对于数据结构中频繁访问的成员我们可以设置它独占cache line。为啥要让它独占呢，还是cache伪缓存问题，这个成员可能导致互相干架，频繁导入导出cache line。例如zone->lock和zone->lru_lock这两个频繁的锁，有助于提高获取锁的效率。在SMP系统中，自旋锁的争用会导致严重的cache line颠簸现象。

欢迎关注我的个人微信公众号，一起交流学习嵌入式开发知识！
关注「求密勒实验室」

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。