Kernel Optimization

Hello guys! Does any body tried to compile and install cachyos kernel with autoFDO and propeller?
At first I thought it’s in AUR linux-cachyos already added but it’s not.

Optimizing the Kernel with AutoFDO on CachyOS — CachyOS - looks outdated to me.

git clone -b 6.12/base-autofdo GitHub - CachyOS/linux: Linux kernel source tree
Cloning into ‘linux’…
fatal: Remote branch 6.12/base-autofdo not found in upstream origin

Would appreciate some instructions on how to do it.

try the 6.17 kernels , 6.12 may not support autofdo, to old

how to try?

I download this one:

but what are the next steps?

example that I gave is from the only article on the topic I found.

I believe since February the linux-cachyos kernel has had these optimizations. The package you list says that it uses them as well.

Also not that it matters but why are you using AUR instead of cachy repos?

There was a bug in August that might still be relevant.

That looks like it was resolved end of August

AUR is also official and kernel manager is using this AUR package too.

I don’t have any NVIDIA card.

if kernel is installed via kernel manager it doesn’t use autofdo profile and also doesn’t apply propeller optimizations for it.

I’m not sure what I can say except that my stock cachy kernel installed through the kernel manager does indeed have autofdo and propeller enabled. Obviously don’t clone an ancient source tree but I don’t see why the directions on that page and in the PKGBUILD woukdn’t work. Just start from the current kernel.

stock kernel is being autofdo’d for some processor… main idea is to profile kernel for your processor and to optimize it exactly for your processor.

as of directions in PKGBUILD .. yes, I need to try them. but still they lack commands that need to be executed in order to complete every stage.

for those whom it may concern:
autofdo + propeller + ThinLTO = <1% overall gain (it takes 3 reboots and even more compilations)

I understand that but also I think it’s as much or more about optimizing for your workload, and I’ve personally been fine with Cachy’s presumably well-chosen representative workload there, especially given the expectation of

Thanks for revisiting and coming back with the response. That’s about what I expected which was why I couldn’t really offer much beyond “it’s enabled in github so presumably the kernel and build process isn’t broken”

IMO it’d be more worthwhile if setting up a system with a well-defined workload where I could profile with a known more-representative workload (say a server or other system dedicated to a specific performance-sensitive workload) but I’m also not sure if I’d choose a rolling release there.

3 bash scripts + 1 bash script to automatize everything. it’s possible to make the whole process convenient.

I did the same for systemd which is not optimized in cachyOS. just taking sources from arch linux repository and compiling it at least for my processor. main idea is too see what really can be achieved.

rolling release might seem to be not optimal.. but in the same time… it allows to unleash real potential.

for example:

also I might have done it wrong. it would be helpful if someone clarified:

  1. first kernel build: autofdo + debug + (lto?) lto is needed for propeller in future but it’s bad for autofdo profiling

  2. kernel build autofdo + autofdo profile + debug + thinLTO (fullLTO produced enormously large packages for me +~200MB)

  3. kernel build autofdo + autofdo profile + no debug + thin LTO + propeller

can I somehow make first build without debug for autoFDO profiling?

this one I don’t get

Even Debian stable is on a reasonably fast kernel at this point. The challenges of a rolling distro are that there are so many potential moving pieces that something else might change while you’re optimizing elsewhere, particularly with a vaguely binary-centric one like Arch-derivatives. Too much else going around surrounding your optimizations imo. If your desire is to roll out compiler optimizations everywhere and be on the bleeding edge, a source-based distro like gentoo feels like a better fit. But just my opinion.

Even scripted the compilation takes time, and to be a net benefit in a meaningful way that time needs to balance with performance gains.

Your gains don’t seem outside of the expected range for an already-optimized kernel.

All I can say is the docs say not to.

 # Enable AUTOFDO_CLANG for the first compilation to create a kernel, which can be used for profiling
# Workflow:
# https://cachyos.org/blog/2411-kernel-autofdo/
# 1. Compile Kernel with _autofdo=yes and _build_debug=yes
# 2. Boot the kernel in QEMU or on your system, see Workload
# 3. Profile the kernel and convert the profile, see Generating the Profile for AutoFDO
# 4. Put the profile into the sourcedir
# 5. Run kernel build again with the _autofdo_profile_name path to profile specified
: "${_autofdo:=no}"

# Name for the AutoFDO profile
: "${_autofdo_profile_name:=}"

# Propeller should be applied, after the kernel is optimized with AutoFDO
# Workflow:
# 1. Proceed with above AutoFDO Optimization, but enable at the final compilation also _propeller
# 2. Boot into the AutoFDO Kernel and profile it
# 3. Convert the profile into the propeller profile, example:
# create_llvm_prof --binary=/usr/src/debug/linux-cachyos-rc/vmlinux --profile=propeller.data --format=propeller --propeller_output_module_name --out=propeller_cc_profile.txt --propeller_symorder=propeller_ld_profile.txt
# 4. Place the propeller_cc_profile.txt and propeller_ld_profile.txt into the srcdir
# 5. Enable _propeller_prefix
: "${_propeller:=no}"

# Enable this after the profiles have been generated
: "${_propeller_profiles:=no}"

in fact building the first kernel +debug -lto for autofdo profiling made it the end result a bit faster.

so if someone will still be interested, the first step is:

export _use_llvm_lto=none
export _processor_opt=native
export _use_lto_suffix=no
export _use_kcfi=yes
export _build_debug=yes
export _autofdo=yes
export _use_gcc_suffix=no

# Run the build
makepkg --cleanbuild -sfi --skipinteg

as of gentoo.. I don’t think it’s fast. it even doesn’t have lto out of the box.

i tried to compile different cachyos packages from sources (arch/cachy os repos) - no real gains at all. even firefox-pure is almost of the same performance (it has pgo on board).

@forAUR Did you set makepkg.conf to something like CFLAGS="-march=native -Ofast ?

no. I didn’t edit cachyos’s default pkgbuild file.

ing instruction at address: 0xffffffff82324beb with counter sum 28, instruction name: NOOPL
I20251108 21:08:22.559561 52871 llvm_propeller_binary_address_mapper.cc:463] Started reading the binary content from: /usr/src/debug/linux-sakkan//vmlinux
E20251108 21:08:22.594228 52871 create_llvm_prof.cc:238] INTERNAL: Failed to read the LLVM_BB_ADDR_MAP section from /usr/src/debug/linux-sakkan//vmlinux: unable to read SHT_LLVM_BB_ADDR_MAP section with index 61: unsupported SHT_LLVM_BB_ADDR_MAP version: 3.
because i enable full-lto?

first compilation:
export _use_llvm_lto=none
export _processor_opt=native
export _use_lto_suffix=no
export _use_kcfi=yes
export _build_debug=yes
export _autofdo=yes
export _use_gcc_suffix=no

# Run the build
makepkg --cleanbuild -sfi --skipinteg

I have the same issue, my research says the propeller toolchain may be outdated, but I’m not sure how to resolve that in the context of CachyOS. I think we’re using the vmlinux provided by CachyOS.

create_llvm_prof.cc:238] INTERNAL: Failed to read the LLVM_BB_ADDR_MAP section from /usr/src/debug/linux-cachyos-lto/vmlinux: unable to read SHT_LLVM_BB_ADDR_MAP section with index 61: unsupported SHT_LLVM_BB_ADDR_MAP version: 3.

yep it need GitHub - google/llvm-propeller: PROPELLER: Profile Guided Optimizing Large Scale LLVM-based Relinker
and enable thin lto