Kernel Optimization

forAUR · October 8, 2025, 11:12am

Hello guys! Does any body tried to compile and install cachyos kernel with autoFDO and propeller?
At first I thought it’s in AUR linux-cachyos already added but it’s not.

Optimizing the Kernel with AutoFDO on CachyOS — CachyOS - looks outdated to me.

git clone -b 6.12/base-autofdo GitHub - CachyOS/linux: Linux kernel source tree
Cloning into ‘linux’…
fatal: Remote branch 6.12/base-autofdo not found in upstream origin

Would appreciate some instructions on how to do it.

Franck · October 11, 2025, 7:38am

try the 6.17 kernels , 6.12 may not support autofdo, to old

forAUR · October 12, 2025, 1:21pm

how to try?

I download this one:

but what are the next steps?

example that I gave is from the only article on the topic I found.

mattsteg · October 12, 2025, 2:41pm

I believe since February the linux-cachyos kernel has had these optimizations. The package you list says that it uses them as well.

Also not that it matters but why are you using AUR instead of cachy repos?

There was a bug in August that might still be relevant.

github.com/CachyOS/linux-cachyos

Contradictory information - repo readme vs Wiki

opened 09:55AM - 10 Aug 25 UTC

closed 04:26PM - 22 Aug 25 UTC

spinktvis

There's contradictory information on the Wiki vs repo, this not only applies to …the kernel info part and the readme located here, but for example the Wiki gaming section (the Wine part) as well, for which there are a couple of tickets already located and found here https://github.com/CachyOS/CachyOS-PKGBUILDS/issues/779 https://github.com/CachyOS/CachyOS-PKGBUILDS/issues/800 Anyway, back to this, a small example, the Wiki https://wiki.cachyos.org/features/kernel/#why-is-autofdo-not-being-used-for-all-the-other-kernel-variants says only the default linux-cachyos kernel is compiled with FDO vs the info found here in the readme https://github.com/CachyOS/linux-cachyos#compiler-variants which indicates that only linux-cachyos-lto - Clang and Thin LTO, utilizing AutoFDO + Propeller profiling for optimal performance is the only one that has FDO. So what is the truth? Which kernel is actually compiled with FDO? I must admit that such contradictory information is confusing for new users, especially if one follows Phoronix and reads that FDO + Propellor has been disabled entirely a while back due to the then ongoing issues (not sure how relevant that part still is). Hopefully the communication between the maintainers and editors of the wiki can be further improved upon to avoid such things, since other than that CachyOS does a lot right the first time!

That looks like it was resolved end of August

forAUR · October 12, 2025, 4:41pm

AUR is also official and kernel manager is using this AUR package too.

I don’t have any NVIDIA card.

if kernel is installed via kernel manager it doesn’t use autofdo profile and also doesn’t apply propeller optimizations for it.

mattsteg · October 12, 2025, 5:35pm

I’m not sure what I can say except that my stock cachy kernel installed through the kernel manager does indeed have autofdo and propeller enabled. Obviously don’t clone an ancient source tree but I don’t see why the directions on that page and in the PKGBUILD woukdn’t work. Just start from the current kernel.

forAUR · October 12, 2025, 5:48pm

stock kernel is being autofdo’d for some processor… main idea is to profile kernel for your processor and to optimize it exactly for your processor.

as of directions in PKGBUILD .. yes, I need to try them. but still they lack commands that need to be executed in order to complete every stage.

forAUR · October 13, 2025, 8:02am

for those whom it may concern:
autofdo + propeller + ThinLTO = <1% overall gain (it takes 3 reboots and even more compilations)

mattsteg · October 13, 2025, 6:18pm

I understand that but also I think it’s as much or more about optimizing for your workload, and I’ve personally been fine with Cachy’s presumably well-chosen representative workload there, especially given the expectation of

Thanks for revisiting and coming back with the response. That’s about what I expected which was why I couldn’t really offer much beyond “it’s enabled in github so presumably the kernel and build process isn’t broken”

IMO it’d be more worthwhile if setting up a system with a well-defined workload where I could profile with a known more-representative workload (say a server or other system dedicated to a specific performance-sensitive workload) but I’m also not sure if I’d choose a rolling release there.

forAUR · October 13, 2025, 7:46pm

3 bash scripts + 1 bash script to automatize everything. it’s possible to make the whole process convenient.

I did the same for systemd which is not optimized in cachyOS. just taking sources from arch linux repository and compiling it at least for my processor. main idea is too see what really can be achieved.

rolling release might seem to be not optimal.. but in the same time… it allows to unleash real potential.

for example:

also I might have done it wrong. it would be helpful if someone clarified:

first kernel build: autofdo + debug + (lto?) lto is needed for propeller in future but it’s bad for autofdo profiling
kernel build autofdo + autofdo profile + debug + thinLTO (fullLTO produced enormously large packages for me +~200MB)
kernel build autofdo + autofdo profile + no debug + thin LTO + propeller

can I somehow make first build without debug for autoFDO profiling?

this one I don’t get

mattsteg · October 13, 2025, 9:33pm

Even Debian stable is on a reasonably fast kernel at this point. The challenges of a rolling distro are that there are so many potential moving pieces that something else might change while you’re optimizing elsewhere, particularly with a vaguely binary-centric one like Arch-derivatives. Too much else going around surrounding your optimizations imo. If your desire is to roll out compiler optimizations everywhere and be on the bleeding edge, a source-based distro like gentoo feels like a better fit. But just my opinion.

Even scripted the compilation takes time, and to be a net benefit in a meaningful way that time needs to balance with performance gains.

Your gains don’t seem outside of the expected range for an already-optimized kernel.

All I can say is the docs say not to.

 # Enable AUTOFDO_CLANG for the first compilation to create a kernel, which can be used for profiling
# Workflow:
# https://cachyos.org/blog/2411-kernel-autofdo/
# 1. Compile Kernel with _autofdo=yes and _build_debug=yes
# 2. Boot the kernel in QEMU or on your system, see Workload
# 3. Profile the kernel and convert the profile, see Generating the Profile for AutoFDO
# 4. Put the profile into the sourcedir
# 5. Run kernel build again with the _autofdo_profile_name path to profile specified
: "${_autofdo:=no}"

# Name for the AutoFDO profile
: "${_autofdo_profile_name:=}"

# Propeller should be applied, after the kernel is optimized with AutoFDO
# Workflow:
# 1. Proceed with above AutoFDO Optimization, but enable at the final compilation also _propeller
# 2. Boot into the AutoFDO Kernel and profile it
# 3. Convert the profile into the propeller profile, example:
# create_llvm_prof --binary=/usr/src/debug/linux-cachyos-rc/vmlinux --profile=propeller.data --format=propeller --propeller_output_module_name --out=propeller_cc_profile.txt --propeller_symorder=propeller_ld_profile.txt
# 4. Place the propeller_cc_profile.txt and propeller_ld_profile.txt into the srcdir
# 5. Enable _propeller_prefix
: "${_propeller:=no}"

# Enable this after the profiles have been generated
: "${_propeller_profiles:=no}"

forAUR · October 13, 2025, 9:52pm

in fact building the first kernel +debug -lto for autofdo profiling made it the end result a bit faster.

so if someone will still be interested, the first step is:

export _use_llvm_lto=none
export _processor_opt=native
export _use_lto_suffix=no
export _use_kcfi=yes
export _build_debug=yes
export _autofdo=yes
export _use_gcc_suffix=no

# Run the build
makepkg --cleanbuild -sfi --skipinteg

as of gentoo.. I don’t think it’s fast. it even doesn’t have lto out of the box.

i tried to compile different cachyos packages from sources (arch/cachy os repos) - no real gains at all. even firefox-pure is almost of the same performance (it has pgo on board).

doaxan · October 24, 2025, 3:10pm

@forAUR Did you set makepkg.conf to something like CFLAGS="-march=native -Ofast ?

forAUR · October 24, 2025, 8:03pm

no. I didn’t edit cachyos’s default pkgbuild file.

yingzou · November 8, 2025, 1:19pm

ing instruction at address: 0xffffffff82324beb with counter sum 28, instruction name: NOOPL
I20251108 21:08:22.559561 52871 llvm_propeller_binary_address_mapper.cc:463] Started reading the binary content from: /usr/src/debug/linux-sakkan//vmlinux
E20251108 21:08:22.594228 52871 create_llvm_prof.cc:238] INTERNAL: Failed to read the LLVM_BB_ADDR_MAP section from /usr/src/debug/linux-sakkan//vmlinux: unable to read SHT_LLVM_BB_ADDR_MAP section with index 61: unsupported SHT_LLVM_BB_ADDR_MAP version: 3.
because i enable full-lto?

forAUR · November 9, 2025, 1:01pm

first compilation:
export _use_llvm_lto=none
export _processor_opt=native
export _use_lto_suffix=no
export _use_kcfi=yes
export _build_debug=yes
export _autofdo=yes
export _use_gcc_suffix=no

# Run the build
makepkg --cleanbuild -sfi --skipinteg

infrachris · December 8, 2025, 3:34pm

I have the same issue, my research says the propeller toolchain may be outdated, but I’m not sure how to resolve that in the context of CachyOS. I think we’re using the vmlinux provided by CachyOS.

create_llvm_prof.cc:238] INTERNAL: Failed to read the LLVM_BB_ADDR_MAP section from /usr/src/debug/linux-cachyos-lto/vmlinux: unable to read SHT_LLVM_BB_ADDR_MAP section with index 61: unsupported SHT_LLVM_BB_ADDR_MAP version: 3.

yingzou · December 9, 2025, 7:03am

yep it need GitHub - google/llvm-propeller: PROPELLER: Profile Guided Optimizing Large Scale LLVM-based Relinker
and enable thin lto