What's the right way useing the AutoFDO/BOLT for CachyOS Kernel

Hi,

Recently I found out there’s AutoFDO and LLVM-BOLT patch in CachyOS patch source and I want to try out. Although I was able to compile the kernel with these patches in it, And already disabled the KASLR, but still can’t convert the perf data.

According to my search, this looks like because CachyOS Kernel enable the relocatable kernel function by default but perf2bolt and llvm-profgen doesn’t work in relocatable kernel. So I want to ask what’s the right way to doing these for CachyOS Kernel?

The record command:
perf record -e cycles:u -j any,u -o perf.data -- /opt/geekbench/geekbench6 --cpu

The convert command & output:
./bin/perf2bolt --perfdata=./perf.data -o ./test.perf /usr/src/debug/linux-cachyos-server/vmlinux

PERF2BOLT: Starting data aggregation job for ./perf.data PERF2BOLT: spawning perf job to read branch events PERF2BOLT: spawning perf job to read mem events PERF2BOLT: spawning perf job to read process events PERF2BOLT: spawning perf job to read task events BOLT-INFO: Target architecture: x86_64 BOLT-INFO: BOLT version: 64075837b5532108a1fe96a5b158feb7a9025694 BOLT-INFO: Linux kernel binary detected BOLT-INFO: first alloc address is 0x0 BOLT-INFO: static input executable detected BOLT-INFO: enabling lite mode BOLT-WARNING: split function detected on input : do_one_initcall.cold BOLT-ERROR: symbol seen in the middle of the function srso_untrain_ret/1(*2). Skipping. BOLT-ERROR: symbol seen in the middle of the function retbleed_untrain_ret/1(*2). Skipping. BOLT-INFO: pre-processing profile using perf data aggregator BOLT-INFO: binary build-id is: 5c4e3fbb15b500eaa8fc505305f196147b6eeb13 PERF2BOLT: spawning perf job to read buildid list PERF2BOLT-WARNING: build-id matched a different file name PERF2BOLT: waiting for perf task events collection to finish... PERF2BOLT: parsing perf-script task events output PERF2BOLT: input binary is associated with 0 PID(s) PERF2BOLT: waiting for perf events collection to finish... PERF2BOLT: parse branch events... PERF2BOLT: read 71227 samples and 2277852 LBR entries PERF2BOLT: 0 samples (0.0%) were ignored PERF2BOLT: traces mismatching disassembled function contents: 0 (0.0%) PERF2BOLT: out of range traces involving unknown regions: 2206658 (100.0%) PERF2BOLT: waiting for perf mem events collection to finish... BOLT-INFO: parsed 12288 SMP lock entries BOLT-INFO: parsed 0 static call entries BOLT-INFO: parsed 734 exception table entries BOLT-INFO: parsed 12092 bug table entries BOLT-INFO: setting --alt-inst-has-padlen=0 BOLT-INFO: setting --alt-inst-feature-size=4 BOLT-INFO: parsed 21111 alternative instruction entries BOLT-INFO: parsed 476766 ORC entries BOLT-INFO: parsed 938 PCI fixup entries BOLT-INFO: parsed 8263 static keys jump entries BOLT-WARNING: Running parallel work of 0 estimated cost, will switch to trivial scheduling. PERF2BOLT: processing branch events... PERF2BOLT: wrote 0 objects and 0 memory objects to ./test.perf BOLT-INFO: 0 out of 112481 functions in the binary (0.0%) have non-empty execution profile

Although test.perf are created but the size is 0.

Kernel Config: 33dbe3d

cpuinfo: 779af79

Hey,

I had the same issue and using “nokaslr” to the boot cmdline fixes it.
But I dont got the kernel properly bolted, when the --split-function option was used, besides that it worked fine.

Unfortunately that doesn’t work for me, I even disabled the KASLR both in boot command line and Kernel config (as you can see CONFIG_RANDOMIZE_BASE are is not set), the error still the same.

That worked fine for me. Are you on llvm-bolt 19 or 18?
Edit: The best would be building from -git.

Yes, I do useing the latest llvm-bolt 19 on github repo build by myself.
/opt/source/tc-build/src/build/bin/llvm-bolt --version

LLVM (http://llvm.org/): LLVM version 19.1.0 Optimized build with assertions. BOLT revision 64075837b5532108a1fe96a5b158feb7a9025694

Registered Targets: x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64

I think your perf command is wrong.

It needs any:k not :u , see:
perf record -a -e cycles -j any,k -F 5000 -- /opt/geekbench/geekbench6 --cpu

Also, im unsure if the /usr/src/debug way works. I have somewhere written down, how I made it. Let me check.

I wanted to write an article, and made some drafting. Check that out.

Thank you it works!
perf record -a -e cycles -j any,k -F 5000 -- /opt/geekbench/geekbench6 --cpu

But vmlinux in /usr/lib/modules dosen’t work for me, not sure what’s wrong.

./bin/perf2bolt --perfdata=./perf.data -o ./test.perf /usr/lib/modules/6.10.10-1-cachyos-server/build/vmlinux

PERF2BOLT: Starting data aggregation job for ./perf.data PERF2BOLT: spawning perf job to read branch events PERF2BOLT: spawning perf job to read mem events PERF2BOLT: spawning perf job to read process events PERF2BOLT: spawning perf job to read task events BOLT-INFO: Target architecture: x86_64 BOLT-INFO: BOLT version: 64075837b5532108a1fe96a5b158feb7a9025694 BOLT-INFO: Linux kernel binary detected BOLT-INFO: first alloc address is 0x0 BOLT-INFO: static input executable detected BOLT-INFO: enabling lite mode BOLT-WARNING: split function detected on input : do_one_initcall.cold BOLT-ERROR: symbol seen in the middle of the function srso_untrain_ret/1. Skipping.BOLT-ERROR: symbol seen in the middle of the function retbleed_untrain_ret/1. Skipping. BOLT-ERROR: input file has split functions but does not have FILE symbols. If the binary was stripped, preserve FILE symbols with --keep-file-symbols strip option

Yeah, ive manually put the unstripped vmlinux there. Use the debug one then.