TrueNAS Core: Saving even more power with Intel Speed Shift
In the previous opus here we have discussed using powerd
daemon to optimize power consumption of the TrueNAS system based on Intel Xeon E-2600 v3 (Haswell) and earlier processors, discussed challenges with managing and scaling clocks from the OS, and yet ended up with somewhat unsatisfying solution.
However, today on a used market, one can acquire a v4 (Broadwell) or newer processor for a pocket change; and mainboards that support v3 often also support v4 as well make it a no-brainer upgrade for your aging system to bring Intel Speed Shift, and redefine approach to power efficiency.
Now instead of an operating system actively managing the cpu clocks and c-states, hardware calls all the shots, while OS can still provide hints and control balance between the performance and efficiency, even on a per-core basis.
Daemons, like powerd
, are no longer necessary.
BIOS configuration
The items may be called differently, I’ll be using the Supermicro conventions, as this is a golden standard for TrueNAS systems.
Advanced Power Management Configuration
Head on to your system’s BIOS, CPU configuration → Advanced Power Management Configuration.
On this screen we want to Enable
power technology, and also set Energy Performance Tuning
to Enable
. This will allow the OS to specify the desired performance/power balance. When this is turned on, BIAS setting will be grayed out.
Turn on Energy Efficient Turbo
as well:
Advanced Power Management Configuration
--------------------------------------------------------
Power Technology [Custom]
Enengy Performance Tuning [Enable]
Energy Performance BIAS setting. [N/A]
Energy Efficient Turbo [Enable]
• CPU P State Control
• CPU HAPM State Control
• CPU C State Control
• CPU T State Control
CPU P State Control
This is where we will turn on EIST and set P-State coordination to HW_ALL
– giving hardware full control:
CPU P State Control
EIST (P-States) [Enable]
Turbo Mode [Enable]
P-State Coordination [HW_ALL]
CPU HWPM State Control
In this section, we enable HWPM in Native mode. This will allow OS influence on the parameters, as opposed to Out of Bound control mode, where, depending on the version of the OS, it may not even load the driver, and enable autonomous C-States.
CPU HWPM State Control
Enable GPU HAPM [CHAPM NATIVE MODE]
CPU Autonomous Cstate [Enable]
CPU C State Control
In that section, enable all the C-states. All of them, to the deepest one:
CPU C State Control
Package & State Limit [C6 (Retention) state]
CPU C3 Report [Enable]
CPU C6 Report [Enable]
Enhanced Halt State (C1E) [Enable]
CPU T State Control
Enable it too.
OS Configuration
Under System
→ Tunables
→ Add
add a LOADER
tunable to set machdep.hwpstate_pkg_ctrl
to 0
. This will enable per-core control. We may not need this right a way, but it’s a good and recommended default.
Monitoring
First, ensure that the hwpstate_intel
driver is active:
% sysctl dev.cpufreq.0.freq_driver
dev.cpufreq.0.freq_driver: hwpstate_intel0
The frequency levels and current frequencies may be somewhat bogus:
% sysctl dev.cpu.{0..7}.freq_levels dev.cpu.{0..7}.freq
dev.cpu.0.freq_levels: 3500/-1
dev.cpu.1.freq_levels: 3500/-1
...
dev.cpu.0.freq: 1197
dev.cpu.1.freq: 1197
...
But that’s expected—OS has little visibility into clock control—hardware is doing all the work. To monitor it, we can use Intel PCM tools.
Intel PCM tools
Click here for a short guide on building the PCM tool from source -- as the one installable with pkg is quite old, but we want all the fancy coloring and features
Create a temporary jail with networking
sudo iocage create -r 13.3-RELEASE -n temp
sudo iocage set dhcp=1 temp
sudo iocage set bpf=1 temp
sudo iocage set vnet=1 temp
sudo iocage console -f temp
Install tools, fetch, and build
Install git
and cmake
, and fetch and build the PCM tools like so:
env IGNORE_OSVERSION=yes pkg install -y git cmake
git clone --recursive https://github.com/intel/pcm && cd pcm
mkdir -p build && cd build
cmake ..
cmake --build . --parallel 8
Copy the built binary and dependencies
Once this completes, you will have tools in the bin folder. Using ldd
check the dependencies of the pcm tool:
# ldd -f "%p\n" pcm
/usr/lib/libexecinfo.so.1
/usr/lib/libc++.so.1
/lib/libcxxrt.so.1
/lib/libm.so.5
/lib/libgcc_s.so.1
/lib/libthr.so.3
/lib/libc.so.7
/lib/libelf.so.2
and copy them, along with the contents of the bin folder, to some external dataset:
mkdir -p /mnt/tools/pcm
cp -r . /mnt/tools/pcm
cp `ldd -f "%p\n" pcm` /mnt/tools/pcm
To launch the pcm from the host, that won’t have the required dependencies, those we copied along with the tool, it’s helpful to create a helper alias in your ~/.zshrc
:
alias pcm="sudo zsh -c 'pushd /mnt/pool1/tools/pcm; LD_LIBRARY_PATH=. ./pcm'"
Once done, exit and destroy the temporary jail
sudo iocage stop temp
sudo iocage destroy temp
Launching the pcm tools presents with the endless dump of the current state of the CPU:
Core (SKT) | UTIL | IPC | CFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | L3OCC | LMB | RMB | TEMP
0 0 0.27 0.62 3.08 1518 K 4048 K 0.62 0.56 0.0029 0.0078 984 404 0 65
1 0 0.26 0.69 3.14 1592 K 4022 K 0.60 0.57 0.0028 0.0071 3072 445 0 65
2 0 0.26 0.73 2.83 1421 K 3896 K 0.64 0.57 0.0026 0.0071 1272 403 0 66
3 0 0.25 0.74 2.88 1491 K 3888 K 0.62 0.57 0.0028 0.0072 3672 421 0 66
4 0 0.28 0.65 2.97 1434 K 4530 K 0.68 0.53 0.0026 0.0083 936 392 0 66
5 0 0.25 0.70 3.05 1529 K 4053 K 0.62 0.56 0.0029 0.0076 3192 417 0 66
6 0 0.28 0.65 1.20 815 K 1897 K 0.57 0.52 0.0038 0.0087 480 128 0 67
7 0 0.29 0.76 1.20 934 K 2127 K 0.56 0.52 0.0035 0.0080 1440 164 0 67
---------------------------------------------------------------------------------------------------------------
SKT 0 0.27 0.69 2.52 10 M 28 M 0.62 0.56 0.0029 0.0076 15048 2774 0 61
---------------------------------------------------------------------------------------------------------------
TOTAL * 0.27 0.69 2.52 10 M 28 M 0.62 0.56 0.0029 0.0076 N/A N/A N/A N/A
Instructions retired: 3729 M ; Active cycles: 5400 M ; Time (TSC): 3504 Mticks;
Core C-state residencies: C0 (active,non-halted): 26.79 %; C1: 23.90 %; C3: 0.00 %; C6: 49.31 %; C7: 0.00 %;
Package C-state residencies: C0: 64.64 %; C2: 18.17 %; C3: 0.00 %; C6: 17.19 %; C7: 0.00 %;
┌───────────────────────────────────────────────────────────────────────────────┐
Core C-state distribution│0000000000000000000001111111111111111111666666666666666666666666666666666666666│
└───────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────┐
Package C-state distribution│000000000000000000000000000000000000000000000000000022222222222222266666666666666│
└─────────────────────────────────────────────────────────────────────────────────┘
---------------------------------------------------------------------------------------------------------------
MEM (GB)->| READ | WRITE | LOCAL | CPU energy | DIMM energy | LLCRDMISSLAT (ns)| UncFREQ (Ghz)|
---------------------------------------------------------------------------------------------------------------
SKT 0 1.42 1.03 100 % 17.91 9.58 164.08 1.18
---------------------------------------------------------------------------------------------------------------
Tuning
Of interest are CFREQ
, CPU energy
, DIMM energy
, Core C-state residencies
, and Package C-state residencies
.
It’s useful to run some workload in the jail, such as stress-ng --cpu=N
, where N is a number of cores worth of work to generate.
Then adjusting the epp
parameter of the hwpstate_intel
driver you can shift the preference from performance
(0) to power
(100) and assess the impact.
Handy shortcut to set the epp
values for a range of cores:
sudo sysctl dev.hwpstate_intel.{0..7}.epp=80
Try various values (30-80 is a good starting range) and test various workloads until you are satisfied with the performance/power tradeoff.
Note, the goal is low average power consumption, while ensuring sufficient performance for most common workloads, including a single threaded samba. Therefore, lower clocks are not always better: sometimes it’s more beneficial to complete the job quickly and go to sleep, than to stay awake for longer while performing slower. Keep an eye on C-state residency for both the package and individual cores – you want the CPU to sleep as much as possible; or just focus on the power consumption: because that’s what you want to ultimately optimize.
References
- Your motherboard’s manual
- 14.6.5. Intel Speed Shift™ of FreeBSD Handbook
- hwpstate_intel driver man page