背景
捡垃圾搭了一台基于 PVE 8.4 + i5-12450H 白牌主板 的服务器,使用中发现机箱风扇完全没有工作(显卡是魔改的散热也用的机箱散热端口,结果0转速疯狂过热死机),排查发现系统无法读取风扇信息。
- 系统版本:Proxmox VE 8.4(内核 6.8.12-9-pve)
- 主板:白牌板载 B660 + i5-12450H
root@pve:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +43.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +34.0°C (high = +100.0°C, crit = +100.0°C)
Core 4: +37.0°C (high = +100.0°C, crit = +100.0°C)
Core 8: +36.0°C (high = +100.0°C, crit = +100.0°C)
Core 12: +34.0°C (high = +100.0°C, crit = +100.0°C)
Core 20: +33.0°C (high = +100.0°C, crit = +100.0°C)
Core 21: +33.0°C (high = +100.0°C, crit = +100.0°C)
Core 22: +33.0°C (high = +100.0°C, crit = +100.0°C)
Core 23: +33.0°C (high = +100.0°C, crit = +100.0°C)acpitz-acpi-0
Adapter: ACPI interface
temp1: +27.8°Cnvme-pci-0200
Adapter: PCI adapter
Composite: +39.9°C (low = -273.1°C, high = +89.8°C)
(crit = +94.8°C)
Sensor 1: +39.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +39.9°C (low = -273.1°C, high = +65261.8°C)
没有检测到主板风扇信息,也没有PWM风扇控制器(如Nuvoton芯片)被识别。
最初安装完 lm-sensors
尝试加载了一些常见的硬件监控模块,如:
modprobe coretemp
modprobe nct6775
modprobe w83627ehf
尝试下一步方案
1. 猜测可能使用的是非主流的 IT 芯片(通常用于风扇控制)
sudo sensors-detect
完整执行,一路回车/yes,直到末尾它会告诉你识别到的芯片及推荐加载的内核模块。
随后发现当前的主板传感器芯片是:
ITE IT8613E Super IO Sensors
但目前 Linux 内核并未内建驱动支持(to-be-written
)。
完整输出:
root@pve:~# sudo sensors-detect
# sensors-detect version 3.6.0
# Board: INTEL AIder Lake PCH B660 M-ATX G660
# Kernel: 6.8.12-9-pve x86_64
# Processor: 12th Gen Intel(R) Core(TM) i5-12450H (6/154/2)
This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.
Some south bridges, CPUs or memory controllers contain embedded sensors.
Do you want to scan for them? This is totally safe. (YES/no):
Module cpuid loaded successfully.
Silicon Integrated Systems SIS5595... No
VIA VT82C686 Integrated Sensors... No
VIA VT8231 Integrated Sensors... No
AMD K8 thermal sensors... No
AMD Family 10h thermal sensors... No
AMD Family 11h thermal sensors... No
AMD Family 12h and 14h thermal sensors... No
AMD Family 15h thermal sensors... No
AMD Family 16h thermal sensors... No
AMD Family 17h thermal sensors... No
AMD Family 15h power sensors... No
AMD Family 16h power sensors... No
Hygon Family 18h thermal sensors... No
Intel digital thermal sensor... Success!
(driver coretemp')
Intel AMB FB-DIMM thermal sensor... No
Intel 5500/5520/X58 thermal sensor... No
VIA C7 thermal sensor... No
VIA Nano thermal sensor... No
Some Super I/O chips contain embedded sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no):
Probing for Super-I/O at 0x2e/0x2f
Trying family National Semiconductor/ITE'... No
Trying family SMSC'... No
Trying family VIA/Winbond/Nuvoton/Fintek'... No
Trying family ITE'... Yes
Found ITE IT8613E Super IO Sensors' Success!
(address 0xa30, driver to-be-written')
Probing for Super-I/O at 0x4e/0x4f
Trying family National Semiconductor/ITE'... No
Trying family SMSC'... No
Trying family VIA/Winbond/Nuvoton/Fintek'... No
Trying family ITE'... No
Some systems (mainly servers) implement IPMI, a set of common interfaces
through which system health data may be retrieved, amongst other things.
We first try to get the information from SMBIOS. If we don't find it
there, we have to read from arbitrary I/O ports to probe for such
interfaces. This is normally safe. Do you want to scan for IPMI
interfaces? (YES/no):
Probing for IPMI BMC KCS' at 0xca0... No
Probing for IPMI BMC SMIC' at 0xca8... No
Some hardware monitoring chips are accessible through the ISA I/O ports.
We have to write to arbitrary I/O ports to probe them. This is usually
safe though. Yes, you do have ISA I/O ports even if you do not have any
ISA slots! Do you want to scan the ISA I/O ports? (yes/NO):
Lastly, we can probe the I2C/SMBus adapters for connected hardware
monitoring devices. This is the most risky part, and while it works
reasonably well on most systems, it has been reported to cause trouble
on some systems.
Do you want to probe the I2C/SMBus adapters now? (YES/no):
Found unknown SMBus adapter 8086:51a3 at 0000:00:1f.4.
Sorry, no supported PCI bus adapters found.
Next adapter: SMBus I801 adapter at efa0 (i2c-0)
Do you want to scan it? (yes/NO/selectively):
Next adapter: i915 gmbus dpa (i2c-1)
Do you want to scan it? (yes/NO/selectively):
Next adapter: i915 gmbus dpb (i2c-2)
Do you want to scan it? (yes/NO/selectively):
Next adapter: i915 gmbus dpc (i2c-3)
Do you want to scan it? (yes/NO/selectively):
Next adapter: i915 gmbus tc1 (i2c-4)
Do you want to scan it? (yes/NO/selectively):
Next adapter: i915 gmbus tc2 (i2c-5)
Do you want to scan it? (yes/NO/selectively):
Next adapter: i915 gmbus tc3 (i2c-6)
Do you want to scan it? (yes/NO/selectively):
Next adapter: i915 gmbus tc4 (i2c-7)
Do you want to scan it? (yes/NO/selectively):
Next adapter: i915 gmbus tc5 (i2c-8)
Do you want to scan it? (yes/NO/selectively):
Next adapter: i915 gmbus tc6 (i2c-9)
Do you want to scan it? (yes/NO/selectively):
Next adapter: AUX B/DDI B/PHY B (i2c-10)
Do you want to scan it? (yes/NO/selectively):
Next adapter: NVIDIA i2c adapter 1 at 1:00.0 (i2c-11)
Do you want to scan it? (yes/NO/selectively):
Next adapter: NVIDIA i2c adapter 2 at 1:00.0 (i2c-12)
Do you want to scan it? (yes/NO/selectively):
Now follows a summary of the probes I have just done.
Just press ENTER to continue:
Driver coretemp':
* Chip Intel digital thermal sensor' (confidence: 9)
Driver to-be-written':
* ISA bus, address 0xa30
Chip ITE IT8613E Super IO Sensors' (confidence: 9)
Note: there is no driver for ITE IT8613E Super IO Sensors yet.
Check https://hwmon.wiki.kernel.org/device_support_status for updates.
To load everything that is needed, add this to /etc/modules:
#----cut here----
# Chip drivers
coretemp
#----cut here----
If you have some drivers built into your kernel, the list above will
contain too many modules. Skip the appropriate ones!
Do you want to add these lines automatically to /etc/modules? (yes/NO)
Unloading cpuid... OK
解决方法:手动安装第三方驱动支持 IT8613E
虽然主线内核不支持,但社区已经提供非官方支持 IT8613E
的 it87
模块。使用 frankcrawford 的 it87 驱动,支持该芯片。我们可以通过 DKMS 编译安装:
1. 安装编译依赖并克隆仓库
apt update
apt install -y dkms git build-essential linux-headers-$(uname -r)
cd /usr/src
git clone https://github.com/frankcrawford/it87.git
2. 通过 DKMS 构建并安装驱动模块
root@pve:/usr/src/it87# make dkms-install
cat dkms-install.sh >dkms-install
chmod a+x dkms-install
root@pve:/usr/src/it87# ./dkms-install
这会安装模块并使其在内核更新时持续有效。
3. 加载模块并配置参数
modprobe it87 force_id=0x8613 ignore_resource_conflict=1
force_id=0x8613
:强制识别 IT8613E 芯片ignore_resource_conflict=1
:忽略资源冲突(按需使用)
4. 验证传感器和风扇读数
sensors
发现此白牌主板默认系统仅能控制 CPU 风扇转速,主板上的 SysFan(实际也用作 GPU 区域风扇,显示是fan3)读数为0(BIOS 中也找不到相关设置项)。
root@pve:/usr/src/it87# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +43.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +35.0°C (high = +100.0°C, crit = +100.0°C)
Core 4: +33.0°C (high = +100.0°C, crit = +100.0°C)
Core 8: +36.0°C (high = +100.0°C, crit = +100.0°C)
Core 12: +33.0°C (high = +100.0°C, crit = +100.0°C)
Core 20: +33.0°C (high = +100.0°C, crit = +100.0°C)
Core 21: +33.0°C (high = +100.0°C, crit = +100.0°C)
Core 22: +33.0°C (high = +100.0°C, crit = +100.0°C)
Core 23: +33.0°C (high = +100.0°C, crit = +100.0°C)acpitz-acpi-0
Adapter: ACPI interface
temp1: +27.8°Cit8613-isa-0a30
Adapter: ISA adapter
in0: 715.00 mV (min = +2.67 V, max = +0.49 V) ALARM
in1: 1.36 V (min = +1.79 V, max = +2.44 V) ALARM
in2: 2.04 V (min = +2.72 V, max = +0.68 V) ALARM
in4: 2.01 V (min = +2.22 V, max = +1.36 V) ALARM
in5: 1.83 V (min = +1.49 V, max = +2.23 V)
3VSB: 3.32 V (min = +3.81 V, max = +3.78 V) ALARM
Vbat: 2.77 V
+3.3V: 3.34 V
fan2: 768 RPM (min = 11 RPM)
fan3: 0 RPM (min = 250 RPM) ALARM
temp1: +33.0°C (low = +89.0°C, high = -41.0°C) ALARM sensor = Intel PECI
temp2: -91.0°C (low = +49.0°C, high = +53.0°C)
temp3: -80.0°C (low = +126.0°C, high = +125.0°C)
intrusion0: ALARMnvme-pci-0200
Adapter: PCI adapter
Composite: +39.9°C (low = -273.1°C, high = +89.8°C)
(crit = +94.8°C)
Sensor 1: +39.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +39.9°C (low = -273.1°C, high = +65261.8°C)
查找 PWM 控制通道
列出可控制通道:
ls /sys/class/hwmon/hwmon*/pwm*
cat /sys/class/hwmon/hwmon*/name
确认 it8613
控制路径为 /sys/class/hwmon/hwmon3/
(可能会因系统而异)。
完整输出:
root@pve:/usr/src/it87# ls /sys/class/hwmon/hwmon*/pwm*
/sys/class/hwmon/hwmon3/pwm2 /sys/class/hwmon/hwmon3/pwm3 /sys/class/hwmon/hwmon3/pwm5
/sys/class/hwmon/hwmon3/pwm2_auto_channels_temp /sys/class/hwmon/hwmon3/pwm3_auto_channels_temp /sys/class/hwmon/hwmon3/pwm5_auto_channels_temp
/sys/class/hwmon/hwmon3/pwm2_auto_point1_temp /sys/class/hwmon/hwmon3/pwm3_auto_point1_temp /sys/class/hwmon/hwmon3/pwm5_auto_point1_temp
/sys/class/hwmon/hwmon3/pwm2_auto_point1_temp_hyst /sys/class/hwmon/hwmon3/pwm3_auto_point1_temp_hyst /sys/class/hwmon/hwmon3/pwm5_auto_point1_temp_hyst
/sys/class/hwmon/hwmon3/pwm2_auto_point2_temp /sys/class/hwmon/hwmon3/pwm3_auto_point2_temp /sys/class/hwmon/hwmon3/pwm5_auto_point2_temp
/sys/class/hwmon/hwmon3/pwm2_auto_point3_temp /sys/class/hwmon/hwmon3/pwm3_auto_point3_temp /sys/class/hwmon/hwmon3/pwm5_auto_point3_temp
/sys/class/hwmon/hwmon3/pwm2_auto_slope /sys/class/hwmon/hwmon3/pwm3_auto_slope /sys/class/hwmon/hwmon3/pwm5_auto_slope
/sys/class/hwmon/hwmon3/pwm2_auto_start /sys/class/hwmon/hwmon3/pwm3_auto_start /sys/class/hwmon/hwmon3/pwm5_enable
/sys/class/hwmon/hwmon3/pwm2_enable /sys/class/hwmon/hwmon3/pwm3_enable /sys/class/hwmon/hwmon3/pwm5_freq
/sys/class/hwmon/hwmon3/pwm2_freq /sys/class/hwmon/hwmon3/pwm3_freq
root@pve:/usr/src/it87# cat /sys/class/hwmon/hwmon*/name
acpitz
nvme
coretemp
it8613
/sys/class/hwmon/hwmon*
是一组目录(hwmon0、hwmon1、hwmon2、…),每个目录对应一个硬件监控设备。这些目录中的 name
文件标识了该设备的驱动或芯片名称。
系统会按 hwmon 编号顺序排列,比如:
目录 | 名称(内容来自 name 文件) |
---|---|
/sys/class/hwmon/hwmon0 | acpitz |
/sys/class/hwmon/hwmon1 | nvme |
/sys/class/hwmon/hwmon2 | coretemp |
/sys/class/hwmon/hwmon3 | it8613 ✅ |
因此就能推断出:
it8613 => /sys/class/hwmon/hwmon3/
确认 SysFan 对应通道(实测)
驱动加载后,/sys/class/hwmon/hwmon3/
下出现了多个 pwmX
控制通道:
通道 | 启用命令 | 占空比测试 | 实际反应 | 结论 |
---|---|---|---|---|
pwm2 | echo 1 > pwm2_enable | echo 235 > pwm2 | 无反应 | 无效 |
pwm3 | echo 1 > pwm3_enable | echo 235 > pwm3 | 风扇立即加速(GPU/SysFan 区域) | ✅ 有效 |
pwm5 | echo 1 > pwm5_enable | echo 235 > pwm5 | 无反应 | 无效 |
测试命令如下:
# 测试 pwm2/3/5
echo 1 > /sys/class/hwmon/hwmon3/pwm2_enable
echo 128 > /sys/class/hwmon/hwmon3/pwm2
echo 1 > /sys/class/hwmon/hwmon3/pwm3_enable
echo 128 > /sys/class/hwmon/hwmon3/pwm3
echo 1 > /sys/class/hwmon/hwmon3/pwm5_enable
echo 128 > /sys/class/hwmon/hwmon3/pwm5
# 128就是50%转速,255满,跑哪个有反应就是哪个
最终确认3有效
6. 设置开机自动加载驱动与设定风扇转速
编辑系统服务文件:
sudo nano /etc/systemd/system/sysfan.service
写入以下内容:
[Unit]
Description=Set sysfan PWM speed
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/set-sysfan.sh
[Install]
WantedBy=multi-user.target
新建设置脚本:
sudo nano /usr/local/bin/set-sysfan.sh
写入:
#!/bin/bash
modprobe it87
sleep 1
echo 1 > /sys/class/hwmon/hwmon3/pwm3_enable
echo 100 > /sys/class/hwmon/hwmon3/pwm3 # 固定转速:约 100/255
添加执行权限:
chmod +x /usr/local/bin/set-sysfan.sh
启用服务,开机即可自动设置到100/255速度:
systemctl daemon-reexec
systemctl daemon-reload
systemctl enable sysfan.service
systemctl start sysfan.service
7. 可选:手动调速脚本
新建快捷脚本用于手动调整风扇转速:
sudo nano /usr/local/bin/fan-speed.sh
内容如下:
#!/bin/bash
# 用法: fan-speed.sh 120
speed=$1
echo 1 > /sys/class/hwmon/hwmon3/pwm3_enable
echo "$speed" > /sys/class/hwmon/hwmon3/pwm3
添加执行权限:
chmod +x /usr/local/bin/fan-speed.sh
使用方法:
fan-speed.sh 180 # 设置较高转速
fan-speed.sh 100 # 设置较低转速
总结
这块白牌主板使用了少见的 IT8613E 芯片,Linux 默认无驱动支持,需手动编译第三方 it87 驱动,并手动测试确定 PWM 通道。最终,通过 systemd 脚本和 bash 工具实现了完整的 SysFan 控制能力,适用于 GPU 区域散热管理。
PS:更全面的调速策略暂时还没做,毕竟这个sys fan还要管显卡,不过这张t10功耗150W,100/255的速度已经稳定低于50度了,暂时就先定速使用