Difference between revisions of "Talk:Problem with fan noise"

From ThinkWiki
Jump to: navigation, search
(Observations about the Secret Sensors)
(Secret sensor and the cause of fan always on)
Line 197: Line 197:
 
--[[User:Thinker|Thinker]] 16:20, 20 Nov 2005 (CET)
 
--[[User:Thinker|Thinker]] 16:20, 20 Nov 2005 (CET)
 
----
 
----
 
== Observations about the Secret Sensors ==
 
  
 
At the moment I am experimenting with controlling the fan on Windows XP with a self written tool on a {{T43}} (Model 2668 97G).  Having found the information about the secret sensors here I built these into the program and it seems that after starting my cooled (placed outside) {{T43}} the 0xC1 sensor indeed rises fastest but also cools down quite quicky especially if also the CPU is cool.  I have seen it hotter than the CPU but not much cooler, so probably it is a small chip connected to the colling element of the CPU.
 
At the moment I am experimenting with controlling the fan on Windows XP with a self written tool on a {{T43}} (Model 2668 97G).  Having found the information about the secret sensors here I built these into the program and it seems that after starting my cooled (placed outside) {{T43}} the 0xC1 sensor indeed rises fastest but also cools down quite quicky especially if also the CPU is cool.  I have seen it hotter than the CPU but not much cooler, so probably it is a small chip connected to the colling element of the CPU.
Line 214: Line 212:
 
Bottom line on my {{T43}} (2668 97G): Fan kicks in for CPU around 48°C or 0xC1 at 43°C and then never goes off again unless you use external cooling.  0xC1 sensor could to be related to WLAN (I'm not really sure about it) and/or is probably placed near the CPU.  It could also have something to do with running the machine no AC rather than battery.
 
Bottom line on my {{T43}} (2668 97G): Fan kicks in for CPU around 48°C or 0xC1 at 43°C and then never goes off again unless you use external cooling.  0xC1 sensor could to be related to WLAN (I'm not really sure about it) and/or is probably placed near the CPU.  It could also have something to do with running the machine no AC rather than battery.
  
 +
-- Shimdoax - 2005-11-27
 +
----
 +
 +
Shimdoax, you said "''I have seen it hotter than the CPU but not much cooler, so probably it is a small chip connected to the colling element of the CPU''", but also "''the temperature on 0xC1 stays around 41°C here even if there is heavy activity on the CPU''". It follows that your CPU is never much hotter than 41°C, which I find unlikely... Anyway, on my T43, sensor 0cC1 is correlated with the CPU but very slightly; it is more correlated with the GPU, but not very much either.
  
-- Shimdoax - 2005-11-27
+
I suspect that sensor 0xC1 sits on the system board under the touchpad, since this is consistent with all of the following:
 +
* In idle with wireless off, sensor 0xC1 has roughly the same temperature as the GPU (which is adjacent on the system board, under the spacebar and TrackPoint buttons).
 +
* Correlation with the WLAN card activity (which is sandwiched between the system board and the touchpad).
 +
* Quick warm-up (the southbridge is also on the system board under the touchpad, and has no heat spreader).
 +
* Negligible effect of fan speed on 0xC1 temperature (the touchpad area is cramped and lacks decent ventilation, hence has negligible air flow).
 +
* When I place a 12cm-by-12cm pad of thick thermally isolating material (a folded fleece blanket...) under the touchpad, 0xC1 temperature consistently rises by 2-3 degrees (and cools back when I remove the pad); other sensors seem unaffected.
 +
 
 +
If this is indeed the case, it's hard to see what can be done (other than using a fan control script with an increased threshold for this sensor). It looks like IBM/Lenovo counted on this area being passively cooled through the bottom of the case - see how the bottom of the laptop is designed to allow air flow under the front quarter? However, once the desk under the laptop has warmed up (or if air flow is blocked, as when the laptop is sitting on the top of a lap), things just cook up. The [http://forum.thinkpads.com/viewtopic.php?t=14580http://forum.thinkpads.com/viewtopic.php?t=14580 mods] which  thermally connet the southbridge to the GPU cooling assembly might improve things a bit, but on my system sensor 0xC1 isn't much hotter than the GPU anyway. Maybe ventilation can be improved by letting in more air through the speaker grills on the front - does anyone know what things looks like, under the very front of the palmrest? This won't solve "fan always on" since it will help only when the fan is on, but it may let the fan run at a lower speed.
 +
 
 +
BTW, Shimdoax, how are you monitoring/controlling the EC under Windows?
 +
 
 +
--[[User:Thinker|Thinker]] 18:22, 27 Nov 2005 (CET)
 +
----

Revision as of 18:22, 27 November 2005

Problem with fan noise on R51 1829 L7G (ATI M9)

On my R51 the fan is behaving like this:

  • > 45C -> fan on;
  • < 38C -> fan off.

By using cpufreq + laptop_mode + Xorg DynamicClocks + WiFi power management, I get the fan stopped time to time, but only for 3 minutes time (transition from 38 C -> 45 C). The cooling down cycle is taking 20 minutes in the best case.

I knew about the 'ibm_acpi experimental=1' trick, but in my opinion this is not very useful since nobody can guarantee that a temperature greater then 45 C will not damage the laptop and in the same time the transition time is very short (the laptop gets hot fast without fan).

Thinkpad T42 Radeon Mobility M7

When Xorg is running, the fan is always on and pretty loud ! Setting DynamicClocks does not help

it's clear that the GPU is the problem on the thinkpad :

after 10minutes with the fan off temperatures: 44 47 33 52 32 -128 24 -128

1: CPU 2: Mini PCI Module 3: HDD 4: GPU 5: Battery 6: N/A 7: Battery 8: N/A

Controlling the fan speed would be really cool !

What is the maximum temperature not to cross ?


Word on the 'net is that 85 degrees is the max operating temp for most of the Intel chips. I've seen some high 70's all the time (just put it on carpet for awhile and play some quake3 :). I wouldn't let your processor get much higher than 85...


Older versions of xorg (i.e. 6.7.0) don't seem to be able to use the DynamicClocks option although it's set in the xorg.conf. Search the log to find out if it's really used.

Thinkpad R32 with Radeon Mobility M6

Updating xorg-x11 from 6.7.0 to 6.8.2 and using Speedstep (with the ondemand module in this case) helped cooling the system down significantly:

  • before updating the CPU was ~62 C in idle state, and got very near the critical temperature (72 C) during heavy load - I even got some freezes because of the heat ;)
  • after the update the CPU is ~54 C in idle state, and still gets to about 68 C while under heavy load

The second sensor (which may be the GPU) is somehow fixed to 50 C (maybe a bug?)

The fan on the R32 is behaving like this:

  • > 61 -> fan in state 2 (quite noisy)
  • < 55 -> fan in state 1 (less noisy :) )

But I remember using my old SuSE distribution with kernel 2.4.16, apm and some old x11 version the fan actually stopped completely from time to time.

Concerning the maximum temperature of the CPU, I found that the critical temperature on the R32 for the CPU sensor is 72 C (using # cat /proc/acpi/thermal_zone/THM0/trip_points )

Fan Control script: more save version

ibm_acpi works well on my R50 and R51. But to rely on it completely, I modified the script in two ways:

1. It catches verious signals and turns the fan on before it quits

2. It turns off the fan under very strict conditions, leaving it on when unexpected errors occur.

Here is my script:

#!/bin/sh

# july 2005 Erik Groeneveld, erik@cq2.nl
# More conservatiev and saver version
# It make sure the fan is on in case of errors
# and only turns it off when all temps are ok.

IBM_ACPI=/proc/acpi/ibm
THERMOMETER=$IBM_ACPI/thermal
FAN=$IBM_ACPI/fan
MAXTRIPPOINT=65
MINTRIPPOINT=60
TRIPPOINT=$MINTRIPPOINT

echo fancontrol: Thermometer: $THERMOMETER, Fan: $FAN
echo fancontrol: Current `cat $THERMOMETER`
echo fancontrol: Controlling temperatures between $MINTRIPPOINT and $MAXTRIPPOINT degrees.

# Make sure the fan is turned on when the script crashes or is killed
trap "echo enable > $FAN; exit 0" HUP KILL INT ABRT STOP QUIT SEGV TERM

while [ 1 ];
do
       command=enable
       temperatures=`sed s/temperatures:// < $THERMOMETER`
       result=
       for temp in $temperatures
       do
               test $temp -le $TRIPPOINT && result=$result.Ok
       done
       if [ "$result" = ".Ok.Ok.Ok.Ok.Ok.Ok.Ok.Ok" ]; then
               command=disable
               TRIPPOINT=$MAXTRIPPOINT
       else
               command=enable
               TRIPPOINT=$MINTRIPPOINT
       fi
       echo $command > $FAN
       # Temperature ramps up quickly, so pick this not too large:
       sleep 5
done

I added this script to the other ones. Don't wander about my talk edits, i didn't realize i was on the talk page. Wyrfel 01:48, 13 Aug 2005 (CEST)


X41

Same fan problem here on the X41. Once it starts it won't stop (unless it is _very_ cold outside). Undervolting the CPU doesn't help - still the same problem.

Fan speed control?

Only the X31 and X40 have an ACPI method for controlling the FAN speed (this is why ibm_acpi provides this functionality just for these models).

What will happen if we take the "FANS" method from the X40 DSDT, paste it into a iasl-disassembled DSDT of (say) a T43, recompile it and tell the kernel to use the patched DSDT? ibm_acpi will present the functionality, but it may or may not work.

--Thinker 16:16, 28 Sep 2005 (CEST)

Any risk of damaging the hardware when doing this? E.g. what does occur if the system overheats - will the CPU be destroyed are does it automatically switch of? As I've just bought a new X41 I don't want to take any stupid risks - but otherwise I'd say let's try it out.

--gst Thu Sep 29 18:14:13 CEST 2005

I think Intel CPUs have some built-in thermal protection, but I'd hate to test it. And of course, any fiddling with the hardware at this level might damage it. That said, when the CPU is mostly idle it keeps a reasonable temperature even when the fan is disabled, so as long as you keep an eye on both the CPU usage meter and /proc/acpi/ibm/thermal, things should be pretty safe temperature-wise. For extra safety you can force the CPU to its lowest speed via /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq. --Thinker 18:33, 29 Sep 2005 (CEST)

Further discussion

I've just found a very interesting thread regarding the same issue on HP notebooks. IMO it provides many insight information about heat/fan problems in general, the URL is: http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=853249 Especially the posts by the HP engineer "Andy Fisher" are very interesting. IBM should be able to provide the same BIOS fix as HP did (maybe I should have bought an HP notebook instead of a Thinkpad?).

I've also contacted IBM/Lenovo support via the website about the fan issue. Maybe it helps when others do this as well (especially people who bought larger quantities) so that this issue is taken serious by Lenovo. Is there already any official response to this problem?

--gst Thu Sep 29 19:40:34 CEST 2005


Two of the changes mentioned by the HP engineer make perfect sense here: raise the low trip points and make speed transition gradual. Oh, and get rid of the annoying beat pattern (a brief speed pulse every few seconds) it sometimes gets into!

But from our perspective, what would probably be best is to do the whole thing in software, providing the flexibility for personal preferences and smart decisions. The hardware would only enforce emergency override or throttle/shutdown for extreme temperatures. Then we could do cute things like having a software daemon lower the thresholds in a noisy environment (as judged using the built-in microphone) or when the laptop is on the user's lap (as judged by the built-in accelometers).

--Thinker 18:47, 30 Sep 2005 (CEST)

I noticed that on my T43 the fan is usually in one of two modes, low speed (around 3300 RPM, triggered around CPU=47deg) and medium speed (around 4100 RPM, can't figure out the trip condition). The former is nearly inaudible, but the latter is quite noticable in the absense of strong background noises.

Now, the problem is that once it has tripped into medium speed, it usually never comes back to low speed until the next reboot. So once it happens, to quiet things down I can only run one of the fan-disabling scripts given here. But with a disabled fan the T43 is not thermally stable, so it will spend its time moving back and forth between the hysteresis thresholds, i.e., toggling between 4100 RPM and 0 RPM every few minutes. This is quite silly and annoying, when staying at low speed would be both more stable and more quiet.

I hope someone will find a way to control the fan speed, or at least to reset the embedded controller's hysteresis state.

--Thinker 10:29, 6 Oct 2005 (CEST)

When you do changes to e.g. the Energy Schema in Windows or you eject the Thinkpad of the Docking Station it seems that the controllers state is rest. At least on the X41 the fan does stop until it reaches the threshold to start some minutes later. So it should be doable. --85.124.171.70

That's good. But just like a bunch of other functions (e.g., controlling the battery charge threshold), it probably uses low-level undocumented proprietary interfaces which are very hard to figure out without the help of IBM/Lenovo, who are in denial about the whole thing. --Thinker 01:40, 16 Oct 2005 (CEST)


Works fine with APM instead of ACPI?

On my X41 the fan starts after about 10 minutes of use and doesn't stop (until it is rather cold in my room - and even then it runs most of the time ;) A friend of mine who has a X41 too (though another model) and who does use NetBSD and APM doesn't experience this problem. He claims that the fan only comes up if the system is not idle. So either it is colder in his room, the X41 model which he has doesn't have this flaw or APM does use different tresholds than ACPI.

  • Then why not just try the acpi=off kernel parameter and see what happens? --Thinker 18:14, 30 Sep 2005 (CEST)

I currently don't have physical access to the X41. Will try in a few days.

Rewiring the fan?

Since IBM/Lenovo shows no intention of fixing their embedded controller firmware or releasing its specs, how about getting the embedded controller out of the loop? I'd be happy as a clam if my fan was hard-wired to work at a constant 3000RPM, with temperatures kept at bay in software through CPU frequenty control.

Assuming the fan has the standard 3-wire connector, we can probaby keep the sensor and ground wires untouched, and rewire the positive wire to some nearby current source of appropriate voltage (through a resistor, for fine-tuning). The trick would be to find an easily tappable source that can handle an extra 2W and has the appropriate voltage (i.e., just slightly higher than what the fan needs to rotate at that RPM, so we don't waste too much energy in the resistor). Any idea what are the typical fan voltages and what would be an appropriate hookup point?

--Thinker 01:59, 16 Oct 2005 (CEST)


Secret sensor and the cause of fan always on

On my T43, ecdump offsets 0xC0-0xC2 seem to include 3 more temperature sensors that are not seen in /proc/acpi/ibm/thermal:

# cat /proc/acpi/ibm/thermal;  
temperatures:   44 41 33 42 33 -128 30 -128
# perl -ne 'm/^EC 0xc0: .(..) .(..) .(..) / or next; print hex($1)." ".hex($2)." ".hex($3)."\n"' < /proc/acpi/ibm/ecdump
40 48 43

Note the "48" entry (EC offset 0xC1). Something's pretty hot even at full full speed (level 7, 4700RPM). This sensor increases very quickly when the system starts (in fact, faster than anything else when the CPU is undervolted and fglrx is in maximum powersaving).

Now, note this: the fan kicks up from low speed to medium speed whenever this sensor reaches 46 degrees, even if no other sensor changes; and this seems to usually be the first trigger encountered. Moreover, this sensor hovers around 47-48 degrees even on an idle machine. Taken together, this fully explains the "fan always on" behavior: a previously-unnoticed sensor that's always hot.

Any idea what this sensor is? It seems correlated with WiFi: there's a 2deg difference when I toggle /sys/bus/pci/drivers/ipw2200/*/rf_kill (without ever being associated so this shouldn't affect anything else), and heavy WiFi data transfer increases temperature by several more degrees. This suggests the sensor is located in or close to the mini-PCI slot (i.e., under the touchpad). That region is indeed often hot to the touch. But why would the mini-PCI slot get so hot? Could it be the southbridge, which sits under the mini-PCI slot with no heatsink and poor ventilation? Can anyone correlate this sensor other specific activity, or with blocking of specific ventilation holes, or with cooling of specific components? If it's the mini-PCI slot? The operating temperature of the Intel 2200BG is 0-80 deg.

Caveat: this is my experience with a T43 after undervolting the CPU and activating maximal GPU powersaving using fglrx. It could be that for other people, other components are the first to trigger. But either way, those are 3 temperature sensors we didn't know of and they're used by the Embedded Controller's algorithm.

--Thinker 16:20, 20 Nov 2005 (CET)


At the moment I am experimenting with controlling the fan on Windows XP with a self written tool on a T43 (Model 2668 97G). Having found the information about the secret sensors here I built these into the program and it seems that after starting my cooled (placed outside) T43 the 0xC1 sensor indeed rises fastest but also cools down quite quicky especially if also the CPU is cool. I have seen it hotter than the CPU but not much cooler, so probably it is a small chip connected to the colling element of the CPU.

The values at 0xC0 and 0xC2 also seem to show temperature values here, while 0xC4 is always at 128.

First experiments indicate that as long as all the temperature value are below 43°C the Thinkpad comes up with no fan and stays that way. (The fan control register at EC offset 0x2F set to 0x80, see the bottom of the patch for controlling fan speed page for a description of this register). If 43°C are reached on the 0xC1 sensor, the fan kicks in with low speed while 43°C on the CPU do not activate the fan. With regard to the CPU the kick-in seems to be around 48°C.

Once the fan is on, it goes off again if all the seonsors drop to the area of 38°C or lower (the value may not be precise). But it hardly happens on it's own, for tests I placed it outside in cold weather.

On forums.thinkpad.com is a (discussion) from users who experimented with physically cooling the North- and Southbridge without success. In a different thread there a user claimed that he worked with a couple of Thinkpads and silenced them by turning off unused devices, WLAN being among them.

With the XP WLAN device disabled the temperature on 0xC1 stays around 41°C here even if there is heavy activity on the CPU. It rises as soon as the WLAN device is enabled but hardly goes any hotter than 44°C. But I also could not make it go hot at all running on battery. And the heat reading there somehow more or less follows the value of the CPU.

Bottom line on my T43 (2668 97G): Fan kicks in for CPU around 48°C or 0xC1 at 43°C and then never goes off again unless you use external cooling. 0xC1 sensor could to be related to WLAN (I'm not really sure about it) and/or is probably placed near the CPU. It could also have something to do with running the machine no AC rather than battery.

-- Shimdoax - 2005-11-27


Shimdoax, you said "I have seen it hotter than the CPU but not much cooler, so probably it is a small chip connected to the colling element of the CPU", but also "the temperature on 0xC1 stays around 41°C here even if there is heavy activity on the CPU". It follows that your CPU is never much hotter than 41°C, which I find unlikely... Anyway, on my T43, sensor 0cC1 is correlated with the CPU but very slightly; it is more correlated with the GPU, but not very much either.

I suspect that sensor 0xC1 sits on the system board under the touchpad, since this is consistent with all of the following:

  • In idle with wireless off, sensor 0xC1 has roughly the same temperature as the GPU (which is adjacent on the system board, under the spacebar and TrackPoint buttons).
  • Correlation with the WLAN card activity (which is sandwiched between the system board and the touchpad).
  • Quick warm-up (the southbridge is also on the system board under the touchpad, and has no heat spreader).
  • Negligible effect of fan speed on 0xC1 temperature (the touchpad area is cramped and lacks decent ventilation, hence has negligible air flow).
  • When I place a 12cm-by-12cm pad of thick thermally isolating material (a folded fleece blanket...) under the touchpad, 0xC1 temperature consistently rises by 2-3 degrees (and cools back when I remove the pad); other sensors seem unaffected.

If this is indeed the case, it's hard to see what can be done (other than using a fan control script with an increased threshold for this sensor). It looks like IBM/Lenovo counted on this area being passively cooled through the bottom of the case - see how the bottom of the laptop is designed to allow air flow under the front quarter? However, once the desk under the laptop has warmed up (or if air flow is blocked, as when the laptop is sitting on the top of a lap), things just cook up. The mods which thermally connet the southbridge to the GPU cooling assembly might improve things a bit, but on my system sensor 0xC1 isn't much hotter than the GPU anyway. Maybe ventilation can be improved by letting in more air through the speaker grills on the front - does anyone know what things looks like, under the very front of the palmrest? This won't solve "fan always on" since it will help only when the fan is on, but it may let the fan run at a lower speed.

BTW, Shimdoax, how are you monitoring/controlling the EC under Windows?

--Thinker 18:22, 27 Nov 2005 (CET)