Difference between revisions of "Talk:Problem with fan noise"

From ThinkWiki
Jump to: navigation, search
(Secret sensor and the cause of fan always on)
(Secret sensor and the cause of fan always on)
Line 332: Line 332:
 
----
 
----
 
I just had the idea that 0xC0 could be the Northbridge chip. See [http://forum.thinkpads.com/viewtopic.php?p=111974 "Shimodax fan control tool : sharing values"]
 
I just had the idea that 0xC0 could be the Northbridge chip. See [http://forum.thinkpads.com/viewtopic.php?p=111974 "Shimodax fan control tool : sharing values"]
 +
 
-- Shimodax 23:15 CET - 2005-12-05
 
-- Shimodax 23:15 CET - 2005-12-05
 
----
 
----
Line 341: Line 342:
 
--[[User:Thinker|Thinker]] 00:45, 5 Dec 2005 (CET)
 
--[[User:Thinker|Thinker]] 00:45, 5 Dec 2005 (CET)
 
----
 
----
 +
Mmmh, I guess I'll remove the keyboard and play with some cooling spray.  It seems that a good part of the inside area can be reached through the opening of the keyboard.
 +
 +
-- Shimodax 23:15 CET - 2005-12-05

Revision as of 16:10, 6 December 2005

Problem with fan noise on R51 1829 L7G (ATI M9)

On my R51 the fan is behaving like this:

  • > 45C -> fan on;
  • < 38C -> fan off.

By using cpufreq + laptop_mode + Xorg DynamicClocks + WiFi power management, I get the fan stopped time to time, but only for 3 minutes time (transition from 38 C -> 45 C). The cooling down cycle is taking 20 minutes in the best case.

I knew about the 'ibm_acpi experimental=1' trick, but in my opinion this is not very useful since nobody can guarantee that a temperature greater then 45 C will not damage the laptop and in the same time the transition time is very short (the laptop gets hot fast without fan).

Thinkpad T42 Radeon Mobility M7

When Xorg is running, the fan is always on and pretty loud ! Setting DynamicClocks does not help

it's clear that the GPU is the problem on the thinkpad :

after 10minutes with the fan off temperatures: 44 47 33 52 32 -128 24 -128

1: CPU 2: Mini PCI Module 3: HDD 4: GPU 5: Battery 6: N/A 7: Battery 8: N/A

Controlling the fan speed would be really cool !

What is the maximum temperature not to cross ?


Word on the 'net is that 85 degrees is the max operating temp for most of the Intel chips. I've seen some high 70's all the time (just put it on carpet for awhile and play some quake3 :). I wouldn't let your processor get much higher than 85...


Older versions of xorg (i.e. 6.7.0) don't seem to be able to use the DynamicClocks option although it's set in the xorg.conf. Search the log to find out if it's really used.

Thinkpad R32 with Radeon Mobility M6

Updating xorg-x11 from 6.7.0 to 6.8.2 and using Speedstep (with the ondemand module in this case) helped cooling the system down significantly:

  • before updating the CPU was ~62 C in idle state, and got very near the critical temperature (72 C) during heavy load - I even got some freezes because of the heat ;)
  • after the update the CPU is ~54 C in idle state, and still gets to about 68 C while under heavy load

The second sensor (which may be the GPU) is somehow fixed to 50 C (maybe a bug?)

The fan on the R32 is behaving like this:

  • > 61 -> fan in state 2 (quite noisy)
  • < 55 -> fan in state 1 (less noisy :) )

But I remember using my old SuSE distribution with kernel 2.4.16, apm and some old x11 version the fan actually stopped completely from time to time.

Concerning the maximum temperature of the CPU, I found that the critical temperature on the R32 for the CPU sensor is 72 C (using # cat /proc/acpi/thermal_zone/THM0/trip_points )

Fan Control script: more safe version

ibm_acpi works well on my R50 and R51. But to rely on it completely, I modified the script in two ways:

1. It catches verious signals and turns the fan on before it quits

2. It turns off the fan under very strict conditions, leaving it on when unexpected errors occur.

Here is my script:

#!/bin/sh

# july 2005 Erik Groeneveld, erik@cq2.nl
# More conservatiev and saver version
# It make sure the fan is on in case of errors
# and only turns it off when all temps are ok.

IBM_ACPI=/proc/acpi/ibm
THERMOMETER=$IBM_ACPI/thermal
FAN=$IBM_ACPI/fan
MAXTRIPPOINT=65
MINTRIPPOINT=60
TRIPPOINT=$MINTRIPPOINT

echo fancontrol: Thermometer: $THERMOMETER, Fan: $FAN
echo fancontrol: Current `cat $THERMOMETER`
echo fancontrol: Controlling temperatures between $MINTRIPPOINT and $MAXTRIPPOINT degrees.

# Make sure the fan is turned on when the script crashes or is killed
trap "echo enable > $FAN; exit 0" HUP KILL INT ABRT STOP QUIT SEGV TERM

while [ 1 ];
do
       command=enable
       temperatures=`sed s/temperatures:// < $THERMOMETER`
       result=
       for temp in $temperatures
       do
               test $temp -le $TRIPPOINT && result=$result.Ok
       done
       if [ "$result" = ".Ok.Ok.Ok.Ok.Ok.Ok.Ok.Ok" ]; then
               command=disable
               TRIPPOINT=$MAXTRIPPOINT
       else
               command=enable
               TRIPPOINT=$MINTRIPPOINT
       fi
       echo $command > $FAN
       # Temperature ramps up quickly, so pick this not too large:
       sleep 5
done

I added this script to the other ones. Don't wander about my talk edits, i didn't realize i was on the talk page. Wyrfel 01:48, 13 Aug 2005 (CEST)


X41

Same fan problem here on the X41. Once it starts it won't stop (unless it is _very_ cold outside). Undervolting the CPU doesn't help - still the same problem.

Fan speed control?

Only the X31 and X40 have an ACPI method for controlling the FAN speed (this is why ibm_acpi provides this functionality just for these models).

What will happen if we take the "FANS" method from the X40 DSDT, paste it into a iasl-disassembled DSDT of (say) a T43, recompile it and tell the kernel to use the patched DSDT? ibm_acpi will present the functionality, but it may or may not work.

--Thinker 16:16, 28 Sep 2005 (CEST)

Any risk of damaging the hardware when doing this? E.g. what does occur if the system overheats - will the CPU be destroyed are does it automatically switch of? As I've just bought a new X41 I don't want to take any stupid risks - but otherwise I'd say let's try it out.

--gst Thu Sep 29 18:14:13 CEST 2005

I think Intel CPUs have some built-in thermal protection, but I'd hate to test it. And of course, any fiddling with the hardware at this level might damage it. That said, when the CPU is mostly idle it keeps a reasonable temperature even when the fan is disabled, so as long as you keep an eye on both the CPU usage meter and /proc/acpi/ibm/thermal, things should be pretty safe temperature-wise. For extra safety you can force the CPU to its lowest speed via /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq. --Thinker 18:33, 29 Sep 2005 (CEST)

Further discussion

I've just found a very interesting thread regarding the same issue on HP notebooks. IMO it provides many insight information about heat/fan problems in general, the URL is: http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=853249 Especially the posts by the HP engineer "Andy Fisher" are very interesting. IBM should be able to provide the same BIOS fix as HP did (maybe I should have bought an HP notebook instead of a Thinkpad?).

I've also contacted IBM/Lenovo support via the website about the fan issue. Maybe it helps when others do this as well (especially people who bought larger quantities) so that this issue is taken serious by Lenovo. Is there already any official response to this problem?

--gst Thu Sep 29 19:40:34 CEST 2005


Two of the changes mentioned by the HP engineer make perfect sense here: raise the low trip points and make speed transition gradual. Oh, and get rid of the annoying beat pattern (a brief speed pulse every few seconds) it sometimes gets into!

But from our perspective, what would probably be best is to do the whole thing in software, providing the flexibility for personal preferences and smart decisions. The hardware would only enforce emergency override or throttle/shutdown for extreme temperatures. Then we could do cute things like having a software daemon lower the thresholds in a noisy environment (as judged using the built-in microphone) or when the laptop is on the user's lap (as judged by the built-in accelometers).

--Thinker 18:47, 30 Sep 2005 (CEST)

I noticed that on my T43 the fan is usually in one of two modes, low speed (around 3300 RPM, triggered around CPU=47deg) and medium speed (around 4100 RPM, can't figure out the trip condition). The former is nearly inaudible, but the latter is quite noticable in the absense of strong background noises.

Now, the problem is that once it has tripped into medium speed, it usually never comes back to low speed until the next reboot. So once it happens, to quiet things down I can only run one of the fan-disabling scripts given here. But with a disabled fan the T43 is not thermally stable, so it will spend its time moving back and forth between the hysteresis thresholds, i.e., toggling between 4100 RPM and 0 RPM every few minutes. This is quite silly and annoying, when staying at low speed would be both more stable and more quiet.

I hope someone will find a way to control the fan speed, or at least to reset the embedded controller's hysteresis state.

--Thinker 10:29, 6 Oct 2005 (CEST)

When you do changes to e.g. the Energy Schema in Windows or you eject the Thinkpad of the Docking Station it seems that the controllers state is rest. At least on the X41 the fan does stop until it reaches the threshold to start some minutes later. So it should be doable. --85.124.171.70

That's good. But just like a bunch of other functions (e.g., controlling the battery charge threshold), it probably uses low-level undocumented proprietary interfaces which are very hard to figure out without the help of IBM/Lenovo, who are in denial about the whole thing. --Thinker 01:40, 16 Oct 2005 (CEST)


Works fine with APM instead of ACPI?

On my X41 the fan starts after about 10 minutes of use and doesn't stop (until it is rather cold in my room - and even then it runs most of the time ;) A friend of mine who has a X41 too (though another model) and who does use NetBSD and APM doesn't experience this problem. He claims that the fan only comes up if the system is not idle. So either it is colder in his room, the X41 model which he has doesn't have this flaw or APM does use different tresholds than ACPI.

  • Then why not just try the acpi=off kernel parameter and see what happens? --Thinker 18:14, 30 Sep 2005 (CEST)

I currently don't have physical access to the X41. Will try in a few days.

Rewiring the fan?

Since IBM/Lenovo shows no intention of fixing their embedded controller firmware or releasing its specs, how about getting the embedded controller out of the loop? I'd be happy as a clam if my fan was hard-wired to work at a constant 3000RPM, with temperatures kept at bay in software through CPU frequenty control.

Assuming the fan has the standard 3-wire connector, we can probaby keep the sensor and ground wires untouched, and rewire the positive wire to some nearby current source of appropriate voltage (through a resistor, for fine-tuning). The trick would be to find an easily tappable source that can handle an extra 2W and has the appropriate voltage (i.e., just slightly higher than what the fan needs to rotate at that RPM, so we don't waste too much energy in the resistor). Any idea what are the typical fan voltages and what would be an appropriate hookup point?

--Thinker 01:59, 16 Oct 2005 (CEST)


Secret sensor and the cause of fan always on

On my T43, ecdump offsets 0xC0-0xC2 seem to include 3 more temperature sensors that are not seen in /proc/acpi/ibm/thermal:

# cat /proc/acpi/ibm/thermal;  
temperatures:   44 41 33 42 33 -128 30 -128
# perl -ne 'm/^EC 0xc0: .(..) .(..) .(..) / or next; print hex($1)." ".hex($2)." ".hex($3)."\n"' < /proc/acpi/ibm/ecdump
40 48 43

Note the "48" entry (EC offset 0xC1). Something's pretty hot even at full full speed (level 7, 4700RPM). This sensor increases very quickly when the system starts (in fact, faster than anything else when the CPU is undervolted and fglrx is in maximum powersaving).

Now, note this: the fan kicks up from low speed to medium speed whenever this sensor reaches 46 degrees, even if no other sensor changes; and this seems to usually be the first trigger encountered. Moreover, this sensor hovers around 47-48 degrees even on an idle machine. Taken together, this fully explains the "fan always on" behavior: a previously-unnoticed sensor that's always hot.

Any idea what this sensor is? It seems correlated with WiFi: there's a 2deg difference when I toggle /sys/bus/pci/drivers/ipw2200/*/rf_kill (without ever being associated so this shouldn't affect anything else), and heavy WiFi data transfer increases temperature by several more degrees. This suggests the sensor is located in or close to the mini-PCI slot (i.e., under the touchpad). That region is indeed often hot to the touch. But why would the mini-PCI slot get so hot? Could it be the southbridge, which sits under the mini-PCI slot with no heatsink and poor ventilation? Can anyone correlate this sensor other specific activity, or with blocking of specific ventilation holes, or with cooling of specific components? If it's the mini-PCI slot? The operating temperature of the Intel 2200BG is 0-80 deg.

Caveat: this is my experience with a T43 after undervolting the CPU and activating maximal GPU powersaving using fglrx. It could be that for other people, other components are the first to trigger. But either way, those are 3 temperature sensors we didn't know of and they're used by the Embedded Controller's algorithm.

--Thinker 16:20, 20 Nov 2005 (CET)


At the moment I am experimenting with controlling the fan on Windows XP with a self written tool on a T43 (Model 2668 97G). Having found the information about the secret sensors here I built these into the program and it seems that after starting my cooled (placed outside) T43 the 0xC1 sensor indeed rises fastest but also cools down quite quicky especially if also the CPU is cool. I have seen it hotter than the CPU but not much cooler, so probably it is a small chip connected to the colling element of the CPU.

The values at 0xC0 and 0xC2 also seem to show temperature values here, while 0xC4 is always at 128.

First experiments indicate that as long as all the temperature value are below 43°C the Thinkpad comes up with no fan and stays that way. (The fan control register at EC offset 0x2F set to 0x80, see the bottom of the patch for controlling fan speed page for a description of this register). If 43°C are reached on the 0xC1 sensor, the fan kicks in with low speed while 43°C on the CPU do not activate the fan. With regard to the CPU the kick-in seems to be around 48°C.

Once the fan is on, it goes off again if all the seonsors drop to the area of 38°C or lower (the value may not be precise). But it hardly happens on it's own, for tests I placed it outside in cold weather.

On forums.thinkpad.com is a (discussion) from users who experimented with physically cooling the North- and Southbridge without success. In a different thread there a user claimed that he worked with a couple of Thinkpads and silenced them by turning off unused devices, WLAN being among them.

With the XP WLAN device disabled the temperature on 0xC1 stays around 41°C here even if there is heavy activity on the CPU. It rises as soon as the WLAN device is enabled but hardly goes any hotter than 44°C. But I also could not make it go hot at all running on battery. And the heat reading there somehow more or less follows the value of the CPU.

Bottom line on my T43 (2668 97G): Fan kicks in for CPU around 48°C or 0xC1 at 43°C and then never goes off again unless you use external cooling. 0xC1 sensor could to be related to WLAN (I'm not really sure about it) and/or is probably placed near the CPU. It could also have something to do with running the machine no AC rather than battery.

-- Shimdoax - 2005-11-27


Shimdoax, you said "I have seen it hotter than the CPU but not much cooler, so probably it is a small chip connected to the colling element of the CPU", but also "the temperature on 0xC1 stays around 41°C here even if there is heavy activity on the CPU". It follows that your CPU is never much hotter than 41°C, which I find unlikely... Anyway, on my T43, sensor 0cC1 is correlated with the CPU but very slightly; it is more correlated with the GPU, but not very much either.

I suspect that sensor 0xC1 sits on the system board under the touchpad, since this is consistent with all of the following:

  • In idle with wireless off, sensor 0xC1 has roughly the same temperature as the GPU (which is adjacent on the system board, under the spacebar and TrackPoint buttons).
  • Correlation with the WLAN card activity (which is sandwiched between the system board and the touchpad).
  • Quick warm-up (the southbridge is also on the system board under the touchpad, and has no heat spreader).
  • Negligible effect of fan speed on 0xC1 temperature (the touchpad area is cramped and lacks decent ventilation, hence has negligible air flow).
  • When I place a 12cm-by-12cm pad of thick thermally isolating material (a folded fleece blanket...) under the touchpad, 0xC1 temperature consistently rises by 2-3 degrees (and cools back when I remove the pad); other sensors seem unaffected.

If this is indeed the case, it's hard to see what can be done (other than using a fan control script with an increased threshold for this sensor). It looks like IBM/Lenovo counted on this area being passively cooled through the bottom of the case - see how the bottom of the laptop is designed to allow air flow under the front quarter? However, once the desk under the laptop has warmed up (or if air flow is blocked, as when the laptop is sitting on the top of a lap), things just cook up. The mods which thermally connet the southbridge to the GPU cooling assembly might improve things a bit, but on my system sensor 0xC1 isn't much hotter than the GPU anyway. Maybe ventilation can be improved by letting in more air through the speaker grills on the front - does anyone know what things looks like, under the very front of the palmrest? This won't solve "fan always on" since it will help only when the fan is on, but it may let the fan run at a lower speed.

BTW, Shimdoax, how are you monitoring/controlling the EC under Windows?

--Thinker 18:22, 27 Nov 2005 (CET)


Thinker,

I currently don't know where to read the GPU temp from, so I can't say much about it (I'm running XP and have not found drivers or tools that would display the GPU).

However, regarding my experiments: I had the machine on my desk earlier today (when I wrote the post) on AC with WLAN connection to the office network and "Max. Battery Life" Scheme. I had taken it from the trunk of the car (it's quite cold outside, around freezing). During the whole experiments the CPU hardly went higher than 46°C, most of the time it was around 39°C to 43°C. I wasn't very systematic in these tests, these were just first observations.

However, I think I can confirm that the 43°C on the C1 triggers the fan on my machine here. 48°C to 50°C on the CPU also triggers the fan on. Then I put the laptop outside the window twice. Temperatures dropped quite quickly and around MAX(CPU, 0xC1) of 38°C the fan turned itself off.

Further tests on the WLAN revealed mixed results about correlation. If the CPU goes up the C1 also goes up, even if WLAN is disabled. On the other hand I had cases where WLAN (big folder copy) made the C1 rise ahead of the CPU. The way I tested it, mostly the C1 triggered the fan before the CPU did. This at least explains why CPU undervolting/clocking doesn't help much.

But I think you're right. Without custom scripts I guess it will be hard to keep the C1 below 43°C. This value may even be intentional by IBM. If it is really near the palmrest, higher values may cause burns (I once read about a guy who actually burnt his balls [no joke!] by working with a laptop which had a 42°C - 45°C battery temp. in his lap for an hour or two). So they may think that fan noise is preferrable to bad publicity.

Hence I'm not counting on IBM. Instead I'm currently writing a custom fan control program for XP, that's how I read the EC there. I'll post a first version here later today. Maybe some folks from the hardware modding thread will help to locate the sensors with some cooling spray.

-- Shimodax - 2005-11-27


Shimodax,

Great to see work on a Windows solution, especially from Emtec! (Alas, I let my ZOC registration expire when I switched to Linux). Will you be releasing the source code?

If the 0xC1 sensor is near the southbridge then it will be affected by CPU activity both because of related southbridge activity and by thermal conductance via the motherboard; but I've seen 0xC1 at 47deg and CPU at 59deg (after a long burn-in), so they can't be too close. About the palmrest, IBM actually brags about low palm rest temperature in some of their marketing publication. But ironically the hottest and worst-cooled area of the laptop (where I suspect 0xC1 sits) is in the bottom center right under the touchpad - which tends to coincide with certain anatomical regions... BTW, GPU temp is EC offset 0x7B; there a partial list inside my new fan control script at Talk:ACPI_fan_control_script (I'll move it to the article page soon).

--Thinker 23:20, 27 Nov 2005 (CET)


+LOL+ I wouldn't have expected that anybody would know me :-)

Yes, I'll release source code soon. I took quite some pain in writing this tool without our proprietary classes and libs in order to be able to release the source (or at least maintain a basic Open Source version). I'll see if SourceForge accepts the project (applied on Saturday), otherwise I'll have find another place.

Thanks for the info about the GPU ...

Markus

-- Shimodax - 23:42 (CET) - 2005-11-27


For the record: the new "Shimodax fan control tool : sharing values" topic at the thinkpads.com forums tracks some other users' experience with their sensor. So far the only new observation is that sensor 0x7A (3rd) is probably in the vicinity of the the CPU or northbridge.

--Thinker 12:53, 28 Nov 2005 (CET)


Just now I see the C2 higher than C1 and rest of the system for the first time. Only difference I can think of is the fact that the battery is loading. I hooked it on with 6% left about 30 minutes ago. Usage was mainly web broswing (firefox, maybe a webpage with animated gif ads). C2 triggered the fan at 50°C two times.

CPU 42°C (0x78) APS 41°C (0x79) X7A 34°C (0x7A) GPU 44°C (0x7B) BAT 40°C (0x7C) BAT 31°C (0x7E) XC0 40°C (0xC0) XC1 46°C (0xC1) XC2 48°C (0xC2)

-- Shimodax 00:17 CET - 2005-11-30


Upon further casual observation I would like to offer the theory that the C2 sensor is indeed related to battery loading and may be located rear/left (under the Esc/F1) on a T43. See: page 2 on "Shimodax fan control tool : sharing values"

-- Shimodax 13:27 CET - 2005-12-01


I happen to have a photo of that area from the last time I opened my T43, and indeed it looks like there's some power circuitry there:

T43 CDC

Those two "150 A47L" are just above the ventilation grill. Any idea what they could be?

--Thinker 20:11, 1 Dec 2005 (CET)


Don't know ... they could look like power stabilizing transistors, but I have very little knowledge of electronic (especially of SMD circuits) so that's just wild guessing.

Hoever, the system is currently loading battery again and I played with the fan. The C2 does react to the fan quite slowly and when I forced the fan off it rose no higher than 55°C. Also from touching the bottom of the laptop, I'd say the hottest part of that area is between the grill and the latch for the DRAM expansion (probably below the thing in the center of your photo).

-- Shimodax 01:53 CET - 2005-12-02


Makes perfect sense. So 0xC2 sits under the CDC and monitors the power circuitry (not just battery charging, since it also heats up slightly above its environment without a battery). Then XC2->PWR, I guess. Two more to go: 0x7A and 0xC0 (both are nice and cool here).

--Thinker 03:35, 2 Dec 2005 (CET)


I'll then rename it in my tool with the next release. Btw, do you have any idea what the APS might be on other models?

-- Shimodax 14:07 CET - 2005-12-03


It's easy to check if 0x79 is the HDAPS accelerometer or not: read the HDAPS temperature directly and compare. For getting the HDAPS temperature you can follow the Linux hdaps.c driver, or just reboot to Linux and look at /sys/bus/platform/drivers/hdaps/hdaps/temp1 (and at /proc/acpi/ibm/thermal for the first 8 EC sensors). On my T43, the 0x79 always matches the HDAPS sensor (usually identical but sometimes 1 degree off, probably due to a different sampling time). BTW, my ACPI fan control script monitors both, just in case.

Speaking of which, the table at the top of that script reflects all knowledge gleaned from the forum.tinkpads.com discussion. Feel free to update it (maybe we should move it to a separate and more spacious page?).

--Thinker 15:03, 3 Dec 2005 (CET)


For another view of the 0xC2 area, including a peek under the CDC card, see IBM/Lenovo's CDC removal movie. There seems to be nothing very exciting visible on the upper side on the motherboard (but judging by the plastic buldge in the bottom of the case, there's probably some circuitry on the underside).

--Thinker 16:39, 3 Dec 2005 (CET)


I just had the idea that 0xC0 could be the Northbridge chip. See "Shimodax fan control tool : sharing values"

-- Shimodax 23:15 CET - 2005-12-05


Yes, 0xC0 is very much correlated with CPU temperature. But if it's the northbrighe then it's surpsigingly cool, since northbridges usually run pretty hot, and the 815PM has a small surface area and no cooling assembly whatsoever, see here:

T43 systemboard (click to enlarge)

--Thinker 00:45, 5 Dec 2005 (CET)


Mmmh, I guess I'll remove the keyboard and play with some cooling spray. It seems that a good part of the inside area can be reached through the opening of the keyboard.

-- Shimodax 23:15 CET - 2005-12-05