Talk:Thermal Sensors

From ThinkWiki
Revision as of 01:48, 17 March 2006 by Shiyang (Talk | contribs) (Southbridge sensor 0xC1)
Jump to: navigation, search

Secret sensor and the cause of fan always on

On my T43, ecdump offsets 0xC0-0xC2 seem to include 3 more temperature sensors that are not seen in /proc/acpi/ibm/thermal:

# cat /proc/acpi/ibm/thermal;  
temperatures:   44 41 33 42 33 -128 30 -128
# perl -ne 'm/^EC 0xc0: .(..) .(..) .(..) / or next; print hex($1)." ".hex($2)." ".hex($3)."\n"' < /proc/acpi/ibm/ecdump
40 48 43

Note the "48" entry (EC offset 0xC1). Something's pretty hot even at full full speed (level 7, 4700RPM). This sensor increases very quickly when the system starts (in fact, faster than anything else when the CPU is undervolted and fglrx is in maximum powersaving).

Now, note this: the fan kicks up from low speed to medium speed whenever this sensor reaches 46 degrees, even if no other sensor changes; and this seems to usually be the first trigger encountered. Moreover, this sensor hovers around 47-48 degrees even on an idle machine. Taken together, this fully explains the "fan always on" behavior: a previously-unnoticed sensor that's always hot.

Any idea what this sensor is? It seems correlated with WiFi: there's a 2deg difference when I toggle /sys/bus/pci/drivers/ipw2200/*/rf_kill (without ever being associated so this shouldn't affect anything else), and heavy WiFi data transfer increases temperature by several more degrees. This suggests the sensor is located in or close to the mini-PCI slot (i.e., under the touchpad). That region is indeed often hot to the touch. But why would the mini-PCI slot get so hot? Could it be the southbridge, which sits under the mini-PCI slot with no heatsink and poor ventilation? Can anyone correlate this sensor other specific activity, or with blocking of specific ventilation holes, or with cooling of specific components? If it's the mini-PCI slot? The operating temperature of the Intel 2200BG is 0-80 deg.

Caveat: this is my experience with a T43 after undervolting the CPU and activating maximal GPU powersaving using fglrx. It could be that for other people, other components are the first to trigger. But either way, those are 3 temperature sensors we didn't know of and they're used by the Embedded Controller's algorithm.

--Thinker 16:20, 20 Nov 2005 (CET)


At the moment I am experimenting with controlling the fan on Windows XP with a self written tool on a T43 (Model 2668 97G). Having found the information about the secret sensors here I built these into the program and it seems that after starting my cooled (placed outside) T43 the 0xC1 sensor indeed rises fastest but also cools down quite quicky especially if also the CPU is cool. I have seen it hotter than the CPU but not much cooler, so probably it is a small chip connected to the colling element of the CPU.

The values at 0xC0 and 0xC2 also seem to show temperature values here, while 0xC4 is always at 128.

First experiments indicate that as long as all the temperature value are below 43°C the Thinkpad comes up with no fan and stays that way. (The fan control register at EC offset 0x2F set to 0x80, see the bottom of the patch for controlling fan speed page for a description of this register). If 43°C are reached on the 0xC1 sensor, the fan kicks in with low speed while 43°C on the CPU do not activate the fan. With regard to the CPU the kick-in seems to be around 48°C.

Once the fan is on, it goes off again if all the seonsors drop to the area of 38°C or lower (the value may not be precise). But it hardly happens on it's own, for tests I placed it outside in cold weather.

On forums.thinkpad.com is a (discussion) from users who experimented with physically cooling the North- and Southbridge without success. In a different thread there a user claimed that he worked with a couple of Thinkpads and silenced them by turning off unused devices, WLAN being among them.

With the XP WLAN device disabled the temperature on 0xC1 stays around 41°C here even if there is heavy activity on the CPU. It rises as soon as the WLAN device is enabled but hardly goes any hotter than 44°C. But I also could not make it go hot at all running on battery. And the heat reading there somehow more or less follows the value of the CPU.

Bottom line on my T43 (2668 97G): Fan kicks in for CPU around 48°C or 0xC1 at 43°C and then never goes off again unless you use external cooling. 0xC1 sensor could to be related to WLAN (I'm not really sure about it) and/or is probably placed near the CPU. It could also have something to do with running the machine no AC rather than battery.

-- Shimdoax - 2005-11-27


Shimdoax, you said "I have seen it hotter than the CPU but not much cooler, so probably it is a small chip connected to the colling element of the CPU", but also "the temperature on 0xC1 stays around 41°C here even if there is heavy activity on the CPU". It follows that your CPU is never much hotter than 41°C, which I find unlikely... Anyway, on my T43, sensor 0cC1 is correlated with the CPU but very slightly; it is more correlated with the GPU, but not very much either.

I suspect that sensor 0xC1 sits on the system board under the touchpad, since this is consistent with all of the following:

  • In idle with wireless off, sensor 0xC1 has roughly the same temperature as the GPU (which is adjacent on the system board, under the spacebar and TrackPoint buttons).
  • Correlation with the WLAN card activity (which is sandwiched between the system board and the touchpad).
  • Quick warm-up (the southbridge is also on the system board under the touchpad, and has no heat spreader).
  • Negligible effect of fan speed on 0xC1 temperature (the touchpad area is cramped and lacks decent ventilation, hence has negligible air flow).
  • When I place a 12cm-by-12cm pad of thick thermally isolating material (a folded fleece blanket...) under the touchpad, 0xC1 temperature consistently rises by 2-3 degrees (and cools back when I remove the pad); other sensors seem unaffected.

If this is indeed the case, it's hard to see what can be done (other than using a fan control script with an increased threshold for this sensor). It looks like IBM/Lenovo counted on this area being passively cooled through the bottom of the case - see how the bottom of the laptop is designed to allow air flow under the front quarter? However, once the desk under the laptop has warmed up (or if air flow is blocked, as when the laptop is sitting on the top of a lap), things just cook up. The mods which thermally connet the southbridge to the GPU cooling assembly might improve things a bit, but on my system sensor 0xC1 isn't much hotter than the GPU anyway. Maybe ventilation can be improved by letting in more air through the speaker grills on the front - does anyone know what things looks like, under the very front of the palmrest? This won't solve "fan always on" since it will help only when the fan is on, but it may let the fan run at a lower speed.

BTW, Shimdoax, how are you monitoring/controlling the EC under Windows?

--Thinker 18:22, 27 Nov 2005 (CET)


Thinker,

I currently don't know where to read the GPU temp from, so I can't say much about it (I'm running XP and have not found drivers or tools that would display the GPU).

However, regarding my experiments: I had the machine on my desk earlier today (when I wrote the post) on AC with WLAN connection to the office network and "Max. Battery Life" Scheme. I had taken it from the trunk of the car (it's quite cold outside, around freezing). During the whole experiments the CPU hardly went higher than 46°C, most of the time it was around 39°C to 43°C. I wasn't very systematic in these tests, these were just first observations.

However, I think I can confirm that the 43°C on the C1 triggers the fan on my machine here. 48°C to 50°C on the CPU also triggers the fan on. Then I put the laptop outside the window twice. Temperatures dropped quite quickly and around MAX(CPU, 0xC1) of 38°C the fan turned itself off.

Further tests on the WLAN revealed mixed results about correlation. If the CPU goes up the C1 also goes up, even if WLAN is disabled. On the other hand I had cases where WLAN (big folder copy) made the C1 rise ahead of the CPU. The way I tested it, mostly the C1 triggered the fan before the CPU did. This at least explains why CPU undervolting/clocking doesn't help much.

But I think you're right. Without custom scripts I guess it will be hard to keep the C1 below 43°C. This value may even be intentional by IBM. If it is really near the palmrest, higher values may cause burns (I once read about a guy who actually burnt his balls [no joke!] by working with a laptop which had a 42°C - 45°C battery temp. in his lap for an hour or two). So they may think that fan noise is preferrable to bad publicity.

Hence I'm not counting on IBM. Instead I'm currently writing a custom fan control program for XP, that's how I read the EC there. I'll post a first version here later today. Maybe some folks from the hardware modding thread will help to locate the sensors with some cooling spray.

-- Shimodax - 2005-11-27


Shimodax,

Great to see work on a Windows solution, especially from Emtec! (Alas, I let my ZOC registration expire when I switched to Linux). Will you be releasing the source code?

If the 0xC1 sensor is near the southbridge then it will be affected by CPU activity both because of related southbridge activity and by thermal conductance via the motherboard; but I've seen 0xC1 at 47deg and CPU at 59deg (after a long burn-in), so they can't be too close. About the palmrest, IBM actually brags about low palm rest temperature in some of their marketing publication. But ironically the hottest and worst-cooled area of the laptop (where I suspect 0xC1 sits) is in the bottom center right under the touchpad - which tends to coincide with certain anatomical regions... BTW, GPU temp is EC offset 0x7B; there a partial list inside my new fan control script at Talk:ACPI_fan_control_script (I'll move it to the article page soon).

--Thinker 23:20, 27 Nov 2005 (CET)


+LOL+ I wouldn't have expected that anybody would know me :-)

Yes, I'll release source code soon. I took quite some pain in writing this tool without our proprietary classes and libs in order to be able to release the source (or at least maintain a basic Open Source version). I'll see if SourceForge accepts the project (applied on Saturday), otherwise I'll have find another place.

Thanks for the info about the GPU ...

Markus

-- Shimodax - 23:42 (CET) - 2005-11-27


For the record: the new "Shimodax fan control tool : sharing values" topic at the thinkpads.com forums tracks some other users' experience with their sensor. So far the only new observation is that sensor 0x7A (3rd) is probably in the vicinity of the the CPU or northbridge.

--Thinker 12:53, 28 Nov 2005 (CET)


Just now I see the C2 higher than C1 and rest of the system for the first time. Only difference I can think of is the fact that the battery is loading. I hooked it on with 6% left about 30 minutes ago. Usage was mainly web broswing (firefox, maybe a webpage with animated gif ads). C2 triggered the fan at 50°C two times.

CPU 42°C (0x78) APS 41°C (0x79) X7A 34°C (0x7A) GPU 44°C (0x7B) BAT 40°C (0x7C) BAT 31°C (0x7E) XC0 40°C (0xC0) XC1 46°C (0xC1) XC2 48°C (0xC2)

-- Shimodax 00:17 CET - 2005-11-30


Upon further casual observation I would like to offer the theory that the C2 sensor is indeed related to battery loading and may be located rear/left (under the Esc/F1) on a T43. See: page 2 on "Shimodax fan control tool : sharing values"

-- Shimodax 13:27 CET - 2005-12-01


I happen to have a photo of that area from the last time I opened my T43, and indeed it looks like there's some power circuitry there:

T43 CDC

Those two "150 A47L" are just above the ventilation grill. Any idea what they could be?

--Thinker 20:11, 1 Dec 2005 (CET)


Don't know ... they could look like power stabilizing transistors, but I have very little knowledge of electronic (especially of SMD circuits) so that's just wild guessing.

Hoever, the system is currently loading battery again and I played with the fan. The C2 does react to the fan quite slowly and when I forced the fan off it rose no higher than 55°C. Also from touching the bottom of the laptop, I'd say the hottest part of that area is between the grill and the latch for the DRAM expansion (probably below the thing in the center of your photo).

-- Shimodax 01:53 CET - 2005-12-02


Makes perfect sense. So 0xC2 sits under the CDC and monitors the power circuitry (not just battery charging, since it also heats up slightly above its environment without a battery). Then XC2->PWR, I guess. Two more to go: 0x7A and 0xC0 (both are nice and cool here).

--Thinker 03:35, 2 Dec 2005 (CET)


I'll then rename it in my tool with the next release. Btw, do you have any idea what the APS might be on other models?

-- Shimodax 14:07 CET - 2005-12-03


It's easy to check if 0x79 is the HDAPS accelerometer or not: read the HDAPS temperature directly and compare. For getting the HDAPS temperature you can follow the Linux hdaps.c driver, or just reboot to Linux and look at /sys/bus/platform/drivers/hdaps/hdaps/temp1 (and at /proc/acpi/ibm/thermal for the first 8 EC sensors). On my T43, the 0x79 always matches the HDAPS sensor (usually identical but sometimes 1 degree off, probably due to a different sampling time). BTW, my ACPI fan control script monitors both, just in case.

Speaking of which, the table at the top of that script reflects all knowledge gleaned from the forum.tinkpads.com discussion. Feel free to update it (maybe we should move it to a separate and more spacious page?).

--Thinker 15:03, 3 Dec 2005 (CET)


For another view of the 0xC2 area, including a peek under the CDC card, see IBM/Lenovo's CDC removal movie. There seems to be nothing very exciting visible on the upper side on the motherboard (but judging by the plastic buldge in the bottom of the case, there's probably some circuitry on the underside).

--Thinker 16:39, 3 Dec 2005 (CET)


I just had the idea that 0xC0 could be the Northbridge chip. See "Shimodax fan control tool : sharing values"

-- Shimodax 23:15 CET - 2005-12-05


Yes, 0xC0 is very much correlated with CPU temperature. But if it's the northbrighe then it's surpsigingly cool, since northbridges usually run pretty hot, and the 815PM has a small surface area and no cooling assembly whatsoever, see here:

T43 systemboard (click to enlarge)

--Thinker 00:45, 5 Dec 2005 (CET)


Mmmh, I guess I'll remove the keyboard and play with some cooling spray. It seems that a good part of the inside area can be reached through the opening of the keyboard.

-- Shimodax 23:15 CET - 2005-12-05


Just in case - these instructions and movies are pretty useful. It looks like the palmrest should be easy to remove too, but I didn't try that.

Keep us posted :-)

And please take plenty of photos! You never know what you'll want to look up later (as with those 0xC2 power chips above).

--Thinker 17:34, 6 Dec 2005 (CET)


Someone on a German forum reported that he saw pictures on an U.S. forum where someone said he located the 0xC1 with cooling spray. Seems indeed to be below the left of the touchpad on the mainboard (pictures on the forum article linked above)

-- Shimodax 22:50 CET - 2005-12-06


Interesting. That's a T40, right? Similar layout but different cooling assembly. Anyway, the T40 didn't have HDAPS, but on the T43 the HDAPS accelerometer chip is just 1 or 2 centimeters down from the location of the chip marked here. And on the T43, sensor 0xC1 and direct HDAPS reads give very different results. So maybe they moved 0xC1 away on the T43? Or, maybe the temperatures read through by HDAPS driver actually come from a separate sensor located elsewhere (unlikely but possible).

--Thinker 01:05, 7 Dec 2005 (CET)


Yep, that was a T40. Well, I purchased a can of cooling spray today. First results without opening the machine indicate 0xC2 is near the grill below the Esc-F3 keys. The currently still unknown 0x7A cools down if I spray into the PCMCIA slot (also makes sense to place a sensor there, I'd say it is on the mainboard below the slots).

Probably will open the case tonight or tomorrow. I guess for precise results I'll need to remove the bezel and the fan ... quite an adventure ;-). Will keep you posted of further results.

-- Shimodax 2005-12-07


Speaking of the PCMCIA port - I noticed that under heavy CPU load with the fan on, if I insert a PC Card into the slot then the CPU and 0x79 (HDAPS) temperature quickly go up a couple of degrees. This happens even if the PC Card is turned off or not inserted all the way in (no electric contact). Probably blocked airflow.

--Thinker 20:08, 7 Dec 2005 (CET)


I opened the thing today and played around a bit. Here's my assesment of where the sensors are: Image (dunno how to upload an image here, feel free to add it to thinkwiki directly).

C1 is most likely the Southbridge chip itself, APS and BUS are the small highlighted chips or very near them. PWR did not react much to the spray, but does react to spraying throught the grill on the bottom of the case, so it's probably on the underside.

-- Shimodax 2005-12-08

ThinkPad T43 thermal sensors (click to enlarge)


Shimodax,

Beautiful, beautiful work! Got everything back in working order, I hope, including those pesky screwcap stickers?

This confirms our information/guesses about CPU,GPU,BAT,BAT2,PWR and Southbridge, and solves the mystery of 0x7A (PCMCIA) and 0xCA (Northbridge-RAM bus). A few notes:

  • 0x79 is a surprise - it is quite far from the HDAPS acceleromater chip (search for "accelerometer" here), even though it gives the same temperatures as when reading the accelerometer's IO ports. But this explains why 0x89 quickly gets hotter under load when the PCMCIA port is blocked.
  • The mapping is indeed different from the R52 one that people keep citing because of the ibm-acpi documentation - the 2nd and 3rd sensors are not HDD and Mini-PCI.
  • Some of the chips are a quite distant from the hot components they presumably monitor. Most significantly, when some usage causes the Northbridge to heat up rapidly, by the time sensor 0xC0 says 55 degrees the Northbridge core is probably above 60.

Shimodoax, it would be great if you could upload more (unprocessed) photos, either to the Wiki (using the "Upload file" button on the left navigation bar) or by links. Deserves its own Wikipage, I think. It will save a lot of people the need to open their laptop too... For example, in regard to a discussion from yesterday further up the page - got a clear photo of the fan power connector, by any chance?

--Thinker 18:38, 8 Dec 2005 (CET)


See Thermal sensors.

--Thinker 19:48, 8 Dec 2005 (CET)


Just for the record: I did not test the two BAT locations, these were taken from your previous description. Regarding the other locations, I'm quite sure about them but not 100% definite. I would not treat them as bullet proof fact until someone else confirms the experiment.

The whole procedure went quite smoothly. I just found that there were three screws missing in my machine but luckily I had ordered a thinkpad screw kit on Ebay a few days ago. Also, on my model I did not need the stickers, all screws were without one. Just mone minor mishap happened when I stuck one of the #2 screws in the wrong hole (there is an empty hole that looks like a screw hole about 4mm from the intended location) and had a hard time getting the screw out.

I still have no idea how to upload stuff here, maybe becaue I don't have an account here. I do have a couple more pictures, also larger ones, although they are not too great photography. I'll upload them to the thinkpad forum via imageshack and you can grab them and put them here.

-- Shimodax -- 2005-12-08


Maybe those missing screws were the 3 optional Torx screws "protecting" the wireless card (lip service to FCC regulations). Yes, uploads require an account (registation is straightforward, and has some perks). BTW, and that extra hole fooled me too. :-)

--Thinker 20:45, 8 Dec 2005 (CET)


One of the missing screws was the one that secures the HD, the other were two #1 near the front. I just zipped and uploaded the other pictures to our server: T43-2668.zip. Feel free to upload or use them anywhere you want.

-- Shimodax -- 2005-12-08

Southbridge sensor 0xC1

The southbridge on the T43 does not have a built-in thermal sensor, according to its specs (see Intel 82801FBM). So sensor 0xC1 may be adjacent or on the underside of the system board. --Thinker


That's possible. Spraying the Southbridge chip did yield the most direct temperature changes, but they were not as quick as I would have expected (I attributed that to the large size of the chip). So it may indeed be the underside or something near (there wasn't anything visible that looked like the BUS or APS sensors in my picture.)

--Shimodax


Since the southbridge is the hottest thing around, and moreover has a large ceramic package, it gives the quickest heat exchange with the cooling spray (i.e., it "absorbs the cold" fastest). Thermal conductance through the PCB would then allow the sensor to be anywhere nearby. Similarly for some of the other locations, which to my (untrained) eye look more like power regulation components than sensors. So the actual sensors are probably on the underside (otherwise you'd spot them, and it also makes sense in the bottlenek is the user's lap rather than the chips), in the vicinity of the identified components.

--Thinker 17:33, 11 Dec 2005 (CET)


For what it is worth, the other ones (BUS,APS) did react very directly, even to very targeted and short bursts of cold (in one instance BUS went down to zero with one burst). Also, they look quite like the thing the user on the thinkpads forum had identified on the T40. With the southbridge I sprayed that specifically also (just covering the chip) but results were not nearly as direct as with the small chips. Reason could have been that the chip itself generated heat from the core at the same time but if the SB doesn't have it's own sensor, a themrmal sensor on the underside would fit the observation quite as well (or even better).

--Shimodax


Anybody know what the empty screw hole on the bottom panel is?

Thanks

"APS" sensor 0x79

About the 0x79 sensor: I previously claimed that it's the HDAPS chip, because it gave the same values as reading the the Linux hdaps driver (which reads ports 0x1600-0x161f directly). This didn't sit well with Shmidoax's cooling spray experiment. Well, my original claim is misguided - it turns out that the interface at ports 0x1600-0x161f also reports stuff unrelated to HDAPS (namely battery information), so it's probably yet another view into the embedded controller and there's no reason to assign its temperature reports to the HDAPS accelerometer.

--Thinker 00:50, 13 Dec 2005 (CET)


Embedded controller firmware disassembly

This thread points to, and discusses, a commented (partial) disassembly of the embedded controller. It may provide useful hints about the sensors.

--Thinker 07:07, 24 February 2006 (CET)