Locating persistent hot spots in the data center
December 27, 2006
Appeared in SearchDataCenter.com
The threat of overheating is a chief concern among modern data center operators. The presence of hot spots continues to hamper the efficiency of today's high-density computer equipment.
Even though many data center operators believe they have installed the adequate cooling capacity to support their IT infrastructure, they are still witnessing a high number of hot spots, which are defined as areas of elevated temperature at the inlet side of computer equipment. This can typically be attributed to either a lack of cooling capacity or inability to deliver the cooling where it is needed.
So what is the most efficient way to resolve this complex problem and ensure the data center's cooling infrastructure is optimized?
The facility operator or engineer should begin by completing a thorough evaluation and assessment of the data center based upon three primary techniques: load reconciliation, air distribution evaluation, and systems operational assessment.
Any assessment should start with a load reconciliation, which documents the computer equipment within the room and its associated heat load, then benchmarks it against the facility capacity. The primary goal of this first step is to employ a high level of detail within the load analysis to identify where the low-density and high-density loads are located within the data center, and to quantify each.
Several software programs exist that can track the equipment inventory of the data center to aid in this process. Also, most IT hardware providers can provide actual heat load and cooling requirements data specific to the associated IT equipment. This can usually be attained via the IT manufacturer's website by searching for the associated installation manuals. These usually include an "environmental specifications" section and an ASHRAE "Thermal Report".
The next step, air distribution evaluation, determines the air flow characteristics necessary to satisfy the loads for all locations and assesses the air distribution performance in meeting the load requirements. A key component to this analysis involves various "failure scenarios," which analyze the airflow performance during different air distribution unit (AHUs, CRACs, etc.) failures. This ensures that adequate redundancy exists to meet the facility's operational risk requirements.
Depending on the layout of the air distribution units and the computer equipment load distribution, there may be varying effects on the system performance based upon which unit fails. One unit's failure may have a greater impact to the system than another.
Finally, the actual operational condition of each critical system and component should be evaluated through a Systems Operational Assessment. This step is employed to verify that the infrastructure can perform to rated design capacity. The evaluation should utilize various on-line condition assessment technologies, such as vibration analysis, IR scanning, tribology (lubricant analysis), motor analysis, laser alignment, and computerized balancing.
Some large facilities operations have the in-house capability to perform some or all of these on-line condition assessments. But for most sites, these technologies will be contracted out to service providers who specialize in predictive maintenance and operational condition assessment, along with their trained technicians necessary to perform and evaluate the data. All of these activities are performed in the field and will yield data on the spot. Even rudimentary field tests are possible for lubricants, though sophisticated laboratory tests may be required if the field tests identify problems. The Systems Operational Assessment should include a full evaluation of the operational parameters and control set points to ensure negative outcomes such as CRAC "fighting" and unnecessary latent cooling (dehumidification) are avoided.
Obviously, the goal of this process is to eliminate the hot spots by fully optimizing the data center's existing infrastructure. However, if hot spots persist, then the results of the evaluation and analyses should provide valuable insight and direction regarding what facility and/or infrastructure modifications or upgrades will best serve the data center in the future. Any significant changes or modifications implemented should be followed by a re-commissioning effort to ensure the systems provide the performance expected.