Linux Box Admin
Trusted Remote Administration
logo

Tilde
What's new
Articles
Micro HowTos
About
Contact



Fresh Micros




Tracking system performance using SAR
(4 votes)
Monday, 20 March 2006
  Tracking system performance with SAR
originally published on March 20, 2006 at linux.com

Sar is the "system activity report" program. In Linux, it is usually found in the sysstat package. The sysstat package includes programs and scripts to capture and summarize performance data, then produce detailed reports. This suite of programs can be very useful in tracking down performance bottlenecks and providing insight into how the system is used throughout the day.

Collecting performance data

The program that gathers performance data is called sadc (system activity data collector). It primarily gets information from the kernel, pulling data out of the virtual /proc filesystem. Then, it saves the data in a file (one per day) named /var/log/sa/saDD where DD is the day of the month.

Two shell scripts are included in the package that control how the data collector is run. The first script, sa1, controls how often data is collected, while sa2 is used to create summary reports (one per day) in /var/log/sa/sarDD. Both scripts are run from cron. On Red Hat Enterprise, the scripts are automatically set up in cron as follows:

# run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib/sa/sa1 1 1
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib/sa/sa2 -A

In the default configuration, data is collected every 10 minutes and summarized just before midnight. If you suspect a performance problem with a particular program, you can use sadc to collect data on a particular process (-x), or it's children (-X), but you will need to set up a custom script using those flags.

As Dr. Heisenberg showed, the act of measuring something changes it. Any tool that collects performance data has some overall negative impact on system performance, but with sar, it seems to be minimal. I ran a test with the sa1 cron job set to gather data every minute (on a server that was not busy) and it didn't cause any serious issues. That may not hold true on a system that is already busy.

Creating reports

If the daily summary reports created by the sa2 script are not enough, you can create your own custom reports using sar. The sar program reads data from the current daily data file unless you specify otherwise. To have sar read a particular data file, use the -f /var/log/sa/saDD flag. You can select multiple files by using multiple -f flags. Since many reports produced by sar are lengthy, you may want to pipe the output to a file.

To create a basic report showing CPU usage and IO wait time percentage, use sar with no flags. It produces a report similar to this:

01:10:00 PM       CPU     %user     %nice   %system   %iowait     %idle
01:20:00 PM       all      7.78      0.00      3.34     20.94     67.94
01:30:00 PM       all      0.75      0.00      0.46      1.71     97.08
01:40:00 PM       all      0.65      0.00      0.48      1.63     97.23
01:50:00 PM       all      0.96      0.00      0.74      2.10     96.19
02:00:00 PM       all      0.58      0.00      0.54      1.87     97.01
02:10:00 PM       all      0.80      0.00      0.60      1.27     97.33
02:20:01 PM       all      0.52      0.00      0.37      1.17     97.94
02:30:00 PM       all      0.49      0.00      0.27      1.18     98.06
Average:          all      1.85      0.00      0.44      2.56     95.14

If the %idle is near zero, your CPU is overloaded. If the %iowait is large, your disks are overloaded.

To see how your paging file is performing, use sar -B to get a report similar to this:

11:00:00 AM  pgpgin/s pgpgout/s   fault/s  majflt/s
11:10:00 AM      8.90     34.08      0.00      0.00
11:20:00 AM      2.65     26.63      0.00      0.00
11:30:00 AM      1.91     34.92      0.00      0.00
11:40:01 AM      0.26     36.78      0.00      0.00
11:50:00 AM      0.53     32.94      0.00      0.00
12:00:00 PM      0.17     30.70      0.00      0.00
12:10:00 PM      1.22     27.89      0.00      0.00
12:20:00 PM      4.11    133.48      0.00      0.00
12:30:00 PM      0.41     31.31      0.00      0.00
Average:       130.91     27.04      0.00      0.00

Raw paging numbers may not be of concern, but a high number of major faults (majflt/s) are an indication that the system needs more memory. (note: majflt/s is only valid with kernel version >= 2.5).

For network statistics, use sar -n DEV. This generates a report that shows transmit and receive statistics for each interface. Here is an abbreviated version of the report:

11:00:00 AM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s
11:10:00 AM        lo      0.62      0.62     35.03     35.03
11:10:00 AM      eth0     29.16     36.71   4159.66  34309.79
11:10:00 AM      eth1      0.00      0.00      0.00      0.00
11:20:00 AM        lo      0.29      0.29     15.85     15.85
11:20:00 AM      eth0     25.52     32.08   3535.10  29638.15
11:20:00 AM      eth1      0.00      0.00      0.00      0.00

For network errors, try sar -n EDEV.

Reports on current activity

Sar can also be used to view what is currently happening with a system component. By passing a time interval (in seconds) and a count for the number of reports to produce, you can take an immediate snapshot of a potential bottleneck.

For example, to see the basic report every second for the next 10 seconds, use sar 1 10. Any of the reports can be run this way to see near real time results.

Benchmarking

Even if you have plenty of horsepower to run your applications, you can use sar to track changes in the workload over time. To do this, save the summary reports (sar only saves seven) to a different directory over a period of a few weeks or a month. This set of reports can serve as a baseline for the normal system workload. Future reports can be compared against the baseline to see how the workload is changing over time. You can automate your comparison reports with awk or your favorite programming language.

In large systems management, benchmarking is important to predict when and how hardware should be upgraded. It also provides ammunition to justify your hardware upgrade requests.

Digging deeper

In my experience, most hardware performance problems are related to the disks, memory, or CPU. Perhaps more frequently, application programming errors or poorly designed databases are the cause of serious performance issues. In any case, sar and friends can give you a comprehensive view of how things are working and help track down and fix a sluggish system. The previous examples just scratch the surface of what can be done with sar. Additional detailed reports can be created with the iostat and mpstat programs included in the package. If you take a look at the documentation, it should be easy to customize a set of reports for your needs.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License.
 

Copyright © 2006,2007 Linux Box Admin.

 
My NHL fan blog