Thursday, July 28, 2011

AIX Process Priority and its Control


Introduction
As an AIX® administrator, you should already know the basics of how to work with processes, including researching, prioritizing, and killing them. You should also know how to tune your processes and optimize them accordingly, using the various tools at your disposal. These tools include some of the more recent tools available to you in AIX 5.3. To provide effective process control on your system, it is imperative that you understand the definition of processes and threads and the difference between them. This article also covers the psnice, and schedtune commands, as well as the Process Monitor Console (procmon), AIX Workload Manager (WLM), and other tools available to you. Let's start with definitions of processes and threads:
  • Processes -- A process is an activity within the system that is started with a command, shell script, or another process.
  • Threads -- A thread is an independent flow of control that operates within the same address space as other independent flows of controls within a process. A kernel thread is a single sequential flow of control.
Another way of looking at this is that the process is the entity that the operating system uses to control the use of system resources, while the threads control actual processor-time consumption. Most system management tools still require you to refer to the process rather then the thread. The process itself actually owns the kernel threads and each process can have one or more kernel threads (for example, multi-threaded applications). With threads, you can have multiple threads running on different CPUs on a system, which really takes advantage of computers with more then one processor (Symmetric Multiprocessing or SMP boxes). Applications can be designed to have user-level threads that are scheduled to work by the application or by the pthreads scheduler in libpthreads. Multiple threads of control allow an application to service requests from multiple users at the same time. With the libpthreads implementation, user threads sit on top of virtual processors, which are themselves on top of kernel threads. During this article, delve into more detail on the kernel aspects of a process and tools available to help you more effectively manage your overall system. To help you manage your environment, go through time-tested UNIX® commands and many of the new tools available to you as an AIX administrator.
Threads and SMT
Allowing threads to run on different CPUs also allows for effective utilization of simultaneous multi-threading (SMT). With the system in SMT mode, the processor fetches instructions from more than one thread. Exclusive to the POWER5 architecture, the concept of SMT is that no single process uses all processor execution units at the same time. The POWER5 design implements two-way SMT on each of the chip's cores. The end result is that each physical processor core is represented by two virtual processors. SMT is primarily beneficial in commercial environments where the speed of an individual transaction is not as important as the total number of transactions that are performed. SMT is expected to increase the throughput of workloads with large or frequently changing working sets, such as database servers and Web servers. Workloads that are floating-point intensive are likely to gain little from SMT and are the ones most likely to lose performance. These workloads heavily use either the floating-point units or the memory bandwidth. Workloads with low cycles per instruction (CPI) and low cache miss rates might see some small benefit. Generally, you should expect to see approximately a 30 percent increase in system performance due to SMT. You must determine whether the critical processes running on your system benefit from SMT. Critical processes typically benefit from SMT; however, if you determine otherwise, you need to shut it down -- it comes enabled by default.
Scheduling concepts
I'll try not to spend too much time on the kernel intricacies of the AIX scheduler, but you need to get to a better level of understanding before going into administering processes or tuning the scheduler.
Each CPU on a system has its own dedicated run queue, which is a list of runnable threads sorted by thread priority value. There is also another run queue called the global run queue. All new threads are placed in the global run queue. Every time the CPU is ready to dispatch a thread, this global run queue is checked before any other run queues. When a thread finishes its time slice on the CPU, it goes back on the run queue of the CPU it was running on. This helps AIX to maintain its processor affinity. (I'll discuss processor affinity in more detail later.)
There are some environmental variables you can tune to increase performance on the scheduler, which are out of scope on this article. The CPUs on the system are shared among all of the threads by giving each thread a certain slice of time to run. The default time slice is 10 ms (for one clock tick). It can be changed with the schedo command. Increasing the time slice can improve system throughput due to reduced context switching. One can look at the context switching using either the vmstat orsar commands. If the value of context switching is very high, increasing the time slice can improve performance, but this should only be done after extensive analysis.
Regarding system modes, there are two modes that a CPU operates in: kernel mode and user mode. In user mode, programs have read and write access to the user data in the process private region. This is the mode that a process should accumulate its majority of CPU time on. The other mode is kernel mode. Some of the programs that operate in kernel mode include interrupt handlers and kernel processes. Code operating in this mode has read and write access to the global kernel address space and kernel data in the process region when executing within the context of a process. User data within the process address space must be accessed using kernel services. When a user program accesses system calls, it does so in kernel mode, not user mode. You need to understand this concept when trying to interpret the output of commands, such as vmstat and sar.
Processor affinity and binding processors
Processor affinity is a facility provided by operating systems that is used on SMP hardware. Essentially, all the threads within the process can be bound to run on the specified processor. AIX automatically tries to encourage processor affinity by having one run queue per CPU, which was discussed earlier. Using process affinity settings to bind or unbind threads can help you find the root cause of hangs or deadlocks that are difficult to debug. Some applications might also run faster if their threads are always bound to run on one particular CPU.
In a typical SMP system, all of the processors are identical and can run any thread on its system. Essentially, any process or thread can be dispatched to run on any processor, except for processors or threads that are bound to run on a specific processor. This can be accomplished using the bindprocessor command. Let's look at an example (see Listing 1).

Listing 1. Using the bindprocessor command
# bindprocessor -q

The available processors are:  0 1 2 3

From here, you can see there are four processors to call upon. The available processors are: 0 1 2 3.
This command shows which thread is bound to CPU 3 (see Listing 2).

Listing 2. Discovering which thread is bound to CPU 3
# ps -emo THREAD | grep p3
    root 401544 389152        - A    0  60  1 f10001001ece2fb8   200001  pts/0   
- grep p3

You can also use the SMIT fast path, smit bindproc, to help bind processes. Another way to bind a process is within a program using the bindprocessor API available on AIX. You should understand that these are powerful commands. When binding a process to a CPU, you can actually reduce performance for that process if the CPU to which the process will be bound is busy while others are idle.
Let's talk about the commands that you'll generally use to identify and work with processes.
To get a long listing of files, use the following in Listing 3.

Listing 3. Getting a long listing of files
# ps -ef
     UID    PID   PPID   C    STIME    TTY  TIME CMD
    root      1      0   0   Jan 08      -  0:05 /etc/init 
    root  82126 204974   0   Jan 08      -  0:00 /usr/sbin/snmpmibd 
    root  86210 106640   0   Jan 08      -  0:00 /usr/dt/bin/dtcm 
    root  90172 123038   0   Jan 08      -  0:35 /usr/lpp/X11/bin/X -D /usr/lib/X11//rgb 
-T -force :0 -auth /var/dt/A:0-DjUjUa 
    root  98390      1   0   Jan 08      -  8:36 /usr/sbin/syncd 60 
    root 106640 131160   0   Jan 08      -  0:25 /usr/dt/bin/dtsession

To further identify processes broken down by CPU hogs, see Listing 4.

Listing 4. Identify processes
# ps aux | more
USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
root       8196 12.9  0.0  384  384      - A      Jan 08 14695:30 wait
root      57372 12.8  0.0  384  384      - A      Jan 08 14542:51 wait
root      61470 12.2  0.0  384  384      - A      Jan 08 13884:38 wait
root      53274 12.0  0.0  384  384      - A      Jan 08 13711:38 wait
root     245938  0.0  0.0  828  856      - A      Jan 08 20:17 /usr/bin/xmwlm -
root      98390  0.0  0.0  508  516      - A      Jan 08  8:36 /usr/sbin/syncd 
root      69666  0.0  0.0  960  960      - A      Jan 08  3:46 gil
root          0  0.0  0.0  384  384      - A      Jan 08  2:49 swapper
root      49176  0.0  0.0  448  448      - A      Jan 08  1:13 xmgc
root     241842  0.0  0.0 23

If you want some more information on the nice value of the process, you need to use the -l flag. The NI column is the nice value (see Listing 5).

Listing 5. Using the -l flag to get the nice value
# ps -elf
   F   S   UID   PID   PPID   C PRI   NI ADDR    SZ   WCHAN  STIME   TTY TIME      CMD
200003 A   root   1      0     0      60  20 14001400  660   Jan 08  - 0:05    /etc/init 
240001 A   root  82126 204974  0      60  20 3c22b510  1264  Jan 08  - 0:00 /usr/sbin/snmpmibd 
240801 A   root  86210 106640  0      60  20 584d2400  2156  Jan 08  - 0:00 /usr/dt/bin/dtcm 
240001 A   root  90172 123038  0      60  20 5136 f1000100224650e0  5136  Jan 08  - 0:35 
/usr/lpp/X11/bin/X 
  -D /usr/lib/X11//rgb -T -force :0 -auth /var/dt/A:0-DjUjUa 
240001 A   root  98390     1   0      60  20 41a5400   508 * Jan 08  - 8:36 /usr/sbin/syncd 60 
240001 A   root 106640 131160  0      60  20 3816a400  1880  Jan 08 - 0:25 /usr/dt/bin/dtsession 
40001  A   root 123038     1   0      60  20 5c153400   380  Jan 08-  0:00 /usr/dt/bin/dtlogin 
  -daemon

The command in Listing 6 gives you the top three performing processes, including their nice values.

Listing 6. Getting the top 3 performing processes
# ps -elf | egrep -v "STIME|$LOGNAME" | sort +3 -r | head -n 15
   40401 A   nobody 323762 127128   0  60 20 602dc400   660 f1000600002daa08   Jan 08      -  0:00 
/usr/HTTPServer/bin/httpd -d /usr/HTTPServer -k restart 
   40001 A   nobody 319662 127128   0  60 20 6c35f400  1336        *   Jan 08      -  0:00 
/usr/HTTPServer/bin/httpd -d /usr/HTTPServer -k restart 
   40001 A   nobody 307358 127128   0  60 20 3834a400  1340        *   Jan 08      -  0:00 
/usr/HTTPServer/bin/httpd -d /usr/HTTPServer -k restart 
  240001 A   daemon 254084 204974   0  60 20 58272400  1364            Jan 08      -  0:00 
/usr/sbin/rpc.statd -d 0 -t 50

Now that you know which processes are killing the system (you could also use topas or nmon), what can you do about it? Wouldn't it be nice if there were a command to let you prioritize how the kernel schedules its processing? Of course, there is a command and also another command that allows you to prioritize a process again that is already running. The commands arenice and renice, respectively. A user's job in AIX carries a base priority level of 40 and a default nice value of 20. Together, these two numbers form the default priority level of 60. The vast majority of jobs have this value. The higher the default priority level number, the lower the priority of the job. If you want to start a job with a lower priority, you can try the command in Listing 7.

Listing 7. Starting a job with a lower priority
 
# nice -n 10 thisjob

The command in Listing 7 adds 10 to the default of 20 and creates the new nice value of 30, with the priority of 70.
Running the command in Listing 8 caused process 1683 to have a nice value of 30.

Listing 8. Causing process 1683 to have a nice value of 30
# renice -n 10 -p 1683

The procmon utility
While there are many performance tools that come with the base operating system of AIX, perhaps the best performance monitoring tool introduced recently (in AIX 5.3) is procmon. This utility displays a dynamic, sorted list of processes, with all sorts of information about them. It allows the execution of basic administration commands, such as nicerenice, and kill. The procmon tool runs on the Performance Workbench platform, which is an Eclipse-based tool and also has a nice little graphical user interface to system activity. To start up procmon, start perfwb, which launches Eclipse with the procmon plug-in (see Listing 9). You need the bos.perf.gtools.perfwb fileset.

Listing 9. Starting perfwb
# /usr/bin/perfwb

The procmon tool displays the following, by default:
  • How long a process has been running
  • How much CPU resource the processes are using
  • Whether processes are being penalized by the system
  • How much memory the processes are using
  • How much I/O a process is performing
  • The priority and nice values of a process
  • Who has created a particular process
It also has the following options to perform:
  • procfiles
  • proctree
  • procsig
  • procstack
  • procrun
  • procmap
  • procflags
  • proccred
  • procldd
The process table, the main component of procmon, displays the various processes that are running on the system, and they can be ordered and filtered based on the user configuration. Although, by default, the number of processes listed in the table is 20, the processes can easily be changed using the Table Properties panel from the main menu.
WLM
WLM is a complex tool you can use for performance monitoring, gathering accounting data, and also for managing the load on a standalone system. You can also use it (with DLPAR) as a resource provisioning tool in a partitioned environment. It is an efficient way for system administrators to monitor and control resource usage by processes.
The WLM feature on AIX provides a set of tools that assist in gathering performance statistics and providing you with a mechanism to control allocation of resources to processes. It is intended for use with large systems running multiple applications, databases, and transaction processing systems, where workloads are combined into a single large system ("vertical" server consolidation). It provides the flexibility for dividing system resources between jobs without having to partition a system. You can use WLM to prevent different classes of jobs from interfering with each other and to allocate resources based on the requirements of different groups of users. Many people get this confused with Partition Load Manager (PLM), which is a resource manager that assigns and moves resources based on defined policies and utilization of the resources in an IBM System p™ environment, which contains Advanced Power Virtualization. PLM is able to manage memory, dedicated processor partitions, and shared processor partitions, using micro-partitioning technology to readjust the resources. This adds additional flexibility to the micro-partition flexibility offered by the POWER Hypervisor. Unfortunately, PLM has no knowledge of the importance of any workload running in the partitions and, therefore, cannot readjust priority based on the changes of workload types.
Conclusion
Process management is far from the most exciting aspect of being a UNIX systems administrator. While it can be tedious, it is a necessary evil of systems administration. You always have to field questions on how to speed up a process and why it is taking it so long to finish. You need to identify problematic processes and do everything you can to make them run more efficiently. You also have to identify the best tool to do your job, whether that is simply running a ps command and using renice, working with a new performance utility such as procmon, or introducing an enterprise-wide process scheduling utility like WLM to help manage all your system processes more efficiently. Try to do some additional research on the kernel concepts of processes and scheduling prior to introducing any new elements. Before you do the work, it will be much more helpful in the long run if you really understand what it is you are doing.

No comments:

Post a Comment