Next: Stack Management Effect on Up: Improved ADLB Scheme Previous: Improved ADLB Scheme Virtual Network Effect on Load BalancingThe virtual network used in local coordination of neighbor processors for workload distribution and messages propagation has a crucial effect on network communication overhead, in addition to physical network restraints, such as channel bandwidth and startup latency. Since more connections means higher communication costs, the number of neighbors on virtual network determines the message propagation speed across the set of processors. On one hand, having more connections among processors can reduce the message diffusion distance and help balance overloaded or underloaded processors. However, on the other hand, increasing communication correspondences may worsen physical network contaminant, especially on Ethernet, consequently delaying message transmit and degrading the load balancing performance. Thus, an appropriate virtual network is necessary. In fact, the optimal selection of virtual network depends on various factors, such as the hardware environment, numerical algorithm, and the size and difficulty of numerical applications.
Here, for a superior load balancing, we consider a two-dimensional (2-D) torus or 2-D wraparound mesh virtual network shown in Figure 12, in addition to 1-D torus or a ring virtual network. As compared with 1-D torus, 2-D torus has a higher communication overhead
due to more neighbors, but keeps a smaller network diameter,
To assess various load balancing algorithms in parallel computing needs a reasonable and systematical measurement. In performance analysis, parallel speedup the ratio of sequential run time over parallel run time and parallel efficiency the ratio of parallel efficiency over the number of processors used are two of commonly used metrics to measure the capability of parallel algorithms. However, the two metrics can be easily influenced by trial factors. For example, due to Amdahl's law [79,53], parallel efficiency enaturally descends as the number of processors increases, and both speedup and efficiency ascend as the problems size increases, regardless of what kind of parallel system is used. These underlying characteristics on parallel processing point out that these metrics may not be capable of accurately distinguishing the essence of parallel algorithms for different applications, since a scalable parallel system needs to maintain a constant efficiency when altering the application scale (problem size) and computer capacity (processor number). Thus, we here perform a standard scalability analysis based on the theory of isoefficiecny function [53] for a parallel tree search using the best sequential scheme. In definition, the isoefficiency function associated with a parallel system describes how much problem size needs to be increased in proportion to the number of processors as keeping the efficiency at a constant level. This function can be determined experimentally by running the parallel algorithm on the a particular parallel architecture (computer) to solve applications with a wide range of problem sizes. The resulting isoefficiency curve, thereafter, presents the rising rate of problem size with respect to adding more processors for reaching the same efficiency. Thus, it provides a means to evaluate the parallel scalability, that is, the smaller the isoefficiency function is, the more scalable a parallel algorithm presents, and vice versa. In parallel processing, there are two types of overhead influencing the parallel run time: search overhead and communication overhead. The former is due to the situation in which the parallel algorithm undergoes more work or nodes before conclude the globally optimal solution than the sequential one, and is in essence due to inappropriate parallel tree search schemes. The latter is frequently occurred when the load balancing schemes take extra time in distributing work and exchanging message via physical network, and is the targeted subject of this scalability analysis. Thus, when running the best sequential scheme over multi-processors, we ensure no extra computational work from parallel tree search with respect to sequential search, thus excluding the factor of search overhead. However, the best sequential tree search scheme to solve the global optimization is generally unknown in prior to execution. Thus, we apply an equation-solving approach that can solve a system of equations for all feasible solutions, or solves an optimization problem as the global optimization approach does, except that the objective range test is turned off allowing all feasible optimal solutions being sought. Following the equation-solving approach, it ascertains the parallel execution undergoes the same amount of work as in the sequential execution.
The resulting isoefficiency curves of 1-D and 2-D torus on up to 32 processors are exhibited in Figure 13. Note that while the 1-D curve grows exponentially with respect to the number of processors, the 2-D curve merely extends in an almost linear manner. It shows that, in comparison to 1-D torus, 2-D torus averagely needs less amounts of work to maintain a constant efficiency when increasing the processor number. In other words, when using 2-D torus merely a small amount of additional work is required to maintain the same level of processor utilization as more processors are employed. Apparently, the increasing amount of communication requirements in higher order virtual connectivity gives little trouble to the Ethernet network in our cluster. Instead, the superior interaction among processors alleviates load distribution unbalance in the network, consequently maintaining more busy processors at run time that helps enhance the scalability. The result affirmatively demonstrates 2-D torus to be a more scalable virtual network over 1-D torus in our parallel system.In fact, a variety of other virtual networks can also be implanted in ADLB algorithm, such as higher dimensional torus network, hypercube network and tree network. The best selection of virtual network, however, is dependent on the parallel system of hardware and software that executes parallel processing. In practice, the optimal virtual network are determined experimentally by the scalability analysis [53] as what has been discussed above. Thus, we here can conclude that the 2-D ADLB algorithm is a very suitable parallel BB to the COW system connected by Ethernet network, and when incorporating with interval analysis or INTBIS, it provides a capability to effective utilize the full power of all processors for finding all solutions of a system of equations in a timely manner.
Next: Stack Management Effect on Up: Improved ADLB Scheme Previous: Improved ADLB Scheme ChaoYang Gau 2001-03-12 |