Материал готовится,

пожалуйста, возвращайтесь позднее

пожалуйста, возвращайтесь позднее

Hypergraph-Partitioning Based Decomposition for Parallel Sparse-Matrix Vector Multiplication�

.

UmitV.C.ataly.urekand CevdetAykanat, Member, IEEE Computer Engineering Department, Bilkent University 06533 Bilkent, Ankara, Turkey fcumit/aykanatg@cs.bilkent.edu.tr

Abstract

In this work, we show that the standard graph-partitioning based decomposition of sparse matrices does not reﬂect the actual communication volume requirement for parallel matrix-vector multiplication. We propose two compu¬tational hypergraph models which avoid this crucial deﬁciency of the graph model. The proposed models reduce the decomposition problem to the well-known hypergraph partitioning problem. The recently proposed successful multilevel framework is exploited to develop a multilevel hypergraph partitioning tool PaToH for the experimental veriﬁcation of our proposed hypergraph models. Experimental results on a wide range of realistic sparse test matrices conﬁrm the validity of the proposed hypergraph models. In the decomposition of the test matrices, the hypergraph models using PaToH and hMeTiS result in up to 63% less communication volume (30%–38% less on the average) than the graph model using MeTiS, while PaToH is only 1.3–2.3 times slower than MeTiS on the average.

Index Terms — Sparse matrices, matrix multiplication, parallel processing, matrix decomposition, computational graph model, graph partitioning, computational hypergraph model, hypergraph partitioning.

�This work is partially supported by the Commission of the European Communities, Directorate General for Industry under contract ITDC 204-82166, and Turkish Science and Research Council under grant EEEAG-160.

1 INTRODUCTION

Iterative solvers are widely used for the solution of large, sparse, linear system of equations on multicomputers. Two basic types of operations are repeatedly performed at each iteration. These are linear operations on dense vectors and sparse-matrix vector product (SpMxV) of the form y�Ax, where A is an m�msquare matrix with the same sparsity structure as the coefﬁcient matrix [3, 5, 8, 35], and y and x are dense vectors. Our goal is the parallelization of the computations in the iterative solvers through rowwise or columnwise decomposition of the A matrix as

where processor Pkowns row stripe Aror column stripe Ac, respectively, for a parallel system with Kprocessors.

kk

In order to avoid the communication of vector components during the linear vector operations, a symmetric parti¬tioning scheme is adopted. That is, all vectors used in the solver are divided conformally with the row partitioning or the column partitioning in rowwise or columnwise decomposition schemes, respectively. In particular, the x and y vectors are divided as [x1�:::�xK]tand [y1�:::�yK]t, respectively. In rowwise decomposition, processor Pkis

responsible for computing yk�Arkxand the linear operations on the k-th blocks of the vectors. In columnwise PK

k

decomposition, processor Pkis responsible for computing y�Akcxk(where y�k�1 yk) and the linear operations on the k-th blocks of the vectors. With these decomposition schemes, the linear vector operations can be easily and efﬁciently parallelized [3, 35], such that only the inner-product computations introduce global communication overhead of which its volume does not scale up with increasing problem size. In parallel SpMxV, the rowwise and columnwise decomposition schemes require communication before or after the local SpMxV computations, thus they can also be considered as pre and post communication schemes, respectively. Depending on the way in which the rows or columns of A are partitioned among the processors, entries in x or entries in yk may need to be communicated among the processors.

.

UmitV.C.ataly.urekand CevdetAykanat, Member, IEEE Computer Engineering Department, Bilkent University 06533 Bilkent, Ankara, Turkey fcumit/aykanatg@cs.bilkent.edu.tr

Abstract

In this work, we show that the standard graph-partitioning based decomposition of sparse matrices does not reﬂect the actual communication volume requirement for parallel matrix-vector multiplication. We propose two compu¬tational hypergraph models which avoid this crucial deﬁciency of the graph model. The proposed models reduce the decomposition problem to the well-known hypergraph partitioning problem. The recently proposed successful multilevel framework is exploited to develop a multilevel hypergraph partitioning tool PaToH for the experimental veriﬁcation of our proposed hypergraph models. Experimental results on a wide range of realistic sparse test matrices conﬁrm the validity of the proposed hypergraph models. In the decomposition of the test matrices, the hypergraph models using PaToH and hMeTiS result in up to 63% less communication volume (30%–38% less on the average) than the graph model using MeTiS, while PaToH is only 1.3–2.3 times slower than MeTiS on the average.

Index Terms — Sparse matrices, matrix multiplication, parallel processing, matrix decomposition, computational graph model, graph partitioning, computational hypergraph model, hypergraph partitioning.

�This work is partially supported by the Commission of the European Communities, Directorate General for Industry under contract ITDC 204-82166, and Turkish Science and Research Council under grant EEEAG-160.

1 INTRODUCTION

Iterative solvers are widely used for the solution of large, sparse, linear system of equations on multicomputers. Two basic types of operations are repeatedly performed at each iteration. These are linear operations on dense vectors and sparse-matrix vector product (SpMxV) of the form y�Ax, where A is an m�msquare matrix with the same sparsity structure as the coefﬁcient matrix [3, 5, 8, 35], and y and x are dense vectors. Our goal is the parallelization of the computations in the iterative solvers through rowwise or columnwise decomposition of the A matrix as

where processor Pkowns row stripe Aror column stripe Ac, respectively, for a parallel system with Kprocessors.

kk

In order to avoid the communication of vector components during the linear vector operations, a symmetric parti¬tioning scheme is adopted. That is, all vectors used in the solver are divided conformally with the row partitioning or the column partitioning in rowwise or columnwise decomposition schemes, respectively. In particular, the x and y vectors are divided as [x1�:::�xK]tand [y1�:::�yK]t, respectively. In rowwise decomposition, processor Pkis

responsible for computing yk�Arkxand the linear operations on the k-th blocks of the vectors. In columnwise PK

k

decomposition, processor Pkis responsible for computing y�Akcxk(where y�k�1 yk) and the linear operations on the k-th blocks of the vectors. With these decomposition schemes, the linear vector operations can be easily and efﬁciently parallelized [3, 35], such that only the inner-product computations introduce global communication overhead of which its volume does not scale up with increasing problem size. In parallel SpMxV, the rowwise and columnwise decomposition schemes require communication before or after the local SpMxV computations, thus they can also be considered as pre and post communication schemes, respectively. Depending on the way in which the rows or columns of A are partitioned among the processors, entries in x or entries in yk may need to be communicated among the processors.

Загрузка...

Выбрать следующее задание

Ты добавил

Выбрать следующее задание

Ты добавил