#include <limits.h>
#include <math.h>
#include "superlu_ddefs.h"
Functions/Subroutines | |
| static float | a_plus_at_CompRow_loc (int iam,int_t *perm_r,int nprocs_i,int_t *vtxdist_i,int_t n,int_t *rowptr,int_t *colind,int nprocs_o,int_t *vtxdist_o,int_t *p_bnz,int_t **p_b_rowptr,int_t **p_b_colind,gridinfo_t *grid) |
| float | get_perm_c_parmetis (SuperMatrix *A, int_t *perm_r, int_t *perm_c, int nprocs_i, int noDomains, int_t **sizes, int_t **fstVtxSep, gridinfo_t *grid, MPI_Comm *metis_comm) |
-- Distributed symbolic factorization auxialiary routine (version 2.1) -- Lawrence Berkeley National Lab, Univ. of California Berkeley - July 2003 INRIA France - January 2004 Laura Grigori
November 1, 2007
| static float a_plus_at_CompRow_loc | ( | int | iam, | |
| int_t * | perm_r, | |||
| int | nprocs_i, | |||
| int_t * | vtxdist_i, | |||
| int_t | n, | |||
| int_t * | rowptr, | |||
| int_t * | colind, | |||
| int | nprocs_o, | |||
| int_t * | vtxdist_o, | |||
| int_t * | p_bnz, | |||
| int_t ** | p_b_rowptr, | |||
| int_t ** | p_b_colind, | |||
| gridinfo_t * | grid | |||
| ) | [static] |
Purpose =======
Form the structure of Pr*A +A'Pr'. A is an n-by-n matrix in NRformat_loc format, represented by (rowptr, colind). The output B=Pr*A +A'Pr' is in NRformat_loc format (symmetrically, also row oriented), represented by (b_rowptr, b_colind).
The input matrix A is distributed in block row format on nprocs_i processors. The output matrix B is distributed in block row format on nprocs_o processors, where nprocs_o <= nprocs_i. On output, the matrix B has its rows permuted according to perm_r.
Sketch of the algorithm =======================
Let iam by my process number. Let fst_row, lst_row = m_loc + fst_row be the first/last row stored on iam.
Compute Pr' - the inverse row permutation, stored in iperm_r.
Compute the transpose of the block row of Pr*A that iam owns:
T[:,Pr(fst_row:lst_row)] = Pr' * A[:,fst_row:lst_row] * Pr'
All to all communication such that every processor iam receives all
the blocks of the transpose matrix that it needs, that is
T[fst_row:lst_row, :]
Compute B = A[fst_row:lst_row, :] + T[fst_row:lst_row, :]
If Pr != I or nprocs_i != nprocs_o then permute the rows of B (that is compute Pr*B) and redistribute from nprocs_i to nprocs_o according to the block row distribution in vtxdist_i, vtxdist_o.
| float get_perm_c_parmetis | ( | SuperMatrix * | A, | |
| int_t * | perm_r, | |||
| int_t * | perm_c, | |||
| int | nprocs_i, | |||
| int | noDomains, | |||
| int_t ** | sizes, | |||
| int_t ** | fstVtxSep, | |||
| gridinfo_t * | grid, | |||
| MPI_Comm * | metis_comm | |||
| ) |
Purpose =======
GET_PERM_C_PARMETIS obtains a permutation matrix Pc, by applying a graph partitioning algorithm to the symmetrized graph A+A'. The multilevel graph partitioning algorithm used is the ParMETIS_V3_NodeND routine available in the parallel graph partitioning package parMETIS.
The number of independent sub-domains noDomains computed by this algorithm has to be a power of 2. Hence noDomains is the larger number power of 2 that is smaller than nprocs_i, where nprocs_i = nprow * npcol is the number of processors used in SuperLU_DIST.
Arguments =========
A (input) SuperMatrix*
Matrix A in A*X=B, of dimension (A->nrow, A->ncol). The number
of the linear equations is A->nrow. Matrix A is distributed
in NRformat_loc format.
perm_r (input) int_t*
Row permutation vector of size A->nrow, which defines the
permutation matrix Pr; perm_r[i] = j means row i of A is in
position j in Pr*A.
perm_c (output) int_t*
Column permutation vector of size A->ncol, which defines the
permutation matrix Pc; perm_c[i] = j means column i of A is
in position j in A*Pc.
nprocs_i (input) int*
Number of processors the input matrix is distributed on in a block
row format. It corresponds to number of processors used in
SuperLU_DIST.
noDomains (input) int*, must be power of 2
Number of independent domains to be computed by the graph
partitioning algorithm. ( noDomains <= nprocs_i )
sizes (output) int_t**, of size 2 * noDomains
Returns pointer to an array containing the number of nodes
for each sub-domain and each separator. Separators are stored
from left to right.
Memory for the array is allocated in this routine.
fstVtxSep (output) int_t**, of size 2 * noDomains
Returns pointer to an array containing first node for each
sub-domain and each separator.
Memory for the array is allocated in this routine.
Return value ============ < 0, number of bytes allocated on return from the symbolic factorization. > 0, number of bytes allocated when out of memory.
1.5.5