Support for hypre mixedint #1583

jandrej · 2020-06-26T23:27:13Z

This PR should add support for hypre built with the option --enable-mixedint, e.g. allow local sizes to use 32bit integers and global sizes to use 64bit integers.

I added an initial guess of changes and I hope that we get a group of contributors that go through a group of files and check for errors or add changes.

Right now the code branch does not compile without changes. What I did locally was commenting out the complex operator files. With that ex1p runs fine in serial and parallel.

If you participate and want to compile individual files, I recommend using the makefile build system and specifying the file you are working on with

make fem/pfespace.o

this way you only get the errors from the file you are looking at.

I'd especially like to see someone help with the complex stuff, since I'm not comfortable just churning through the files. @dylan-copeland @mlstowell @psocratis

TODO:

In INSTALL, document that mixed-int support requires hypre >= 2.20.0

PR	Author	Editor	Reviewers	Assignment	Approval	Merge
#1583	@jandrej	@tzanio	@v-dobrev + @dylan-copeland + @psocratis	04/20/21	04/24/21	04/28/21

mlstowell · 2020-07-14T19:39:52Z

@jandrej , I think I found all of the BigInt changes in complex_operator and complex_fem. If you notice any other issues I'd be glad to help.

Thanks for tackling this!

jandrej · 2020-07-14T21:06:14Z

Thanks! Compiles fine, make test shows problems with HypreParMatrixBlocks and

    NURBS miniapp [ mpirun -np 4 nurbs_ex11p ... ]: FAILED  (0.13s 17768kB)
Options used:
   --mesh ../../data/star.mesh
   --refine-serial 2
   --refine-parallel 1
   --order '0'
   --num-eigs 5
   --seed 75
   --no-visualization
Mesh::GeneratePartitioning(...): edgecut = 39
Mesh does not have FEs --> Assume order 1.
Number of unknowns: 1361

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 18902 RUNNING AT dyro.llnl.gov
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault: 11 (signal 11)

This is not testing problems larger than MAX_INT yet.

mlstowell · 2020-07-14T21:33:36Z

That looks like something @dylan-copeland will want to take a look at. I'm not familiar with HypreParMatrixBlocks but I see a couple places where row and column offsets are treated as integer arrays. The number of non-zeros might also cause a problem.

mlstowell · 2020-07-14T21:45:27Z

This branch doesn't seem to compile with HYPRE version 2.12.0 because HYPRE_BigInt was not defined back in 2017. Is there a plan to support these older versions of HYPRE or do we require a new minimum HYPRE version (the current minimum is 2.10.0b)? Asking for a friend...

v-dobrev · 2020-07-14T22:39:20Z

This branch doesn't seem to compile with HYPRE version 2.12.0 because HYPRE_BigInt was not defined back in 2017. Is there a plan to support these older versions of HYPRE or do we require a new minimum HYPRE version (the current minimum is 2.10.0b)? Asking for a friend...

I think this will be easy to add: for older versions of hypre (checked using MFEM_HYPRE_VERSION), we can typedef HYPRE_BigInt to be the same as HYPRE_Int and define HYPRE_MPI_BIG_INT as HYPRE_MPI_INT.

v-dobrev · 2020-07-14T22:45:41Z

The issue with HypreParMatrixFromBlocks may be resolved by merging master into this branch -- there was an update to that function recently in #1521.

dylan-copeland · 2020-07-14T23:08:18Z

I pulled from master and checked that the unit tests pass, so @v-dobrev is right.

jandrej · 2020-07-14T23:24:47Z

Great, lets wait for the merge and then I try some large problems and report back what breaks.

A few tweaks to support older versions of hypre. Run 'make style'.

v-dobrev · 2020-07-25T22:55:15Z

It turns out --enable-mixedint disables some features in hypre:

  --enable-mixedint       Use long long int for HYPRE_BigInt and int for
                          HYPRE_Int (default is int for both). Note: This
                          option disables Euclid, ParaSails, pilut and CGC
                          coarsening.

Because of this and how the makefiles work, building on Mac with --enable-mixedint (using configure and make) fails with an error like this:

ar -rcu libHYPRE.a 
ar: no archive members specified
usage:  ar -d [-TLsv] archive file ...
	ar -m [-TLsv] archive file ...
	ar -m [-abiTLsv] position archive file ...
	ar -p [-TLsv] archive [file ...]
	ar -q [-cTLsv] archive file ...
	ar -r [-cuTLsv] archive file ...
	ar -r [-abciuTLsv] position archive file ...
	ar -t [-TLsv] archive [file ...]
	ar -x [-ouTLsv] archive [file ...]

One workaround for this is to combine the following lines: https://github.com/hypre-space/hypre/blob/b075d642550c9947b758201f41f5d38dabfe7a51/src/lib/Makefile#L82-L84 with the previous line, so that the list of files is not empty.

Fix a bug affecting HYPRE_MIXEDINT builds -- 'make test' still shows a lot of failures in this case.

All tests in 'make test' seem to pass now.

of HYPRE_Int where appropriate.

number of elements is less than the number of processors.

v-dobrev · 2020-07-30T18:27:42Z

For proper mixedint support, we need to use the latest HYPRE master (any commit after hypre-space/hypre@13e2cad). We should document this in INSTALL.

tzanio · 2020-10-13T22:45:41Z

Take 2: How far are we from marking this as ready-for-review? It will be great to include it in the xSDK release at the end of the month...

jandrej · 2020-10-13T23:08:53Z

I'd like to do 2 things before we go into review

Run clang-tidy with https://clang.llvm.org/extra/clang-tidy/checks/bugprone-too-small-loop-variable.html https://reviews.llvm.org/D53974. This detects too small loop variables.
Run at least a handful of examples (not just ex1p) so we get more code coverage. This requires us to discuss and pick examples where we know the examples have robust solvers that handle >2B unknowns.

tzanio · 2020-10-14T01:55:05Z

Running clang-tidy and using it in general is an excellent idea.

My personal feeling is that testing ex1p, ex2p, ex3p and ex4p should be sufficient. We can add maybe ex8p, ex11p and ex13p to that, but I don't think we need to run every single parallel example.

tzanio · 2021-04-21T18:41:50Z

linalg/hypre.hpp

+   /** This is a compact text representation of the local data of the
+       HypreParMatrix that can be used to compare matrices from different runs
+       without the need to save the whole matrix. */
+   void PrintHash(std::ostream &out) const;


tzanio · 2021-04-21T18:45:08Z

general/hash.hpp

@@ -212,6 +213,84 @@ class HashTable : public BlockArray<T>
 };


+/// Hash function for data sequences.
+/** Depends on GnuTLS for SHA-256 hashing. */
+class HashFunction


Should this be #ifdef MFEM_USE_GNUTLS?

No need -- the implementation has the #ifdefs. If you don't have GnuTLS, you'll get a hash string that says you need GnuTLS.

Ok, thanks. Do you want to mention any of the hash changes in CHANGELOG?

It's not crucial.

Is GnuTLS already installed on LC, or does someone know how to install it? Is it the one at https://github.com/gnutls/gnutls so we just need to clone and follow their build instructions?

Thanks @v-dobrev, I tried that but apparently need something else:
In file included from general/hash.cpp(15):
/usr/include/gnutls/crypto.h(35): error: identifier "gnutls_cipher_algorithm_t" is undefined
gnutls_cipher_algorithm_t cipher,

On Mac I got it working, using a brew installation of gnutls. Maybe on LC some include directory needs to be specified? Anyway, the hash feature is a really nice addition.

On the mac it works for me too. On LC I get the same error with @dylan-copeland. All paths look ok... so It's not clear to me what's wrong here.

Thanks for testing this. I pushed a fix for the issue -- that version of GnuTLS needed <gnutls/gnutls.h> to be included before <gnutls/crypto.h>.

Thanks, I confirmed it works now on LC.

tzanio · 2021-04-21T18:45:57Z

LGTM

tzanio · 2021-04-21T18:53:38Z

LGTM

tzanio · 2021-04-22T21:51:56Z

@dylan-copeland and @psocratis, can you please take a quick look?

psocratis · 2021-04-22T21:53:31Z

@dylan-copeland and @psocratis, can you please take a quick look?

I will need some to test it first, by tomorrow maybe?

tzanio

Thanks @jandrej !

dylan-copeland · 2021-04-22T23:04:37Z

fem/pfespace.cpp

+   HYPRE_Int *j_offd_hi = j_offd;
+#else
+   HYPRE_Int *j_offd_hi = Memory<HYPRE_Int>(offd_cols);
+   Memory<HYPRE_BigInt>(j_offd, offd_cols, true).Delete();


Is j_offd created and then deleted? Can this be simplified?

This is a trick to delete a host pointer that was allocated via the memory manager (using the currently set host allocator). The respective allocation is done with something like HYPRE_BigInt *j_offd = Memory<HYPRE_BigInt>(size); -- you can check above that j_offd was allocated that way.

dylan-copeland

I did not check that the right type is used everywhere in MFEM, but the changes that were made in this PR look correct, and I suppose if something is wrong it will show up when a hypre solve fails. It seems other examples also work (e.g. ex3p), so the CHANGELOG could state that (it would require some more testing to be sure).

psocratis

Though it's really hard to check all the places for the appropriate changes, this looks very good. Some examples that I ran with more than 2 billion dofs seem to work just fine.

which seems to be necessary for some versions of GnuTLS.

tzanio · 2021-04-25T01:36:08Z

Merged in next for testing...

tzanio · 2021-04-25T16:30:59Z

There are number of errors from tonight's autotest (due to older version of hypre?):

linalg/hypre.cpp: In member function ‘void mfem::HypreParMatrix::PrintHash(std::ostream&) const’:
linalg/hypre.cpp:1642:16: error: ‘struct hypre_CSRMatrix’ has no member named ‘big_j’
       if (csr->big_j == nullptr)
                ^
linalg/hypre.cpp:1648:29: error: ‘struct hypre_CSRMatrix’ has no member named ‘big_j’
          hf.AppendInts(csr->big_j, csr_nnz);
                             ^

See tux426/next/baseline and toss3/next for more details.

v-dobrev · 2021-04-25T23:37:54Z

There are number of errors from tonight's autotest (due to older version of hypre?):

linalg/hypre.cpp: In member function ‘void mfem::HypreParMatrix::PrintHash(std::ostream&) const’:
linalg/hypre.cpp:1642:16: error: ‘struct hypre_CSRMatrix’ has no member named ‘big_j’
       if (csr->big_j == nullptr)
                ^
linalg/hypre.cpp:1648:29: error: ‘struct hypre_CSRMatrix’ has no member named ‘big_j’
          hf.AppendInts(csr->big_j, csr_nnz);
                             ^

See tux426/next/baseline and toss3/next for more details.

Ah, yes, forgot about older versions. I'll fix it.

versions of hypre.

tzanio · 2021-04-26T13:43:32Z

Re-merged in next for testing ...

dylan-copeland · 2021-04-28T23:30:35Z

@tzanio I started doing some large tests, to verify this branch for examples other than ex1p, but it is going to take a while to get that done due to the large number of nodes required to exceed the max int without also running out of memory. Let's proceed with the CHANGELOG just claiming to support ex1p.

tzanio · 2021-04-28T23:31:57Z

Thanks @dylan-copeland. I will merge as is.

started converting to mixedint

352955c

jandrej added the WIP Work in Progress label Jun 26, 2020

tzanio mentioned this pull request Jun 28, 2020

Upgrading to hypre-2.16.0+ #1276

Closed

5 tasks

tzanio added help wanted linalg labels Jun 28, 2020

mlstowell added 2 commits July 14, 2020 12:29

HYPRE Mixed integer support in complex_operator.?pp

77459c0

HYPRE Mixed integer support in complex_fem.?pp

d9b342e

jandrej and others added 3 commits July 21, 2020 15:33

merge master

f867afe

more changes towards mixedint

238afad

Fix a copy-paste bug in the case when HYPRE_MIXEDINT is not used.

2c24256

A few tweaks to support older versions of hypre. Run 'make style'.

v-dobrev added 2 commits July 25, 2020 16:00

Fix a bug affecting versions of hypre < 2.16.0.

4d249b9

Fix a bug affecting HYPRE_MIXEDINT builds -- 'make test' still shows a lot of failures in this case.

Fix a bug affecting hypre builds with "mixedint" enabled.

3b26ff8

All tests in 'make test' seem to pass now.

v-dobrev mentioned this pull request Jul 26, 2020

Error building on Mac with mixedint hypre-space/hypre#159

Closed

v-dobrev added 2 commits July 25, 2020 20:47

Update examples, miniapps, tests to use HYPRE_BigInt instead

785486f

of HYPRE_Int where appropriate.

In Mesh::GeneratePartitioning, fix a bug in the case when the

0be20cc

number of elements is less than the number of processors.

v-dobrev mentioned this pull request Jul 28, 2020

Issue with big partitions in HYPRE_MIXEDINT mode hypre-space/hypre#162

Closed

tzanio added this to the mfem-4.2 milestone Oct 19, 2020

tzanio reviewed Apr 21, 2021

View reviewed changes

minor

cc6338f

make style

0a620f0

jandrej and others added 5 commits April 22, 2021 15:00

int -> BigInt

58a9a86

changelog entry

cca5dac

changelog

d87873e

install info

7bcb6dc

Editorial

b1359b8

tzanio approved these changes Apr 22, 2021

View reviewed changes

dylan-copeland reviewed Apr 22, 2021

View reviewed changes

dylan-copeland approved these changes Apr 23, 2021

View reviewed changes

psocratis approved these changes Apr 23, 2021

View reviewed changes

v-dobrev added 2 commits April 22, 2021 21:20

In INSTALL, add HYPRE version requirement for mixedint support.

e990756

In hash.cpp, include <gnutls/gnutls.h> before <gnutls/crypto.h>

0c38bd5

which seems to be necessary for some versions of GnuTLS.

tzanio added the in-next label Apr 25, 2021

In HypreParMatrix::PrintHash, skip printing of 'big_j' with older

f53b140

versions of hypre.

tzanio merged commit 5b894c6 into master Apr 29, 2021

Pull Requests automation moved this from Review Now to Merged Apr 29, 2021

tzanio deleted the hypre-mixedint branch April 29, 2021 00:02

Support for hypre mixedint #1583

Support for hypre mixedint #1583

Conversation

jandrej commented Jun 26, 2020 • edited by tzanio

mlstowell commented Jul 14, 2020 • edited

jandrej commented Jul 14, 2020

mlstowell commented Jul 14, 2020

mlstowell commented Jul 14, 2020

v-dobrev commented Jul 14, 2020

v-dobrev commented Jul 14, 2020

dylan-copeland commented Jul 14, 2020

jandrej commented Jul 14, 2020

v-dobrev commented Jul 25, 2020

v-dobrev commented Jul 30, 2020

tzanio commented Oct 13, 2020

jandrej commented Oct 13, 2020 • edited

tzanio commented Oct 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psocratis Apr 23, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tzanio commented Apr 21, 2021

tzanio commented Apr 21, 2021

tzanio commented Apr 22, 2021

psocratis commented Apr 22, 2021

tzanio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dylan-copeland left a comment

Choose a reason for hiding this comment

psocratis left a comment

Choose a reason for hiding this comment

tzanio commented Apr 25, 2021

tzanio commented Apr 25, 2021

v-dobrev commented Apr 25, 2021

tzanio commented Apr 26, 2021

dylan-copeland commented Apr 28, 2021

tzanio commented Apr 28, 2021

jandrej commented Jun 26, 2020 •

edited by tzanio

mlstowell commented Jul 14, 2020 •

edited

jandrej commented Oct 13, 2020 •

edited

psocratis Apr 23, 2021 •

edited