deal.II version GIT relicensing2076g6b43d56a25 20241102 12:30:00+00:00

This tutorial depends on step6.
This program grew out of a student project by Sven Wetterauer at the University of Heidelberg, Germany. Most of the work for this program is by him.
This program deals with an example of a nonlinear elliptic partial differential equation, the minimal surface equation. You can imagine the solution of this equation to describe the surface spanned by a soap film that is enclosed by a closed wire loop. We imagine the wire to not just be a planar loop, but in fact curved. The surface tension of the soap film will then reduce the surface to have minimal surface. The solution of the minimal surface equation describes this shape with the wire's vertical displacement as a boundary condition. For simplicity, we will here assume that the surface can be written as a graph \(u=u(x,y)\) although it is clear that it is not very hard to construct cases where the wire is bent in such a way that the surface can only locally be constructed as a graph but not globally.
Because the equation is nonlinear, we can't solve it directly. Rather, we have to use Newton's method to compute the solution iteratively.
In a classical sense, the problem is given in the following form:
\begin{align*} \nabla \cdot \left( \frac{1}{\sqrt{1+\nabla u^{2}}}\nabla u \right) &= 0 \qquad \qquad &&\textrm{in} ~ \Omega \\ u&=g \qquad\qquad &&\textrm{on} ~ \partial \Omega. \end{align*}
\(\Omega\) is the domain we get by projecting the wire's positions into \(xy\) space. In this example, we choose \(\Omega\) as the unit disk.
As described above, we solve this equation using Newton's method in which we compute the \(n\)th approximate solution from the \((n1)\)th one, and use a damping parameter \(\alpha^n\) to get better global convergence behavior:
\begin{align*} F'(u^{n},\delta u^{n})&= F(u^{n}) \\ u^{n+1}&=u^{n}+\alpha^n \delta u^{n} \end{align*}
with
\[ F(u) \dealcoloneq \nabla \cdot \left( \frac{1}{\sqrt{1+\nabla u^{2}}}\nabla u \right) \]
and \(F'(u,\delta u)\) the derivative of F in direction of \(\delta u\):
\[ F'(u,\delta u)=\lim \limits_{\epsilon \rightarrow 0}{\frac{F(u+\epsilon \delta u) F(u)}{\epsilon}}. \]
Going through the motions to find out what \(F'(u,\delta u)\) is, we find that we have to solve a linear elliptic PDE in every Newton step, with \(\delta u^n\) as the solution of:
\[  \nabla \cdot \left( \frac{1}{\left(1+\nabla u^{n}^{2}\right)^{\frac{1}{2}}}\nabla \delta u^{n} \right) + \nabla \cdot \left( \frac{\nabla u^{n} \cdot \nabla \delta u^{n}}{\left(1+\nabla u^{n}^{2}\right)^{\frac{3}{2}}} \nabla u^{n} \right) = \left(  \nabla \cdot \left( \frac{1}{\left(1+\nabla u^{n}^{2}\right)^{\frac{1}{2}}} \nabla u^{n} \right) \right) \]
In order to solve the minimal surface equation, we have to solve this equation repeatedly, once per Newton step. To solve this, we have to take a look at the boundary condition of this problem. Assuming that \(u^{n}\) already has the right boundary values, the Newton update \(\delta u^{n}\) should have zero boundary conditions, in order to have the right boundary condition after adding both. In the first Newton step, we are starting with the solution \(u^{0}\equiv 0\), the Newton update still has to deliver the right boundary condition to the solution \(u^{1}\).
Summing up, we have to solve the PDE above with the boundary condition \(\delta u^{0}=g\) in the first step and with \(\delta u^{n}=0\) in all the following steps.
Starting with the strong formulation above, we get the weak formulation by multiplying both sides of the PDE with a test function \(\varphi\) and integrating by parts on both sides:
\[ \left( \nabla \varphi , \frac{1}{\left(1+\nabla u^{n}^{2}\right)^{\frac{1}{2}}}\nabla \delta u^{n} \right)\left(\nabla \varphi ,\frac{\nabla u^{n} \cdot \nabla \delta u^{n}}{\left(1+\nabla u^{n}^{2}\right)^{\frac{3}{2}}}\nabla u^{n} \right) = \left(\nabla \varphi , \frac{1}{\left(1+\nabla u^{n}^{2}\right)^{\frac{1}{2}}} \nabla u^{n} \right). \]
Here the solution \(\delta u^{n}\) is a function in \(H^{1}(\Omega)\), subject to the boundary conditions discussed above. Reducing this space to a finite dimensional space with basis \(\left\{ \varphi_{0},\dots , \varphi_{N1}\right\}\), we can write the solution:
\[ \delta u^{n}=\sum_{j=0}^{N1} \delta U_{j} \varphi_{j}. \]
Using the basis functions as test functions and defining \(a_{n} \dealcoloneq \frac{1} {\sqrt{1+\nabla u^{n}^{2}}}\), we can rewrite the weak formulation:
\[ \sum_{j=0}^{N1}\left[ \left( \nabla \varphi_{i} , a_{n} \nabla \varphi_{j} \right)  \left(\nabla u^{n}\cdot \nabla \varphi_{i} , a_{n}^{3} \nabla u^{n} \cdot \nabla \varphi_{j} \right) \right] \cdot \delta U_{j}=\left( \nabla \varphi_{i} , a_{n} \nabla u^{n}\right) \qquad \forall i=0,\dots ,N1, \]
where the solution \(\delta u^{n}\) is given by the coefficients \(\delta U^{n}_{j}\). This linear system of equations can be rewritten as:
\[ A^{n}\; \delta U^{n}=b^{n}, \]
where the entries of the matrix \(A^{n}\) are given by:
\[ A^{n}_{ij} \dealcoloneq \left( \nabla \varphi_{i} , a_{n} \nabla \varphi_{j} \right)  \left(\nabla u^{n}\cdot \nabla \varphi_{i} , a_{n}^{3} \nabla u^{n} \cdot \nabla \varphi_{j} \right), \]
and the right hand side \(b^{n}\) is given by:
\[ b^{n}_{i} \dealcoloneq \left( \nabla \varphi_{i} , a_{n} \nabla u^{n}\right). \]
The matrix that corresponds to the Newton step above can be reformulated to show its structure a bit better. Rewriting it slightly, we get that it has the form
\[ A_{ij} = \left( \nabla \varphi_i, B \nabla \varphi_j \right), \]
where the matrix \(B\) (of size \(d \times d\) in \(d\) space dimensions) is given by the following expression:
\[ B = a_n \left\{ \mathbf I  a_n^2 [\nabla u_n] \otimes [\nabla u_n] \right\} = a_n \left\{ \mathbf I  \frac{\nabla u_n}{\sqrt{1+\nabla u^{n}^{2}}} \otimes \frac{\nabla u_n}{\sqrt{1+\nabla u^{n}^{2}}} \right\}. \]
From this expression, it is obvious that \(B\) is symmetric, and so \(A\) is symmetric as well. On the other hand, \(B\) is also positive definite, which confers the same property onto \(A\). This can be seen by noting that the vector \(v_1 = \frac{\nabla u^n}{\nabla u^n}\) is an eigenvector of \(B\) with eigenvalue \(\lambda_1=a_n \left(1\frac{\nabla u^n^2}{1+\nabla u^n^2}\right) > 0\) while all vectors \(v_2\ldots v_d\) that are perpendicular to \(v_1\) and each other are eigenvectors with eigenvalue \(a_n\). Since all eigenvalues are positive, \(B\) is positive definite and so is \(A\). We can thus use the CG method for solving the Newton steps. (The fact that the matrix \(A\) is symmetric and positive definite should not come as a surprise. It results from taking the derivative of an operator that results from taking the derivative of an energy functional: the minimal surface equation simply minimizes some nonquadratic energy. Consequently, the Newton matrix, as the matrix of second derivatives of a scalar energy, must be symmetric since the derivative with regard to the \(i\)th and \(j\)th degree of freedom should clearly commute. Likewise, if the energy functional is convex, then the matrix of second derivatives must be positive definite, and the direct calculation above simply reaffirms this.)
It is worth noting, however, that the positive definiteness degenerates for problems where \(\nabla u\) becomes large. In other words, if we simply multiply all boundary values by 2, then to first order \(u\) and \(\nabla u\) will also be multiplied by two, but as a consequence the smallest eigenvalue of \(B\) will become smaller and the matrix will become more illconditioned. (More specifically, for \(\nabla u^n\rightarrow\infty\) we have that \(\lambda_1 \propto a_n \frac{1}{\nabla u^n^2}\) whereas \(\lambda_2\ldots \lambda_d=a_n\); thus, the condition number of \(B\), which is a multiplicative factor in the condition number of \(A\) grows like \({\cal O}(\nabla u^n^2)\).) It is simple to verify with the current program that indeed multiplying the boundary values used in the current program by larger and larger values results in a problem that will ultimately no longer be solvable using the simple preconditioned CG method we use here.
As stated above, Newton's method works by computing a direction \(\delta u^n\) and then performing the update \(u^{n+1} = u^{n}+\alpha^n \delta u^{n}\) with a step length \(0 < \alpha^n \le 1\). It is a common observation that for strongly nonlinear models, Newton's method does not converge if we always choose \(\alpha^n=1\) unless one starts with an initial guess \(u^0\) that is sufficiently close to the solution \(u\) of the nonlinear problem. In practice, we don't always have such an initial guess, and consequently taking full Newton steps (i.e., using \(\alpha=1\)) does frequently not work.
A common strategy therefore is to use a smaller step length for the first few steps while the iterate \(u^n\) is still far away from the solution \(u\) and as we get closer use larger values for \(\alpha^n\) until we can finally start to use full steps \(\alpha^n=1\) as we are close enough to the solution. The question is of course how to choose \(\alpha^n\). There are basically two widely used approaches: line search and trust region methods.
In this program, we simply always choose the step length equal to 0.1. This makes sure that for the testcase at hand we do get convergence although it is clear that by not eventually reverting to full step lengths we forego the rapid, quadratic convergence that makes Newton's method so appealing. Obviously, this is a point one eventually has to address if the program was made into one that is meant to solve more realistic problems. We will comment on this issue some more in the results section, and use an even better approach in step77.
Overall, the program we have here is not unlike step6 in many regards. The layout of the main class is essentially the same. On the other hand, the driving algorithm in the run()
function is different and works as follows:
Start with the function \(u^{0}\equiv 0\) and modify it in such a way that the values of \(u^0\) along the boundary equal the correct boundary values \(g\) (this happens in the call to AffineConstraints::distribute()
). Set \(n=0\).
Compute the Newton update by solving the system \(A^{n}\;\delta U^{n}=b^{n}\) with boundary condition \(\delta u^{n}=0\) on \(\partial \Omega\).
Compute a step length \(\alpha^n\). In this program, we always set \(\alpha^n=0.1\). To make things easier to extend later on, this happens in a function of its own, namely in MinimalSurfaceProblem::determine_step_length
. (The strategy of always choosing \(\alpha^n=0.1\) is of course not optimal – we should choose a step length that works for a given search direction – but it requires a bit of work to do that. In the end, we leave these sorts of things to external packages: step77 does that.)
The new approximation of the solution is given by \(u^{n+1}=u^{n}+\alpha^n \delta u^{n}\).
If \(n\) is a multiple of 5 then refine the mesh, transfer the solution \(u^{n+1}\) to the new mesh and set the values of \(u^{n+1}\) in such a way that along the boundary we have \(u^{n+1}_{\partial\Gamma}=g\). Note that this isn't automatically guaranteed even though by construction we had that before mesh refinement \(u^{n+1}_{\partial\Gamma}=g\) because mesh refinement adds new nodes to the mesh where we have to interpolate the old solution to the new nodes upon bringing the solution from the old to the new mesh. The values we choose by interpolation may be close to the exact boundary conditions but are, in general, nonetheless not the correct values.
The testcase we solve is chosen as follows: We seek to find the solution of minimal surface over the unit disk \(\Omega=\{\mathbf x: \\mathbf x\<1\}\subset {\mathbb R}^2\) where the surface attains the values \(u(x,y){\partial\Omega} = g(x,y) \dealcoloneq \sin(2 \pi (x+y))\) along the boundary.
The first few files have already been covered in previous examples and will thus not be further commented on.
We will use adaptive mesh refinement between Newton iterations. To do so, we need to be able to work with a solution on the new mesh, although it was computed on the old one. The SolutionTransfer class transfers the solution from the old to the new mesh:
We then open a namespace for this program and import everything from the dealii namespace into it, as in previous programs:
MinimalSurfaceProblem
class templateThe class template is basically the same as in step6. Three additions are made:
zero_constraints
and nonzero_constraints
. The former contains homogeneous boundary conditions to be used for the residual and solution updates, while the latter contains the correct boundary conditions for the solution. Both objects also contain the hanging nodes constraints.setup_system
function takes an argument that denotes whether this is the first time it is called or not. The difference is that the first time around we need to distribute the degrees of freedom and set the solution vector for \(u^n\) to the correct size. The following times, the function is called after we have already done these steps as part of refining the mesh in refine_mesh
.compute_residual()
is a function that computes the norm of the nonlinear (discrete) residual. We use this function to monitor convergence of the Newton iteration. The function takes a step length \(\alpha^n\) as argument to compute the residual of \(u^n + \alpha^n
\; \delta u^n\). This is something one typically needs for step length control, although we will not use this feature here. Finally, determine_step_length()
computes the step length \(\alpha^n\) in each Newton iteration. As discussed in the introduction, we here use a fixed step length and leave implementing a better strategy as an exercise. (step77 does this differently: It simply uses an external package for the whole solution process, and a good line search strategy is part of what that package provides.)The boundary condition is implemented just like in step4. It is chosen as \(g(x,y)=\sin(2 \pi (x+y))\):
MinimalSurfaceProblem
class implementationThe constructor and destructor of the class are the same as in the first few tutorials.
As always in the setupsystem function, we set up the variables of the finite element method. There are some differences to step6, because we need to construct two AffineConstraint<> objects.
This function does the same as in the previous tutorials except that now, of course, the matrix and right hand side functions depend on the previous iteration's solution. As discussed in the introduction, we need to use zero boundary values for the Newton updates; this is done by using the zero_constraints
object when assembling into the global matrix and vector.
The top of the function contains the usual boilerplate code, setting up the objects that allow us to evaluate shape functions at quadrature points and temporary storage locations for the local matrices and vectors, as well as for the gradients of the previous solution at the quadrature points. We then start the loop over all cells:
For the assembly of the linear system, we have to obtain the values of the previous solution's gradients at the quadrature points. There is a standard way of doing this: the FEValues::get_function_gradients function takes a vector that represents a finite element field defined on a DoFHandler, and evaluates the gradients of this field at the quadrature points of the cell with which the FEValues object has last been reinitialized. The values of the gradients at all quadrature points are then written into the second argument:
With this, we can then do the integration loop over all quadrature points and shape functions. Having just computed the gradients of the old solution in the quadrature points, we are able to compute the coefficients \(a_{n}\) in these points. The assembly of the system itself then looks similar to what we always do with the exception of the nonlinear terms, as does copying the results from the local objects into the global ones:
The solve function is the same as always. At the end of the solution process we update the current solution by setting \(u^{n+1}=u^n+\alpha^n\;\delta u^n\).
The first part of this function is the same as in step6... However, after refining the mesh we have to transfer the old solution to the new one which we do with the help of the SolutionTransfer class. The process is slightly convoluted, so let us describe it in detail:
Then we need an additional step: if, for example, you flag a cell that is once more refined than its neighbor, and that neighbor is not flagged for refinement, we would end up with a jump of two refinement levels across a cell interface. To avoid these situations, the library will silently also have to refine the neighbor cell once. It does so by calling the Triangulation::prepare_coarsening_and_refinement function before actually doing the refinement and coarsening. This function flags a set of additional cells for refinement or coarsening, to enforce rules like the onehangingnode rule. The cells that are flagged for refinement and coarsening after calling this function are exactly the ones that will actually be refined or coarsened. Usually, you don't have to do this by hand (Triangulation::execute_coarsening_and_refinement does this for you). However, we need to initialize the SolutionTransfer class and it needs to know the final set of cells that will be coarsened or refined in order to store the data from the old mesh and transfer to the new one. Thus, we call the function by hand:
With this out of the way, we initialize a SolutionTransfer object with the present DoFHandler. We make a copy of the solution vector and attach it to the SolutionTransfer. Now we can actually execute the refinement and create the new matrices and vectors including the vector current_solution
, that will hold the current solution on the new mesh after calling SolutionTransfer::interpolate():
On the new mesh, there are different hanging nodes, computed in setup_system()
above. To be on the safe side, we should make sure that the current solution's vector entries satisfy the hanging node constraints (see the discussion in the documentation of the SolutionTransfer class for why this is necessary) and boundary values. As explained at the end of the introduction, the interpolated solution does not automatically satisfy the boundary values even if the solution before refinement had the correct boundary values.
In order to monitor convergence, we need a way to compute the norm of the (discrete) residual, i.e., the norm of the vector \(\left<F(u^n),\varphi_i\right>\) with \(F(u)=\nabla \cdot \left( \frac{1}{\sqrt{1+\nabla u^{2}}}\nabla u \right)\) as discussed in the introduction. It turns out that (although we don't use this feature in the current version of the program) one needs to compute the residual \(\left<F(u^n+\alpha^n\;\delta u^n),\varphi_i\right>\) when determining optimal step lengths, and so this is what we implement here: the function takes the step length \(\alpha^n\) as an argument. The original functionality is of course obtained by passing a zero as argument.
In the function below, we first set up a vector for the residual, and then a vector for the evaluation point \(u^n+\alpha^n\;\delta u^n\). This is followed by the same boilerplate code we use for all integration operations:
The actual computation is much as in assemble_system()
. We first evaluate the gradients of \(u^n+\alpha^n\,\delta u^n\) at the quadrature points, then compute the coefficient \(a_n\), and then plug it all into the formula for the residual:
As discussed in the introduction, Newton's method frequently does not converge if we always take full steps, i.e., compute \(u^{n+1}=u^n+\delta u^n\). Rather, one needs a damping parameter (step length) \(\alpha^n\) and set \(u^{n+1}=u^n+\alpha^n\delta u^n\). This function is the one called to compute \(\alpha^n\).
Here, we simply always return 0.1. This is of course a suboptimal choice: ideally, what one wants is that the step size goes to one as we get closer to the solution, so that we get to enjoy the rapid quadratic convergence of Newton's method. We will discuss better strategies below in the results section, and step77 also covers this aspect.
This last function to be called from run()
outputs the current solution (and the Newton update) in graphical form as a VTU file. It is entirely the same as what has been used in previous tutorials.
In the run function, we build the first grid and then have the toplevel logic for the Newton iteration.
As described in the introduction, the domain is the unit disk around the origin, created in the same way as shown in step6. The mesh is globally refined twice followed later on by several adaptive cycles.
Before starting the Newton loop, we also need to do ensure that the first Newton iterate already has the correct boundary values, as discussed in the introduction.
The Newton iteration starts next. We iterate until the (norm of the) residual computed at the end of the previous iteration is less than \(10^{3}\), as checked at the end of the do { ... } while
loop that starts here. Because we don't have a reasonable value to initialize the variable, we just use the largest value that can be represented as a double
.
On every mesh we do exactly five Newton steps. We print the initial residual here and then start the iterations on this mesh.
In every Newton step the system matrix and the right hand side have to be computed first, after which we store the norm of the right hand side as the residual to check against when deciding whether to stop the iterations. We then solve the linear system (the function also updates \(u^{n+1}=u^n+\alpha^n\;\delta u^n\)) and output the norm of the residual at the end of this Newton step.
After the end of this loop, we then also output the solution on the current mesh in graphical form and increment the counter for the mesh refinement cycle.
Finally the main function. This follows the scheme of all other main functions:
The output of the program looks as follows:
Obviously, the scheme converges, if not very fast. We will come back to strategies for accelerating the method below.
One can visualize the solution after each set of five Newton iterations, i.e., on each of the meshes on which we approximate the solution. This yields the following set of images:
It is clearly visible, that the solution minimizes the surface after each refinement. The solution converges to a picture one would imagine a soap bubble to be that is located inside a wire loop that is bent like the boundary. Also it is visible, how the boundary is smoothed out after each refinement. On the coarse mesh, the boundary doesn't look like a sine, whereas it does the finer the mesh gets.
The mesh is mostly refined near the boundary, where the solution increases or decreases strongly, whereas it is coarsened on the inside of the domain, where nothing interesting happens, because there isn't much change in the solution. The ninth solution and mesh are shown here:
The program shows the basic structure of a solver for a nonlinear, stationary problem. However, it does not converge particularly fast, for good reasons:
Obviously, a better program would have to address these two points. We will discuss them in the following.
Newton's method has two well known properties:
A consequence of these two observations is that a successful strategy is to choose \(\alpha^n<1\) for the initial iterations until the iterate has come close enough to allow for convergence with full step length, at which point we want to switch to \(\alpha^n=1\). The question is how to choose \(\alpha^n\) in an automatic fashion that satisfies these criteria.
We do not want to review the literature on this topic here, but only briefly mention that there are two fundamental approaches to the problem: backtracking line search and trust region methods. The former is more widely used for partial differential equations and essentially does the following:
determine_step_length()
is written the way it is to support exactly this kind of use case.Whether we accept a particular step length \(\alpha^n\) depends on how we define "substantially smaller". There are a number of ways to do so, but without going into detail let us just mention that the most common ones are to use the Wolfe and ArmijoGoldstein conditions. For these, one can show the following:
We will not dwell on this here any further but leave the implementation of such algorithms as an exercise. We note, however, that when implemented correctly then it is a common observation that most reasonably nonlinear problems can be solved in anywhere between 5 and 15 Newton iterations to engineering accuracy — substantially fewer than we need with the current version of the program.
More details on globalization methods including backtracking can be found, for example, in [104] and [167].
A separate point, very much worthwhile making, however, is that in practice the implementation of efficient nonlinear solvers is about as complicated as the implementation of efficient finite element methods. One should not attempt to reinvent the wheel by implementing all of the necessary steps oneself. Substantial pieces of the puzzle are already available in the LineMinimization::line_search() function and could be used to this end. But, instead, just like building finite element solvers on libraries such as deal.II, one should be building nonlinear solvers on libraries such as SUNDIALS. In fact, deal.II has interfaces to SUNDIALS and in particular to its nonlinear solver subpackage KINSOL through the SUNDIALS::KINSOL class. It would not be very difficult to base the current problem on that interface – indeed, that is what step77 does.
We currently do exactly 5 iterations on each mesh. But is this optimal? One could ask the following questions:
Ultimately, what this boils down to is that we somehow need to couple the discretization error on the current mesh with the nonlinear residual we want to achieve with the Newton iterations on a given mesh, and to the linear iteration we want to achieve with the CG method within each Newton iterations.
How to do this is, again, not entirely trivial, and we again leave it as a future exercise.
As outlined in the introduction, when solving a nonlinear problem of the form
\[ F(u) \dealcoloneq \nabla \cdot \left( \frac{1}{\sqrt{1+\nabla u^{2}}}\nabla u \right) = 0 \]
we use a Newton iteration that requires us to repeatedly solve the linear partial differential equation
\begin{align*} F'(u^{n},\delta u^{n}) &= F(u^{n}) \end{align*}
so that we can compute the update
\begin{align*} u^{n+1}&=u^{n}+\alpha^n \delta u^{n} \end{align*}
with the solution \(\delta u^{n}\) of the Newton step. For the problem here, we could compute the derivative \(F'(u,\delta u)\) by hand and obtained
\[ F'(u,\delta u) =  \nabla \cdot \left( \frac{1}{\left(1+\nabla u^{2}\right)^{\frac{1}{2}}}\nabla \delta u \right) + \nabla \cdot \left( \frac{\nabla u \cdot \nabla \delta u}{\left(1+\nabla u^{2}\right)^{\frac{3}{2}}} \nabla u \right). \]
But this is already a sizable expression that is cumbersome both to derive and to implement. It is also, in some sense, duplicative: If we implement what \(F(u)\) is somewhere in the code, then \(F'(u,\delta u)\) is not an independent piece of information but is something that, at least in principle, a computer should be able to infer itself. Wouldn't it be nice if that could actually happen? That is, if we really only had to implement \(F(u)\), and \(F'(u,\delta u)\) was then somehow done implicitly? That is in fact possible, and runs under the name "automatic differentiation". step71 discusses this very concept in general terms, and step72 illustrates how this can be applied in practice for the very problem we are considering here.
On modern computer systems, accessing data in main memory takes far longer than actually doing something with it: We can do many floating point operations for the time it takes to load one floating point number from memory onto the processor. Unfortunately, when we do things such as matrixvector products, we only multiply each matrix entry once with another number (the corresponding entry of the vector) and then we add it to something else – so two floating point operations for one load. (Strictly speaking, we also have to load the corresponding vector entry, but at least sometimes we get to reuse that vector entry in doing the products that correspond to the next row of the matrix.) This is a fairly low "arithmetic intensity", and consequently we spend most of our time during matrixvector products waiting for data to arrive from memory rather than actually doing floating point operations.
This is of course one of the rationales for the "matrixfree" approach to solving linear systems (see step37, for example). But if you don't quite want to go all that way to change the structure of the program, then here is a different approach: Storing the system matrix (the "Jacobian") in singleprecision instead of double precision floating point numbers (i.e., using float
instead of double
as the data type). This reduces the amount of memory necessary by a factor of 1.5 (each matrix entry in a SparseMatrix object requires storing the column index – 4 bytes – and the actual value – either 4 or 8 bytes), and consequently will speed up matrixvector products by a factor of around 1.5 as well because, as pointed out above, most of the time is spent loading data from memory and loading 2/3 the amount of data should be roughly 3/2 times as fast. All of this could be done using SparseMatrix<float> as the data type for the system matrix. (In principle, we would then also like it if the SparseDirectUMFPACK solver we use in this program computes and stores its sparse decomposition in float
arithmetic. This is not currently implemented, though could be done.)
Of course, there is a downside to this: Lower precision data storage also implies that we will not solve the linear system of the Newton step as accurately as we might with double
precision. At least while we are far away from the solution of the nonlinear problem, this may not be terrible: If we can do a Newton iteration in half the time, we can afford to do a couple more Newton steps if the search directions aren't as good. But it turns out that even that isn't typically necessary: Both theory and computational experience shows that it is entirely sufficient to store the Jacobian matrix in single precision as long as one stores the right hand side in double precision*. A great overview of why this is so, along with numerical experiments that also consider "half precision" floating point numbers, can be found in [128] .