Fault Tolerant Paradigms
Modern high performance computers offer hundreds of thousands of processors that can be leveraged, in parallel, to compute numerical solutions to time dependent Partial Differential Equations (PDEs). As we move towards exascale computations, scalability of algorithms and software, and resilience to equipment malfunctions are paramount. This proposal seeks to develop scalable algorithms that have reduced communication costs and fault resiliency (on several levels) for grid based solutions to PDEs.