2019

A. Gyulassy, P.-T. Bremer, V. Pascucci.
**“Shared-Memory Parallel Computation of Morse-Smale Complexes with Improved Accuracy,”** In *IEEE Transactions on Visualization and Computer Graphics*, Vol. 25, No. 1, IEEE, pp. 1183--1192. Jan, 2019.

DOI: 10.1109/tvcg.2018.2864848

Topological techniques have proven to be a powerful tool in the analysis and visualization of large-scale scientific data. In particular, the Morse-Smale complex and its various components provide a rich framework for robust feature definition and computation. Consequently, there now exist a number of approaches to compute Morse-Smale complexes for large-scale data in parallel. However, existing techniques are based on discrete concepts which produce the correct topological structure but are known to introduce grid artifacts in the resulting geometry. Here, we present a new approach that combines parallel streamline computation with combinatorial methods to construct a high-quality discrete Morse-Smale complex. In addition to being invariant to the orientation of the underlying grid, this algorithm allows users to selectively build a subset of features using high-quality geometry. In particular, a user may specifically select which ascending/descending manifolds are reconstructed with improved accuracy, focusing computational effort where it matters for subsequent analysis. This approach computes Morse-Smale complexes for larger data than previously feasible with significant speedups. We demonstrate and validate our approach using several examples from a variety of different scientific domains, and evaluate the performance of our method.

D. Hoang, P. Klacansky, H. Bhatia, P.-T. Bremer, P. Lindstrom, V. Pascucci.
**“A Study of the Trade-off Between Reducing Precision and Reducing Resolution for Data Analysis and Visualization,”** In *IEEE Transactions on Visualization and Computer Graphics*, Vol. 25, No. 1, IEEE, pp. 1193--1203. Jan, 2019.

DOI: 10.1109/tvcg.2018.2864853

There currently exist two dominant strategies to reduce data sizes in analysis and visualization: reducing the precision of the data, e.g., through quantization, or reducing its resolution, e.g., by subsampling. Both have advantages and disadvantages and both face fundamental limits at which the reduced information ceases to be useful. The paper explores the additional gains that could be achieved by combining both strategies. In particular, we present a common framework that allows us to study the trade-off in reducing precision and/or resolution in a principled manner. We represent data reduction schemes as progressive streams of bits and study how various bit orderings such as by resolution, by precision, etc., impact the resulting approximation error across a variety of data sets as well as analysis tasks. Furthermore, we compute streams that are optimized for different tasks to serve as lower bounds on the achievable error. Scientific data management systems can use the results presented in this paper as guidance on how to store and stream data to make efficient use of the limited storage and bandwidth in practice.

2018

K Knudson, B Wang.
**“Discrete Stratified Morse Theory: A User's Guide,”** In *CoRR*, 2018.

Inspired by the works of Forman on discrete Morse theory, which is a combinatorial adaptation to cell complexes of classical Morse theory on manifolds, we introduce a discrete analogue of the stratified Morse theory of Goresky and MacPherson (1988). We describe the basics of this theory and prove fundamental theorems relating the topology of a general simplicial complex with the critical simplices of a discrete stratified Morse function on the complex. We also provide an algorithm that constructs a discrete stratified Morse function out of an arbitrary function defined on a finite simplicial complex; this is different from simply constructing a discrete Morse function on such a complex. We borrow Forman's idea of a "user's guide," where we give simple examples to convey the utility of our theory.

W. Usher, S. Rizzi, I. Wald, J. Amstutz, J. Insley, V. Vishwanath, N. Ferrier, M. E. Papka,, V. Pascucci.
**“libIS: A Lightweight Library for Flexible In Transit Visualization,”** In *Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization*, ACM Press, 2018.

DOI: 10.1145/3281464.3281466

As simulations grow in scale, the need for in situ analysis methods to handle the large data produced grows correspondingly. One desirable approach to in situ visualization is in transit visualization. By decoupling the simulation and visualization code, in transit approaches alleviate common difficulties with regard to the scalability of the analysis, ease of integration, usability, and impact on the simulation. We present libIS, a lightweight, flexible library which lowers the bar for using in transit visualization. Our library works on the concept of abstract regions of space containing data, which are transferred from the simulation to the visualization clients upon request, using a client-server model. We also provide a SENSEI analysis adaptor, which allows for transparent deployment of in transit visualization. We demonstrate the flexibility of our approach on batch analysis and interactive visualization use cases on different HPC resources.

2016

C. Christensen, S. Liu, G. Scorzelli, J. Lee, P.-T. Bremer, V. Pascucci.
**“Embedded Domain-Specific Language and Runtime System for Progressive Spatiotemporal Data Analysis and Visualization,”** In *Symposium on Large Data Analysis and Visualization*, IEEE, 2016.

As our ability to generate large and complex datasets grows, accessing and processing these massive data collections is increasingly the primary bottleneck in scientific analysis. Challenges include retrieving, converting, resampling, and combining remote and often disparately located data ensembles with only limited support from existing tools. In particular, existing solutions rely predominantly on extensive data transfers or large-scale remote computing resources, both of which are inherently offline processes with long delays and substantial repercussions for any mistakes. Such workflows severely limit the flexible exploration and rapid evaluation of new hypotheses that are crucial to the scientific process and thereby impede scientific discovery. Here we present an embedded domain-specific language (EDSL) specifically designed for the interactive exploration of largescale, remote data. Our EDSL allows users to express a wide range of data analysis operations in a simple and abstract manner. The underlying runtime system transparently resolves issues such as remote data access and resampling while at the same time maintaining interactivity through progressive and interruptible computation. This system enables, for the first time, interactive remote exploration of massive datasets such as the 7km NASA GEOS-5 Nature Run simulation, which previously have been analyzed only offline or at reduced resolution.

D. Maljovec, S. Liu, Bei Wang, V. Pascucci, P. T. Bremer, D. Mandelli, C. Smith..
**“Analyzing Simulation-Based PRA Data Through Traditional and Topological Clustering: A BWR Station Blackout Case Study,”** In *Reliability Engineering & System Safety*, Vol. 145, Elsevier, pp. 262--276. January, 2016.

DOI: 10.1016/j.ress.2015.07.001

Dynamic probabilistic risk assessment (DPRA) methodologies couple system simulator codes (e.g., RELAP, MELCOR) with simulation controller codes (e.g., RAVEN, ADAPT). Whereas system simulator codes model system dynamics deterministically, simulation controller codes introduce both deterministic (e.g., system control logic, operating procedures) and stochastic (e.g., component failures, parameter uncertainties) elements into the simulation. Typically, a DPRA is performed by sampling values of a set of parameters, and simulating the system behavior for that specific set of parameter values. For complex systems, a major challenge in using DPRA methodologies is to analyze the large number of scenarios generated, where clustering techniques are typically employed to better organize and interpret the data. In this paper, we focus on the analysis of two nuclear simulation datasets that are part of the risk-informed safety margin characterization (RISMC) boiling water reactor (BWR) station blackout (SBO) case study. We provide the domain experts a software tool that encodes traditional and topological clustering techniques within an interactive analysis and visualization environment, for understanding the structures of such high-dimensional nuclear simulation datasets. We demonstrate through our case study that both types of clustering techniques complement each other in bringing enhanced structural understanding of the data.

2015

J. Bennett, F. Vivodtzev, V. Pascucci (Eds.).
**“Topological and Statistical Methods for Complex Data,”** Subtitled **“Tackling Large-Scale, High-Dimensional, and Multivariate Data Spaces,”** Mathematics and Visualization, Springer Berlin Heidelberg, 2015.

ISBN: 978-3-662-44899-1

This book contains papers presented at the Workshop on the Analysis of Large-scale,

High-Dimensional, and Multi-Variate Data Using Topology and Statistics, held in Le Barp,

France, June 2013. It features the work of some of the most prominent and recognized

leaders in the field who examine challenges as well as detail solutions to the analysis of

extreme scale data.

The book presents new methods that leverage the mutual strengths of both topological

and statistical techniques to support the management, analysis, and visualization

of complex data. It covers both theory and application and provides readers with an

overview of important key concepts and the latest research trends.

Coverage in the book includes multi-variate and/or high-dimensional analysis techniques,

feature-based statistical methods, combinatorial algorithms, scalable statistics algorithms,

scalar and vector field topology, and multi-scale representations. In addition, the book

details algorithms that are broadly applicable and can be used by application scientists to

glean insight from a wide range of complex data sets.

H. Bhatia, Bei Wang, G. Norgard, V. Pascucci, P. T. Bremer.
**“Local, Smooth, and Consistent Jacobi Set Simplification,”** In *Computational Geometry*, Vol. 48, No. 4, Elsevier, pp. 311-332. May, 2015.

DOI: 10.1016/j.comgeo.2014.10.009

The relation between two Morse functions defined on a smooth, compact, and orientable 2-manifold can be studied in terms of their Jacobi set. The Jacobi set contains points in the domain where the gradients of the two functions are aligned. Both the Jacobi set itself as well as the segmentation of the domain it induces, have shown to be useful in various applications. In practice, unfortunately, functions often contain noise and discretization artifacts, causing their Jacobi set to become unmanageably large and complex. Although there exist techniques to simplify Jacobi sets, they are unsuitable for most applications as they lack fine-grained control over the process, and heavily restrict the type of simplifications possible.

This paper introduces the theoretical foundations of a new simplification framework for Jacobi sets. We present a new interpretation of Jacobi set simplification based on the perspective of domain segmentation. Generalizing the cancellation of critical points from scalar functions to Jacobi sets, we focus on simplifications that can be realized by smooth approximations of the corresponding functions, and show how these cancellations imply simultaneous simplification of contiguous subsets of the Jacobi set. Using these extended cancellations as atomic operations, we introduce an algorithm to successively cancel subsets of the Jacobi set with minimal modifications to some userdefined metric. We show that for simply connected domains, our algorithm reduces a given Jacobi set to its

P. T. Bremer, D. Maljovec, A. Saha, Bei Wang, J. Gaffney, B. K. Spears, V. Pascucci.
**“ND2AV: N-Dimensional Data Analysis and Visualization -- Analysis for the National Ignition Campaign,”** In *Computing and Visualization in Science*, 2015.

One of the biggest challenges in high-energy physics is to analyze a complex mix of experimental and simulation data to gain new insights into the underlying physics. Currently, this analysis relies primarily on the intuition of trained experts often using nothing more sophisticated than default scatter plots. Many advanced analysis techniques are not easily accessible to scientists and not flexible enough to explore the potentially interesting hypotheses in an intuitive manner. Furthermore, results from individual techniques are often difficult to integrate, leading to a confusing patchwork of analysis snippets too cumbersome for data exploration. This paper presents a case study on how a combination of techniques from statistics, machine learning, topology, and visualization can have a significant impact in the field of inertial confinement fusion. We present the ND2AV: N-Dimensional Data Analysis and Visualization framework, a user-friendly tool aimed at exploiting the intuition and current work flow of the target users. The system integrates traditional analysis approaches such as dimension reduction and clustering with state-of-the-art techniques such as neighborhood graphs and topological analysis, and custom capabilities such as defining combined metrics on the fly. All components are linked into an interactive environment that enables an intuitive exploration of a wide variety of hypotheses while relating the results to concepts familiar to the users, such as scatter plots. ND2AV uses a modular design providing easy extensibility and customization for different applications. ND2AV is being actively used in the National Ignition Campaign and has already led to a number of unexpected discoveries.

H. Carr, Z. Geng, J. Tierny, A. Chattophadhyay,, A. Knoll.
**“Fiber Surfaces: Generalizing Isosurfaces to Bivariate Data,”** In *Computer Graphics Forum*, Vol. 34, No. 3, pp. 241-250. 2015.

Scientific visualization has many effective methods for examining and exploring scalar and vector fields, but rather fewer for multi-variate fields. We report the first general purpose approach for the interactive extraction of geometric separating surfaces in bivariate fields. This method is based on fiber surfaces: surfaces constructed from sets of fibers, the multivariate analogues of isolines. We show simple methods for fiber surface definition and extraction. In particular, we show a simple and efficient fiber surface extraction algorithm based on Marching Cubes. We also show how to construct fiber surfaces interactively with geometric primitives in the range of the function. We then extend this to build user interfaces that generate parameterized families of fiber surfaces with respect to arbitrary polylines and polygons. In the special case of isovalue-gradient plots, fiber surfaces capture features geometrically for quantitative analysis that have previously only been analysed visually and qualitatively using multi-dimensional transfer functions in volume rendering. We also demonstrate fiber surface extraction on a variety of bivariate data

J. Edwards, S. Kumar, V. Pascucci.
**“Big data from scientific simulations,”** In *Big Data and High Performance Computing*, Vol. 26, IOS Press, pp. 32. 2015.

Scientic simulations often generate massive amounts of data used for debugging, restarts, and scientic analysis and discovery. Challenges that practitioners face using these types of big data are unique. Of primary importance is speed of writing data during a simulation, but this need for fast I/O is at odds with other priorities, such as data access time for visualization and analysis, ecient storage, and portability across a variety of supercomputer topologies, congurations, le systems, and storage devices. The computational power of high-performance computing systems continues to increase according to Moore's law, but the same is not true for I/O subsystems, creating a performance gap between computation and I/O. This chapter explores these issues, as well as possible optimization strategies, the use of in situ analytics, and a case study using the PIDX I/O library in a typical simulation.

A. Gyulassy, A. Knoll, K. C. Lau, Bei Wang, PT. Bremer, M.l E. Papka, L. A. Curtiss, V. Pascucci.
**“Interstitial and Interlayer Ion Diffusion Geometry Extraction in Graphitic Nanosphere Battery Materials,”** In *Proceedings IEEE Visualization Conference*, 2015.

Large-scale molecular dynamics (MD) simulations are commonly used for simulating the synthesis and ion diffusion of battery materials. A good battery anode material is determined by its capacity to store ion or other diffusers. However, modeling of ion diffusion dynamics and transport properties at large length and long time scales would be impossible with current MD codes. To analyze the fundamental properties of these materials, therefore, we turn to geometric and topological analysis of their structure. In this paper, we apply a novel technique inspired by discrete Morse theory to the Delaunay triangulation of the simulated geometry of a thermally annealed carbon nanosphere. We utilize our computed structures to drive further geometric analysis to extract the interstitial diffusion structure as a single mesh. Our results provide a new approach to analyze the geometry of the simulated carbon nanosphere, and new insights into the role of carbon defect size and distribution in determining the charge capacity and charge dynamics of these carbon based battery materials.

O. A. von Lilienfeld, R. Ramakrishanan, M., A. Knoll.
**“Fourier Series of Atomic Radial Distribution Functions: A Molecular Fingerprint for Machine Learning Models of Quantum Chemical Properties,”** In *International Journal of Quantum Chemistry*, Wiley Online Library, 2015.

We introduce a fingerprint representation of molecules based on a Fourier series of atomic radial distribution functions. This fingerprint is unique (except for chirality), continuous, and differentiable with respect to atomic coordinates and nuclear charges. It is invariant with respect to translation, rotation, and nuclear permutation, and requires no pre-conceived knowledge about chemical bonding, topology, or electronic orbitals. As such it meets many important criteria for a good molecular representation, suggesting its usefulness for machine learning models of molecular properties trained across chemical compound space. To assess the performance of this new descriptor we have trained machine learning models of molecular enthalpies of atomization for training sets with up to 10 k organic molecules, drawn at random from a published set of 134 k organic molecules with an average atomization enthalpy of over 1770 kcal/mol. We validate the descriptor on all remaining molecules of the 134 k set. For a training set of 10k molecules the fingerprint descriptor achieves a mean absolute error of 8.0 kcal/mol, respectively. This is slightly worse than the performance attained using the Coulomb matrix, another popular alternative, reaching 6.2 kcal/mol for the same training and test sets.

S. Liu, D. Maljovec, Bei Wang, P. T. Bremer, V. Pascucci.
**“Visualizing High-Dimensional Data: Advances in the Past Decade,”** In *State of The Art Report*, *Eurographics Conference on Visualization (EuroVis)*, 2015.

Massive simulations and arrays of sensing devices, in combination with increasing computing resources, have generated large, complex, high-dimensional datasets used to study phenomena across numerous fields of study. Visualization plays an important role in exploring such datasets. We provide a comprehensive survey of advances in high-dimensional data visualization over the past 15 years. We aim at providing actionable guidance for data practitioners to navigate through a modular view of the recent advances, allowing the creation of new visualizations along the enriched information visualization pipeline and identifying future opportunities for visualization research.

J. M. Phillips, Bei Wang, Y. Zheng.
**“Geometric Inference on Kernel Density Estimates,”** In *CoRR*, Vol. abs/1307.7760, 2015.

We show that geometric inference of a point cloud can be calculated by examining its kernel density estimate with a Gaussian kernel. This allows one to consider kernel density estimates, which are robust to spatial noise, subsampling, and approximate computation in comparison to raw point sets. This is achieved by examining the sublevel sets of the

P. Skraba, Bei Wang, G. Chen, P. Rosen.
**“Robustness-Based Simplification of 2D Steady and Unsteady Vector Fields,”** In *IEEE Transactions on Visualization and Computer Graphics (to appear)*, 2015.

Vector field simplification aims to reduce the complexity of the flow by removing features in order of their relevance and importance, to reveal prominent behavior and obtain a compact representation for interpretation. Most existing simplification techniques based on the topological skeleton successively remove pairs of critical points connected by separatrices, using distance or area-based relevance measures. These methods rely on the stable extraction of the topological skeleton, which can be difficult due to instability in numerical integration, especially when processing highly rotational flows. In this paper, we propose a novel simplification scheme derived from the recently introduced topological notion of robustness which enables the pruning of sets of critical points according to a quantitative measure of their stability, that is, the minimum amount of vector field perturbation required to remove them. This leads to a hierarchical simplification scheme that encodes flow magnitude in its perturbation metric. Our novel simplification algorithm is based on degree theory and has minimal boundary restrictions. Finally, we provide an implementation under the piecewise-linear setting and apply it to both synthetic and real-world datasets. We show local and complete hierarchical simplifications for steady as well as unsteady vector fields.

I. Wald, A. Knoll, G. P. Johnson, W. Usher, V. Pascucci, M. E. Papka.
**“CPU Ray Tracing Large Particle Data with Balanced P-k-d Trees,”** In *2015 IEEE Scientific Visualization Conference*, IEEE, Oct, 2015.

DOI: 10.1109/scivis.2015.7429492

We present a novel approach to rendering large particle data sets from molecular dynamics, astrophysics and other sources. We employ a new data structure adapted from the original balanced k-d tree, which allows for representation of data with trivial or no overhead. In the OSPRay visualization framework, we have developed an efficient CPU algorithm for traversing, classifying and ray tracing these data. Our approach is able to render up to billions of particles on a typical workstation, purely on the CPU, without any approximations or level-of-detail techniques, and optionally with attribute-based color mapping, dynamic range query, and advanced lighting models such as ambient occlusion and path tracing.

2014

H. Bhatia, V. Pascucci, R.M. Kirby, P.-T. Bremer.
**“Extracting Features from Time-Dependent Vector Fields Using Internal Reference Frames,”** In *Computer Graphics Forum*, Vol. 33, No. 3, pp. 21--30. June, 2014.

DOI: 10.1111/cgf.12358

Extracting features from complex, time-dependent flow fields remains a significant challenge despite substantial research efforts, especially because most flow features of interest are defined with respect to a given reference frame. Pathline-based techniques, such as the FTLE field, are complex to implement and resource intensive, whereas scalar transforms, such as λ

This paper introduces a new data-driven technique to compute internal reference frames for large-scale complex flows. More general than uniformly moving frames, these frames can transform unsteady fields, which otherwise require substantial processing of resources, into a sequence of individual snapshots that can be analyzed using the large body of steady-flow analysis techniques. Our approach is simple, theoretically well-founded, and uses an embarrassingly parallel algorithm for structured as well as unstructured data. Using several case studies from fluid flow and turbulent combustion, we demonstrate that internal frames are distinguished, result in temporally coherent structures, and can extract well-known as well as notoriously elusive features one snapshot at a time.

H. Bhatia, A. Gyulassy, H. Wang, P.-T. Bremer, V. Pascucci .
**“Robust Detection of Singularities in Vector Fields,”** In *Topological Methods in Data Analysis and Visualization III*, Mathematics and Visualization, Springer International Publishing, pp. 3--18. March, 2014.

DOI: 10.1007/978-3-319-04099-8_1

Recent advances in computational science enable the creation of massive datasets of ever increasing resolution and complexity. Dealing effectively with such data requires new analysis techniques that are provably robust and that generate reproducible results on any machine. In this context, combinatorial methods become particularly attractive, as they are not sensitive to numerical instabilities or the details of a particular implementation. We introduce a robust method for detecting singularities in vector fields. We establish, in combinatorial terms, necessary and sufficient conditions for the existence of a critical point in a cell of a simplicial mesh for a large class of interpolation functions. These conditions are entirely local and lead to a provably consistent and practical algorithm to identify cells containing singularities.

H. Bhatia, V. Pascucci, P.-T. Bremer.
**“The Natural Helmholtz-Hodge Decomposition For Open-Boundary Flow Analysis,”** In *IEEE Transactions on Visualization and Computer Graphics (TVCG)*, Vol. 99, pp. 1566--1578. 2014.

DOI: 10.1109/TVCG.2014.2312012

The Helmholtz-Hodge decomposition (HHD) describes a flow as the sum of an incompressible, an irrotational, and a harmonic flow, and is a fundamental tool for simulation and analysis. Unfortunately, for bounded domains, the HHD is not uniquely defined, and traditionally, boundary conditions are imposed to obtain a unique solution. However, in general, the boundary conditions used during the simulation may not be known and many simulations use open boundary conditions. In these cases, the flow imposed by traditional boundary conditions may not be compatible with the given data, which leads to sometimes drastic artifacts and distortions in all three components, hence producing unphysical results. Instead, this paper proposes the natural HHD, which is defined by separating the flow into internal and external components. Using a completely data-driven approach, the proposed technique obtains uniqueness without assuming boundary conditions a priori. As a result, it enables a reliable and artifact-free analysis for flows with open boundaries or unknown boundary conditions. Furthermore, our approach computes the HHD on a point-wise basis in contrast to the existing global techniques, and thus supports computing inexpensive local approximations for any subset of the domain. Finally, the technique is easy to implement for a variety of spatial discretizations and interpolated fields in both two and three dimensions.