ABSTRACT
 Title of dissertation: HIGHLY PARALLEL GEOMETRIC
 CHARACTERIZATION AND VISUALIZATION
 OF VOLUMETRIC DATA SETS
 Derek Christopher Juba
 Dissertation directed by: Professor Amitabh Varshney
 Department of Computer Science
 Volumetric 3D data sets are being generated in many di erent application
 areas. Some examples are CAT scans and MRI data, 3D models of protein molecules
 represented by implicit surfaces, multi-dimensional numeric simulations of plasma
 turbulence, and stacks of confocal microscopy images of cells. The size of these data
 sets has been increasing, requiring the speed of analysis and visualization techniques
 to also increase to keep up.
 Recent advances in processor technology have stopped increasing clock speed
 and instead begun increasing parallelism, resulting in multi-core CPUS and many-
 core GPUs. To take advantage of these new parallel architectures, algorithms must
 be explicitly written to exploit parallelism. In this thesis we describe several algo-
 rithms and techniques for volumetric data set analysis and visualization that are
 amenable to these modern parallel architectures.
 We  rst discuss modeling volumetric data with Gaussian Radial Basis Func-
 tions (RBFs). RBF representation of a data set has several advantages, including
lossy compression, analytic di erentiability, and analytic application of Gaussian
 blur. We also describe a parallel volume rendering algorithm that can create images
 of the data directly from the RBF representation.
 Next we discuss a parallel, stochastic algorithm for measuring the surface area
 of volumetric representations of molecules. The algorithm is suitable for implemen-
 tation on a GPU and is also progressive, allowing it to return a rough answer almost
 immediately and re ne the answer over time to the desired level of accuracy.
 After this we discuss the concept of Con uent Visualization, which allows the
 visualization of the interaction between a pair of volumetric data sets. The interac-
 tion is visualized through volume rendering, which is well suited to implementation
 on parallel architectures.
 Finally we discuss a parallel, stochastic algorithm for classifying stem cells as
 having been grown on a surface that induces di erentiation or on a surface that does
 not induce di erentiation. The algorithm takes as input 3D volumetric models of the
 cells generated from confocal microscopy. This algorithm builds on our algorithm
 for surface area measurement and, like that algorithm, this algorithm is also suitable
 for implementation on a GPU and is progressive.
HIGHLY PARALLEL GEOMETRIC
 CHARACTERIZATION AND VISUALIZATION
 OF VOLUMETRIC DATA SETS
 by
 Derek Christopher Juba
 Dissertation submitted to the Faculty of the Graduate School of the
 University of Maryland, College Park in partial ful llment
 of the requirements for the degree of
 Doctor of Philosophy
 2012
 Advisory Committee:
 Dr. Amitabh Varshney, Chair/Advisor
 Dr. Shuvra Bhattacharyya
 Dr. Antonio Cardone
 Dr. Joseph JaJa
 Dr. David Mount
c Copyright by
 Derek Christopher Juba
 2012
Foreword
 This document contains work which was co-authored with other colleagues.
 I certify that I have made substantial contributions to all jointly authored work
 included in this document.
 ii
Acknowledgments
 I would like to thank my advisor, Professor Amitabh Varshney, for his support
 during my time in the graduate program and for sticking with me even when the
 going was rough.
 I would also like to thank my family for their continuous support and encour-
 agement throughout my academic career.
 iii
Table of Contents
 List of Tables vi
 List of Figures viii
 1 Introduction 1
 1.1 Gaussian RBF Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . 2
 1.2 Parallel Stochastic Measurement . . . . . . . . . . . . . . . . . . . . . 4
 1.3 Con uent Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 6
 1.4 Regenerative Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . 8
 2 Gaussian RBF Fitting 10
 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
 2.3 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
 2.3.1 Multiresolution Hierarchical Fitting . . . . . . . . . . . . . . . 16
 2.3.2 Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 18
 2.3.3 Anisotropic Width Selection . . . . . . . . . . . . . . . . . . . 20
 2.3.4 Weight selection . . . . . . . . . . . . . . . . . . . . . . . . . . 22
 2.3.5 Choice of the Basis Function . . . . . . . . . . . . . . . . . . . 23
 2.3.6 Binary Representation . . . . . . . . . . . . . . . . . . . . . . 24
 2.3.7 Fitting Results . . . . . . . . . . . . . . . . . . . . . . . . . . 25
 2.4 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
 2.4.1 GPU Direct Volume Rendering . . . . . . . . . . . . . . . . . 28
 2.4.2 View-Dependent Level of Detail . . . . . . . . . . . . . . . . . 29
 2.4.3 Blending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
 2.4.4 Rendering Results . . . . . . . . . . . . . . . . . . . . . . . . 33
 2.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
 3 Parallel Stochastic Surface Measurement 37
 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
 3.2.1 Molecular Surface Area Computation . . . . . . . . . . . . . . 40
 3.2.2 General Purpose GPU Computing . . . . . . . . . . . . . . . . 41
 3.3 Gaussian Molecular Modelling . . . . . . . . . . . . . . . . . . . . . . 43
 3.4 Stochastic Area Measurement . . . . . . . . . . . . . . . . . . . . . . 44
 3.4.1 Sampling the Space of Lines . . . . . . . . . . . . . . . . . . . 45
 3.4.2 Intersection Counting . . . . . . . . . . . . . . . . . . . . . . . 47
 3.5 Parallelization for GPU . . . . . . . . . . . . . . . . . . . . . . . . . . 48
 3.6 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
 3.7 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . 51
 4 Con uent Visualization 55
 iv
5 Parallel Stochastic Classi cation of Stem Cells 60
 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
 5.2 Technical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
 5.3 Geometric Characterization of the Cell . . . . . . . . . . . . . . . . . 68
 5.4 High-Throughput Processing . . . . . . . . . . . . . . . . . . . . . . . 71
 5.5 Classi cation Through Machine Learning . . . . . . . . . . . . . . . . 74
 5.6 Performance and Validation . . . . . . . . . . . . . . . . . . . . . . . 75
 5.7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
 5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
 6 Future Work 83
 Bibliography 85
 v
List of Tables
 2.1 Comparison of iterative  tting of isotropic RBFs, iterative  tting of
 anisotropic RBFs, and single-step  tting of anisotropic RBFs. See
 section 2.3.7 for a description of the data sets. The iterative  tting
 method can produce a somewhat better  t, but would take 3-4 times
 longer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
 2.2 Fitting results for several di erent data sets using anositropic MLE-
 based RBF width selection. For comparison, the VHF Torso and
 LLNL R-M data sets have been downsampled to several di erent
 resolutions prior to  tting. . . . . . . . . . . . . . . . . . . . . . . . . 26
 2.3 Comparison of no blending, four-sample blending, and eight-sample
 blending (see section 2.4.3) for the 2563 UNC Head data set and the
 10243 LLNL R-M data set. Screen shots are given in  g. 2.6 . . . . . 33
 2.4 Rendering results for several di erent transfer functions and data sets
 of several di erent sizes. All renderings were done using four-sample
 blending (see section 2.4.3). Step Size is the length of a step along
 the ray given that the data volume is a unit cube. Images of several
 of these data sets are given in  g. 2.7. . . . . . . . . . . . . . . . . . 36
 3.1 Comparison of the surface areas and running times of our method,
 LSMS with a 1283 grid, LSMS with a 2563 grid, MSMS, and SURF.
 Protein size is given in RBFs, which is equal to the number of atoms
 listed in the PDB  le. For our method, we used 20,000 sample lines.
 All methods computed the SES area (or an approximation to it) of
 all disconnected surface components using a probe sphere radius of
 1.4  A. Tests were performed on a machine with a GeForce 8800 GTX
 GPU, an Intel Xeon 3.0 GHz CPU, and 4 GB of RAM. A ?|? means
 that MSMS or SURF was unable to compute a surface for this molecule. 53
 5.1 Percent correct classi cation with various threshold values. The vox-
 els in the data set contained values in the range [0, 15]. Parameters
 used were 106 lines intersected, minimum gap length of 8.0, linear
 SVM, ten-fold cross validation. . . . . . . . . . . . . . . . . . . . . . 67
 5.2 Percent correct classi cation with various minimum gap lengths. Pa-
 rameters used were 106 lines intersected, threshold value of 3.0, linear
 SVM, ten-fold cross validation. . . . . . . . . . . . . . . . . . . . . . 68
 5.3 Results of intersecting a cell with di erent numbers of lines. The CPU
 algorithm was run on an Intel Xeon X5260 (using only one core) with
 8 GB of RAM. The GPU algorithms were run on an NVIDIA Tesla
 C2050. The dimensions of the cell volume data were 512 512 20.
 The threshold was 3.0 and the minimum gap length was 8.0. . . . . . 78
 vi
5.4 Example of one fold of the validation of a trained SVM model. His-
 tograms were generated using the GPU Lists algorithm with thresh-
 old 3.0 and minimum gap length 8.0. An SVM was trained with 19
 nano ber cells and 18 spun-coat cells. An additional 2 nano ber cells
 and 2 spun-coat cells were set aside to be classi ed by the model. Of
 these 4 cells, 3 were classi ed correctly. . . . . . . . . . . . . . . . . . 80
 vii
List of Figures
 1.1 Renderings from three di erent levels of an RBF hierarchy for the
 UNC Head data set. From left to right, the renderings were generated
 using 4, 6, and 8 levels from the octree and consist of 561, 20.6 K,
 and 485 K RBFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
 1.2 Given a scalar  eld data set such as a cell or molecule (part a) we  t
 the data with Gaussian radial basis functions (part b) then intersect
 the data and a bounding sphere with a set of random lines, counting
 the number of intersections (part c). . . . . . . . . . . . . . . . . . . 6
 1.3 Example of Con uent Visualization in gyrokinetic turbulence sim-
 ulation. Visible structures indicate zones of high heat  ux. Color
 represents values of the electric potential. . . . . . . . . . . . . . . . . 7
 1.4 Scanning Electron Micrographs (SEM) of surfaces upon which stem
 cells can be grown and confocal microscopy models of cells grown
 on the surfaces. Left to right: SEM of spun-coat  lm surface that
 does not induce di erentiation. Cell grown on spun-coat  lm. SEM
 of nano ber sca old surface that does induce di erentiation. Cell
 grown on nano ber sca old. . . . . . . . . . . . . . . . . . . . . . . . 9
 2.1 Renderings from three di erent levels of an RBF hierarchy for the
 UNC Head data set. From left to right, the renderings were generated
 using 4, 6, and 8 levels from the octree and consist of 561, 20.6 K,
 and 485 K RBFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
 2.2 Overview of the  tting process. An RBF is  rst  t to low-resolution
 data (a), and the  tting error is evaluated at the full resolution (b).
 If the error is too high, additional RBFs are  t at a higher resolution
 (c), and their error evaluated (d). In the actual implementation the
 borders of each block would overlap neighboring blocks by one data
 point on each side, giving each block a resolution of 43 rather than
 23 { this is omitted in this illustration for clarity. . . . . . . . . . . . 18
 2.3 Blending between quadtree nodes. Only one edge-adjacent node
 (blue) and one corner-adjacent node (red) are shown. Sample points
 within the central block (grey) would be blended with samples from
 adjacent blocks whose shaded regions the points fell in. . . . . . . . . 19
 2.4 RBF and octree parameters packed into 32 bytes. . . . . . . . . . . . 25
 2.5 Comparison of varying amounts of view-dependent Level of Detail
 reduction (see section 2.4.2) for the 10243 LLNL R-M data set. The
 top row shows the renderings with the LOD reduction applied. The
 bottom row shows which LOD is being used in which area of the
 volume by color coding| from front to back, the colors represent
 levels of detail generated from 10, 9, 8, and 7 levels of the octree.
 From left to right, the rendering times are 0.59s, 0.53s, 0.48s, and 0.44s. 30
 viii
2.6 Comparison of (left to right) no blending, four-sample blending, and
 eight-sample blending (see section 2.4.3) for the 2563 UNC Head and
 10243 LLNL R-M data sets. Rendering times are given in table 2.3. . 32
 2.7 Rendering results using three di erent transfer functions for the 5123
 VHF Torso and the 2563 UNC Head. Rendering times and other
 details are given in table 2.4. . . . . . . . . . . . . . . . . . . . . . . . 34
 3.1 2D distributions of points generated on a regular grid (left), from the
 Niederreiter quasi-random sequence [1] (middle), and from a pseudo-
 random sequence (right). . . . . . . . . . . . . . . . . . . . . . . . . . 46
 3.2 Surface area approximation errors for several proteins using various
 numbers of intersected lines. Vertical axis is the percent error from
 the  nal value (the area returned with 106 sample lines). Note that
 all proteins have converged to near their  nal area by 20,000 lines. . . 50
 3.3 Surface area computation times and percent di erences for proteins
 of di erent sizes using several di erent algorithms. Data is from table
 3.6. Protein size is given in RBFs, which is equal to the number of
 atoms listed in the PDB  le. For MSMS and SURF, data is not
 available for the larger proteins since these programs were unable to
 compute the area for proteins of that size. Percent di erences are
 given as absolute magnitudes. . . . . . . . . . . . . . . . . . . . . . . 52
 3.4 Point-based renderings of the protein with Protein Data Bank ID
 1HTO using 104 points (left), 105 points (middle), and 106 points
 (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
 4.1 Computational Domain: Visible Structures Indicate Zones of High
 Heat Flux. Color Represents Values of the Electric Potential. . . . . . 57
 4.2 Flux Tube Domain in a Toroidal Device. The Circular Region is an
 Enlargement of an Area of the Ribbon Shaped Domain to Show Detail. 58
 5.1 Left to right: SEM (Scanning Electron Micrograph) of 2D PCL spun-
 coat  lm, an hBMSC cultured on a 2D PCL spun-coat  lm, SEM of
 3D PCL nano ber sca old, and an hBMSC cultured in a 3D PCL
 nano ber sca old. hBMSC images are reconstructed from 3D confo-
 cal  uorescent scans of  uorescently stained actin. . . . . . . . . . . . 62
 5.2 Overview of our cell data processing pipeline. . . . . . . . . . . . . . 65
 5.3 Top-down view of lines de ned by points on a cylinder enclosing a
 cell. The lengths of the intersections of these lines with the cell are
 stored in a histogram. . . . . . . . . . . . . . . . . . . . . . . . . . . 70
 5.4 GPU memory and multiprocessor layout. . . . . . . . . . . . . . . . . 71
 5.5 Use of CPU and GPU to generate segment length histogram from cell
 volume using atomic operations algorithm. . . . . . . . . . . . . . . . 73
 5.6 Use of CPU and GPU to generate segment length histogram from cell
 volume using parallel reduction algorithm. . . . . . . . . . . . . . . . 74
 ix
5.7 Use of CPU and GPU to generate segment length histogram from cell
 volume using segment length list algorithm. . . . . . . . . . . . . . . 75
 5.8 Trained Support Vector Machine (SVM). The hyperplane that sepa-
 rates the two classes of histograms is the zero level-set of a decision
 function that divides space into a positive half and a negative half.
 Histograms that fall in the positive half are labeled as one class while
 histograms that fall in the negative half are labeled as the other. . . . 76
 5.9 Percent correct classi cation of di erentiated and non-di erentiated
 stem cells. This data was generated using the GPU Lists algorithm
 with threshold 3.0 and minimum gap length 8.0. . . . . . . . . . . . . 81
 x
Chapter 1
 Introduction
 Modern technology has given rise to the generation of 3D volumetric data
 sets in many application areas. CAT scans and MRI data are being generated
 by the medical  eld. 3D models of protein molecules represented by implicit sur-
 faces are being generated by computational biologists. Physicists are generating
 multi-dimensional numeric simulations of plasma turbulence. Confocal microscopy
 technology is allowing biologists to capture stacks of cell images at varying depths,
 which can then be combined to create a 3D model of the cell.
 Because these data sets can be large and can have a need for high-throughput
 processing, we would like to leverage the most recent advances in high-performance
 computing for their analysis. Recent advances in processor technology have stopped
 increasing clock speed and instead begun increasing parallelism, resulting in multi-
 core CPUS and many-core GPUs.
 To take advantage of these new parallel architectures, algorithms must be
 explicitly written to exploit parallelism. We have found that progressive stochastic
 algorithms map well to these architectures. With these algorithms each parallel
 processor can compute one stochastic sample{ scaling to more processors then simply
 results in an increase in the rate at which samples are taken. An additional bene t
 is that the progressive nature of these algorithms allows an initial estimated result
 1
to be produced quickly, and then re ned over time. The threads also tend to be
 independent, requiring little or no inter-thread communication.
 We have applied these techniques and algorithms to several di erent areas.
 The  rst is  tting scalar  elds with Gaussian radial basis functions in the context of
 medical imaging and physical simulations. In addition, we have done work on using
 parallel, stochastic algorithms to measure molecular surface area. We have also done
 work in con uent visualization{ using one data set to control the visualization of
 another{ in the context of plasma physics. Finally, we have done work in the area
 of stem cell therapy, using these algorithms to identify whether or not a cell has
 di erentiated into a desired type of tissue.
 1.1 Gaussian RBF Fitting
 3D volumetric data sets are often represented as simple scalar  elds. This is
 su cient for many purposes, but some applications can bene t from representing the
 data as a mixture of Gaussian Radial Basis Functions (RBFs). Gaussian RBFs are
 easily di erentiable, which allows certain derived quantities such as the curvature of
 an implicit surface to be computed analytically. A Gaussian RBF representation can
 also have a Gaussian blur applied analytically by simply adjusting the parameters
 of each RBF.
 To  t RBFs to data, we make use of our work on Gaussian RBF modelling and
 rendering [2]. In this work we  t the RBFs using a multi-resolution octree hierarchy.
 We  rst  t a single RBF to the highest level of the octree. We then evaluate the
 2
 tting error, and if it is too high, we subdivide the octree node into eight children
 and  t an RBF to each child. We then evaluate the  tting error of each of the
 children and continue the subdivision and  tting until the error drops below a given
 threshold.
 To  t an RBF to an octree node we must determine the RBF weight, width,
 and center. The center of the RBF is set to the location of maximum error. The
 weight is then set such that the error at the center is zero. The anisotropic width can
 be computed using several methods. One method involves incrementally increasing
 the width along each axis until the increases stop decreasing the  tting error. An-
 other method involves setting the width non-iteratively using Maximum Likelihood
 Estimation.
 Figure 1.1: Renderings from three di erent levels of an RBF hierarchy for the UNC
 Head data set. From left to right, the renderings were generated using 4, 6, and 8
 levels from the octree and consist of 561, 20.6 K, and 485 K RBFs.
 In addition to the  tting algorithm, we also describe a method of rendering
 a  tted data set directly from the RBF representation. We perform direct volume
 rendering by incrementally stepping along rays shot from an eye position through
 3
the data set. At each step along the ray we evaluate the RBFs in the hierarchy at
 that point and use this value to get a color and opacity from a transfer function.
 We use the colors and opacities along each ray to produce the  nal color of a pixel
 in the image.
 The octree hierarchy allows us to make several optimizations in the rendering.
 One optimization we can make is if the sample point along the ray is far from the
 eye position, we can produce a lower resolution version of the data by not including
 RBFs from all levels of the octree, instead stopping the evaluation of the RBFs at
 some level above the leaves. Another optimization is that we can skip over octree
 cells that do not contain a range of values that would give color or opacity with the
 current transfer function.
 1.2 Parallel Stochastic Measurement
 Some applications require measuring geometric properties of a 3D object such
 as its surface area or volume. For applications where the data is large or high-
 throughput is required, it would be bene cial to perform these measurements using
 a parallel algorithm. Such an algorithm is described in our work on measuring the
 surface area of molecules by intersecting the surface of the molecule with a set of
 random lines [3].
 In this work we model molecules as a sum of Gaussian radial basis functions,
 with each atom being represented with a single RBF. The representation of atoms
 by Gaussian RBFs is inspired by the electron cloud around each atom. Unlike
 4
our previously described work on Gaussian RBF  tting we do not do the  tting
 hierarchically, instead using a simple bucketing spacial data structure. RBF centers
 are set to atom centers, and RBF weights and widths are set based on atomic radius
 so that an isosurface of the RBF corresponds to a van der walls surface of the atom.
 Once we have the RBF model of the molecule, we create a set of random
 lines to intersect it with by choosing pairs of points on the surface of a bounding
 sphere. We uniformly generate points on the surface of the sphere by using a quasi-
 random sequence of numbers to give the point coordinates. Each pair of points then
 determines a line.
 To count the number of intersections of each line with the molecule we incre-
 mentally step along each line evaluating the RBFs representing the molecule?s atoms
 at that point. We determine whether each point is inside or outside the molecule by
 comparing the value of the RBFs with an isovalue. If a point is found to be inside
 the molecule when the previous point was outside, or vice versa, an intersection with
 the molecular surface is recorded.
 Once we have the count of intersections with the molecular surface, we can
 use the Cauchy-Crofton formula from integral geometry to estimate the molecular
 surface area. This formula relates the surface area and number of line intersections
 with a bounding sphere to the surface area and number of intersections with the
 enclosed molecular surface. Since the surface area of the bounding sphere, the
 number of intersections with the bounding sphere, and the number of intersections
 with the molecular surface area are known, the area of the molecular surface can be
 estimated.
 5
(a) (b) (c)
 Figure 1.2: Given a scalar  eld data set such as a cell or molecule (part a) we  t
 the data with Gaussian radial basis functions (part b) then intersect the data and
 a bounding sphere with a set of random lines, counting the number of intersections
 (part c).
 This algorithm has several nice features. First, it is easily parallelizable since
 each line?s intersections with the molecule can be calculated independently. Our
 parallel GPU implementation is 3x { 10x faster than existing CPU implementations
 for the same error level. Second, it is incremental, meaning that it returns a rough
 estimate of the surface almost immediately, and gives gradually more accurate es-
 timates as more lines are intersected. This is useful since in some applications the
 rough estimate might be used to determine whether or not additional time should
 be spent to compute the more re ned estimate.
 1.3 Con uent Visualization
 Some visualization applications involve understanding the spatial relationship
 between a pair of superimposed data sets. One example is the phenomenon of stem
 cell di erentiation being induced by surface geometry, which involves a data set
 6
of the cell itself and a data set of the surface on which it is resting. Interesting
 information may be derived by looking at the locations where the two data sets
 interact.
 Another example is physical simulations of gyrokinetic turbulence performed
 in the context of research on fusion energy. These simulations can produce complex,
 high dimensional data sets that are di cult to visualize. In order to make the data
 more amenable to visualization, derived quantities are often visualized instead{ in
 this case heat  ux and electric potential.
 Figure 1.3: Example of Con uent Visualization in gyrokinetic turbulence simulation.
 Visible structures indicate zones of high heat  ux. Color represents values of the
 electric potential.
 To visualize these quantities we use standard direct volume rendering with a
 color and opacity transfer function. However, unlike in traditional direct volume
 rendering where the same value from the scalar  eld would be used to generate both
 the color and opacity, we use electric potential to generate the color and heat  ux
 to generate the opacity. The result is that values of electric potential are visualized
 7
in areas of high heat  ux. We refer to this as Con uent Visualization [4].
 1.4 Regenerative Medicine
 Tissue transplants are used to treat disorders of many di erent organs and
 tissues, including the heart, the bladder, and the urethra. Unfortunately there
 are several drawbacks of current transplant technology, such as requiring patients
 to be on immunosuppresant drugs for the remainder of their life. Patients must
 also compete for a limited supply of donor organs. These and other drawbacks of
 transplants have created a large market for stem cell and regenerative medicine.
 Stem cells are cells which have the ability to di erentiate and grow into mul-
 tiple di erent types of organ and tissue. One issue with stem cells is how to get
 them to di erentiate into the desired cell type. This can be accomplished through
 exposure to various chemicals, but one current line of work has shown that in some
 cases stem cells can be made to di erentiate simply by placing them on a surface of
 a certain texture.
 When working with stem cells it is useful to be able to tell whether a given
 cell population has di erentiated or not. We have developed an algorithm based
 on our work on molecular surface area measurement [3] that can identify whether
 a cell was grown on a di erentiating or non-di erentiating surface with over 80%
 accuracy. The algorithm works on a 3D model of the cell which can be generated
 using confocal microscopy.
 As with our molecular measurement work, this algorithm begins by intersecting
 8
Figure 1.4: Scanning Electron Micrographs (SEM) of surfaces upon which stem
 cells can be grown and confocal microscopy models of cells grown on the surfaces.
 Left to right: SEM of spun-coat  lm surface that does not induce di erentiation.
 Cell grown on spun-coat  lm. SEM of nano ber sca old surface that does induce
 di erentiation. Cell grown on nano ber sca old.
 the 3D model of the cell with a set of random lines. The lengths of the line segments
 that pass through the interior of the cell are stored in a histogram. These histograms
 can be used as data points to train a machine learning algorithm, such as a Support
 Vector Machine. Once the algorithm is trained, histograms of query cells can be
 classi ed as coming from either a di erentiating or non-di erentiating surface.
 Like our molecular measurement algorithm, this algorithm is parallel and pro-
 gressive, making it suitable for future high-throughput cell classi cation applica-
 tions. Classi cation of the cell by analysis of its shape also allows the cell to be
 classi ed at an earlier stage of development than is possible by other means such as
 chemical staining.
 9
Chapter 2
 Gaussian RBF Fitting
 2.1 Introduction
 In this chapter we discuss modelling volumetric data with Gaussian Radial
 Basis Functions (RBFs). Representation of volumetric data with Gaussian RBFs
 has several advantages. First, the RBF representation may be smaller than the
 original data, resulting in lossy compression. This reduction in data size can reduce
 memory accesses resulting in a performance increase. Second, Gaussian RBFs are
 easily di erentiable, which allows certain derived quantities such as the curvature of
 an implicit surface to be computed analytically. Third, a Gaussian RBF represen-
 tation can also have a Gaussian blur applied analytically by simply adjusting the
 parameters of each RBF.
 In addition to the RBF representation, we describe a parallel, GPU-accelerated
 volume rendering algorithm that can generate images of the data directly from the
 RBF representation.
 The following material in this chapter applies the concepts of RBF  tting
 to medical and physics data, but the concepts are general enough that they could
 be applied to any type of volumetric data. The material in this chapter has been
 previously published [2].
 Scienti c visualization is currently facing a grand challenge in coping with vast
 10
quantities of data arising from high- delity acquisitions and large-scale scienti c
 simulations. Such datasets range from a few gigabytes to several terabytes and are
 often characterized by an imperative need for interactive exploratory visualization
 capabilities. For instance, the Richtmyer-Meshkov instability simulation performed
 at the Lawrence Livermore National Laboratory [5] has produced a 2048 2048 1920
 data over 273 time steps. The sheer size of this data makes it a challenge to visualize
 it on commodity graphics hardware.
 We believe that implicit representations o er a powerful model for facilitating
 interactive visual exploration of large volumetric datasets. Implicit functions have
 been used for almost two decades in visual computing. Introduced as blobby models
 [6] and metaballs [7], they have grown to be widely used in games and movies. A nice
 overview of some of the early work in implicit surfaces can be found in [8]. The use of
 implicit representations for volume modeling and rendering is a recent phenomenon.
 Implicit representations o er several advantages for volumes. First, their expressive
 power makes them well-suited for trading o memory accesses with computation.
 This is proving to be a powerful technique to hide memory latency for modern multi-
 core architectures. Second, the analytical formulation of implicit representations is
 highly amenable to local geometry processing, such as computing  rst and higher-
 order derivatives. Indeed, recent research has established the relationship of such
 geometric operators to features such as vortices and shocks by Weiler et al. [9] and
 the principled design of transfer functions by Kniss et al. [10, 11]. Third, volumetric
 implicit representations o er a multi-scale, and view-dependent capability to control
 visual detail in an intuitive manner. This is likely to have a direct implication in
 11
Figure 2.1: Renderings from three di erent levels of an RBF hierarchy for the UNC
 Head data set. From left to right, the renderings were generated using 4, 6, and 8
 levels from the octree and consist of 561, 20.6 K, and 485 K RBFs.
 devising techniques for facilitating comprehension and reducing clutter in visual
 depictions.
 In this chapter we present an algorithm for succinctly representing and e -
 ciently rendering large scalar volumetric data using a hierarchy of implicit functions
 based on anisotropic radial basis functions. The novel contributions of our work are:
 1. A multiresolution representation using anisotropic radial basis functions that
 can encode a given volumetric dataset with progressively greater detail. Our
 representation is well-suited for time-critical rendering, progressive re nement,
 view-dependent level-of-detail, and progressive transmission.
 2. An O(n log n)-scalable  tting algorithm based on Maximum Likelihood Es-
 timation that can rapidly  t large datasets in a memory-friendly manner.
 Speci cally, we can  t 5123 dataset in around 20 minutes with a 1:6% RMS
 error.
 12
3. A GPU-based ray-casting algorithm that can e ciently and directly render
 from the hierarchical implicit representation of volumes. Our GPU-based
 ray-casting algorithm supports acceleration techniques such as empty-space
 skipping and early-ray termination with implicit volumes.
 4. A multiresolution octree hierarchy over implicit RBFs that can leverage prior
 work on octree location codes for e cient ray-casting, obviates the need to
 store cell boundaries, and enables view-dependent level-of-detail rendering.
 We review related work in section 2.2. We present our  tting algorithm in
 section 2.3 and our direct volume rendering algorithm in section 2.4. Finally, we
 conclude with some suggestions for future work in section 2.5.
 2.2 Related Work
 The basic goal of implicit function  tting is to  t a representation f(xi) =
 di; i = 1; : : : ; n, to surface or volumetric data, where di are appropriately chosen
 scalars, such that the level sets of the function f; given by the values di have some
 meaning associated either with the geometry or the properties of the object. When
  tting an implicit function to the input data, a local form of the  tting expression
 is desirable, since the  t must adapt to local geometrical features. Radial basis
 functions (RBFs) have been shown to be a versatile  tting tool in various  elds.
 A strong mathematical basis for their theory has been established and theorems
 showing accuracy in various normed spaces have been proven for both surface and
 volumetric data [12, 13, 14, 15, 16, 17, 18, 19]. These representations have several
 13
nice properties associated with sensitivity to local features, ability to support di er-
 ent levels of detail, ability to mitigate noise, ability to incorporate various degrees of
 smoothness and other priors via regularization and choice of particular RBFs (e.g.,
 thin plate splines [20]). While these methods are considered to be accurate, their
 naive implementations are both expensive to  t, and subsequently evaluate, making
 them not very popular methods for  tting large datasets of the type we are dealing
 with in this chapter.
 Research by Beatson et al. [21] shows how RBF function  tting and evalua-
 tion of the  tted function can be sped up by an order of magnitude using the fast
 multipole method (FMM) [22]. They used non-compactly supported RBFs due to
 their interpolation and extrapolation properties. Morse et al. [15] were the  rst to
  t surfaces using compactly-supported RBFs. Due to their local region of in u-
 ence, compactly-supported RBFs can permit e cient evaluation and the ability to
 incorporate local changes to the  tted function without the need for the FMM data-
 structures. Co et al. [23] and Jang et al. [24] were the  rst to represent scattered and
 irregular volumetric scalar  elds using RBFs. They used Principal Component Anal-
 ysis (PCA) [25] to cluster and determine centers for the Gaussian RBFs and used the
 Levenberg-Marquardt optimization method to determine the Gaussian RBF vari-
 ances. Weiler et al. [9] presented a k-d-tree-based method to  t Gaussian RBFs
 to an unstructured volumetric vector  eld. In addition to using PCA clustering,
 they select some of their RBF centers to be at peaks and troughs of low frequencies
 in the data. They also use an approximate iterative method to quickly solve the
 system of equations for the RBF weights. Hong et al. [26] use arbitrarily-oriented
 14
elliptical RBFs to  t data on an irregular grid. The RBF variances and orientations
 are chosen to match the Voronoi cells of the data points. PCA is used to create
 an initial guess for the RBF parameters, which is then re ned using an iterative
 optimization algorithm. Jang et al. [27] give a method of  tting arbitrarily-oriented
 elliptical RBFs using non-linear optimization, and extend it to support vector data.
 Rendering of implicit functions has a rich history in visual computing. Ray
 tracing of implicit surfaces has been reasonably well-studied [6, 28, 29, 30, 31]. An
 alternative to ray-tracing is to sample the implicit surface with a collection of well-
 distributed particles and then render such particles. Methods for carefully sampling
 the implicit surfaces have been discussed by Witkin and Heckbert [32] as well as by
 Turk and O?Brien [18]. Another approach involves converting the implicit functions
 to polygons by using marching cubes [33] or continuation methods [34, 35]. The
 polygons can then be rendered as in traditional graphics. Direct volume rendering
 of implicit functions is a relatively new endeavor. Jang et al. [24] and Weiler et
 al. [9] have developed GPU-based volume rendering algorithms that proceed in a
 slice-by-slice fashion. For each fragment in a slice, a fragment program iterates over
 RBF parameters stored in a texture, computes the scalar value at that location, and
 looks up the corresponding color from a 1D texture. Neophytou et al. [36] present a
 splatting-based GPU-accelerated volume rendering method for arbitrarily-oriented
 elliptical RBFs. Their method splats each RBF onto each intersecting slice as a
 textured polygon and accumulates the splats in a texture bu er. The slice can then
 be rendered after classi cation with a fragment program.
 In this chapter we present the  rst GPU-accelerated ray-casting algorithm for
 15
direct rendering of implicit volumes and the  rst octree-structured RBF representa-
 tion of regular volume data that lends itself well to adaptive and progressive levels
 of detail.
 2.3 Modelling
 We model volume data as a sum of Radial Basis Functions (RBFs). Symboli-
 cally, this can be represented as:
 f(x) = w0 +
 MX
 i=1
 wi  i(x) (2.1)
 where
 x is the position vector of a point in the volume
 f(x) is the scalar value at that point
  i is the ith RBF
 M is the number of RBFs
 w0 is a constant term
 wi is the weight of the ith RBF
 In this work we have used Gaussian RBFs:
  i(x) = e
  r2=2 (2.2)
 2.3.1 Multiresolution Hierarchical Fitting
 Since we are targeting regular volumetric data, we can take advantage of the
 inherent structure of the data and  t the RBFs to a multi-resolution octree hierarchy,
 16
from low resolution to high. We start by  tting a single RBF to a 23 resolution
 downsampled version of the given dataset. This becomes the root of our RBF
 hierarchy. We then take a version of the dataset that has been downsampled to
 43, evaluate the initial (root) RBF at each of the 43 data points, and compute the
 di erence between the root RBF value and the downsampled version. The 43 block
 of residual data is then divided into eight blocks of 23 data values, and the entire
 process is repeated for each of these blocks. This is continued using versions of
 the data that have been downsampled to 83, 163, etc. (see  g. 2.2). Each time
 a new block is  t, the RBFs in that block and all of its ancestors are sampled at
 the locations of the data points in the full resolution data, and the current  tting
 error is computed. If the error is below a user-de ned threshold, that block is not
 subdivided any further. Otherwise, the blocks will continue to be subdivided until
 they reach the full original data resolution.
 This  tting algorithm only ever needs to access the full data in a sequential
 manner, during the error evaluation stage. All other operations are performed on
 one small,  xed-size block of data at a time, which will likely  t in a memory
 cache. Besides being memory-friendly, the overall  tting algorithm also scales in a
 O(n log n) manner.
 In this method we truncate each RBF at the boundary of its octree cell. Due
 to  tting errors, this would cause noticeable discontinuities at the borders between
 octree cells during rendering. To avoid this, we  t each RBF to not only the data
 within its 23 block, but also to data within a 1 data point border around that block
 (each RBF is thus  tted to 43 data values). The  tting area of each block therefore
 17
(a) (b) (c) (d)
 Figure 2.2: Overview of the  tting process. An RBF is  rst  t to low-resolution data
 (a), and the  tting error is evaluated at the full resolution (b). If the error is too
 high, additional RBFs are  t at a higher resolution (c), and their error evaluated (d).
 In the actual implementation the borders of each block would overlap neighboring
 blocks by one data point on each side, giving each block a resolution of 43 rather
 than 23 { this is omitted in this illustration for clarity.
 overlaps half of each face-adjacent block, one quarter of each edge-adjacent block,
 and one eighth of each corner-adjacent block (see  g. 2.3 for an example on a
 quadtree). During rendering, the RBF values are blended together based on the
 distance from the center of each RBF?s block (see section 2.4.3).
 2.3.2 Basic Algorithm
 In the simplest version of the algorithm the RBF widths  i are identical. In
 this case, the RBF radius r is simply:
 r = k
 x  i
  i
 k (2.3)
 where x is the position vector of a point in the volume,  i is the position vector of
 the ith RBF?s center, and  i de nes the width of the ith RBF.
 De ning the RBF approximation is then only a matter of choosing a constant
 term w0, RBF weights wi, RBF centers  i, and RBF widths  i. Some results of
 18
Figure 2.3: Blending between quadtree nodes. Only one edge-adjacent node (blue)
 and one corner-adjacent node (red) are shown. Sample points within the central
 block (grey) would be blended with samples from adjacent blocks whose shaded
 regions the points fell in.
 this simple algorithm are shown in the  rst row of table 2.2. Some possible choices
 for the initial value of the constant term are zero, the minimum data value, or the
 mean of the data. In our experiments we found that setting the initial value to zero
 often produced the best results { this was done for all results reported in this work.
 To compute the RBF centers and weights, the following simple algorithm can
 be used:
 1. Perform a linear search over the data to  nd the data point with the maximum
 approximation error. This will be the location of the new RBF?s center.
 2. Set the RBF?s weight to the error value at this cell. Since Gaussian RBFs
 have a value of 1 at their center, setting the weight in this way will cause the
 19
center data point to have zero error.
 3. Update error values stored at each data point within the RBF?s radius of
 in uence.
 4. Repeat this process until the desired error threshold is reached.
 2.3.3 Anisotropic Width Selection
 A natural extension to the basic algorithm is to allow the widths of the RBFs
 to vary anisotropically. This can be accomplished by making  in equation 2.1 a
 vector and allowing it to take on di erent values for di erent dimensions, dividing
 it componentwise into the x  i vector. General elliptical RBFs were used by Hong
 et al.[26], Jang et al.[27], and Neophytou et al.[36], although we restrict our RBFs
 to be axis-aligned, sacri cing some generality for a more compact representation.
 The e ect of using anisotropic widths is illustrated in table 2.1.
 The simplest way to select the RBF widths is by using a greedy algorithm. For
 each RBF, we  rst initialize the value of each  xi;  yi;  zi to some small number (we
 use 0.3 times the width of a grid cell). We then begin iterating repeatedly over the
 x; y; z dimensions. For each dimension, we increment the corresponding  i by some
 small number (we use 0.1 times the width of a grid cell), and check if the average
 error in the RBF?s area of e ect was made worse by this change. If it was, the change
 is reverted, and that dimension is excluded from further iterations. Otherwise, the
 change is kept, and the iterations continued. Intuitively, this algorithm can be
 viewed as blowing up the RBF like a balloon inside a closed box, with the sides of
 20
the box representing the widths at which further in ation along that axis begins
 increasing the error.
 Alternatively, a fast, non-iterative method of selecting the RBF widths is Maxi-
 mum Likelihood Estimation (MLE)[37]. In this method, the data to be  t is treated
 as a histogram of samples from a Gaussian probability density function with the
 previously computed mean. The width  of the probability density function that
 would give this set of samples the highest probability of being generated is then
 computed. This computation can be performed independently for each dimension,
 giving  xi;  yi;  zi.
 The equations for computing the MLE width for the x dimension are:
 U0 =
 P
 f
 Ux =
 P
 xf
 Uxx =
 P
 (x Ux=U0)f
  2xi = U0=2Uxx
 (2.4)
 where x is the x coordinate of the data point and f is the data value at that point.
 Other dimensions are computed similarly.
 Since the residual data being  t can take on negative values, we must somehow
 convert this into a positive-valued histogram for the MLE computation. We do this
 by treating negative data values as zero when the RBF weight is positive, and the
 opposite when the RBF weight is negative (except then also taking the absolute
 value of the data values). The rationale for this is that we would like the RBF?s
 region of in uence to be restricted to an area in which the data has the same sign
 as the RBF, since these are the areas in which the RBF will decrease the  tting
 21
error rather than increase it. Setting values of the histogram to zero outside this
 area encourages probability distribution widths that do not produce many samples
 outside this area, and thus result in RBFs that do not have much in uence outside
 this area.
 A comparison between MLE width selection, iterative anisotropic width selec-
 tion, and iterative isotropic width selection is given in table 2.1. The MLE method
 is clearly comparable in accuracy and yet is signi cantly faster to compute.
 2.3.4 Weight selection
 Once the RBF center and widths have been chosen, we compute the weights
 to minimize the sum of squared errors by forming the following system of linear
 equations, similar to [24, 15, 9]:
 2
 6
 6
 6
 6
 6
 6
 6
 6
 4
 1  1(x1)     m(x1)
 ...
 ...
 . . .
 ...
 1  1(xn)     m(xn)
 3
 7
 7
 7
 7
 7
 7
 7
 7
 5
 2
 6
 6
 6
 6
 6
 6
 6
 6
 6
 6
 6
 6
 6
 4
 w0
 w1
 ...
 wm
 3
 7
 7
 7
 7
 7
 7
 7
 7
 7
 7
 7
 7
 7
 5
 =
 2
 6
 6
 6
 6
 6
 6
 6
 6
 4
 f(x1)
 ...
 f(xn)
 3
 7
 7
 7
 7
 7
 7
 7
 7
 5
 (2.5)
 where m is the number of RBFs, n is the number of data cells, xj is the position of
 the jth data point in the volume, f(xj) is the data value at that point, w0 is the
 constant term, wi is the weight of the ith RBF, and  i is the ith RBF, with center
  i and widths  i.
 Ifm = n, then this system can be solved exactly. Typically we will havem < n,
 so in general we compute the best solution in the least-squares sense through the
 use of singular-value decomposition.
 22
Data Width Fitting Number RMS
 Data Set Size RBF Type Selection Time of RBFs Error
 UNC Head 2563 Isotropic Iterative 4.12 m 520 K 1.93%
 UNC Head 2563 Anisotropic Iterative 9.48 m 455 K 1.50%
 UNC Head 2563 Anisotropic MLE 2.25 m 485 K 1.75%
 VHF Torso 5123 Isotropic Iterative 29.9 m 3.48 M 1.68%
 VHF Torso 5123 Anisotropic Iterative 58.2 m 2.95 M 1.23%
 VHF Torso 5123 Anisotropic MLE 19.1 m 3.22 M 1.60%
 LLNL R-M 10243 Isotropic Iterative 139 m 15.9 M 2.22%
 LLNL R-M 10243 Anisotropic Iterative 315 m 15.4 M 1.64%
 LLNL R-M 10243 Anisotropic MLE 85.3 m 16.0 M 1.85%
 Table 2.1: Comparison of iterative  tting of isotropic RBFs, iterative  tting of
 anisotropic RBFs, and single-step  tting of anisotropic RBFs. See section 2.3.7 for
 a description of the data sets. The iterative  tting method can produce a somewhat
 better  t, but would take 3-4 times longer.
 2.3.5 Choice of the Basis Function
 Gaussian RBFs (as in equation 3.1) have seen much use in prior work [23, 26,
 27, 24, 36, 9], largely due to the smooth blending that occurs when these RBFs
 23
are brought in close proximity to each other. Since we truncate RBFs at octree
 cell boundaries and perform explicit blending between the cells, we do not bene t
 from this natural blending. However, there are still some desirable properties of
 this basis function. First, it decays to zero rather than going o to in nity, so if
 an octree cell is poorly  t by its RBF in some region, there is a limit to how much
 damage the bad RBF can do to the  nal, blended result. Second, the derivatives
 of Gaussian RBFs do not degenerate at higher orders, so these basis functions are
 useful if higher-order derivative information needs to be computed. Still, most of our
 work is not dependent on the particular choice of basis function, and investigation
 into other possible basis functions may prove useful.
 2.3.6 Binary Representation
 Once the RBF representation has been computed, we store it in the following
 format. The nodes of the octree from section 2.3.1 are stored in breadth- rst order.
 Each node contains exactly one RBF and has either eight or zero children.
 We  rst store a 4-byte integer giving the byte o set to the start of this node?s
 children. We then store the maximum and minimum data values contained within
 the node as two 2-byte integers. The maximum and minimum values are packed into
 2 bytes each by subtracting their minimum value, dividing by their range, and then
 multiplying by the maximum size of a 2-byte integer. We next store the constant
 term w0 and RBF weight w1, each as a 4-byte  oat. We then store the x; y; z
 components of the RBF mean, each as a 1-byte integer. The packing computation
 24
takes into account the node?s size and position in the octree. For each component
 we subtract the lowest value it could take on in that node, divide by the width of
 the node, and then multiply by the maximum size of a 1-byte integer. Finally, we
 store the  x;  y;  z width factors, each as a 4-byte  oat. This gives a total size of
 31 bytes per node, which we pad to 32 bytes so that it  ts evenly into two 16-byte
 texture look-ups. This packing is illustrated in  gure 2.4.
 At the start of the  le we store a header giving the number of levels in the
 octree and the minimum values and ranges of the node maximums and minimums,
 to allow the packing to be undone.
 Byte Offset
 to Children
 Max/Min
 w0
 w1
 Mean x,y,z
 Padding
 Width x,y,z
 0 32
 Figure 2.4: RBF and octree parameters packed into 32 bytes.
 2.3.7 Fitting Results
 We have tested our RBF  tting implementation on three di erent data sets.
 The  rst (UNC Head) is a CT scan of a human head from the University of North
 Carolina. This data set originally consisted of 113 slices, with each slice being a
 2562 array of 16-bit integers (although only 12 bits of precision are actually used).
 We linearly interpolated additional slices between the existing ones and duplicated
 the  nal slice several times to produce a data set of size 2563. In our tests we used
 this data set as well as a version downsampled to 1283.
 25
Data File Fitting Number RBF File RMS
 Data Set Size Size Time of RBFs Size Error
 UNC Head 1283 4 MB 0.28 m 91.4 K 2.8 MB 2.22%
 UNC Head 2563 32 MB 1.90 m 485 K 15 MB 1.75%
 VHF Torso 1283 4 MB 0.38 m 109 K 3.4 MB 2.14%
 VHF Torso 2563 32 MB 2.10 m 520 K 16 MB 2.00%
 VHF Torso 5123 256 MB 16.2 m 3.22 M 99 MB 1.60%
 LLNL R-M 1283 2 MB 0.22 m 62.2 K 1.9 MB 5.24%
 LLNL R-M 2563 16 MB 1.52 m 402 K 13 MB 4.37%
 LLNL R-M 5123 128 MB 11.3 m 2.58 M 79 MB 3.31%
 LLNL R-M 10243 1024 MB 85.3 m 16.0 M 489 MB 1.85%
 Table 2.2: Fitting results for several di erent data sets using anositropic MLE-based
 RBF width selection. For comparison, the VHF Torso and LLNL R-M data sets
 have been downsampled to several di erent resolutions prior to  tting.
 The second data set (VHF Torso) is a 5123 block of the torso area of the
 Female CT data set from the Visible Human Project[38], in the same format as
 the UNC Head data. In our tests we used both the full 5123 version and versions
 downsampled to 2563 and 1283.
 26
The third data set (LLNL R-M) is time step 100 from the Richtmyer-Meshkov
 Instability data set from LLNL[5], which consists of an array of 2048 2048 1920
 8-bit integers. We padded this to 20483 and then downsampled it to several di erent
 resolutions for testing.
 We performed our  ttings on a 64-bit Linux machine with dual Intel Xeon 3.0
 GHz CPUs and 8 GB of RAM. As our  tting program is single threaded, it can only
 make use of one CPU.
 Fitting results are given in table 2.2. For most of the data sets our  tting
 method achieves a reduction in  le size of about 50% with an RMS error of about 1%-
 2%. An exception to this was the lower resolutions of the LLNL R-M data set. We
 suspect the reason for this is that these low resolution versions contain sharp, large-
 magnitude discontinuities which are not amenable to  tting with smooth Gaussian
 RBFs.
 It is di cult to compare the e ciency of our  tting results with previous work
 due to a lack of prior reported  tting times. One exception to this is Hong et al. [26]
 who report  tting at the rate of 1300 grid points per minute. Although our results
 are several orders of magnitude faster, this is not a fair comparison since they are
  tting unstructured data while we  t structured.
 2.4 Rendering
 We use ray-casting to render implicit RBF-encoded volumes. Data values at
 points in the volume are computed directly from the RBF representation without
 27
reconstructing the full data set in memory. Specifcally, the scalar value at a point
 is computed by summing up the values of all RBFs that overlap that point. The
 gradient vectors can be computed in a similar fashion, by summing the gradients of
 all RBFs that overlap the sample point (this works because the gradient operator
 distributes across summations). This sampling method can then be used to imple-
 ment any rendering scheme that works by sampling data values and gradients at
 various points, such as direct volume rendering or point-based isosurface rendering.
 2.4.1 GPU Direct Volume Rendering
 We have implemented a GPU direct volume rendering algorithm using NVIDIA?s
 new CUDA GPU programming system. Our implementation supports piecewise-
 linear transfer functions, gradient-based lighting, early ray termination based on ac-
 cumulated opacity, and empty-space skipping based on checking whether the range
 of values contained within an octree cell overlaps the range which, given the transfer
 function, would produce any contribution to cumulative color or opacity. We also
 perform view-dependent level-of-detail by adjusting the ray step size and maximum
 octree depth based on the current distance along the ray.
 Given an eye position and viewing direction, we construct ray origin and di-
 rection vectors for a perspective projection on the GPU and store them in GPU
 memory. We then execute a GPU kernel that traces each ray in parallel and writes
 the  nal resulting colors to GPU memory.
 The kernel  rst sets the initial sample point to the ray?s intersection with
 28
the data volume?s bounding cube, and then begins stepping along the ray. At each
 sample point the kernel executes a space skipping routine that descends through the
 octree cells containing that point. For each cell, it makes a single 16-byte texture
 look-up and unpacks some parameters as described in section 2.3.6. It then checks
 whether the cell?s range of data values would produce any contribution to the  nal
 color or opacity. If not, the kernel computes the ray?s intersection with that octree
 cell and moves the sample point just beyond the intersection point. This is repeated
 until the skipping routine encounters a sample point at which it decends all the way
 through the octree.
 The kernel then evaluates the scalar  eld and its gradient at the current sample
 point. This simply requires traversing the octree from the root to a leaf, evaluating
 the value and gradient of the RBF stored in each cell at the location of the sample
 point and summing up these values. We make use of the e cient location-code-
 based octree traversal methods of Frisken and Perry[39]. For each octree cell the
 kernel makes two 16-byte texture look-ups and unpacks some of the parameters
 as described in section 2.3.6. Finally, we sample color and opacity values from
 the transfer function using linear interpolation and apply di use, gradient-based
 lighting.
 2.4.2 View-Dependent Level of Detail
 Our multi-resolution RBF hierarchy allows us to implement view-dependent
 level-of-detail rendering by reducing the depth in the octree that we visit based
 29
Figure 2.5: Comparison of varying amounts of view-dependent Level of Detail re-
 duction (see section 2.4.2) for the 10243 LLNL R-M data set. The top row shows
 the renderings with the LOD reduction applied. The bottom row shows which LOD
 is being used in which area of the volume by color coding| from front to back, the
 colors represent levels of detail generated from 10, 9, 8, and 7 levels of the octree.
 From left to right, the rendering times are 0.59s, 0.53s, 0.48s, and 0.44s.
 on the distance along the ray. Every time a certain amount of distance has been
 traversed, we reduce the maximum depth in the octree that we can visit by one
 level. To avoid level-of-detail transition discontinuities one can store the sample
 value computed at one level above the current level during the octree traversal, and
 blend between this sample and the  nal sample based on the distance from the
 previous level-of-detail transition. We experimented with this blending, but found
 that in practice the level-of-detail transitions tended to occur far enough away from
 the viewer that the discontinuities were not perceptible, even without blending. We
 did not use level-of-detail blending for the images and results in this chapter.
 30
In addition to decreasing the octree depth based on the distance along the ray,
 we also double the ray step size every time we decrease the maximum depth by one
 level. We account for this during the incremental updating of cumulative color and
 opacity by keeping track of the integer ratio of the current step size to the original
 step size and iterating our incremental updates this many times at every step, using
 the same sample color and opacity values.
 Screen shots of renderings using various amounts of view-dependent level-of-
 detail reduction and rendering rates are given in  g. 2.5.
 2.4.3 Blending
 Truncating the RBFs at octree cell boundaries can lead to visually noticeable
 discontinuities. To resolve this, we take into account data in adjacent cells when
  tting each cell?s RBF (see section 2.3.1). Then, for each sample point, in addition
 to computing a sample value from the RBFs in that point?s octree cells, we also
 compute sample values at that position from the RBFs in adjacent octree cells.
 We implement this by simply altering the location code of the sample point to the
 location code of a point in each adjacent cell.
 Since each cell?s overlap region extends halfway into each adjacent cell, at each
 sample point there is the possibility of blending with up to three face-adjacent cells,
 three edge-adjacent cells, and one corner-adjacent cell. The contribution of each
 of the adjacent cells is linearly weighted based on the distance from the sample
 point to the corresponding face, edge, or corner (see  g. 2.3 for an example on
 31
a quadtree). This blending scheme is similar to the one used in Ohtake et al.?s
 multi-level partition of unity implicits[40].
 Figure 2.6: Comparison of (left to right) no blending, four-sample blending, and
 eight-sample blending (see section 2.4.3) for the 2563 UNC Head and 10243 LLNL
 R-M data sets. Rendering times are given in table 2.3.
 While completely smooth blending requires blending across faces, edges, and
 corners, many of the discontinuities can be resolved by simply blending across faces.
 This reduces the number of samples required per point from eight (1 local + 3 face
 + 3 edge + 1 corner) to four, reducing rendering time by about a factor of two.
 A comparison between using no blending, four-sample blending, and eight-sample
 blending is given in  g. 2.6 and table 2.3.
 32
Rendering
 Data Set Blending Type Time
 LLNL R-M One-Sample 0.37 s
 LLNL R-M Four-Sample 0.72 s
 LLNL R-M Eight-Sample 1.11 s
 UNC Head One-Sample 0.10 s
 UNC Head Four-Sample 0.21 s
 UNC Head Eight-Sample 0.33 s
 Table 2.3: Comparison of no blending, four-sample blending, and eight-sample
 blending (see section 2.4.3) for the 2563 UNC Head data set and the 10243 LLNL
 R-M data set. Screen shots are given in  g. 2.6
 2.4.4 Rendering Results
 Unless stated otherwise, all renderings were done with MLE- tted data (see
 section 2.3.3), four-sample blending (see section 2.4.3), and an image resolution of
 5122. Renderings were performed using a beta version of NVIDIA?s CUDA drivers
 on a GeForce 8800 GTX with 768 MB of RAM. Rendering times for several di erent
 data sets and transfer functions are given in table 2.4, and screen shots are given in
  g. 2.7.
 We found that the ray step size required to avoid artifacts was dependent on
 33
Figure 2.7: Rendering results using three di erent transfer functions for the 5123
 VHF Torso and the 2563 UNC Head. Rendering times and other details are given
 in table 2.4.
 the size of the smallest octree nodes in the  tting, which is related to the amount
 of high frequencies in the  tted data. We therefore needed to use smaller steps
 for higher resolution data. For a given data set and transfer function, most of the
 di erences in rendering rate between original data resolutions are accounted for by
 the di erence in step size.
 2.5 Future Work
 We plan to use this work on RBF  tting in the context of  tting scalar  elds
 representing stem cells and nano  ber substrates. Our hope is that this will aid us
 in intelligently selecting an isosurface of this data set for use in later steps of our
 34
proposed algorithm.
 A great deal of work has been done in the area of RBF  tting, for example
 in the machine-learning community. It seems likely that some of that work could
 further bene t procedural encoding through RBFs. Possible improvements include
 adding the ability to  t to a given error bound and removing the need to perform
 explicit blending between octree cells. It also might prove bene cial to investigate
 basis functions other than the Gaussian RBFs.
 Another area of future work might be to investigate 4D  tting of time-varying
 data sets. Since these data sets often contain a large amount of temporal coherence,
  tting multiple time steps simultaneously could result in a signi cant reduction in
 the number of basis functions required to represent the data set.
 Finally, there are likely many possible uses of the higher-level information pro-
 vided by the RBF representation that have not yet been explored, perhaps involving
 the use of high order derivatives that would have been too expensive to compute
 when using traditional volume rendering techniques.
 35
Transfer Data Rendering
 Data Set Function Size Step Size Time
 UNC Head Skin 2563 384 1 0.45 s
 UNC Head Skin/Bones 2563 384 1 0.67 s
 UNC Head Bones 2563 384 1 0.21 s
 VHF Torso Skin 2563 384 1 0.32 s
 VHF Torso Skin/Bones 2563 384 1 1.09 s
 VHF Torso Bones 2563 384 1 0.30 s
 VHF Torso Skin 5123 768 1 0.54 s
 VHF Torso Skin/Bones 5123 768 1 2.15 s
 VHF Torso Bones 5123 768 1 0.53 s
 LLNL R-M Isosurface 2563 384 1 0.26 s
 LLNL R-M Isosurface 5123 768 1 0.40 s
 LLNL R-M Isosurface 10243 1536 1 0.72 s
 Table 2.4: Rendering results for several di erent transfer functions and data sets
 of several di erent sizes. All renderings were done using four-sample blending (see
 section 2.4.3). Step Size is the length of a step along the ray given that the data
 volume is a unit cube. Images of several of these data sets are given in  g. 2.7.
 36
Chapter 3
 Parallel Stochastic Surface Measurement
 3.1 Introduction
 In this chapter we discuss a parallel, stochastic algorithm to compute the
 surface area of a volumetric model of a molecule. The parallel nature of the algorithm
 makes it suitable for implementation on the GPU and makes it a good candidate for
 situations where high-throughput is required. The stochastic nature of the algorithm
 makes it progressive, allowing it to return a rough answer almost immediately and
 re ne the answer over time to the desired level of accuracy.
 We next describe the application of this algorithm to protein molecules mod-
 elled by Gaussian Radial Basis Functions, but the algorithm is general enough that
 it could be applied to any data representation that allows the computation of the
 intersections of a line with the object being measured. An application of some of
 the techniques used in this chapter to the problem of stem cell classi cation is given
 in chapter 5. The material in this chapter has been previously published [3].
 Computation of molecular surface area is important in the grand challenge
 problems of molecular docking and protein folding as it allows one to incorporate
 the e ects of solvent in the potential energy calculations. Recent work on interac-
 tive manipulation [41] and visualization of large-scale proteins [42] shows us how
 interactive visualization o ers a powerful front end for computational steering of
 37
calculations. In such settings, rapid calculation of protein conformations becomes
 especially important and fast solvent-solute interactions are an essential  rst step.
 In this chapter we address the mapping of molecular surface area calculations on
 the emerging multi-core architectures for potential use in interleaved computation
 and visualization of large bio-macromolecular complexes.
 To serve this need for molecular surface area computation, a wide variety of
 algorithms and programs have been developed| a few examples are the early works
 by Connolly [43] [44], MSMS [45] by Sanner et al., GETAREA [46] by Fraczkiewicz
 and Braun, LSMS [47] by Can et al., 3V [48] by Voss, and an adaptive grid-based
 algorithm [49] included in TexMol [50] by Bajaj et al.. These algorithms have been
 designed to work well on traditional, single-processor computer architectures using
 a serial programming model.
 However, computer architectures are now facing the  rst major disruptive
 challenge in over two decades in the form of pervasive parallelism. For example,
 AMD and Intel have already changed their product lines to include dual-core and
 quad-core processors. According to the Intel road map, they plan to have hundreds
 of cores on a single chip becoming a reality over the next decade. The Cell processor
 has 8 stream processing cores in addition to a conventional scalar processor. GPUs
 have been at the forefront of the multi-core revolution in that they are already
 shipping with hundreds of cores. NVIDIA?s G80 has 128 multiprocessors. Intel has
 recently disclosed their plans for a GPU consisting of 24-32 cores each involving a
 16-wide SIMD vector processor with over 2 TFLOPs of performance. In addition,
 GPUs and CPUs are being merged thereby blurring the distinction between cores
 38
that specialize for graphics and cores that are more general-purpose. Both AMD
 and Intel are working on fused CPU-GPU cores; this will enable tight coupling
 between applications and graphics. Because of the large computer games market,
 these highly-parallel GPUs are being mass produced and are available for commodity
 prices. While the use of this hardware for scienti c computation originally required
 some unpleasant hacks, recent development environments such as NVIDIA?s CUDA
 (Compute Uni ed Device Architecture) [51] and ATI?s CTM (Close To Metal) [52]
 make the use of this hardware much more elegant. Bringing GPU computing further
 into the mainstream is NVIDIA?s Tesla product line, a GPU designed speci cally
 for general-purpose computation.
 Unfortunately, algorithms and programs designed for a single-processor archi-
 tecture are often not able to directly take advantage of these new parallel processors.
 Algorithms designed for serial computation can sometimes be parallelized, but this
 can be a non-trivial task.
 To take advantage of this new trend in high-performance computer architec-
 ture, we present a parallel algorithm, implemented for both CPU and GPU, to
 e ciently compute molecular surface area. In addition to its parallel nature, the
 algorithm is also progressive, providing a rough estimate of surface area very quickly
 and re ning the estimate over time until the desired accuracy is reached. Finally,
 the algorithm generates points on the molecular surface, which can be used to create
 point-based renderings of the molecule.
 39
3.2 Related Work
 3.2.1 Molecular Surface Area Computation
 Molecular surface areas have been computed through several di erent methods.
 The program MSMS [45] by Sanner constructs the Solvent Accessible Surface (SAS)
 [53] and Solvent Excluded Surface (SES) [54] by considering the intersections of
 spheres representing Van der Waals radii of atoms of the molecule, and using this
 information to compute a set of patches which make up the surface. The reduced
 surfaces it computes correspond to alpha shapes [55]. The program GETAREA
 [46] by Fraczkiewicz and Braun also calculates surface area by computing surface
 patches based on sphere intersections, making use of some additional ideas from
 computational geometry. A di erent type of approach was used by Wodak and
 Janin[56], who give a fast method to estimate molecular surface area using only
 distances between pairs of atoms.
 Additionally, any program that computes triangulations of molecular surfaces,
 such as SURF [57] by Varshney et al., can be easily converted to give an estimate of
 molecular surface area by adding up the areas of all the generated triangles. SURF
 is designed to take advantage of data-parallelism at the granularity of individual
 atoms, but cannot scale to take advantage of an unlimited degree of parallelism
 as our algorithm can. The SURF algorithm is also restricted to molecules de ned
 as a collection of discrete atoms, while our algorithm can be applied to molecular
 surfaces de ned in virtually any manner.
 More recently, the program LSMS [47] by Can et al. discretizes atomic Van der
 40
Waals spheres onto a regular grid, and then uses the level-set method to propagate
 fronts to compute the SAS and SES; it can also compute the Solvent Excluded
 Volume (SEV). The program 3V [48] by Voss also discretizes the molecule onto a
 regular grid and computes area and volume, but does not use the e cient level-set
 algorithm of LSMS.
 Another recent algorithm by Bajaj and Siddavanahalli [49] can compute sev-
 eral di erent molecular surfaces. Their work models atoms using signed distance
  elds, which are similar to the radial basis functions used in our work. However,
 our algorithms are very di erent| Bajaj and Siddavanahalli?s algorithm builds up
 molecular surfaces incrementally on a grid by adding atoms one at a time, while our
 algorithm measures surface area using parallel, stochastic sampling.
 3.2.2 General Purpose GPU Computing
 Although Graphics Processing Units (GPUs) were originally specialized hard-
 ware suitable only for 3D graphics computations, modern GPUs have evolved into
 general-purpose high-performance parallel processors. NVIDIA?s G80 product line,
 for example, features 128 programmable processor cores and advertises a maximum
 performance of 300 giga ops. These processors are programmed in an SPMD (Single
 Program, Multiple Datastream) fashion; all processors execute the same program,
 but are allowed to take di erent branches at conditional statements at the cost of
 a performance penalty. The high peak performance of GPUs relative to CPUs is
 largely due to the fact that GPUs devote a larger proportion of their transistors
 41
to arithmetic computation instead of tasks such as memory caching. Because of
 this architecture, GPUs perform best with algorithms that do a large amount of
 computation relative to their number of memory accesses; this type of algorithm is
 referred to as having high arithmetic intensity.
 Modern GPUs have large amounts of on-card memory;  rst generation Tesla
 cards, for example, will have 1.5 GB of RAM. Historically, each processor on a GPU
 was only able to write its output to a single location in memory, corresponding to
 the pixel whose value that processor was computing. Modern GPUs have overcome
 this limitation and allow full read and write access to any location in memory from
 any processor. Additionally, the processors have access to a small pool of very fast
 shared memory which is suitable for communication between processors within the
 inner loop of an algorithm.
 In the past, writing a general-purpose program for a GPU meant casting the
 algorithm in terms of graphics operations, such as texture look-ups and RGB color
 vector manipulations. With the recent advent of development environments such
 as NVIDIA?s CUDA and ATI?s CTM, however, general-purpose algorithms can be
 written in much more natural terms. CUDA, for example, is basically equivalent to
 the C language with a few extensions to facilitate the launching of parallel compu-
 tation kernels.
 Even though GPU algorithms can now be written in development environ-
 ments similar to those used for CPU algorithms, developing an algorithm for a
 highly parallel architecture such as a GPU requires a di erent approach than de-
 veloping for current CPUs. For an algorithm to run e ciently on a GPU, it must
 42
be divided into a large number (at least on the order of hundreds) of independent
 tasks which can be executed simultaneously. We note that our algorithm is ideal
 for this, since it is based on taking a large number of independent random samples.
 Care must also be taken to reduce main memory access as much as possible, and to
 take advantage of the available fast shared memory.
 3.3 Gaussian Molecular Modelling
 We calculate the surface area of a protein which is represented as the level set
 of a sum of Gaussian Radial Basis Functions (RBFs), with one RBF being placed
 at the location of each atom?s center. This implicit molecular surface representation
 has been used as far back as Blinn?s 1982 work [58], as well as in many more recent
 works such as Grant and Pickup[59], Ritchie[60], and Bajaj and Siddavanahalli [49].
 Symbolically, each Gaussian RBF  (x) can be represented as
  i(x) = wi e
  k
 x  i
  i
 k2 (3.1)
 where wi is the weight of the ith RBF,  i is the location of the ith RBF?s center,
 and  i controls the width of the ith RBF.
 The program reads the same XYZR  le format used by MSMS [45], which
 can be generated from PDB [61]  les by the pdb to xyzr utility that comes with
 MSMS. Note that multiple atoms may be combined into a single entry in the PDB
  le (merging with hydrogens, for example), in which case the number of RBFs will
 be di erent than the number of atoms in the protein.
 For the results reported in this work, we set the  i to the RBF centers in
 43
the XYZR  le. RBF weights and widths are set based on the constants given
 in Ritchie[60] and Grant and Pickup[59], which are designed to model the Van
 der Waals surface of a molecule. Speci cally, we set wi = 2:70 for all RBFs and
  i = ri=
 p
 2:3442, where ri is the RBF radius from the XYZR  le. We form the
 overall scalar  eld by summing together all RBFs, and treat the surface as being at
 an isovalue of 0:259.
 To accelerate sampling of the scalar  eld, we insert the RBFs into a bucketing
 spatial data structure. We partition space into a regular grid, and store a pointer
 to a list of the RBFs that overlap each grid cell at the corresponding element of a
 three-dimensional array. The Gaussian RBFs are truncated to zero at a radius of
 3 .
 3.4 Stochastic Area Measurement
 To measure molecular surface area we make use of the Cauchy-Crofton formula
 (equation 3.2) from integral geometry, which relates the area of a surface to the
 number of intersections with the surface of a set of lines. This formula can be
 written as
 Z
 m dL =  s (3.2)
 where s is the surface area, m represents the number of intersections along a given
 line, and the integration is taken over the space of all possible lines.
 A numeric approximation to this integral can be made by taking a random
 sample of lines and counting their intersections with the surface. Approximating the
 44
integral in this manner gives n1N  c s1, where n1 is the count of intersections, N is
 the number of sampled lines, s1 is the surface area, and c is an unknown constant of
 proportionality. To get rid of c, we can intersect the same set of lines with a second
 surface, giving n2N  c s2. Combining these equations gives
 s1  
n1
 n2
 s2 (3.3)
 If the area of the second surface s2 is known, we can then calculate the molecular
 surface area s1. This derivation is given in more detail in Li et al.[62], and further
 applications are discussed in Liu et al.[63].
 3.4.1 Sampling the Space of Lines
 Several methods for generating lines from randomly chosen parameters are
 given in Li et al.[62]. We use a method called the Chord Model, which consists of
 picking two random points from a uniform distribution of points on the surface of
 a sphere and then taking the line that passes through them. Uniformly distributed
 points (x; y; z) on a sphere can be generated from pairs (u;  ) of uniformly distributed
 random numbers by using the formula
 (x; y; z) = ((1 u2)
 1
 2 cos  ; (1 u2)
 1
 2 sin  ; u) (3.4)
 where u is in [ 1; 1] and  is in [0; 2 ). Further discussion of generating uniformly
 distributed points on spheres is given on the Mathworld web site [64].
 To generate random lines with this method, we must generate uniformly dis-
 tributed random numbers. This would typically be done using a pseudo-random
 45
sequence; however, better results can be obtained by using a quasi-random sequence
 (also called low-discrepancy sequences). These sequences have less clustering of val-
 ues than pseudo-random sequences, which results in a more representative sampling
 of lines and actually provides an asymptotically lower error bound for the numeric
 integration [62].
 In our implementation we used the Niederreiter quasi-random sequence [1],
 which can be found in the GNU Scienti c Library [65]. We generate 4D quasi-
 random points (a; b; c; d), and use the  rst and second coordinate pairs (a; b) and
 (c; d) to generate the (u1;  1) and (u2;  2) for equation 5.1.
 Comparisons between 2D points generated from a pseudo-random, a quasi-
 random, and a regular grid distribution are given in  gure 3.1. Note that the
 pseudo-random distribution has more clusters and bare regions than the quasi-
 random distribution. The regular grid distribution also avoids clustering, but is
 so regular that its use could cause aliasing artifacts. Further analysis of sampling
 points using quasi-random distributions can be found in Rovira et al. [66].
 Figure 3.1: 2D distributions of points generated on a regular grid (left), from the
 Niederreiter quasi-random sequence [1] (middle), and from a pseudo-random se-
 quence (right).
 46
3.4.2 Intersection Counting
 For our line intersection algorithm, we start by enclosing the RBFs represent-
 ing the atoms in the tightest bounding sphere centered at the center of the molecule.
 One optimization could be to instead use the tightest bounding sphere, computed
 by a method such as in Gartner [67].
 We then generate a sequence of quasi-random lines using pairs of points on
 the surface of the bounding sphere as described above. For each line, we step in
 uniform increments from one point to the other, evaluating the scalar  eld at each
 step to determine whether the current point lies in the interior or exterior of the
 surface. The optimal step size is a function of the typical atomic radii and packing
 densities. In this work we have used a step size of 0:25 Angstroms, which we have
 experimentally determined to be a reasonable value.
 To evaluate the scalar  eld at a point, we iterate over all RBFs that overlap
 that point?s bucket, adding their values to a running total until either all RBFs have
 been processed or the current total exceeds the surface?s isovalue. If a point is found
 to be in the interior of the surface and the previous point was in the exterior (or
 vice versa), a running count of surface intersections is incremented.
 Once the number of intersections of the lines with the isosurface has been
 computed, equation 3.3 can be used to estimate the surface area of the molecule (the
 area of the bounding sphere can be easily computed analytically, and the number
 of intersections with the bounding sphere is simply two times the number of lines
 intersected). The approximation improves as more lines are intersected.
 47
3.5 Parallelization for GPU
 Because the sampling along each line is completely independent from all other
 lines, this algorithm is a natural  t for a highly parallel architecture such as a GPU.
 In fact, our algorithm is able to linearly scale to take advantage of an unlimited
 amount of parallelism, since each additional available processor can be assigned to
 compute the intersections of another random line, increasing the speed at which the
 result converges.
 We have implemented a version of the algorithm in NVIDIA?s CUDA lan-
 guage that runs on a GPU, and compared its performance to the CPU version. In
 our GPU implementation the 4D quasi-random points that de ne the sample lines
 are generated on the CPU and then sent to GPU memory. After the per-line in-
 tersection counts are computed in parallel on the GPU, this data is sent back to
 CPU memory where the per-line counts are aggregated into an overall total. This
 process could potentially be optimized by computing the quasi-random points and
 performing the summation of the per-line counts on the GPU, which would not only
 take further advantage of the GPU?s parallel processing capabilities but also avoid
 time-consuming data transfers to and from the GPU.
 3.6 Test Results
 Areas can be calculated for several di erent types of molecular surfaces. The
 Van der Waals surface is formed by a union of spheres located at the centers of
 the molecule?s atoms, with radii equal to the atoms? Van der Waals radii. The
 48
Solvent Accessible Surface (SAS) [53] is de ned as the surface traced by the center
 of a probe sphere (representing a solvent molecule) as it is rolled along the Van der
 Waals surface. Finally, the Solvent Excluded Surface (SES) [54] is the boundary
 of the area that no part of such a probe sphere may penetrate. Programs can also
 calculate either the area of the outermost shell of the surface only, or include the
 area of any interior cavities as well.
 For our tests, we set the parameters of our Gaussian RBF implicit surface rep-
 resentation to approximate the SES formed with a probe radius of 1.4 Angstroms.
 Our algorithm calculates the surface area of the outer surface as well as the inte-
 rior cavities. We compare our results against several other programs that compute
 molecular surface area| MSMS [45], LSMS [47], and SURF [57].
 Figure 3.2 illustrates how our algorithm converges on an estimate of the SES
 for the several proteins as increasingly more lines are intersected with the surface. In
 the remainder of the tests we set the number of intersected lines to 20,000, which we
 have found gives quick estimates with reasonable accuracy. The number of sample
 lines could be set higher or lower based on the speed and accuracy requirements of
 a particular application.
 To evaluate the accuracy of our surface area computations, we would like to
 have some ground truth to compare our results against. Both MSMS and SURF
 compute the SES analytically, and their reported surface areas usually agree very
 closely. Therefore, we take the true SES area to be the average of the SES areas
 reported by MSMS and SURF, and measure algorithm accuracy as a percent dif-
 ference from this average value. The di erences between the area we report and
 49
Figure 3.2: Surface area approximation errors for several proteins using various
 numbers of intersected lines. Vertical axis is the percent error from the  nal value
 (the area returned with 106 sample lines). Note that all proteins have converged to
 near their  nal area by 20,000 lines.
 50
this average area come from two main sources| our algorithm not having yet fully
 converged on the area of our surface, and the fact that the surface whose area we
 are converging on is not quite the same as the SES surface that we are comparing
 ourselves against.
 As can be seen from table 3.6, our di erences are comparable to the di erences
 of LSMS when using a  ne 2563 grid, while our GPU running time is signi cantly
 faster than LSMS using a 2563 grid, and is often even faster than LSMS using a
 coarse 1283 grid. We observe that our running time depends mostly on the molecule
 size, while for LSMS the running time depends mostly on the grid resolution. Our
 GPU implementation is also faster than MSMS and SURF, especially for larger
 molecules. A graph of running times of the various algorithms is given in  gure 3.3.
 Note that for similar error levels our GPU algorithm is 3x { 10x faster than existing
 CPU algorithms.
 3.7 Discussion and Future Work
 Because our algorithm generates points on the molecular surface, it can easily
 be used to create point-based renderings [68] of the molecule. Surface normals
 for lighting calculations can also be easily generated by analytically computing the
 gradient of the implicit function at each surface point. Because of their light weight
 and simplicity, points are a good primitive for the representation of large models.
 Some point-based molecular renderings generated from our implementation using
 di erent numbers of points are shown in  gure 3.4.
 51
Figure 3.3: Surface area computation times and percent di erences for proteins of
 di erent sizes using several di erent algorithms. Data is from table 3.6. Protein size
 is given in RBFs, which is equal to the number of atoms listed in the PDB  le. For
 MSMS and SURF, data is not available for the larger proteins since these programs
 were unable to compute the area for proteins of that size. Percent di erences are
 given as absolute magnitudes.
 One nice feature of our algorithm that we have not explored is its progressive
 nature. As the algorithm runs, a rough approximation of the surface area is returned
 almost immediately, while increasingly accurate approximations are obtained as
 more and more line intersections are computed. This feature could be used to
 52
Protein 1GCQ 2PTN 1PPE 8TLN 2CHA 1HIA 1N2C 1PMA 1FFK 1HTO
 Size 1,678 1,712 1,991 2,621 3,542 4,584 24,237 56,392 64,268 90,672
 Our Area ( A2) 8,806.06 8,258.64 8,805.47 11,256.8 16,761.7 21,157.2 79,010.5 192,382 461,631 335,624
 Area Di 0.95 % 3.39 % 0.48 % 1.12 % 9.78 % 3.51 % 17.92 % | | |
 CPU Time (s) 1.63 3.13 2.53 2.23 2.52 2.90 2.58 5.11 7.56 5.42
 GPU Time (s) 0.24 0.48 0.38 0.33 0.38 0.52 0.60 2.22 3.49 1.71
 LSMS 1283 Area ( A2) 8,225.56 8,437.62 9,155.30 10,926.5 16,919.5 19,748.0 74,059.3 184,464 422,295 301,674
 Area Di 5.71 % 5.63 % 4.47 % 4.02 % 8.93 % 9.94 % 23.06 % | | |
 Time (s) 0.76 0.93 0.98 0.87 0.85 0.88 0.75 0.89 1.00 1.11
 LSMS 2563 Area ( A2) 8,272.37 8,466.53 9,148.43 11,202.1 18,171.1 21,078.2 84,250.1 203,324 472,889 352,981
 Area Di 5.17 % 5.99 % 4.39 % 1.60 % 2.20 % 3.87 % 12.48 % | | |
 Time (s) 6.66 8.01 8.20 6.98 6.94 6.90 5.72 6.87 6.38 6.61
 MSMS Area ( A2) 8,724.65 8,039.77 8,807.21 11,364.4 18,538.6 21,944.8 97,129.4 | | |
 Time (s) 0.83 0.81 0.99 1.36 1.65 2.40 14.51 | | |
 SURF Area ( A2) 8,722.10 7,935.88 8,719.81 11,404.0 18,619.9 21,909.9 95,388.7 | | |
 Time (s) 0.66 0.64 0.80 1.03 1.45 1.83 10.55 | | |
 Table 3.1: Comparison of the surface areas and running times of our method, LSMS
 with a 1283 grid, LSMS with a 2563 grid, MSMS, and SURF. Protein size is given
 in RBFs, which is equal to the number of atoms listed in the PDB  le. For our
 method, we used 20,000 sample lines. All methods computed the SES area (or an
 approximation to it) of all disconnected surface components using a probe sphere
 radius of 1.4  A. Tests were performed on a machine with a GeForce 8800 GTX GPU,
 an Intel Xeon 3.0 GHz CPU, and 4 GB of RAM. A ?|? means that MSMS or SURF
 was unable to compute a surface for this molecule.
 tune the speed versus accuracy of the algorithm for di erent applications, or to
 provide a rough estimate to decide whether or not more exact calculations are worth
 performing.
 One possible area of future work might be to extend the program to compute
 other geometric properties of molecules, such as volume or mean curvature, as dis-
 53
Figure 3.4: Point-based renderings of the protein with Protein Data Bank ID 1HTO
 using 104 points (left), 105 points (middle), and 106 points (right).
 cussed in Schr oder [69]. Molecular volume computation in particular would likely
 be an easy and useful extension.
 One  nal issue worth mentioning is the treatment of hollow cavities within
 the interior of a molecule. Depending on the application, it may or may not be
 desirable to include interior cavity surface area in the overall surface area reported.
 A discussion of these interior cavities can be found in Liang et al. [70]. Our algorithm
 includes the surface area of these cavities in the  nal  gures reported, as does SURF.
 MSMS and LSMS give an option to either include these areas or not. All surface
 areas and running times reported in this work are for the outer surface plus all
 cavities. If only the outer surface area is required, MSMS can compute this several
 times faster than it can compute the area of the outer surface plus all cavities.
 54
Chapter 4
 Con uent Visualization
 In this chapter we discuss the visualization of the interaction between a pair of
 data sets. For example, the phenomenon of stem cell di erentiation being induced
 by surface geometry involves a data set of the cell itself and a data set of the surface
 on which it is resting. Biologists may wish to visualize the area of contact between
 the cell and the surface to learn about how the surface induces di erentiation. The
 idea of choosing an area of a data set to visualize based on another data set is called
 Con uent Visualization.
 In the following material in this chapter we discuss Con uent Visualization in
 the context of gyrokinetic plasma turbulence simulations. However, the idea should
 be applicable to any pair of volumetric data sets whose interaction needs to be
 visualized. The material in this chapter was previously published [4].
 Gyrokinetic simulations of tokamak turbulence are widely used to interpret
 experimental data. The level of realism in modern gyrokinetic codes is very high { so
 high, in fact, that it is nearly as di cult to understand and interpret the simulation
 results as it is to work directly with experimental data. One of the main problems
 is that the datasets, such as those shown here, are typically of dimension greater
 than three. Also, many gyrokinetic simulations take place in a  ux-tube following
 coordinates whose geometry is fairly non-trivial and whose size with respect to
 55
the enveloping tokamak device is small. Moreover, the very nature of turbulence
 implies the existence of structures at various scales, which turns coherent graphical
 representation of data into a challenging problem.
 A gyrokinetic simulation describes the time evolution of the  ve-dimensional
 probability distribution function(s) for a few plasma species (typically 1-5). Three
 of these dimensions are spatial and the other two correspond to velocity space vari-
 ables. Visualizing such a dataset directly is clearly impossible. A traditional diag-
 nostic quantity is the electric potential  whose  uctuations capture the patterns of
 turbulence present in the system.
 One of the most compelling techniques for visualizing three dimensional scalar
  elds is direct volume rendering. It gives a global qualitative view of the entire
 dataset, providing a fast insight into spatial patterns and correlations. However,
 direct volume rendering works best for scalar  elds with high degree of regularity,
 such that occlusion and cluttering can be eliminated by  ltering easily identi able
 value ranges. This is not the case with the electric potential, or other turbulent
 scalar  elds that arise in this context, which exhibit  uctuations on a variety of
 scales.
 To deal with this problem we focus on derived diagnostic quantities, which
 are both physically meaningful and suitable for direct volume rendering. One such
 diagnostic is the heat  ux, which is calculated as a velocity space integral:
 Q =
 Z
 fE
 @h i
 @y
 dv
 where f is the gyrokinetic probability distribution function, E is energy, and
 56
h i denotes the gyroaveraged electric potential. The values of Q are used to identify
 regions where the heat  ux satis es a certain condition, for instance, speci ed by a
 given range. Within each region of interest we map the values of a turbulent  eld,
 such as the electric potential. Thus information from two  elds with distinct spatial
 characteristics is convolved into a single coherent visual representation.
 The data presented here are taken from an electrostatic, gyrokinetic simulation
 of the Cyclone [71] benchmark case for ion temperature gradient-driven turbulence
 in a tokamak. The spatial simulation domain is (nx; ny; nz) = (96; 96; 64), where
 nx and ny are the number of grid points in the plane locally perpendicular to the
 background magnetic  eld, and nz is the number of grid points along the magnetic
  eld. There are 768 velocity-space grid points at every spatial grid, for a total
 meshpoint count of about half a billion. As is standard in the Cyclone benchmark,
 there is one gyrokinetic ion species, and the electrons are taken to have a Boltzmann
 response. The code used to generate this data is the GS2 code [72], [73].
 Figure 4.1: Computational Domain: Visible Structures Indicate Zones of High Heat
 Flux. Color Represents Values of the Electric Potential.
 57
Figure 4.1 shows "con uent" volumetric rendering (CVR) of the heat  ux and
 the electric potential in the computational domain. Heat  ux values are used in
 the opacity transfer function, in e ect de ning a set of visible spatial structures
 inside the volume. Electric potential values are used in the color transfer function,
 which controls how color is applied to visible pixels. In our setup we are looking at
 a collection of island regions where heat  ux is higher than a certain cuto value.
 Figure 4.2 illustrates the geometry of the  ux tube domain and features a blown-up
 region that corresponds to a small ball around the center of the computational do-
 main. Note that for aesthetics reasons the partial toroidal shell shown as a reference
 here is taken from a di erent tokamak con guration.
 Figure 4.2: Flux Tube Domain in a Toroidal Device. The Circular Region is an
 Enlargement of an Area of the Ribbon Shaped Domain to Show Detail.
 58
Shown here are data derived from the late linear phase of the simulation, when
 the linearly unstable modes are at high amplitude and are strongly interacting, but
 before the nonlinearly generated  ows and eddies have reached high amplitude. The
 amount of heat being transported at this instance is quite large (comparable to the
 steady-state, turbulent value), despite the small spatial  lling factor. A conventional
 diagnostic would show "streamers" at this point in time. We are using the new CVR
 diagnostic to develop intuition about the relationship between the E B  ows (along
 contours of constant potential) and the radial heat  ux.
 59
Chapter 5
 Parallel Stochastic Classi cation of Stem Cells
 5.1 Introduction
 In this chapter we discuss a parallel, stochastic algorithm that can classify
 volumetric models of stem cells as having been grown on a surface that induces
 cell di erentiation or on a surface that does not induce cell di erentiation. This
 algorithm builds on our algorithm discussed in Chapter 3 for molecular surface
 area measurement. Like the surface area measurement algorithm, this algorithm is
 suitable for implementation on a GPU and is progressive, allowing it to return a
 rough classi cation almost immediately and re ne the accuracy of the classi cation
 over time.
 In the following material in this chapter we discuss the application of this
 algorithm to the classi cation of stem cells, but the algorithm could be applied to
 the classi cation of any volumetric data sets that can be characterized by the lengths
 of the intersections of randomly generated lines with the data set. The material in
 this chapter is under preparation for publication [74].
 Tissue engineering has been de ned as an \interdisciplinary  eld that ap-
 plies the principles of engineering and the life sciences toward the development of
 biological substitutes that restore, maintain, or improve tissue or whole organ func-
 tion" [75]. Recent advances include many tissues and organs, including heart[76],
 60
bladder[77], and urethra[78].
 The two critical parts of a tissue-engineered implant are stem cells and 3D tis-
 sue sca olds [79]. Stem cells have the ability to di erentiate down multiple lineages
 for regeneration of di erent organs and tissues. Sca olds provide a 3D template for
 stem cells to adhere and di erentiate.
 Traditionally, control of cell function has been considered in terms of soluble
 factors, biochemical signaling and paracrine e ects. However, recent work shows
 that the physical properties of the cellular microenvironment can also in uence cell
 function. In particular, the chemistry [80, 81, 82], mechanics [83, 84], and structure
 [85, 86, 87] of the cell niche are important. Thus, a primary goal in regenerative
 medicine is to engineer sca olds to provide a 3D microenvironment that enhances
 tissue regeneration.
 Previously, we demonstrated that 3D PCL (poly("-caprolactone)) nano ber
 sca olds drive hBMSCs (human Bone Marrow Stromal Cells) into an elongated and
 highly-branched morphology that induces them to di erentiate down an osteogenic
 lineage [86]. hBMSCs are isolated from bone marrow and contain adult skeletal stem
 cells that can di erentiate into bone, fat and cartilage [88]. When hBMSCs were
 cultured on 2D  at PCL  lms (spun-coat), they assumed a well-spread, polygonal
 morphology, that supported cell proliferation but did not induce osteogenic di eren-
 tiation. These results indicated that the structure of the sca old can be designed to
 drive cells into morphologies that direct their di erentiation down a desired lineage.
 Images of the 2D and 3D surfaces and cells are given in  gure 5.1.
 The use of sca old structure to control stem cell function is attractive because
 61
Figure 5.1: Left to right: SEM (Scanning Electron Micrograph) of 2D PCL spun-
 coat  lm, an hBMSC cultured on a 2D PCL spun-coat  lm, SEM of 3D PCL
 nano ber sca old, and an hBMSC cultured in a 3D PCL nano ber sca old. hBMSC
 images are reconstructed from 3D confocal  uorescent scans of  uorescently stained
 actin.
 sca old structure is stable, has a low regulatory burden and is relatively easy and
 inexpensive to control [89]. Covalent functionalization of sca old devices with bio-
 chemically active molecules is di cult, hard to characterize and expensive. Loading
 sca olds with growth factors to guide regeneration is challenging because proteins
 are hard to manufacture and highly unstable. In addition, including biomolecules
 or growth factors in a device increases the regulatory costs.
 Cell shape and function are known to be intricately linked [90, 91], and recent
 work has shown that this premise holds true for hBMSCs [86, 87, 92, 93]. Cell
 shape has the added value of being an early predictor of cell fate. Biochemical
 assays require weeks or months of culture for di erentiation markers to become
 detectable, whereas cells attain a stable morphology within a day of culture that
 can be an indicator for their future behavior [93]. Though previous methods for
 assessing cell shape have focused on 2D cell shape data [94, 93], tissue engineering
 aims to use 3D sca olds to control cell shape function. Culture of cells in complex
 62
3D microenvironments is likely to require 3D cell shape analysis in order to establish
 meaningful relationships. Rapid throughput is also desirable due to the large number
 of parameters that must be tested to identify sca olds that promote the desired
 biologic response [95, 96].
 In addition to the tissue engineering  eld, rapid 3D methods for measuring cell
 shape are likely to be needed by the pharmaceutical industry as it moves towards
 3D sca old systems for drug screening [97, 98]. 3D in vitro culture models are
 less expensive than animal models and may be more predictive of human clinical
 outcomes. Drug screens typically involve thousands of compounds and cell shape
 analysis is a parameter frequently used to determine toxicity response. Thus, high-
 content 3D methods for cell shape classi cation are required if pharma is to use 3D
 sca old technologies to improve the predictive nature of in vitro testing [99, 100].
 Herein, we have developed a machine learning algorithm that can rapidly identify
 di erentiating stem cells based on their 3D shape.
 The remainder of this chapter is organized as follows. In section 5.2 we give
 an overview of the algorithm and describe some parameters that need to be set. In
 section 5.3 we describe how we characterize a cell by intersecting it with random
 lines. In section 5.4 we describe several ways in which our algorithm can be paral-
 lelized for implementation on a GPU (Graphics Processing Unit). In section 5.5 we
 describe how we use machine learning to classify the cells. We present timing and
 accuracy results in section 5.6. Finally, we provide suggestions for future work in
 section 5.7 and give our conclusions in section 5.8.
 63
5.2 Technical Approach
 We have cultured hBMSCs (human Bone Marrow Stromal Cells) on two types
 of substrates: a spun-coat substrate with a  at, 2D surface, and a nano ber sub-
 strate with a  brous, 3D surface. hBMSCs cultured on the nano ber substrate
 underwent osteogenic di erentiation and adopted a more spikey, branched appear-
 ance, while hBMSCs cultured on the spun-coat substrate did not di erentiate and
 retained a smoother appearance. Our goal is to measure three-dimensional geomet-
 ric properties of the hBMSCs that can distinguish between hBMSCs cultured on
 the nano ber substrate and hBMSCs cultured on the spun-coat substrate. Many
 methods for classifying three-dimensional objects can be found in the literature. For
 a survey of these methods, see Cardone et al. [101].
 Our general work- ow is as follows. First, confocal microscopy was used to
 collect 3D image data sets of individual cells. Next we use an algorithm to intersect
 each cell with many lines in 3D and record the lengths of each of the partial line
 segments that run through the interior of the cell. From this line length data, we
 generate a line length histogram describing each cell. Finally, we use the histograms
 to train a machine learning technique which is used to classify new cells as coming
 from a 3D,  brous nano ber sca old or a 2D,  at spun-coat  lm. An overview of
 this pipeline is given in  gure 5.2.
 The 3D cell images that were analyzed in the current work were collected pre-
 viously and this previous work has been described in detail [86]. Brie y, primary
 human bone marrow stromal cells (hBMSCs) isolated from iliac crest of healthy
 64
Figure 5.2: Overview of our cell data processing pipeline.
 donors were cultured for 1 day on PCL (poly("-caprolactone)) spun-coat  lms or
 PCL nano ber sca olds. 2D PCL spun-coat  lms were made by spin-coating. 3D
 PCL nano bers were made by electrospinning. Nano ber diameter was measured
 by imaging with scanning electron microscopy (mean nano ber diameter = 910 nm,
 S.D. = 526 nm, n = 100). 2D spun-coat  lms and 3D nano ber sca olds were made
 from the same material, PCL, so that the e ects of sca old chemistry could be dis-
 cerned from e ects of sca old structure. For 3D imaging, cellular actin was stained
 with Alexa-Fluor-546-phalloidin and imaged by confocal  uorescence microscopy.
 Z-stacks of images of individual hBMSCs on spun-coat  lms for nano ber sca olds
 were collected using a 1  m step size.
 Previous analysis of hBMSC shape demonstrated that hBMSCs on 3D nano ber
 sca olds have a more branched structure than hBMSCs on the 2D spun-coat  lms
 [86]. We want to leverage this di erence to distinguish between hBMSCs that were
 cultured on the di erent substrates. To accomplish this, we  rst intersect the cells
 with a set of randomly generated lines. We measure the lengths of the portions of
 65
the lines that passed through the cell and put these lengths into histograms. Our
 hypothesis was that the cells on the nano ber substrates would have more short
 segments since these cells had more long, thin branches. Analysis of the average
 segment length for the two cell types supports this hypothesis.
 A question that arises is how to choose the random lines that are used to
 intersect with the cells. We want a method of line selection whose results do not
 depend on microscope orientation. We have tried two di erent methods of line
 selection. The  rst is to simply select pairs of points on a bounding sphere around
 the cell and generate lines that pass through both of the points. The results of
 this method should converge to the same result regardless of the cell?s orientation.
 The second method is to select pairs of points on the curved surface of a cylinder
 rather than the surface of a sphere. This method takes advantage of the particular
 structure of our data set.
 To perform our algorithm we need to classify each voxel in the three-dimensional
 data set as either being inside or outside the cell. Each voxel contains an integer
 that represents how much light was received from that location by the microscope.
 The  uorescently stained cell gives o a large amount of light from locations that
 are inside the cell, while only some small amount of noise is received from locations
 outside the cell. We classify the voxels by selecting a threshold value and labeling
 voxels with values below this threshold as being outside the cell, and voxels with
 values above this threshold as being inside. If the threshold value is chosen too low
 then noise in the data set may be labeled as part of the cell, while if the threshold
 value is chosen too high then signi cant geometric features of the cell may be eroded.
 66
We therefore expect that, as the threshold value is increased, the performance of
 the algorithm will improve up to a certain point and then begin to worsen. The
 selection of the threshold value can be considered part of the training phase of the
 algorithm and the value can be chosen based on the training data set. For this work
 the threshold value was chosen by testing a series of di erent values and selecting
 the one that produces the greatest percentage of correct classi cation. Based on the
 data presented in table 5.1, we selected a threshold value of 3.0.
 Threshold 0.5 1.0 2.0 3.0 4.0 5.0 6.0
 % Correct 74.2 75.6 80.5 82.9 73.2 78.0 56.1
 Table 5.1: Percent correct classi cation with various threshold values. The voxels
 in the data set contained values in the range [0, 15]. Parameters used were 106 lines
 intersected, minimum gap length of 8.0, linear SVM, ten-fold cross validation.
 In addition to choosing a threshold value we clean the data set by identi-
 fying connected components and throwing out all components except the largest.
 This helps remove noise from the data set and also removes portions of other cells
 that might have been captured in the images. We  ll in any holes (empty regions
 completely surrounded by cell voxels) we  nd in the component.
 After these steps there can still be some noise in the data set that causes
 voxels that should be inside the cell to be labeled as empty space. In our ray
 shooting algorithm we correct this by ignoring gaps in the line segments that are
 smaller than a certain length, instead counting it as one continuous segment. If this
 minimum gap length is set too low then it will allow gaps that are actually noise
 67
or artifacts, while if it is set too high then it will exclude gaps that are actually
 part of the cell geometry. We therefore expect that, as the minimum gap length
 is increased, the performance of the algorithm will improve up to a certain point
 and then begin to worsen. As with the previously mentioned threshold value, the
 selection of the minimum gap length can be considered part of the training phase of
 the algorithm and can be performed based on the training data set. For this work we
 chose our value for the minimum gap length by once again testing a series of values
 and selecting the one that produced the greatest percentage of correct classi cation.
 Based on the data presented in table 5.2, we selected a minimum gap length of 8.0.
 Min Gap Length ( m) 5.0 6.0 7.0 8.0 9.0 10.0 11.0
 % Correct 78.0 80.5 80.5 82.9 80.5 80.5 80.5
 Table 5.2: Percent correct classi cation with various minimum gap lengths. Param-
 eters used were 106 lines intersected, threshold value of 3.0, linear SVM, ten-fold
 cross validation.
 5.3 Geometric Characterization of the Cell
 As described above, our analysis of the cells involves generating a set of random
 lines that intersect the cell. The lines were generated using several di erent methods.
 In the  rst method we picked pairs of points on the surface of a bounding sphere
 and generating a line that intersects these points. This is the method used by Juba
 and Varshney [3] and is described by Li et al.[62] as the Chord Model. Uniformly
 distributed points (x; y; z) on a sphere can be generated from pairs (u;  ) of uniformly
 68
distributed random numbers by using the formula
 (x; y; z) = ((1 u2)
 1
 2 cos  ; (1 u2)
 1
 2 sin  ; u) (5.1)
 where u is in [ 1; 1] and  is in [0; 2 ). A slightly more computationally e cient
 formula is given by Rovira et al. [66] which generates the points (x; y; z) from pairs
 ( 1;  2) of uniformly distributed random numbers
 cos  = 1 2   1
 sin  =
 q
 1 (cos  )2
 ? = 2     2
 (x; y; z) = (sin   sin?; cos  ; sin   cos?)
 (5.2)
 where  1 and  2 are in [0; 1).
 In the second method we picked pairs of points on the curved surface of a
 cylinder and generate a line that intersects these points. The cylinder is oriented
 such that the central axis is perpendicular to the plane of the 2D microscope images
 that were stacked together to form the 3D volume. Each of the two points that
 de ne the line is de ned by an angle  around the circumference of the cylinder and
 a height z along the central axis. This is illustrated in  gure 5.3. The formula for
 points (x; y; z) on the curved surface of a cylinder is
 (x; y; z) = (cos  ; sin  ; z) (5.3)
 where z is in [ 1; 1] and  is in [0; 2 ).
 Both of these line generation methods require a set of uniformly distributed
 random numbers. Typically a pseudo-random sequence of random numbers is used
 69
Count
 Segment Length (um)
 Figure 5.3: Top-down view of lines de ned by points on a cylinder enclosing a cell.
 The lengths of the intersections of these lines with the cell are stored in a histogram.
 for this purpose. However, it has been shown that a so-called quasi-random sequence
 (also called a low-discrepancy sequence) has better properties, including a lower
 error bound in numeric integration [62]. For this work we use the Niederreiter
 quasi-random sequence [1], which can be found in the GNU Scienti c Library [65].
 For equation 5.1, we generate quasi-random points (a; b; c; d) in four dimensions and
 use the  rst and second coordinate pairs (a; b) and (c; d) to generate the (u1;  1)
 and (u2;  2). For equation 5.3, we generate quasi-random points (a; b; c) in three
 dimensions and use the three coordinates for  1,  2, and z (in our implementation
 both points of the line are at the same altitude z).
 The sphere method of generating lines is good for general data sets in which no
 direction should be treated di erently than any other. In the data we are working
 with, however, most of the cell structure variation is in the xy plane of the microscope
 images, with only minor changes along the z axis along which the images were
 stacked. We therefore chose to generate lines using the cylinder method with the
 central axis of the cylinder aligned with the z axis of the image stack when computing
 70
the results reported in this chapter. We have also found experimentally that we get
 better cell classi cation results with the cylinder method (up to 82.9% correct) than
 with the sphere method (up to 75.6% correct).
 Once a line is generated the next step is to compute its intersections with the
 cell. We do this by stepping along the line at uniform intervals and at each point
 checking if that point is inside or outside the cell. Initially, the line starts outside
 the cell. If two adjacent points are inside and outside the cell respectively (or vice
 versa), we know that we have either entered or left the cell. Whenever we leave the
 cell we compute the length of the line segment that was inside the cell and store
 this in a histogram.
 5.4 High-Throughput Processing
 Figure 5.4: GPU memory and multiprocessor layout.
 A GPU (Graphics Processing Unit) is a SIMT (Single Instruction, Multiple
 Thread) processor capable of executing many identical instruction threads in paral-
 lel on di erent sets of input data. It has a large number of stream processors, each
 with some local memory and registers. These are grouped into several multipro-
 71
cessors, each containing a small amount of fast memory that is shared within the
 multiprocessor. It also has a large amount of slower memory that is accessible by
 all thread processors. A diagram of the processors and memory is given in  gure
 5.4.
 The SIMT architecture is similar to the traditional SIMD (Single Instruction,
 Multiple Data) except that in SIMT, threads can take diverging branches. In the
 NVIDIA CUDA library [51] which we used to implement our algorithm, threads
 are divided into groups of 32 called warps. Threads within a warp cannot execute
 di erent instructions at the same time. If threads in a warp go down diverging
 branches, the GPU will  rst disable all the threads going down the second branch
 and execute the instructions of the  rst branch, then vice versa. Greater e ciency
 can therefore be achieved by ensuring that as often as possible all threads in a warp
 follow the same branches. Threads in di erent warps are free to go down di erent
 branches without penalty.
 If an algorithm can be mapped to this architecture, the GPU can generally
 execute the algorithm much faster than it could be executed on a CPU. This is
 because the SIMT nature of the GPU allows more of its transistors to be dedicated
 to arithmetic operations rather than tasks such as caching and  ow control.
 We mapped our line intersection algorithm to the GPU by using one thread to
 compute the intersections of each line with the cell. A straightforward implemen-
 tation of this algorithm involves each thread writing the lengths of the intersecting
 segments into a shared histogram (see  gure 5.5). However, this requires synchro-
 nization of the threads or the use of slow atomic operations. One way around this
 72
CPU Memory CPU Memory
 GPU Memory GPU MemoryGPU
 Cell Volume
 Cell Volume Trace Line
 Trace Line
 Trace Line
 ...
 Histogram
 Histogram
 Figure 5.5: Use of CPU and GPU to generate segment length histogram from cell
 volume using atomic operations algorithm.
 requirement is to give each thread its own 256 bin histogram to store its results to,
 and then merge the histograms at the end using a parallel reduction operation (see
  gure 5.6). This approach is wasteful, however, since each thread would likely write
 only a few segment lengths into each histogram. Instead, each thread writes its
 intersected segment lengths into a  xed size list, and simply ignores any intersected
 segments that occur after the list is full. We found that a list of length 10 is su cient
 to produce results that are almost identical to the CPU version of the algorithm.
 Once the lists are constructed they are read back to the CPU and the lengths are
 counted to generate the histogram (see  gure 5.7). A comparison of performance
 data for these three algorithms is given in section 5.6.
 73
CPU Memory CPU Memory
 GPU Memory GPU MemoryGPU
 Cell Volume
 Cell Volume Histograms
 Histogram
 Trace Line
 Trace Line
 Trace Line
 ...
 GPU MemoryGPU
 Reduction
 Histogram
 Figure 5.6: Use of CPU and GPU to generate segment length histogram from cell
 volume using parallel reduction algorithm.
 5.5 Classi cation Through Machine Learning
 The segment length measurements produced histograms for the nano ber and
 spun-coat cells. To show that these measurements could distinguish between the two
 types of cells we applied several machine learning algorithms from the Weka machine
 learning toolkit [102]. The algorithms we used included Naive Bayes, Support Vector
 Machine, k Nearest Neighbor, and Decision Tree. For each algorithm we used ten-
 fold cross validation. This means we divided the data into ten groups, used nine
 groups to train the algorithm, and then classi ed the tenth. This was repeated
 using each of the ten groups as the classi cation group. The results of the ten
 classi cations were then combined to give classi cation results for the whole data
 set. We obtained the best classi cation results using Support Vector Machines
 (SVM). Once the decision was made to use SVM, we switched from using Weka to
 using LIBSVM [103].
 74
CPU Memory CPU Memory CPU Memory
 GPU Memory GPU MemoryGPU
 CPU
 Cell Volume
 Cell Volume SegmentLength List
 Segment
 Length List CountSegment
 Lengths
 Histogram
 Trace Line
 Trace Line
 Trace Line
 ...
 Figure 5.7: Use of CPU and GPU to generate segment length histogram from cell
 volume using segment length list algorithm.
 The training phase of the SVM treats each histogram as a point in high-
 dimensional space and tries to compute the hyperplane that best separates the
 points corresponding to di erentiated cells and the points corresponding to non-
 di erentiated cells. The orientation of the hyperplane is determined only by those
 points close to it, which are referred to as the support vector. Once the algorithm is
 trained, additional histograms can be classi ed as di erentiated or non-di erentiated
 by testing where the corresponding point lies in relation to the hyperplane. An
 illustration of a trained SVM is given in  gure 5.8. Additional information on
 Support Vector Machines can be found by referring to Cortes and Vapnik [104].
 5.6 Performance and Validation
 We tested our algorithm on a data set developed by Kumar et al. [86] consist-
 ing of 21 cells grown on a  brous nano ber substrate and 20 cells grown on a  at
 75
+
 -
 +
 +
 +
 +
 +
 + -
 - -
 -
 -
 -
 Figure 5.8: Trained Support Vector Machine (SVM). The hyperplane that separates
 the two classes of histograms is the zero level-set of a decision function that divides
 space into a positive half and a negative half. Histograms that fall in the positive
 half are labeled as one class while histograms that fall in the negative half are labeled
 as the other.
 spun-coat substrate. For each cell we were given a stack of 16-bit grayscale confocal
 microscopy images of resolution 512  512 which we merged together into a single
 volume. The number of images in each stack varied from 11 to 20.
 We measured the performance of our algorithms when intersecting a cell with
 various numbers of lines, ranging from 103 to 106. The results are given in table
 5.3. For each algorithm the  rst step was to generate the random lines with which
 to intersect the cell. This step is independent of the data and can be done once as a
 pre-process, with the same set of lines then being used to intersect each cell. The line
 generation time is therefore not included in the running times for the algorithms.
 For each number of lines we measured the times the algorithms took to com-
 pute the intersection counts for a single cell both on the CPU and in the parallel
 76
implementations on the GPU. The timing for each algorithm is broken down into
 the steps listed in  gures 5.5, 5.6, and 5.7. For the GPU Reduction algorithm we
 measured the time required to send the lines to the GPU and trace them (\Trace
 lines"), as well as the time required to perform the parallel reduction operation and
 read back the resulting histogram (\Do reduction"), which was implemented using
 the CUDA Thrust library [105]. For the GPU Lists algorithm we again measured
 the time required to send the lines to the GPU and trace them (\Trace lines"), as
 well as the time required to read the segment length lists back to the CPU and
 convert them into a histogram (\Count lengths"). The total time includes these
 times as well as the time required for any other miscellaneous tasks. In addition to
 these times, all algorithms took about 270 ms to load the cell volume data from the
 disk. We also list speedup factors which show how many times faster each parallel
 GPU implementation is over the serial implementation on the CPU. Note that the
 best parallel GPU implementation can be over two orders of magnitude faster than
 the serial CPU implementation.
 On the current-generation GPU listed in the caption of table 5.3 the best
 performing algorithm was Atomic, followed closely by Lists. In addition to this
 GPU we also tested the algorithms on several older-generation GPUs, an NVIDIA
 Quadro NVS 285 and an NVIDIA Quadro NVS 290. On these GPUs the Lists
 algorithm was actually slightly faster than Atomic. We suspect the reason for this
 is that the implementation of atomic operations has been improved in the current
 generation of GPUs.
 Once the cell histograms have been produced they are either used to train an
 77
Num lines 103 104 105 106
 Generate lines (ms) 0.163 1.35 12.9 126
 CPU Algorithm
 Total time (ms) 50.1 492 4915 49154
 GPU Algorithm { Atomic
 Total time (ms) 3.56 6.74 45.7 450
 Speedup factor 14.1 73.0 108 109
 GPU Algorithm { Reduction
 Trace lines (ms) 0.388 0.467 1.55 17.1
 Do reduction (ms) 13.4 19.1 80.0 708
 Total time (ms) 14.1 20.1 83.4 743
 Speedup factor 3.55 24.5 58.9 66.2
 GPU Algorithm { Lists
 Trace lines (ms) 0.277 0.450 1.55 15.9
 Count lengths (ms) 2.90 6.48 49.2 466
 Total time (ms) 3.52 7.35 52.7 501
 Speedup factor 14.2 66.9 93.3 98.1
 Table 5.3: Results of intersecting a cell with di erent numbers of lines. The CPU
 algorithm was run on an Intel Xeon X5260 (using only one core) with 8 GB of RAM.
 The GPU algorithms were run on an NVIDIA Tesla C2050. The dimensions of the
 cell volume data were 512 512 20. The threshold was 3.0 and the minimum gap
 length was 8.0. 78
SVM classi er (if they are in the training set), or are classi ed by a trained SVM.
 For this work we used the SVM implementation in LIBSVM [103]. Training a linear
 SVM on our test data set of 41 cell histograms took about 23.9 ms. Once the SVM
 is trained, new cell histograms can be classi ed in about 17.4 ms each.
 In addition to running time we also measured the classi cation correctness
 of our algorithm when intersecting the cells with di erent numbers of lines. To
 compute the percentage of correct classi cation for each trial we used 10-fold cross
 validation. This means that we divided the data set of 41 cells into 10 groups of
 approximately equal size, trained the machine learning algorithm on 9 of the groups,
 and measured the percent correct classi cation of the 10th. An example of one fold
 is given in table 5.4. This was repeated 10 times, each time using a di erent group
 to test the classi cation correctness. The percent correct classi cation of all the
 groups was then averaged. The average percent correct classi cations are given in
  gure 5.9.
 This data demonstrates the progressive nature of our algorithm. For a small
 number of lines the classi cation accuracy is about 50%, which is what would be
 expected from random guessing. As the number of intersected lines increases, so
 does the classi cation accuracy until it levels o at a maximum. If, after a certain
 number of lines have been intersected, the user desires additional accuracy, then the
 already-computed lines can be re-used and only the additional lines will need to be
 intersected. The algorithm can therefore provide rough results quickly, which can
 then be improved to the desired accuracy with additional running time.
 79
Actual Nano ber Nano ber Spun-Coat Spun-Coat
 Classi ed as Nano ber Spun Coat Spun-Coat Spun-Coat
 Table 5.4: Example of one fold of the validation of a trained SVM model. Histograms
 were generated using the GPU Lists algorithm with threshold 3.0 and minimum gap
 length 8.0. An SVM was trained with 19 nano ber cells and 18 spun-coat cells. An
 additional 2 nano ber cells and 2 spun-coat cells were set aside to be classi ed by
 the model. Of these 4 cells, 3 were classi ed correctly.
 5.7 Future Work
 One interesting area of potential future work would be to take advantage of
 the progressive nature of the algorithm during the SVM classi cation. Rather than
 simply generating the histogram using the full number of lines and then performing
 the classi cation, we could instead generate the histogram using some small initial
 number of lines and then check the certainty of the classi cation. If the classi cation
 was still doubtful then the histogram could be improved by intersecting the cell with
 more lines, while if the classi cation was su ciently certain then the results could be
 returned immediately. For an SVM, the certainty of the classi cation could perhaps
 be measured by the distance of the query point from the hyperplane dividing the
 two regions of classi cation.
 80
Figure 5.9: Percent correct classi cation of di erentiated and non-di erentiated
 stem cells. This data was generated using the GPU Lists algorithm with threshold
 3.0 and minimum gap length 8.0.
 Although we have only applied the algorithm to cell data, the algorithm is
 general enough that it could potentially be applied to any type of 3D data, such as
 CAD models or protein molecules. The use of the algorithm to classify other data
 types could be another interesting avenue for future work.
 5.8 Conclusions
 We described an algorithm that can classify hBMSCs (human Bone Marrow
 Stromal Cells) as having been grown on a di erentiation-inducing 3D PCL nano ber
 sca old or on a non-di erentiation-inducing 2D PCL spun-coat  lm. The algorithm
 takes 3D cell image data and intersects it with randomly generated rays that connect
 the sides of a cylinder that bounds the cell. The lengths of ray segments that are
 within the cell are used to generate a histogram. These histograms can then be
 81
used as sample points to train a machine learning algorithm such as a support
 vector machine (SVM), which can then be used to classify future cells.
 Our algorithm is easily parallelizable and is also progressive, allowing it to
 provide a rough histogram quickly and then re ne it as desired. Our parallel GPU
 implementation can convert a cell into a histogram representation suitable for ma-
 chine learning training or classi cation by intersecting it with 106 lines in about 450
 ms, representing an over 100-fold speedup from the serial CPU implementation. By
 applying the algorithm to our test data set of 41 cells we were able to achieve 82.9%
 correct classi cation using 10-fold cross validation. This rapid 3D image analysis al-
 gorithm can be used to classify di erentiating and non-di erentiating stem cells for
 high-throughput screening of 3D tissue sca olds. The algorithm used 3D cell image
 data in order to take advantage of the bene ts of 3D culture and to capture the
 e ects of 3D sca old structure on cell shape. The approach has been demonstrated
 using stem cell image data from 1-day cultures, which enables identi cation of dif-
 ferentiating cells at a much earlier stage than is possible with osteogenic markers,
 which can require weeks of culture
 82
Chapter 6
 Future Work
 A current trend in science is the generation of large amounts of data, both
 from measurement and from computer simulation. Various sources have referred
 to this phenomenon as the generation of \Big Data". At the same time, computer
 processor clock speeds have stopped increasing, forcing hardware manufacturers
 to resort to parallelism to continue to increase processing power. Unfortunately,
 algorithms designed for the traditional serial processing model are often not able to
 take advantage of this parallel computing ability. To keep up with Big Data, we
 must design future algorithms using data-parallel techniques.
 There are several potential next steps that could be taken from the work
 described in this thesis. One possibility would be to combine the Gaussian Radial
 Basis Function (RBF)  tting from Chapter 2 with the stem cell classi cation from
 Chapter 5. The use of RBFs could enable several di erent improvements in the
 classi cation algorithm.
 The  rst improvement would involve the use of an analytical Gaussian blur of
 the RBF representation. As di erent levels of blur are applied, the cell shape would
 change in a certain way that could be characteristic of a di erentiated or non-
 di erentiated cell. The line intersection length histograms from Chapter 5 could
 then be generated for each of these di erent blur levels, and used as additional
 83
information for the Support Vector Machine (SVM) training and classi cation.
 The second improvement would involve the use of the Gaussian RBF?s ana-
 lytic di erentiability to compute the mean curvature at random points on the cell?s
 implicit surface. Mean curvatures of the implicit surface could be computed directly
 from the volumetric data using the formulas from Goldman [106]. These curvature
 samples could then be put into histograms just as the line intersection lengths were
 in Chapter 5, and used as additional information for the classi cation of the cells.
 Another potential piece of future work would be to combine the Con uent
 Visualization from Chapter 4 with the stem cell and surface data from Chapter 5.
 Biologists believe that the geometry of the material which a stem cell rests on can
 induce the cell to undergo di erentiation. It would therefore be useful to study the
 geometric interaction between the cell surface and the surface of the material on
 which it is resting. Con uent Visualization could enable this by allowing only the
 portions of one surface that are in contact with the other surface to be visualized.
 A  nal direction for potential future work is in the area of cybersecurity. Com-
 panies, governments, and consumers depend on secure and reliable computer net-
 works and data products, but as technology becomes more complex, security threats
 also become more complicated. The scale of network tra c data is truly staggering
 and our ability to collect such data has far surpassed our ability to meaningfully
 analyze it. Visual representations and interaction technologies provide a powerful
 mechanism for allowing an analyst to see and understand large amounts of infor-
 mation at once. The development of data-parallel visual representation algorithms
 could be an important step towards the solution of this problem.
 84
Bibliography
 [1] Paul Bratley, Bennett L. Fox, and Harald Niederreiter. Implementation and
 tests of low-discrepancy sequences. ACM Trans. Model. Comput. Simul.,
 2(3):195{213, 1992.
 [2] D. Juba and A. Varshney. Modelling and rendering large volume data with
 gaussian radial basis functions. Technical report, Tech. rep., University of
 Maryland, 2007.
 [3] D. Juba and A. Varshney. Parallel, stochastic measurement of molecular sur-
 face area. Journal of Molecular Graphics and Modelling, 27(1):82{87, 2008.
 [4] G. Stantchev, D. Juba, W. Dorland, and A. Varshney. Con uent volumetric
 visualization of gyrokinetic turbulence. Plasma Science, IEEE Transactions
 on, 36(4):1112{1113, 2008.
 [5] A. A. Mirin, R. H. Cohen, B. C. Curtis, W. P. Dannevik, A. M. Dimits, M. A.
 Duchaineau, D. E. Eliason, D. R. Schikore, S. E. Anderson, D. H. Porter, P. R.
 Woodward, L. J. Shieh, and S. W. White. Very high resolution simulation of
 compressible turbulence on the ibm-sp system. In Supercomputing ?99: Pro-
 ceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM),
 page 70, New York, NY, USA, 1999. ACM Press.
 [6] James F. Blinn. A generalization of algebraic surface drawing. ACM Trans-
 actions on Graphics, 1(3):235{256, July 1982.
 [7] Hitoshi Nishimura, Makoto Hirai, Toshiyuki Kawai, Toru Kawata, Isao Shi-
 rakawa, and Koichi Omura. Object modeling by distribution function and
 a method of image generation. The Transactions of the Institute of Elec-
 tronics and Communication Engineers of Japan, J68-D(4):718{725, 1985. In
 Japanese (translated into English by Takao Fujiwara while at Centre for Ad-
 vanced Studies in Computer Aided Art and Design, Middlesex Polytechnic,
 England, 1989).
 [8] J. Bloomenthal, C. Bajaj, J. Blinn, M. Cani-Gascuel, A. Rockwood, B. Wyvill,
 and G. Wywill. Introduction to Implicit Surfaces. Morgan Kaufmann, 1997.
 [9] Manfred Weiler, Ralf Botchen, Simon Stegmaier, Thomas Ertl, Jingshu
 Huang, Yun Jang, David S. Ebert, and Kelly P. Gaither. Hardware-assisted
 feature analysis and visualization of procedurally encoded multi eld volumet-
 ric data. IEEE Comput. Graph. Appl., 25(5):72{81, 2005.
 [10] Joe Kniss, Gordon Kindlmann, and Charles Hansen. Interactive volume ren-
 dering using multi-dimensional transfer functions and direct manipulation wid-
 gets. In VIS ?01: Proceedings of the conference on Visualization ?01, pages
 255{262, Washington, DC, USA, 2001. IEEE Computer Society.
 85
[11] Joe Kniss, Gordon Kindlmann, and Charles Hansen. Multidimensional trans-
 fer functions for interactive volume rendering. IEEE Transactions on Visual-
 ization and Computer Graphics, 8(3):270{285, 2002.
 [12] J. C. Carr, W. R. Fright, and R. K. Beatson. Surface interpolation with radial
 basis functions for medical imaging. IEEE Transactions on Medical Imaging,
 16(1):96{107, February 1997.
 [13] E. W. Cheney and W. A. Light. A Course in Approximation Theory. Brooks
 Cole, Paci c Grove, 1999.
 [14] N. Dyn, D. Levin, and S. Rippa. Numerical procedures for surface  tting
 of scattered data by radial basis functions. SIAM Journal on Scienti c and
 Statistical Computing, 7(2):639{659, 1986.
 [15] Bryan S. Morse, Terry S. Yoo, Penny Rheingans, David T. Chen, and K. R.
 Subramanian. Interpolating implicit surfaces from scattered surface data using
 compactly supported radial basis functions. In Shape Modeling and Applica-
 tions, SMI 2001 International Conference on, pages 89{98, 2001.
 [16] G. Turk, Huong Quynh Dinh, J. F. O?Brien, and G. Yngve. Implicit sur-
 faces that interpolate. In Bob Werner, editor, Proceedings of the International
 Conference on Shape Modeling and Applications (SMI-01), pages 62{73, Los
 Alamitos, CA, May 7{11 2001. IEEE Computer Society.
 [17] Greg Turk and James F. O?Brien. Shape transformation using variational im-
 plicit functions. In SIGGRAPH ?99: Proceedings of the 26th annual conference
 on Computer graphics and interactive techniques, pages 335{342, New York,
 NY, USA, 1999. ACM Press/Addison-Wesley Publishing Co.
 [18] Greg Turk and James F. O?Brien. Modelling with implicit surfaces that inter-
 polate. ACM Trans. Graph., 21(4):855{873, 2002.
 [19] G. Yngve and G. Turk. Creating smooth implicit surfaces from polygonal
 meshes. Technical Report GIT-GVU-99-42, Graphics, Visualization, and Us-
 ability Center. Georgia Institute of Technology, 1999., 1999.
 [20] J. Duchon. Splines minimizing rotation-invariant semi-norms in sobolev
 spaces. In W. Schempp and K. Zeller, editors, Constructive Theory of Func-
 tions of Several Variables, number 571 in Lecture Notes in Mathematics, pages
 85{100, 1977.
 [21] J. C. Carr, R. K. Beatson, J. B. Cherrie, T. J. Mitchell, W. R. Fright, B. C.
 McCallum, and T. R. Evans. Reconstruction and representation of 3d objects
 with radial basis functions. In SIGGRAPH ?01: Proceedings of the 28th annual
 conference on Computer graphics and interactive techniques, pages 67{76, New
 York, NY, USA, 2001. ACM Press.
 86
[22] L. Greengard and V. Rokhlin. A fast algorithm for particle simulations. J.
 Comput. Phys., 73(2):325{348, 1987.
 [23] Christopher S. Co, Bjoern Heckel, Hans Hagen, Bernd Hamann, and Ken-
 neth I. Joy. Hierarchical clustering for unstructured volumetric scalar  elds.
 In Greg Turk, Jarke J. van Wijk, and Robert Moorhead, editors, Proceedings
 of IEEE Visualization 2003, pages 325{332. IEEE, October 19{24 2003.
 [24] Yun Jang, Manfred Weiler, Matthias Hopf, Jingshu Huang, David S. Ebert,
 Kelly P. Gaither, and Thomas Ertl. Interactively visualizing procedurally
 encoded scalar  elds. In Joint EUROGRAPHICS - IEEE TCVG Symposium
 on Visualization (2004), 2004.
 [25] I. T. Jolli e. Principal Component Analysis. Springer-Verlag, 1986.
 [26] W. Hong, N. Neophytou, K. Mueller, and A. Kaufman. Constructing 3d ellip-
 tical gaussians for irregular data. In Mathematical Foundations of Scienti c
 Visualization, Computer Graphics, and Massive Data Exploration, 2006.
 [27] Yun Jang, Ralf P. Botchen, Andreas Lauser, David S. Ebert, Kelly P. Gaither,
 and Thomas Ertl. Enhancing the interactive visualization of procedurally
 encoded multi eld data with ellipsoidal basis functions. Eurographics, 25(3),
 2006.
 [28] John C. Hart. Ray tracing implicit surfaces. In SIGGRAPH 93 Modeling,
 Visualizing, and Animating Implicit Surfaces course notes, pages 13{1 to 13{
 15. ACM SIGGRAPH, August 1993.
 [29] Devendra Karla and Alan H. Barr. Guaranteed ray intersections with implicit
 surfaces. Computer Graphics, 23(3):297{306, 1989.
 [30] Andrei Sherstyuk. Fast ray tracing of implicit surfaces. Computer Graphics
 Forum, 18(2), June 1999.
 [31] Geo Wyvill and Andrew Trotman. Ray-tracing soft objects. In New Trends
 in Computer Graphics (Proceedings of CG International ?90), pages 467{476.
 Springer-Verlag, 1990.
 [32] Andrew P. Witkin and Paul S. Heckbert. Using particles to sample and control
 implicit surfaces. In SIGGRAPH ?94: Proceedings of the 21st annual confer-
 ence on Computer graphics and interactive techniques, pages 269{277, New
 York, NY, USA, 1994. ACM Press.
 [33] William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolu-
 tion 3d surface construction algorithm. In SIGGRAPH ?87: Proceedings of
 the 14th annual conference on Computer graphics and interactive techniques,
 pages 163{169, New York, NY, USA, 1987. ACM Press.
 87
[34] Jules Bloomenthal. Polygonization of implicit surfaces. Computer Aided Ge-
 ometric Design, 5(4):341{355, November 1988.
 [35] Jules Bloomenthal. An implicit surface polygonizer. In Paul Heckbert, editor,
 Graphics Gems IV, pages 324{349. Academic Press, Boston, 1994.
 [36] N. Neophytou, K. Mueller, K. McDonnell, W. Hong, X. Guan, H. Qin, and
 A. Kaufman. Gpu-accelerated volume splatting with elliptical rbfs. In Joint
 Eurographics - IEEE TCVG Symposium on Visualization 2006 (EuroVis?06),
 2006.
 [37] Steven M. Kay. Fundamentals of Statistical Signal Processing: Estimation
 Theory, chapter 7. Prentice Hall, 1993.
 [38] National Library of Medicine (U.S.) Board of Regents. Electronic imaging: Re-
 port of the board of regents. U.S. Department of Health and Human Services,
 Public Health Service, National Institutes of Health, 1990. NIH Publication
 90-2197.
 [39] Sarah F. Frisken and Ron Perry. Simple and e cient traversal methods for
 quadtrees and octrees. Journal of Graphics Tools, 7(3):1{11, 2002.
 [40] Yutaka Ohtake, Alexander Belyaev, Marc Alexa, Greg Turk, and Hans-Peter
 Seidel. Multi-level partition of unity implicits. ACM Trans. Graph., 22(3):463{
 470, 2003.
 [41] O. Kreylos, N. Max, B. Hamann, S. Crivelli, and W. Bethel. Interactive
 protein manipulation. In Proceedings of IEEE Visualization, pages 581{588,
 October 2003.
 [42] O.D. Lampe, I. Viola, N. Reuter, and H. Hauser. Two-Level Approach to E -
 cient Visualization of Protein Dynamics. IEEE Transactions on Visualization
 and Computer Graphics, 13(6):1616{1623, 2007.
 [43] M.L. Connolly. Analytical molecular surface calculation. Journal of Applied
 Crystallography, 16(5):548{558, 1983.
 [44] ML Connolly. Solvent-accessible surfaces of proteins and nucleic acids. Science,
 221(4612):709, 1983.
 [45] Michel F. Sanner, Arthur J. Olson, and Jean-Claude Spehner. Fast
 and robust computation of molecular surfaces. In SCG ?95: Pro-
 ceedings of the eleventh annual symposium on Computational geom-
 etry, pages 406{407, New York, NY, USA, 1995. ACM Press.
 http://www.scripps.edu/ sanner/html/msms home.html.
 [46] Robert Fraczkiewicz and Werner Braun. Exact and e cient analytical
 calculation of the accessible surface areas and their gradients for macro-
 molecules. Journal of Computational Chemistry, 19(3):319{333, 1998.
 http://pauli.utmb.edu/cgi-bin/get a form.tcl.
 88
[47] Tolga Can, Chao-I Chen, and Yuan-Fang Wang. E cient molecular surface
 generation using level-set methods. J Mol Graph Model, 25(4):442{454, De-
 cember 2006.
 [48] N. R. Voss, M. Gerstein, T. A. Steitz, and P. B. Moore. The geometry of
 the ribosomal polypeptide exit tunnel. J Mol Bio, 360(4):893{906, July 2006.
 http://geometry.molmovdb.org/3v/.
 [49] Chandrajit Bajaj and Vinay Siddavanahalli. An adaptive grid based method
 for computing molecular surfaces and properties. Technical Report tr06-56,
 University of Texas, 2006.
 [50] Chandrajit Bajaj, Peter Djeu, Vinay Siddavanahalli, and Anthony Thane.
 Texmol: Interactive visual exploration of large  exible multi-component
 molecular complexes. In VIS ?04: Proceedings of the conference on Visual-
 ization ?04, pages 243{250, Washington, DC, USA, 2004. IEEE Computer
 Society.
 [51] http://developer.nvidia.com/object/cuda.html.
 [52] http://ati.amd.com/companyinfo/researcher/Documents.html.
 [53] B. Lee and FM Richards. The interpretation of protein structures: estimation
 of static accessibility. J Mol Biol, 55(3):379{400, 1971.
 [54] F.M. Richards. Areas, Volumes, Packing, and Protein Structure. Annual
 Review of Biophysics and Bioengineering, 6(1):151{176, 1977.
 [55] H. Edelbrunner and EP Muecke. Three-dimensional alpha shapes. ACM
 Transactions on Graphics, 13:43{72, 1994.
 [56] Shoshana J. Wodak and Joel Janin. Analytical approximation to the accessible
 surface area of proteins. Proc Natl Acad Sci USA, 77(4):1736{1740, April 1980.
 [57] A. Varshney, F. P. Brooks, and W. V. Wright. Linearly scalable computation
 of smooth molecular surfaces. IEEE Computer Graphics and Applications, 14,
 No. 5:19 { 25, 1994.
 [58] James F. Blinn. A generalization of algebraic surface drawing. ACM Trans.
 Graph., 1(3):235{256, 1982.
 [59] J. A. Grant and B. T. Pickup. A gaussian description of molecular shape.
 Journal of Physical Chemistry, 99(11):3503{3510, December 1995.
 [60] David W. Ritchie. Evaluation of protein docking predictions using hex 3.1 in
 capri rounds 1 and 2. Proteins: Structure, Function, and Genetics, 52(1):98{
 106, July 2003.
 89
[61] H.M.Berman, J.Westbrook, Z.Feng, G.Gilliland, T.N.Bhat, H.Weissig,
 I.N.Shindyalov, and P.E.Bourne. The protein data bank. Nucleic Acids Re-
 search, 28:235{242, 2000. http://www.pdb.org/.
 [62] Xueqing Li, Wenping Wang, and Adrian Bowyer. Using low-discrepency se-
 quences and the crofton formula to compute surface areas of geometric models.
 Computer-Aided Design, 35(9):771{82, August 2003.
 [63] Yu-Shen Liu, Jun-Hai Yong, Hui Zhang, Dong-Ming Yan, and Jia-Guang Sun.
 A quasi-monte carlo method for computing areas of point-sampled surfaces.
 Computer-Aided Design, 38(1):55{68, January 2006.
 [64] Eric W. Weisstein. Sphere point picking. From MathWorld{ A Wolfram Web
 Resource, http://mathworld.wolfram.com/SpherePointPicking.html.
 [65] Gnu scienti c library. http://www.gnu.org/software/gsl/.
 [66] J. Rovira, P. Wonka, F. Castro, and M. Sbert. Point sampling with uniformly
 distributed lines. In Point-Based Graphics, 2005. Eurographics/IEEE VGTC
 Symposium Proceedings, pages 109{118, 2005.
 [67] Bernd G artner. Fast and robust smallest enclosing balls. In ESA ?99: Proceed-
 ings of the 7th Annual European Symposium on Algorithms, pages 325{338,
 London, UK, 1999. Springer-Verlag.
 [68] Markus Gross and Hanspeter P ster, editors. Point-Based Graphics. Morgan
 Kaufmann, 2007.
 [69] Peter Schr oder. What can we measure? In SIGGRAPH ?06: ACM SIG-
 GRAPH 2006 Courses, pages 5{9, New York, NY, USA, 2006. ACM Press.
 [70] J. Liang, H. Edelsbrunner, P. Fu, P.V. Sudhakar, and S. Subramaniam. Ana-
 lytical shape computation of macromolecules: II. Inaccessible cavities in pro-
 teins. Proteins Structure Function and Genetics, 33(1):18{29, 1998.
 [71] A. M. Dimits, G. Bateman, M. A. Beer, B. I. Cohen, W. Dorland, G. W.
 Hammett, C. Kim, J. E. Kinsey, M. Kotschenreuther, A. H. Kritz, L. L. Lao,
 J. Mandrekas, W. M. Nevins, S. E. Parker, A. J. Redd, D. E. Shumaker,
 R. Sydora, and J. Weiland. Comparisons and physics basis of tokamak trans-
 port models and turbulence simulations. Physics of Plasmas, 7(3):969{983,
 2000.
 [72] W. Dorland, F. Jenko, M. Kotschenreuther, and B. N. Rogers. Electron tem-
 perature gradient turbulence. Phys. Rev. Lett., 85(26):5579{5582, Dec 2000.
 [73] M. Kotschenreuther, G. Rewoldt, and W. Tang. Comput. Phys. Commun,
 88(128), 1995.
 90
[74] Derek Juba, Antonio Cardone, Horace Ip, Carl G. Simon Jr., Chris K. Ti-
 son, Girish Kumar, Mary Brady, and Amitabh Varshney. Parallel geometric
 classi cation of stem cells by their 3d morphology.
 [75] Ana Jaklenec, Andrea Stamp, Elizabeth Deweerd, Angela Sherwin, and
 Robert Langer. Progress in the tissue engineering and stem cell industry
 are we there yet?. Tissue Engineering Part B: Reviews, 18(3):155{166, 2012.
 [76] H.C. Ott, T.S. Matthiesen, S.K. Goh, L.D. Black, S.M. Kren, T.I. Neto ,
 and D.A. Taylor. Perfusion-decellularized matrix: using nature?s platform to
 engineer a bioarti cial heart. Nature medicine, 14(2):213{221, 2008.
 [77] A. Atala, S.B. Bauer, S. Soker, J.J. Yoo, and A.B. Retik. Tissue-
 engineered autologous bladders for patients needing cystoplasty. The lancet,
 367(9518):1241{1246, 2006.
 [78] Atlantida Raya-Rivera, Diego R Esquiliano, James J Yoo, Esther Lopez-
 Bayghen, Shay Soker, and Anthony Atala. Tissue-engineered autologous ure-
 thras for patients who need reconstruction: an observational study. The lancet,
 377(9772):1175{1182, April 2011.
 [79] Robert Langer and Joseph P. Vacanti. Tissue engineering. Science, 260:920{
 925, 1993.
 [80] D.S.W. Benoit, M.P. Schwartz, A.R. Durney, and K.S. Anseth. Small func-
 tional groups for controlled di erentiation of hydrogel-encapsulated human
 mesenchymal stem cells. Nature materials, 7(10):816{823, 2008.
 [81] J. Kohn, W.J. Welsh, and D. Knight. A new approach to the rationale dis-
 covery of polymeric biomaterials. Biomaterials, 28(29):4171, 2007.
 [82] T.A. Petrie, J.E. Raynor, D.W. Dumbauld, T.T. Lee, S. Jagtap, K.L. Tem-
 pleman, D.M. Collard, and A.J. Garc  a. Multivalent integrin-speci c lig-
 ands enhance tissue healing and biomaterial integration. Science translational
 medicine, 2(45):45ra60, 2010.
 [83] A.J. Engler, S. Sen, H.L. Sweeney, and D.E. Discher. Matrix elasticity directs
 stem cell lineage speci cation. Cell, 126(4):677{689, 2006.
 [84] S.H. Parekh, K. Chatterjee, S. Lin-Gibson, N.M. Moore, M.T. Cicerone, M.F.
 Young, and C.G. Simon. Modulus-driven di erentiation of marrow stromal
 cells in 3d sca olds that is independent of myosin-based cytoskeletal tension.
 Biomaterials, 32(9):2256{2264, 2011.
 [85] M.J. Dalby, N. Gadegaard, R. Tare, A. Andar, M.O. Riehle, P. Herzyk,
 C.D.W. Wilkinson, and R.O.C. Ore o. The control of human mesenchymal
 cell di erentiation using nanoscale symmetry and disorder. Nature materials,
 6(12):997{1003, 2007.
 91
[86] G. Kumar, C.K. Tison, K. Chatterjee, P.S. Pine, J.H. McDaniel, M.L. Salit,
 M.F. Young, and C.G. Simon Jr. The determination of stem cell fate
 by 3d sca old structures through the control of cell shape. Biomaterials,
 32(35):9188{9196, 2011.
 [87] G. Kumar, M.S. Waters, T.M. Farooque, M.F. Young, and C.G. Simon.
 Freeform fabricated sca olds with roughened struts that enhance both stem
 cell proliferation and di erentiation by controlling cell shape. Biomaterials,
 2012.
 [88] M. Dominici, K. Le Blanc, I. Mueller, I. Slaper-Cortenbach, FC Marini,
 DS Krause, RJ Deans, A. Keating, DJ Prockop, and EM Horwitz. Minimal
 criteria for de ning multipotent mesenchymal stromal cells. the international
 society for cellular therapy position statement. Cytotherapy, 8(4):315{317,
 2006.
 [89] J. Makower, A. Meer, and L. Denend. Fda impact on us medical technology
 innovation: a survey of over 200 medical technology companies. Arlington
 (Virginia): National Venture Capital Association, 2010.
 [90] C.S. Chen, M. Mrksich, S. Huang, G.M. Whitesides, and D.E. Ingber. Geo-
 metric control of cell life and death. Science, 276(5317):1425{1428, 1997.
 [91] J. Folkman and A. Moscona. Role of cell shape in growth control. 273, 1978.
 [92] R. McBeath, D.M. Pirone, C.M. Nelson, K. Bhadriraju, and C.S. Chen. Cell
 shape, cytoskeletal tension, and rhoa regulate stem cell lineage commitment.
 Developmental cell, 6(4):483{495, 2004.
 [93] M.D. Treiser, E.H. Yang, S. Gordonov, D.M. Cohen, I.P. Androulakis,
 J. Kohn, C.S. Chen, and P.V. Moghe. Cytoskeleton-based forecasting of stem
 cell lineage fates. Proceedings of the National Academy of Sciences, 107(2):610{
 615, 2010.
 [94] J.T. Elliott, M. Halter, A.L. Plant, J.T. Woodward, K.J. Langenbach, and
 A. Tona. Evaluating the performance of  brillar collagen  lms formed at
 polystyrene surfaces as cell culture substrates. Biointerphases, 3(2):19{28,
 2008.
 [95] J. Lovmand, J. Justesen, M. Foss, R.H. Lauridsen, M. Lovmand, C. Modin,
 F. Besenbacher, F.S. Pedersen, and M. Duch. The use of combinatorial to-
 pographical libraries for the screening of enhanced osteogenic expression and
 mineralization. Biomaterials, 30(11):2015{2022, 2009.
 [96] J. Simon, G. Carl, Y. Yang, V. Thomas, S.M. Dorsey, and A.W. Morgan. Cell
 interactions with biomaterials gradients and arrays. Combinatorial Chemistry
 & High Throughput Screening, 12(6):544{553, 2009.
 92
[97] F.S. Collins. Reengineering translational science: the time is right. Science
 translational medicine, 3(90):90cm17{90cm17, 2011.
 [98] J.H. Sung and M.L. Shuler. A micro cell culture analog ( cca) with 3-d hydro-
 gel culture of multiple cell lines to assess metabolism-dependent cytotoxicity
 of anti-cancer drugs. Lab Chip, 9(10):1385{1394, 2009.
 [99] E. Krausz, R. de Hoogt, E. Gustin, F. Cornelissen, T. Grand-Perret,
 L. Janssen, N. Vloemans, D. Wuyts, S. Frans, A. Axel, et al. Translation
 of a tumor microenvironment mimicking 3d tumor growth co-culture assay
 platform to high-content screening. Journal of Biomolecular Screening, 2012.
 [100] M. Vinci, S. Gowan, F. Boxall, L. Patterson, M. Zimmermann, C. Lomas,
 M. Mendiola, D. Hardisson, S.A. Eccles, et al. Advances in establishment
 and analysis of three-dimensional tumor spheroid-based functional assays for
 target validation and drug evaluation. BMC biology, 10(1):29, 2012.
 [101] A. Cardone, R.K. Gupta, and M. Karnik. A survey of shape similarity as-
 sessment algorithms for product design and manufacturing applications. In
 Journal of Computing and Information Science in Engineering. Citeseer, 2003.
 [102] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten.
 The weka data mining software: an update. ACM SIGKDD Explorations
 Newsletter, 11(1):10{18, 2009.
 [103] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector
 machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1{
 27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/
 libsvm.
 [104] C. Cortes and V. Vapnik. Support-vector networks. Machine learning,
 20(3):273{297, 1995.
 [105] Jared Hoberock and Nathan Bell. Thrust: A parallel template library, 2010.
 Version 1.3.0.
 [106] R. Goldman. Curvature formulas for implicit curves and surfaces. Computer
 Aided Geometric Design, 22(7):632{658, 2005.
 93