ABSTRACT Title of thesis: EDGE-BASED AUTOMATED FACIAL BLEMISH REMOVAL Avisha NessAiver, Master?s in Engineering Thesis directed by: Professor Rama Chellappa Chair, Department of Electrical and and Computer Engineering This thesis presents an end-to-end approach for taking a an image of a face and seamlessly isolating and lling in any blemishes contained therein. This consists of detecting the face within a larger image, building an accurate mask of the facial features so as not to mistake them as blemishes, detecting the blemishes themselves and painting over them with accurate skin tones. We devote the rst part of the thesis to detailing our algorithm for extracting facial features. This is done by rst improving the image through histogram equal- ization and illumination compensation followed by nding the features themselves from a computed edge map. Geometric knowledge of general feature positioning and blemish shapes is used to determine which edge clusters belong to correspond- ing facial features. Color and re ectance thresholding is then used to build a skin map. In the second part of the thesis we identify the blemishes themselves. A Lapla- cian of Gaussian blob detector is used to identify potential candidates. Thresholding and dilating operations are then performed to trim this candidate list down followed by the use of various morphological properties to reject regions likely to not be blem- ishes. Finally, in the third part, we examine four possible techniques for inpainting blemish regions once found. We settle on using a technique that lls in pixels based on nding a patch in the nearby image region with the most similar surrounding texture to the target pixel. Priority in the pixel ll-order is given to strong edges and contours. EDGE-BASED AUTOMATED FACIAL BLEMISH REMOVAL by Avisha NessAiver Thesis submitted to the Faculty of the Graduate School of the University of Maryland, College Park in ful llment of the requirements for the degree of Master?s of Engineering 2013 Advisory Committee: Professor Rama Chellappa, Chair/Advisor Professor Larry Davis Professor Min Wu c Copyright by Avisha NessAiver 2013 Acknowledgments I owe my gratitude to a large number of individuals who have made this thesis possible and made both my graduate experience and the undergraduate one leading up to it one that will be remembered aways with great joy. First and foremost I?d like to thank my advisor, Dr. Rama Chellappa for giving me an invaluable opportunity to work with him over the past three years. While I am fairly certain that his research, teaching and administrative duties made him one of the busiest professors at this university, he always had time to see me when needed and every time I walked into his o ce it felt like at that moment I was de nitely his top priority. If it were not for his teachings in the elds of probability and computer vision I do not know if I would be where I am today. I would also like to thank Dr. Min Wu and Dr. Larry Davis for agreeing to serve on my thesis committee and for sparing their invaluable time reviewing the manuscript. Dr. Min Wu was instrumental in getting me started in the eld of Computer Vision and and invaluable source of advice when I was rst starting the graduate school program. I owe my deepest thanks to my family - my mother and father for always standing by me, guiding me and encouraging me to keep going even when times were hard, and to all my siblings for always providing an open ear and great environment to provide some much needed mental relief. Words cannot express the gratitude I owe them all. I am grateful to my past and present roommates in my apartment- Nonnie ii Howarth for providing friendship and support, Binyamin Besser and Bernie Kramer for giving me a ery place to work, and especially to my brother, Shalev NessAiver for acting as a sounding board for ideas and providing proper encouragement and incentive to work throughout. It is impossible to remember all, and I apologize to those I?ve inadvertently left out. Lastly, thank you all and thank God! iii Table of Contents List of Figures vi List of Abbreviations viii 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Face Detection 5 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Feature Extraction 9 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Prior Research and Attempted Techniques . . . . . . . . . . . . . . . 9 3.2.1 Active Shape Modeling (ASM) . . . . . . . . . . . . . . . . . . 12 3.2.1.1 Building The Model . . . . . . . . . . . . . . . . . . 13 3.2.1.2 Matching Model to Image . . . . . . . . . . . . . . . 16 3.2.1.3 Why ASM is Insu cient . . . . . . . . . . . . . . . . 17 3.3 Our Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 Step 1- Equalize and Convert to Grayscale . . . . . . . . . . . 21 3.3.2 Step 2 - Illumination Compensation . . . . . . . . . . . . . . . 22 3.3.3 Step 3 - Edge Detection . . . . . . . . . . . . . . . . . . . . . 25 3.3.4 Step 4 - Face Detection . . . . . . . . . . . . . . . . . . . . . . 26 3.3.5 Step 5- Feature Region Segmentation . . . . . . . . . . . . . . 26 3.3.6 Step 6 - Calculate Center of Masses . . . . . . . . . . . . . . . 27 3.3.7 Step 7 - Remove Outliers . . . . . . . . . . . . . . . . . . . . . 28 3.3.8 Step 8 - Build Feature Mask . . . . . . . . . . . . . . . . . . . 30 4 Creating the Skin Mask 32 5 Blemish Detection 35 5.1 Overview and Prior Research . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Our Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2.1 Step 1 - Blob Detection . . . . . . . . . . . . . . . . . . . . . 37 5.2.2 Step 2 - Thresholding and Dilating . . . . . . . . . . . . . . . 38 5.2.3 Step 3 - Reject Bad Candidate Blemish Regions . . . . . . . . 40 6 Inpainting 47 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.1.1 Inpainting Overview . . . . . . . . . . . . . . . . . . . . . . . 47 6.2 Inpainting Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.2.1 Iterative Blurring . . . . . . . . . . . . . . . . . . . . . . . . . 49 iv 6.2.1.1 Blurring Results . . . . . . . . . . . . . . . . . . . . 50 6.2.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2.2.1 Interpolation Results . . . . . . . . . . . . . . . . . . 54 6.2.3 Texture Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2.3.1 Texture Synthesis Results . . . . . . . . . . . . . . . 57 6.2.4 Exemplar-Based Inpainting . . . . . . . . . . . . . . . . . . . 61 6.2.4.1 Exemplar-Based Inpainting Results . . . . . . . . . . 63 7 Conclusion and Future Work 67 7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 A Experimental Results 70 Bibliography 73 v List of Figures 2.1 Haar-like and center-surround features. White areas have positive weights, black areas negative weights [22] . . . . . . . . . . . . . . . . 7 2.2 Viola Jones detector outputs on three test images . . . . . . . . . . . 8 3.1 Models produced by varying the rst three bi?s by three standard deviations in either direction (image from [30]) . . . . . . . . . . . . . 15 3.2 Examples of the ASM algorithm applied to a test image from the IMM database (a) and one of our test images (b) . . . . . . . . . . . 19 3.3 Comparison of ASM output models using the original ASM algorithm and the improved algorithm of Seshadri and Savvides (image taken from [31]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 Comparison of original and equalized image . . . . . . . . . . . . . . 22 3.5 Comparison RGB image channels . . . . . . . . . . . . . . . . . . . . 23 3.6 Comparison of edge maps produced by the image and image re- ectance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.7 The caption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.8 Comparison of feature edge maps before and after removing outliers . 30 3.9 Comparison of Feature Mask before and after lling holes and circling edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1 Plot of the raw skin map and the result of combining it with the feature mask from Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . 33 5.1 Depiction of initial blob detection, masking, thresholding and dilating 43 5.2 Binary blemish map with the lighter colors representing the large regions and the medium grey those that are circular enough to not be eliminated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3 Comparison of our original example image and the same image with overlaid identi ed blemish regions. . . . . . . . . . . . . . . . . . . . 45 6.1 Inpainting Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.2 Original Cameraman Image (left) Image Corrupted with 70% noise (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.3 Iterative Blurring Result After 50 Iterations . . . . . . . . . . . . . . 51 6.4 Comparison of varying levels of iterative blurring . . . . . . . . . . . 52 6.5 3D Interpolation Example . . . . . . . . . . . . . . . . . . . . . . . . 54 6.6 Comparison of the original face image with an interpolated result . . 55 6.7 Comparison of the original face image with an interpolated result . . 56 vi 6.8 Overview of Efros and Leungs algorithm ( gure taken from [18]). Given a sample texture image (left), a new image is being synthesized one pixel at a time (right). To synthesize a pixel, the algorithm rst nds all neighborhoods in the sample image (boxes on the left) that are similar to the pixels neighborhood (box on the right) and then randomly chooses one neighborhood and takes its center to be the newly synthesized pixel. . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.9 Texture synthesis in a large block . . . . . . . . . . . . . . . . . . . . 59 6.10 Two examples of images inpainted using texture synthesis . . . . . . 60 6.11 Criminisi Notation Diagram ( gure taken from [19]) Notation dia- gram. Given the patch p, np is the normal to the contour of the target region and 5I?p is the isophote (direction and intensity) at point p. The entire image is denoted with I. . . . . . . . . . . . . . . 62 6.12 Exemplar Inpainting Bungee Results . . . . . . . . . . . . . . . . . . 65 6.13 Comparison of the purely texture-based result and the exemplar- based result along with data and con dence plots . . . . . . . . . . . 66 A.1 Algorithm outputs for a variety of input images . . . . . . . . . . . . 71 A.2 More algorithm outputs for a variety of input images . . . . . . . . . 72 vii List of Abbreviations LoG Laplacian of Gaussian DoG Di erence of Gaussian NCC Normalized Cross Correlation MAD Mean Absolute Di erence PDE Partial Di erential Equation MRF Markov Random Field viii Chapter 1 Introduction 1.1 Background Since the dawn of photography, there have been those who desired to take an existing photo and modify it or improve on it in some way. Even before the advent of modern computers and photo-editing software, sophisticated photo ma- nipulations could be achieved through the use of paint, ink, multiple exposures, airbrushes or various darkroom techniques such as combining negatives from dif- ferent photos [1]. One of the most common uses for such techniques was to take model photos intended for use as magazine covers or in marketing campaigns and remove all possible blemishes or imperfections in a process often referred to as "air- brushing". Nowadays, where these manipulations are made easier through the use of computers and advanced photo editing software such as Adobe?s Photoshop, it has become almost expected that some amount of "airbrushing" will be used on any professionally taken photo and even many personal ones. Nonetheless, while it is true that the use of computer software has made the process of photo manipulation faster and not quite as skill-intensive, it remains the case that retouching a single photo can take hours of human labor. This labor cost is especially ampli ed in situations where entire batches of photos must be processed, such as with wedding photo albums. With modern signal processing techniques it is possible to take this 1 painstaking process and automate every stage of it. While there has been some tangentially related work in the eld of identifying facial blemishes for the purpose of facial recognition such as that done by Park et al. [2], Ramesha et al. [3], and Pierrard and Vetter [4], we have found no complete works focused on removing these blemishes with an aesthetic goal in mind. In this thesis we present a start-to- nish algorithm that takes an input photograph, locates a face within it, extracts the smooth skin regions out from the facial features and image background, isolates all imperfections in said skin and smoothly lls them in using skin textures taken from similar regions in the face. 1.2 Thesis Outline Chapter 2 presents a brief overview of the existing work done in the eld of facial detection before settling on using the simple but e ective Viola Jones face de- tector [5]. It then explains the workings of this algorithm and why it is su cient for our needs. Chapter 3 presents a review of existing literature on the task of extracting accurate outlines of the important features in a face. It gives brief summaries of the techniques used in [6], [6], [7], [8], [9], [10], [11], [12], [13], and [14]. It then goes on to more detailed explanation Active Shape Modelling (ASM) [15], one of the best of these existing techniques and one that we attempted to utilize for our purposes. It describes why ASM is still insu cient due to its sometimes imperfect model align- ment and frequently not covering all of the important edges from the facial features. The steps of our feature extraction algorithm are then explained. We rst adjust 2 the entire image by equalizing and converting to grayscale then adjust for gradual lighting changes by extracting the re ectance. Canny edge detection is applied to the result. The Viola Jones [5] face detector is used to produce a bounding box for the face and simple geometric calculations are then used to determine rough feature regions. Within these regions we then calculate the center of mass of the edges to nd the features? centers, discard outlying edges according to various geometric clas- si ers, circle all remaining edges proportional to the distance from the center and close o remaining holes. The end result is a feature mask conforming to the precise edges of the eyes, nose and mouth. Chapter 4 discusses the creation of a skin mask to determine the outer boundaries of the face. It brie y discusses several attempted techniques such as RGB averaging, graph cutting and texture segmentation before describing the nal process based on the previously determined re ectance. Chap- ter 5 deals with nding the actual blemishes within the skin region. It begins by again giving brief summaries of related research done in [2], [3], [4], [16], and [17]. It then start to describe our algorithm, beginning with performing a convolution with a Laplacian of Gaussian operator across many scales, and then dilating the resulting binary mask. We then analyze each resulting connected region, reject- ing many based on criterion such as weighted areas, eccentricity and a custom Fill Measure. Chapter 6 nishes o our algorithm by examining four di erent types of inpainting techniques that could be used to ll in the blemishes. First we introduce iterative blurring and interpolation, both using mathematical extensions of bound- ary pixels to ll in the holes. We then examine the texture synthesis method of Efros and Lueng [18] and its Exemplar-based inpainting from Criminisi, Perez and 3 Toyoma[19]. We conclude that the Exemplar-based method seems to produce the overall best result but, given a lower resolution image or computational constraints, the interpolation method might su ce in its place. 4 Chapter 2 Face Detection 2.1 Overview The detection of a human face from within a cluttered scene is a task that, while simplistic for the youngest of infants, has been one of the most highly re- searched topics in computing for the past half century. In the general case, one of the main reasons why the task is so di cult is due to the large amounts of variations possible in the faces? location, orientation, scale, pose, occlusions, lighting, facial ex- pressions and ethnicity. For our purposes we make the simplifying assumption that the inputted image will be frontward facing and not occluded, thus leaving only the last three di culties to deal with. There have been hundreds of di erent approaches to face detection that have so far been published , most of which can be grouped into four general categories: template matching methods, feature invariant approaches, knowledge{based meth- ods and appearance-based methods [20]. Template matching methods compare a test image with a predetermined face template or feature templates and look for high similarity. Feature invariant approaches try and nd structure features that are invariant to lighting and pose variations. Knowledge-based methods use sets of rules based on what we know of how faces are generally shaped (they all have eyes, nose, mouth, etc). Appearance-based methods use large sets of training data 5 to build face models and classi ers to perform the detection. The appearance-based methods take far more computing power and storage space to operate, but the increased availability of such resources in recent years has led to their taking the overall lead in performance amongst the di erent detectors. 2.2 Our Method The face detector that we decided to use was the one used in the OpenCV library[21], rst developed by Viola and Jones [5] and later improved upon by Lien- hard and Maydt [22]. While there have been some improvements made on this method since its initial publication, such as the improved feature extraction tech- niques of Brubaker et al. [23] and Froba [24] or the improved boosted learning algorithms of Wu et al.[25], Pham and Cham [26] or Vasconcelos [27], those mostly provide improved accuracy for the cases with di cult pose or illumination. For our constrained requirements the basic Viola Jones detector works almost perfectly. The rst step is to extract features that will be used as input to train a classi er. The reason for using features rather than just the raw pixel data is that they encode knowledge about the domain such as the presence of edges or lines. This serves to reduce in-class variability while increasing intra-class variability as compared to the raw data and thus improve the accuracy of the classi er. The features used are similar to Haar features, consisting of the summing the values of pixels within rectangles in the image and then taking the di erence between various con gurations and orientations of adjacent rectangles (features shown in Figure 2.2). 6 1. Edge features 1. Line features 3. Center-surround features (a) (b) (c) (d) (a) (b) (c) (d) (a) (b) (e) (f) (g) (h) Figure 2.1: Haar-like and center-surround features. White areas have positive weights, black areas negative weights [22] To minimize computation time, a cascade of classi ers is used instead of a single-stage one. Classi cation is performed using a sliding window with a range of resolutions, rst using a coarse classi er to reject the obviously bad regions and passing on the positively identi ed regions to the next stage. Given the extremely large number of features generated for each image, the Discrete Adaboost algorithm [28] is used to train each of the thirteen stages of the classi er. It is able to train a strong classi er based on a large number of weak ones by re-weighting the training samples (with the weak classi ers in this case consisting of a single feature that best di erentiates the positive and negative training examples along with a simple binary threshold). In each stage the threshold is adjusted to minimize false negatives (and therefore increase the rate of false positives, hoping to eliminate them during later stages). 7 (a) Test image 1 (b) Test image 2 (c) Test image 3 Figure 2.2: Viola Jones detector outputs on three test images Given that the output of this procedure is a rectangular face region that often cuts o small outer portions of the face (as seen in Figure 2.2, we cannot focus our attention from here on out on just this region. Instead, we use it as a rough guide for our more accurate feature and skin isolation procedures. 8 Chapter 3 Feature Extraction 3.1 Overview 3.2 Prior Research and Attempted Techniques Given that feature extraction and facial detection are both just speci c in- stances of object-detection applications applied at two di erent levels, they have the same overall grouping of types of methods that have been applied to them: template matching methods, feature-invariant approaches, knowledgebased meth- ods and appearance-based methods. One of the main di erences stems from the fact that the majority of feature extraction methods rely on the face having already been located. This allows them to use fare more detailed knowledge of the position and shape of the various features without worrying about the cost of an exhaustive search in a large image being computationally prohibitive or turning up too many false positives. The most work overall seems to have been done on eye-detection methods due to their high applicability to many machine-interaction applications as well as the ability to determine the rough pose of the face based on the relative size and positioning of the eyes. Rajpathaka et al. [6] used skin detection followed by morphological operations 9 to detect the sharp points of re ection found in any suitably illuminated eye [6]. They achieved fairly high accuracy but only located the center of the eyes, not the boundaries, and the technique would fail on blurry or poorly illuminated images. Yuille et al. [7] proposed a method for describing facial features via a de- formable parameterized template. The template is linked to an energy function that corresponds to edges, peaks and valleys in the image intensity and interacts dynamically to modify the parameters to minimize the energy function. The algo- rithm works decently well for tracking the contours of the eyes and mouth but is still prone to occasional failure due to imperfect initialization or edge cases. This di culty was partially dealt with by Lam and Yan [8] who developed a technique for using corner detection to more accurately initialize the position of the deformable templates but is still imperfect. Pentland et al. [9] used an eigenspace-based method for feature detection as part of their larger facial recognition framework. It works fairly well as long as the training database has enough variability in appearance, pose and illumination but has di culty picking up any strong deviations from the training set. It also is only capable of identifying the feature centers rather than the full contours. Han et al. [10] used morphological-based methods followed by dilation opera- tions and labeling to detect eye-analogue segments. Feng and Yuen [11] proposed a multi-cues method for detecting eyes based on face intensity values, the estimated direction of the line joining the eye centers, the response of convolving an eye vari- ance lter with the face, and then a cross validation using a variance projection function. 10 Phimoltares et al. [12] used a neural visual model that involves a combination of utilizing the geometric properties of the facial features in polar coordinates and training a separate multilayer perceptron network for each feature. They follow it up with a dilation operation on the binary output to improve the overall results. Their technique works fairly well for neutral faces but fails to properly handle any large variation in pose or expression. Sohail and Bhattacharya [29] developed a technique for detecting the 18 most important facial feature points based on a statistical anthropometric model. They rst perform eye-center detection based on a generative object detection framework proposed by Fasel et al. [13]. The relation between these two points is then used to determine the rotation angle of the face as well as to predict the rough locations of the other feature regions based on knowledge of average distances between facial features. A combination of di erent image processing techniques is then used in each of these regions to better determine the precise feature point locations. Their algorithm is one of the best reviewed so far for determining feature contours but their average accuracy of only 90% makes it insu cient for our needs. While many of these methods are accurate enough to add feature data to a facial recognition database, all that requires is the rough size, shape and positioning of the features. This works very well with model-based methods because all that matters is that the same model produce similar enough outputs for the same face. And, if the technique fails for a particular feature, as long as it is not incorrectly added to the database it does not negatively a ect the recognition process. For our purposes, however, given that we need to ensure that we do not overwrite any of 11 the important facial features, rough matching with less than 100% accuracy is not good enough. Of all the existing methods that we found, the most promising for isolating the proper facial contours tended to be the model-based techniques. Indeed, the feature detection techniques used most often by attempting to use soft biometrics such as moles or scars for facial recognition are those of Active Appearance Models (AAM) [14] or Active Shape Models (ASM) [2]. We therefore decided to examine these techniques in closer detail. 3.2.1 Active Shape Modeling (ASM) First developed by Cootes [15], ASM works by training a statistical model of an object using a set of images whose important points and curves have been carefully annotated by a human expert. A diverse training set with variations in shape and appearance allows for the creation of a model which can mimic this variation. Analyzing a new image then reduces to a process of determining a set of parameters which best match the model instance to the image. Unlike methods such as the Viola Jones detector which are only able to output rectangular regions containing the desired feature, ASM is capable of producing an output that fairly closely models the contours of the image. 12 3.2.1.1 Building The Model Building up a good model requires a set of images each with carefully labeled important points, called landmark points. The most important of these landmark points are those that lie on the corners between edges or points of maximum cur- vature within the image. However, given that using these types of points would produce a very sparse output we also label points spaced out along the boundaries in between the key ones. Additionally, to properly keep track of contours we record the connectivity between points [30]. For a given image, the n landmark points (xi; yi) can be represented by the 2n 1 vector x, where x = (x1; ; xn; y1; ; yn)T (3.1) For a database of s training samples, we get s vectors xj where each point (xi; yi) in one image corresponds to the same geometric point in the others. When dealing with faces, for example, the point (xi; yi) could represent the left corner of the mouth for all images in the training set. Given that the shapes themselves are invariant to position, orientation and scale, the next step before we can properly build the model is to transform all of the vectors xj so that they are lined up in a common coordinate system. This is done by scaling, translating and rotating the points in each shape so as to minimize the sum of the distances between each shape and the mean ( D = P jxi xj2). Once we have this set of points on a 2n-D space the next step is to use Principal 13 Component Analysis (PCA) to reduce the dimensionality and make the data more manageable. The training set x can then be approximated by x x + Pb (3.2) where P = (P)1jP)2j jP)t) are the t eigenvectors of the covariance matrix and b is the t-dimensional vector de ned as b = PT (x x) (3.3) The vector b consists of a set of parameters of a deformable model. The i?s give the variance of the ith parameter bi across the training set. It is then possible to generate new images that are variations on the model by choosing values in b and plugging back in to Equation 3.2. The lower the value of i the more pronounced the variation from the mean model will be. Figure 3.1 shows the models produced by varying the rst the elements of b by 3 p i. Each of these parameters corresponds to a di erent mode of the model. The last step in building the model is to come up with a feature vector for each landmark that can be compared against those of candidate images in order to form the initial guess at where the landmarks should be located. This is done by sampling the brightness intensity levels in the region around each landmark and storing them in a vector. Some methods sample only along a line running in the direction of the normal to the shape?s surface at that point, others sample a two-dimensional region 14 Mode 1 Mode 2 Mode 3 Figure 3.1: Models produced by varying the rst three bi?s by three standard deviations in either direction (image from [30]) around the landmark [31]. The value of each element in the vector is then replaced by the intensity gradient at that location and then the entire vector is divided by the mean of the absolute value of its elements. This serves to make the measure mostly invariant to illumination or coloration changes and thus depend primarily on the structure of the object. Carrying out this procedure for each each image in the training set and taking the mean produces the mean pro le vector g. The covariance matrix of all these vectors is then denoted by SG. Generating the mean pro le vectors and covariance matrices for each landmark in the image across the full training set completes our model. 15 3.2.1.2 Matching Model to Image The set of points produced by Equation 3.2 correspond to a model in normal- ized coordinate space. To match the model up with an individual image we must perform a renormalization to nd the position (Xt; Yt), orientation , and scale s of the model within the image. The model points within the image coordinates are then de ned as X, given by X = TXt;Yt;s; ( x + Pb) (3.4) where the Euclidian transformation function TXt;Yt;s; is de ned as TXt;Yt;s; 0 B B @ x y 1 C C A = 0 B B @ Xt Yt 1 C C A+ 0 B B @ s cos s sin s sin s cos 1 C C A 0 B B @ x y 1 C C A (3.5) To nd the best pose parameters of T to match a model instance X to a new set of image points Y we minimize the sum of square distances between corresponding image and model points. This task is carried out by minimizing the expression jY TXt;Yt;s; (x^ + Pb)j 2 (3.6) To generate the estimates for the landmark points in the candidate image, the cost function used by most ASM implementations is the minimum Mahalanobis 16 distance f1(g) given by f1(g) = (g g)TSG 1(g g) (3.7) where g is the candidate?s pro le and g is the mean pro le.[31] There have been some more recent improvements made to the original tech- nique that involve more complex comparison metrics or weighting functions that take into account edge data, but the overall process remains essentially the same. 3.2.1.3 Why ASM is Insu cient Upon implementing and testing the ASM algorithm using the IMM Image Database for training [32], several problems arose. 1. Even using the Viola-Jones face detector to narrow down the search region, the initial guess at lining up the model with the input image was often far enough o that it was never able to properly converge. Sometimes this happened only for speci c features, as with the eyebrows in Figure 3.2(a), and other times the entire model was mis-aligned. 2. Even with an accurate initial guess, there were often times where parts of the model would start to conform to the image in the wrong direction and then continue along that path. Given that the cost function in Equation 3.7 only searches a fairly localized area it is possible for it to produce false matches from a di erent part of the face that have a similar feature vector to the actual 17 correct location. Figure 3.2(b) shows one such example of a result that was completely distorted. One of our best results using ASM was that in Figure 3.2(a), and that with using an image from the same IMM database used for training (and therefore had brightness variations very similar to the trained model, as well as similar size). And yet even there the match was not perfect. If those shapes were used as the feature mask in our algorithm it is very possible that the edge regions at the corners of the mouth or eyes could accidentally be mistaken as blemishes (how this works will be explained later on). Or, had the subject any blemishe in the region directly above the eyebrows they would not have been detected. Looking at the comparison of the results from the original and improved ASM algorithm given in [31] (Figure 3.3), it is evident that even in the best case scenario where the model lines up accurately the edges still do not t the contours perfectly. Indeed, we can expect similar di culties with any model-based method of feature extraction. Any time a technique tries to t a model to the image rather than make use of all the edge data from the ground-up, there is a risk that not all of the important edges will be included. Our algorithm attempts to address this issue. 3.3 Our Algorithm The feature extraction needed for our purposes is at the same time both simpler and more di cult than most of the applications described at the beginning of the chapter. We do not care about storing where the actual features are in relation 18 (a) ASM Best result (b) ASM Failure Figure 3.2: Examples of the ASM algorithm applied to a test image from the IMM database (a) and one of our test images (b) 19 (a) Original ASM (b) Improved ASM Figure 3.3: Comparison of ASM output models using the original ASM algorithm and the improved algorithm of Seshadri and Savvides (image taken from [31]) to the rest of the face, or where the di erent parts of the features are (such as irises, eyelids, eyebrows, or lips) in relation to each other for some purpose such as expression recognition or eye-tracking. What we do care about is that every last edge or textured area that can be associated with a facial feature be labeled as such so as not to accidentally classify it as a blemish. We also need this classi cation to be as robust as possible across all face con gurations, lighting changes, facial expressions and skin colorations for the same reason. No matter how well developed, all existing model or matching-based methods of feature extraction tend to fail at some point and therefore would prove insu cient. It was therefore decided to base our algorithm after the most simplistic "model" possible for a face: all faces have two eyes, a nose and a mouth in roughly the same 20 relation to each other, and these regions are by de nition not smooth. Our algo- rithm identi es these rough feature regions based on the already detected face, nds the primary edge clusters within these regions and removes any outlying stragglers. The only way it is capable of failing is therefore in misclassifying any edges at the outskirts of these regions rather than missing entirely. 3.3.1 Step 1- Equalize and Convert to Grayscale Given that our algorithm is primarily based on edge and blob-detection tech- niques run on a grayscale image, our rst step is to perform some image manipula- tions so as to best emphasize the edges of any blemishes present. This is done by rst equalizing the image so as to increase the contrast between the skin and the blemish regions. This increased contrast can be seen in Figure 3.4. After equalizing, we then convert the image to grayscale. The standard method for doing this would be to just take the image intensity values from the YCbCr representation as seen in Figure 3.5(a). However, given our desire to maximize the contrast between blemishes and the surrounding skin region we can actually do better than this. The rest of Figure 3.5 shows a comparison between the Red, Green and Blue color channels present in the full color image. As expected, the Red channel contains the highest values and actually seems to make the picture look better, hiding some of the blemishes from the start. In contrast, the Blue channel seems darkest and causes the blemishes to pop out the most. This result proved to be true across a wide variety of test images with vastly di erent base skin 21 (a) Original Image (b) Equalized Image Figure 3.4: Comparison of original and equalized image colorations. We therefore select the Blue channel of the image to continue working with. 3.3.2 Step 2 - Illumination Compensation One di culty with processing photos of faces is that the same face can appear vastly di erent based on varying pose an lighting conditions. While pose can be di cult due to occluding and changing the relative positioning of facial features, lighting is an easier problem to compensate for. Pierrard et al. [4] and Raja et al. [3] discuss a method for illumination compensation based on homomorphic ltering, a variation of which we used in this paper. In general, images are represented by scalar intensity values I(x; y) that vary as a function of the image spatial coordinates. We make use of an underlying re ectance model wherein each pixel location (x; y) can be represented by a product 22 (a) Image Intensity (b) Red Channel (c) Green Channel (d) Blue Channel Figure 3.5: Comparison RGB image channels 23 of the amount of source illumination incident on the scene being imaged and the amount of light re ected back by objects in the scene. I(x; y) = R(x; y) L(x; y) (3.8) where L(x; y) is the incident source illumination and R(x; y) is the amount re ected. Thus to obtain a representation of the image that is theoretically independent of lighting conditions all we must do is nd R(x; y). However, given that all we know from the start is I(x; y) we cannot simply divide by L(x; y) to obtain our solution given that it is an unknown. First we must try and separate the re ectance and illumination from the intensity. To do this, the model suggests that we make use of the fact that illumination tends to change gradually across the face whereas re ectance tends to manifest itself primarily in high-frequency components. We therefore can approximate L(x; y) by a low-pass ltered version of the image denoted by Flp(I). This is done by a convolution with a Gaussian kernel of size proportional to the size of the face. Given that frequencies of function products are not directly separable, we carry out this operation in the log-domain, giving us: log(R(x; y)) = log(I(x; y)) log(L(x; y)) log(I(x; y)) [Flp(log(I))](x; y) (3.9) The re ectance approximation can then be achieved by simply exponentiating the result. As seen in Figure 3.6, the edge map generated from the re ectance shows far greater detail around the facial features and edges which will be very useful in 24 (a) Re ectance Map (b) Initial Edge Map (c) Re ectance Edge Map Figure 3.6: Comparison of edge maps produced by the image and image re ectance the next stage of our algorithm. 3.3.3 Step 3 - Edge Detection Once the best grayscale image has been extracted and adjusted for lighting variation, the next step is to calculate all of the edges contained therein. For this we use the edge detector developed by Canny in [33]. A brief overview of the steps in this method is given as follows: 1. The image is ltered with a 2-D Gaussian lter with = p 2 to suppress noise 2. A basic edge detection operator is used to calculate the derivatives in the horizontal and vertical direction at each point. These are then combined to nd the overall gradient direction. 3. Find the maximum gradient magnitude in the direction perpendicular to the 25 gradient direction found in the previous step. 4. Use a large threshold to determine all de nite edges, then trace continuous along these edges with a smaller threshold value to build the nal edge map. The result of this procedure can be seen in the edge maps in Figure 3.6. 3.3.4 Step 4 - Face Detection The Viola Jones [5] face detector is used to nd a bounding box for the face, as explained in the previous chapter. 3.3.5 Step 5- Feature Region Segmentation While the same form of Viola Jones classi er used in the previous step to locate faces could technically also be used for identifying facial feature locations, we found that such a method was not quite reliable enough for our purposes. When testing, there were often times when the classi er would simply fail to nd one of the eyes or other features. This was doubly likely when any of the eyes were closed or the mouth was in a strange expression, and even when it succeeded it only gave a bounding box that often did not fully encompass the feature in question. Given that our method requires 100% accuracy for this step in the process lest we risk falsely identifying a facial feature as a blemish we had to develop a better method. Earlier in the chapter we stated that our approach was made more robust by being non model-based. While this is certainly true in that we never compare an image to an existing model of a face or facial feature, we do still make use of prior 26 OpenCV Face Box h w h/5.5 w/4 w*3/4 h*3/4 h/2 h/3 h/1.5 Figure 3.7: The caption positioning knowledge in order to properly isolate the various features. This is made possible by the fact that we can safely assume that all potential images contain two eyes, a nose and a mouth and that while head size may vary, the relative positioning of these features stays roughly the same. Using the relative feature positioning as a function of the faces height and width as shown in Figure 3.7 we de ne bounding regions for the eyes, nose and mouth. 3.3.6 Step 6 - Calculate Center of Masses Given that the regions calculated in the previous step are only rough estimates our next step is to gure out exactly where the actual features lie within these regions. We do this by making use of the fact that the facial features tend to be 27 collections of sharp edges surrounded by smoother skin regions. This means that within the regions found in Step 5, each feature will be centered at the center of the largest cluster of edges found within that region. We perform these calculations by de ning regions Ri(x; y) as found in Step 5 taking binary values determined by the edge map found in Step 3. The center of masses can then be found by taking C(x)i = P y2Ri P x2Ri x Ri(x; y) P y2Ri P x2Ri Ri(x; y) (3.10) C(y)i = P y2Ri P x2Ri y Ri(x; y) P y2Ri P x2Ri Ri(x; y) (3.11) where C(x)i and C (y) i are the x and y components of the center of mass Ci of region Ri. 3.3.7 Step 7 - Remove Outliers The edges in each feature region will eventually be turned into a feature mask. Before we can do this we eliminate all outlying edge clusters that are more likely to be blemishes than part of the features. The same process is used for the eyes, nose and mouth regions but for the mouth we are able to use one additional classi cation parameter. Whereas in the eye and nose regions there are often circular edge clusters due to the iris or nostrils that can resemble those of circular blemishes, we can expect the edges that de ne the mouth to be composed primarily of continuous lines (see Figure 3.8). We can therefore assume that any edge cluster coming too close in 28 form to a circle will with high probability be a blemish region. To determine this measure of similarity we use the region?s eccentricity, the measure of how much any conic section deviates from being circular de ned as follows: = r 1 b2 a2 (3.12) where a is the length of the semi-major axis and b that of the semi-minor axis of the ellipse that has the same second-moments as the edge region. The value ranges between 0 and 1, with 0 representing a perfect circle and 1 a line segment. The full process is carried out as follows: 1. Build a list of all connected regions in the edge map and calculate their areas aj and centroid locations cj of each of them. 2. Calculate the weighted areas ~aj by dividing by the distance between each cen- troid and the center of mass Ci of the region Ri that contains the component. ~aj = aj 1 q (C(x)i c (x) j )2 + (C (y) i c (y) j )2 (3.13) This gives greater weight to edge regions closer to the region?s center of mass. 3. Remove from the list all regions with ~aj < s(h+w) 2 where h and w are the face?s height and width and s is a customizable sensitivity parameter. 4. For the mouth region, we instead remove from the list all regions smaller than the larger value of ~aj < s (h+w) 2 but enact an additional constraint that 29 (a) Initial Feature Edges (b) Removed Outliers Figure 3.8: Comparison of feature edge maps before and after removing outliers the region must also have an eccentricity of < :9 3.3.8 Step 8 - Build Feature Mask Once the nal feature edges have been found we use them to build a full feature mask of the entire feature regions that we want to preserve against alteration by the future blemish removal process. We do this by building a binary mask from the edge map seen in Figure 3.8 and circling each pixel contained therein with a circle with initial radius proportional to the size of the inputted face and dropping o proportional to distance from the feature?s center of mass. For each pixel: ri = round( w80) 1 rf = round(ri(1 dw )) (3.14) where w is the width of the face, d the Euclidian distance between the pixel and the center of mass, ri is the initial radius size and rf is the radius with the drop-o due to distance. The point of this drop-o is so that we ll as much of the interior 30 (a) Initial Circled Edges (b) Final Mask Figure 3.9: Comparison of Feature Mask before and after lling holes and circling edges of the feature region as possible while at the same time minimizing the amount of skin beyond the actual outermost feature-edge that we count as part of the feature. To ensure no gaps remain within the eye and mouth regions where edges may have been more sparse, we ll any holes left in the resulting binary mask. Finally, to eliminate any blemish regions apart from the features themselves that may have made it through the initial Remove Outliers step we again build a list of connected components and this time eliminate any with an area of less than h w400 . The results of this step are shown in Figure 3.9 31 Chapter 4 Creating the Skin Mask While the feature mask is su cient for showing us where not to search for blemishes, we still must build a skin mask to isolate the region where we should search. We could just use the inverse feature mask combined with the face region from the original face detection, but that only gives us a rectangular region not tted to facial contours and often clipping out portions of the skin. This is therefore insu cient for creating an accurate skin mask. It is, however, still useful. The rst method that we tried for extracting the skin map was to compare the RGB value for each pixel to the average value over the non-features region. To make sure that we did not get the background skewing our averages we made use of the recognized face region, in this case not caring that it clipped out part of the face because it still gives enough skin samples to get an accurate average. Once we found the mean RGB value, we calculated the Euclidian distance between this value and each pixel in the image. We then classi ed each pixel as either skin or non-skin based on an empirically determined threshold distance value. This method worked fairly well but sometimes misclassi ed shadowed regions of the face as non-skin pixels due to the darker values deviating too much from the average brighter ones. 32 (a) Skin Map (b) Final Skin Map Figure 4.1: Plot of the raw skin map and the result of combining it with the feature mask from Chapter 3 To address this issue we again turned to the re ectance map used in the previous chapter which already subtracted out the gradual changes in illumination. Using the re ectance values for calculating the average and distances gives us a more accurate result. As seen in Figure 4.1 however using this method alone would not be suitable for feature extraction. Combining this result with our feature mask produces the skin map which we then search for blemishes. Rather than just using a simple thresholding value to determine skin regions, it would be possible to use a graph segmentation algorithm such as Grab-Cut (as used by [3]. However, in addition to greatly increasing the computational complex- ity, it was determined by Pierrard and Veter [4] that while such a method would 33 generate slightly less scattered regions with smoother boundaries, it would also cut o large numbers of pixels in shaded regions. For our purposes, smoothness is an irrelevant issue whereas gaps in the skin-map could be highly detrimental. One nal method that we attempted to use was that of texture segmentation, again based on the work Pierrard and Vetter [4] (and also Efros and Leung [18]). This worked by taking swatches of skin from easily identi able cheek regions and using a texture segmentation algorithm to identify all other areas in the image with the same tex- ture, which would generally nd the rest of the skin regions. In our test images, however, given that they tend to start with extremely bad skin in the rst pace it turned out that that a ected the texture enough to render the method mostly ine ectual. 34 Chapter 5 Blemish Detection 5.1 Overview and Prior Research Once we have a working mask that leaves us with just the skin region, the next step is to then gure out which areas have a smooth, clear skin tone and which could use some improvement. While there has been a fair amount of research done into the eld of identifying prominent facial blemishes such as large moles or scars, this has primarily been carried out solely for the purpose of improving existing facial recognition techniques. As such, the existing methods only look for very large, clear marks that would appear across a variety of photos and lighting conditions and be less prone to changing with time. They also need to be able to distinguish between di erent marks and be able to classify and store the location and appearance of each one. Additionally, when in doubt these techniques tend towards discarding a potential blemish due to the potential small reduction in recognition accuracy being less costly than a false positive. Park et al. [2] used ASM combined with a commercial SDK to extract facial features followed by a Laplacian of Gaussian blob detector with a fairly high thresh- old to identify blemishes. They then classi ed and stored each mark according to six di erent categories based on morphology and coloration which they then used in a face matching database. 35 Ramesha et al. [3] used a Grab-Cut image segmentation technique to isolate the skin from features and background and a normalized cross-correlation (NCC) matching with a complement of Gaussian lter mask to get mole candidates. Mole candidates were accepted or rejected according to a threshold value based on mole size, darkness and uniqueness with respect to its surrounding region. Pierrard and Vetter [4] used a combination of building a 3D Morphable Model of the face and grayscale texture recognition to extract facial features and skin regions. They then used NCC matching with a Laplacian of Gaussian lter mask to determine blemish candidates. These candidates were then kept or discarded based on a saliency measure calculated by determining how unique that type of mark was to the surrounding region. Cho and Freeman [16] detected moles from within a hair-covered skin region on a human arm by using color thresholding to isolate skin regions, a Di erence of Gaussian lter to determine mole candidates and then a trained Support Vector Machine classi er to identify the actual moles. Barnes et al. [17] were able to detect blemishes on the skins of potatoes with 90% accuracy by using statistical information based on color and texture of hand labeled blemish regions to build a classi er based on an adaptive boosting algorithm (AdaBoost). 36 5.2 Our Algorithm Unlike mole-extraction algorithms used for facial recognition where the goal is to extract only prominent features unlikely to change or be corrupted by noise, our aim is to improve skin tones by eliminating any blemishes or markings on a image- by-image basis. Our goal is to develop a technique that does not just eliminate prominent blemishes surrounded by patches of clear skin. Instead, we aimed to isolate any aberrations or abnormalities that caused a deviation from the ideal, perfectly smooth skin tone and to eliminate them as seamlessly as possible. We started o using a Circular Hough Transform [34] to identify all circles within the skin region, and thus the majority of the blemishes, but it turned out to be very sensitive to varying input parameters as well as miss certain blemishes that were atypical in appearance. Similarly, training a SVM classi er proved to be inaccurate based on the extreme variation in size, shape and coloration of potential blemishes as well as large potential variance in subjects? skin-tone. 5.2.1 Step 1 - Blob Detection Similar to Cho and Freeman [16] and Park et al. [2], we used a Laplacian of Gaussian (LoG) blob detector as the rst step to identify any potential points of interest. When convolving an m n input image I(x; y) image with a LoG lter with standard deviation , it produces an m n output image B(x; y; ) with maximum values corresponding to the size of . Rather than just choosing speci c values based on our input face size and the expected size of individual blemishes, 37 we instead choose to detect blobs over an entire scale space, building a m n p matrix B(x; y; k) shown as follows: B(x; y; k) = 2 k 5 2 G(x; y; k) I(x; y); k = 1; 2; ; p; (5.1) with 2k 5 2 G(x; y; k) being the scale-normalized Laplacian of Gaussian operator and k = k 0 taking p values based on an initial value of 0 = p 2. Unlike Park et al. [2], we do not care about storing the precise size and location of each individual blemish, instead only needing to build a map of which pixels to paint over. We therefore immediately simplify this result by taking the maximum value of B over every scale at each pixel location, giving Bmax(x; y) = max B(x; y; k) (5.2) This produces a matrix whose magnitude at each pixel location (x; y) corre- sponds to the strength of the blob centered at that point. 5.2.2 Step 2 - Thresholding and Dilating As seen in gure 5.1(a), the raw plot of Bmax includes the entire image, not just the skin regions found earlier, and even in those areas it is far too sensitive in identifying the blobs. To address the rst issue we apply the skin mask found in Chapter 3, producing 5.1(b). To address the second, we introduce a threshold- ing value proportional to a percentage of the maximum blob intensity found in 38 Bmax(x; y) Bmax(x; y) = 8 >>< >>: 1 : 8(x; y)s:t:Bmax(x; y) > 0 : 8(x; y)s:t:Bmax(x; y) < (5.3) where the value = 15maxx;y Bmax(x; y) was experimentally determined to elimi- nate most of the noise while leaving behind the important blemishes. The result of this thresholding process is a binary mask taking values in the centers of most of the blemishes, where the blob detector output was strongest, but with the outer edges of the blemishes having been set as zero due to their values being below the threshold, as seen in 5.1(c). To adjust for this and make sure that the entire blemish area is covered we perform a dilation operation on the thresholded image. The grayscale dilation is de ned as: (B M A)(x; y) = max [B(x x0; y y0) + A(x0; y0)]j(x0; y0) 2 DA; (5.4) whereA is a structuring element (in this case a sphere with radius = (faceheight+ facewidth)=200) and B is the Bmax as de ned in Equation 5.3, DA is the domain of A. As seen in 5.1(d), this dilation operation increased the areas of the larger blobs more than the smaller ones which is perfect for covering the outlying areas of blemishes. 39 After dilating, we again perform an ANDing operation with the compliment feature mask found earlier to ensure that we did not accidentally cover any of the important features in the expansion. 5.2.3 Step 3 - Reject Bad Candidate Blemish Regions The Bmax(x; y) found in the previous step gives us all of the potential blemish regions. Based on the imperfect nature of the skin and feature extractions, this will occasionally include edges of the face, portions of the background or small parts of facial features. The next step in our algorithm attempts to use the size and shape of the candidate regions to eliminate all of the suspect blobs. This is done in a manner somewhat similar to the Remove Outliers step in section 3.3.7. Examining our test images to see what type of false candidates can occur showed certain similarities between most of the bad outputs. First, because they tend to be on the edges of the face or clothing they tend to be long and thin regions as opposed to the circular shape of most blemishes. Second, these edges are usually far larger than individual pimples would be. Finally, third, because they are on the edges of the face or even further out they are usually located relatively far from the image center. We use these three properties to build a classi er for determining if the blob detector output is a blemish or just some other edge region. As before, we can use the eccentricity to get a measure of how circular a region is. In this stage, however, there are often long arcs or L-shaped regions that represent the borders of the face or clothing as seen towards the bottom of 40 Figure 5.1(d). Given that the eccentricity is calculated based on the ellipse that encompasses the region it often will be fooled into thinking that these areas are actually circular. To deal with these cases we de ne a new metric that we call the Fill Measure M as the product of the region?s minor and major axis lengths divided by the region area. If we model the typical blemish as a perfectly circular region then this would give a value of M = (2r) (2r)=( r2) = 4= . In practice, we have found that most of the long L-shaped or curved edge regions tend to have a Fill Measure value of > 3 The classi er now works as follows: 1. Get a listing of all the connected regions in Bmax(x; y) along with their centroid locations ci, areas ai, eccentricities i, and major and minor axis lengths ki and li respectively. 2. De ne weighted areas ~ai for region i as the product of the unweighted area ai and the ratio of distance from image center Cface to region centroid cj and the total height of the face h. ~ai = ai q (C(x)face c (x) i )2 + (C (y) face c (y) i )2 h (5.5) 3. De ne B as the set of all connected regions with weighted areas ~ai > (h+w)=5 where h and w are the face height and width. These will be removed as long as they are not too circular, which would indicate a clumping of pimples rather 41 than a long edge. 4. Calculate the ll measure Mi for each connected region. If Mi < 2:5 and i < :94 then remove region i from B. 5. Remove all remaining regions in B from Bmax(x; y) i c(x)i c (y) i Mi 1 12.240 314.736 0.985 1.370 2 24.725 297.024 0.983 1.337 3 26.156 164.928 0.963 1.311 4 33.330 315.176 0.990 1.531 5 46.000 356.842 0.911 3.494 6 121.580 324.162 0.980 1.681 7 126.144 281.694 0.806 1.656 8 144.214 232.643 0.677 1.740 9 186.669 357.629 0.931 2.653 10 198.634 323.435 0.972 1.412 11 207.647 305.912 0.990 1.386 Table 5.1: Region properties for all the large regions in the image, with the bolded numbers being above or below their respective thresholds As seen in Figure 5.2 and Table 5.1, there were a total of 11 big regions found initially as part of B. Of those, only regions 5, 7, 8 and 9 all had an eccentricity of lower than .94 with the rest being long lines of the ear, the jaw and the shirt. As described earlier, however, there were two regions (5 and 9) that seemed to have a circular eccentricity but upon closer examination are still just edge regions with an L-shape. These are easily accounted for by examining the ll measure which shows a drastic jump by each. We therefore only remove regions 7 and 8 from B. This is as desired because upon visual inspection we see that those two regions look like they easily could just be clumped blemishes. 42 (a) Raw LoG (b) LoG Skin Mask (c) LoG Thresholded (d) LoG Dilated Figure 5.1: Depiction of initial blob detection, masking, thresholding and dilating 43 Figure 5.2: Binary blemish map with the lighter colors representing the large regions and the medium grey those that are circular enough to not be eliminated 44 (a) Skin Map (b) Final Skin Map Figure 5.3: Comparison of our original example image and the same image with overlaid identi ed blemish regions. 45 We can see from Figure 5.3 the advantage of our blob-based approach over a machine learning or pattern recognition based classi er approach such as that used by Barnes [17]. Rather than just looking for prominent blemishes, we expand our search to include any aberrations in smooth skin tone that would detract from the aesthetic appeal. Even compared against the others like Peirrard [4], Raja [3] or Park et al [2]. who also used a LoG detector, because we don?t care about any individual classi cation of the blemishes we are able to use far ner detail in our search. 46 Chapter 6 Inpainting 6.1 Introduction 6.1.1 Inpainting Overview The term "inpainting" originally referred to times when painting would get cracked or damaged and an artist was hired to go through and physically paint over the defects. This would sometimes even extend to modifying the painting in small ways such as adding or removing objects. In modern times inpainting essentially refers to the same thing, just instead of happening in the physical realm it usually applies to digital images or video. Oftentimes when an image or video is transmitted there will be some blocks of data that are lost. Rather than just displaying them with the missing portions, techniques have been developed to automatically ll in the holes based on information taken from the rest of the image. Or, instead of image blocks being lost, a user will sometimes want to eliminate an object from a photo and have the background be automatically and seamlessly lled in (see Figure 6.1). Given that we want our algorithm to be able to run smoothly and quickly on batches of photos of all sizes, in this section we will compare examine four separate inpainting techniques ranging from computationally simple to complex and deter- 47 mine their relative advantages and disadvantages. The rst two, Iterative Blurring and Interpolation, utilize mathematical combinations of existing pixels to come up with best-guess values for the missing data. These techniques usually reproduce the overall colors fairly well but by their very nature tend to produce results without any high frequency information. The second two, Texture Synthesis and Exemplar-Based Inpainting, replace the missing pixels by directly sampling from the most likely portions of the uncor- rupted image using various di erent lling orders. Because of this direct sampling these methods are capable of propagating all of the texture and edge information and therefore tend to produce better results. They do however tend to be far more computationally intensive than the rst two techniques. It should be noted that given that the end goal of inpainting is to produce a nal result that looks best to human perceptions, it is very di cult to formulate a mathematical representation for the quality of the inpainted outputs. Attempting to use MAD (Mean Absolute Di erence) between the inpainted images and either the original or a manually xed-up version did not produce results in line with human evaluations. Finally, unlike the general case where an inpainting technique would have to be robust enough to ll in any number of pixels based on any type of background, our requirements are somewhat more narrow. Ideally, we will always be lling in small holes with a smooth skin texture. In practice, because of the imperfect nature of our blemish detection it can sometimes still turn up some false positives and therefore we want our inpainting to be robust enough to seamlessly cover up any 48 Figure 6.1: Inpainting Example such mistakes. Nonetheless we will still be taking some advantage of this limited scope in several of the techniques that we cover. 6.2 Inpainting Techniques 6.2.1 Iterative Blurring The simplest technique that we analyzed is that of iterative blurring. Useful more for repairing images corrupted by noise than for lling in full objects, the technique nonetheless produces surprisingly good results for its simplicity. To start, we take an image and corrupt it with 70% noise (Figure 6.2). We then perform the following steps to try and repair it: 1. Construct a mask made up of noise locations in the image 49 2. Fill the hole locations with a medium-intensity value (this just helps to speed up convergence) 3. De ne a blurring lter b = 1=9 1=9 1=9 1=9 1=9 1=9 1=9 1=9 1=9 4. Convolve b with 5. Use to restore the undamaged pixels in to their original, unblurred values 6. Repeat until the MAD (Mean Absolute Di erence) between the original and damaged image converges 6.2.1.1 Blurring Results As seen in Figure 6.3, the MAD seems to converge in between ten and twenty iterations to produce what looks to be just a blurred version of the original uncor- rupted image. Applying the same technique to another example of our blemish detection algorithm, shown in Figure 6.4, it can be seen that while it does seem to converge at least somewhat, it takes far longer to do so. This is as expected given that the holes in the image due to removed pimples are far larger than those corrupted portions of the cameraman image. It therefore takes longer for the blurring to fully ll up the gaps. Analyzing the resulting image we nd that it seems to have worked fairly well to cover up the small blemish regions but the larger regions remain fairly blotchy as well as a bit smoother than the actual skin tone in the surrounding areas. 50 Figure 6.2: Original Cameraman Image (left) Image Corrupted with 70% noise (right) Figure 6.3: Iterative Blurring Result After 50 Iterations Additionally, while it works decently well with this image where all the blem- ishes are in the middle of skin regions, were we to have one where the blemish was at the edge then the blurring would blur in part of the background. Even if we restricted the blurring to pixels included in the skin map this might still make it take an extremely long time for some of the regions to get lled in. 51 (a) Original Image (b) 10 Blur Iterations (c) 50 Iterations (d) 150 Iterations Figure 6.4: Comparison of varying levels of iterative blurring 52 6.2.2 Interpolation Similar to blurring, with interpolation we attempt to mathematically calculate pixel values to use to ll the holes based on the values at the edges of the hole region. We do this by formulating a partial di erential equation (PDE) that is assumed to apply in the region to be inpainted, with boundary values taken from the perimeter. To simplify the calculation the PDE is then approximated using nite di erence methods and a large linear system of equations is solved for the missing elements in the array. To give a simple example of this technique [35] consider the following 3x4 matrix of values: A = 2 6 6 6 6 6 6 4 0 0 0 0 1 NaN NaN 4 2 3 5 8 3 7 7 7 7 7 7 5 (6.1) For an equally spaced grid, we can approximate the Lagrange?s equation using nite di erences to partial derivatives, implying that any element in the grid can be replaced by an average of its four neighbors. This gives us: A(2; 2) = (0 + 3 + 1 + A(2; 3))=4 (6.2) A(2; 3) = (0 + 5 + A(2; 2) + 4)=4 (6.3) Solving this simple system of equations gives us the following values (plotted in Figure 6.5): A(2; 2) = 1:667 (6.4) 53 1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5 3 0 1 2 3 4 5 6 7 8 A simply inpainted surface Figure 6.5: 3D Interpolation Example A(2; 3) = 2:667 (6.5) This same technique can be expanded to ll much larger gaps. 6.2.2.1 Interpolation Results Figure 6.6 shows what happens when we use interpolation to ll in the holes in our face image. This result seems to be fairly similar to that obtained through the previous blurring method but the colors are slightly more accurate and the results blend in better. It is also far less computationally intensive than the repeated blurring operation. To illustrate how this technique would work at lling in larger regions we show Figure 6.7. Here it becomes obvious that while the overall coloration is maintained fairly well, all high frequency information such as edges and textures fail to propagate inwards leaving a very blurred looking result. Overall, interpolation still su ers from most of the same problems of the blur- ring. It bleeds color along the edges, does not maintain contours and loses all high 54 (a) Original Image (b) Interpolated Image Figure 6.6: Comparison of the original face image with an interpolated result frequency texture data. However, in cases where the inputed image is of a relatively low resolution such that the pores of the skin are not individually distinguishable (i.e. the smooth areas of the skin are actually just smooth color transitions) then this method could be a viable option should computation time be an important concern. This is due to the fact that almost all the regions that we will need to inpaint can be assumed to be comparatively small. 6.2.3 Texture Synthesis The fundamental problem with the previous two techniques is that they at- tempt to recreate pixel values based on a mathematical combination of surrounding 55 (a) Original Bungee Image (b) Interpolated Bungee image Figure 6.7: Comparison of the original face image with an interpolated result intensities, thus introducing a blurring e ect and essentially low-pass ltering the ll regions. To get around this, Efros and Leung developed a completely di erent approach based on replacing hole values with values directly sampled from elsewhere in the image [18]. Although initially designed for texture synthesis, this technique provides a very elegant solution to the inpainting problem. We begin by modeling the texture as a Markov Random Field (MRF). This is equivalent to making the assumption that the probability distribution of the brightness values for a given pixel is dependent solely on the values of the pixels in a close window around the pixel and independent of the rest of the image. This window is generally a square region of a size selected by the user to represent the largest regular feature of the texture. The algorithm then works as follows. 56 If P is a pixel at the image hole boundary and Q is a pixel lled with valid information, we select the value of pixel Q such that (Q), or the pixels in the window around Q, is most similar to the window (P ). This can be expresses as an optimization problem: Output(P ) = V alue(Q); P 2 ; Q =2 ; Q = argmin(d( (P ); (Q))); (6.6) Where d( (P ); (Q)) is the Sum of Squared Distances (SSD) between the windows (Q) and (P ) (comparing only valid pixels): d( 1; 2) = X i X j j 1(i; j) 2(i; j)j 2; (6.7) where the indices i; j span across the selection windows. In practice, we generally nd multiple patches that are fairly similar in SSD and will therefore choose our Q randomly from amongst them (see Figure 6.8). The algorithm lls in pixels at the hole boundary in an onion-peel fashion, starting at the outermost layer all around and working inwards from there. 6.2.3.1 Texture Synthesis Results Applying this algorithm to our test images produces Figures 6.9-6.10. The result in the rst image is far better than that achieved through interpolation, with the result looking like it may have actually come from a real photograph. There are some small artifacts (some trees over the shed and black spot in the water) but 57 Figure 6.8: Overview of Efros and Leungs algorithm ( gure taken from [18]). Given a sample texture image (left), a new image is being synthesized one pixel at a time (right). To synthesize a pixel, the algorithm rst nds all neighborhoods in the sample image (boxes on the left) that are similar to the pixels neighborhood (box on the right) and then randomly chooses one neighborhood and takes its center to be the newly synthesized pixel. overall the textures have been repeated to form a fairly seamless inpainted image. Similarly, for the face images most of the hole regions have been painted in perfectly. The only problems are around the upper hair regions where some of the edges of the hair have been misclassi ed as blemish regions and inpainted. Overall the gaps are not very noticeable but upon close examination it can be seen that some of the tips of the hair have been painted in with skin textures instead of hair While this technique is a vast improvement upon the rst two discussed, it still su ers from the following three problems: 1. The high computational cost of an exhaustive search. -We are able to mitigate this di culty through modifying Efros and Le- ung?s original algorithm. Whereas they have no prior knowledge as to the form of the the image being inpainted and therefore must search through the entire thing to nd a matching texture patch, we know that we want the blemish region to look like the surrounding skin. We can therefore restrict our search 58 (a) Original Image With Mask (b) Texture Synthesized Image Figure 6.9: Texture synthesis in a large block window to a fairly small area surrounding the blemish. 2. It requires manual selection of a window size corresponding to the largest expected texture element. -This is circumvented by using a heuristic such as selecting a window equal to half the size of the largest blemish 3. The onion-peel lling order sometimes has problems with properly propagating contours in the image -This issue will be addressed with the next technique, Exemplar Based Inpainting 59 (a) Inpainted Image (b)Original Image (c) Blemish Mask (d) Inpainted Image Figure 6.10: Two examples of images inpainted using texture synthesis 60 6.2.4 Exemplar-Based Inpainting The work of Efros and Leung was later improved upon by Criminisi et al. [19] in two major fashions. First, they replaced the onion-peel ll method by introducing a priority-based lling order that lls in pixels along image contours before at regions. This allows them to correctly inpaint boundaries that otherwise would have become corrupted with the original formulation. Second, they greatly improved the speed of the algorithm by copying whole patches of pixels rather than just one pixel at a time. The algorithm works as follows. De ne a source region and target ll region with boundary (see Figure 6.11). Choose a pixel p along the boundary with surrounding window p (with window size still chosen by the user). We then compute the ll priority for that pixel with the following equations: P (p) = C(p)D(p) (6.8) C(p) = P q2 p T C(q) j pj (6.9) D(p) = jrI?p npj ; (6.10) where P (p) is the ll priority, C(p) is the con dence term, D(p) is the data term, np is the normal vector to the ll region boundary, is a normalization factor (usually 255 for standard images) and rI?p is a vector in the direction of the image contour. The con dence term C(p) can be viewed as a measure of the quantity of reliable information surrounding the pixel p. The goal is to try and rst ll in patches that 61 Figure 6.11: Criminisi Notation Diagram ( gure taken from [19]) Notation diagram. Given the patch p, np is the normal to the contour of the target region and 5I?p is the isophote (direction and intensity) at point p. The entire image is denoted with I. have the greatest number of pixels in their window either from the original image or at least already lled in. This automatically targets corner or thin tendril regions. This is balanced out with a data term D(p). The data term represents the strength of the isohpote (region of uniform intensity) owing into the ll region. Thus, when there is a strong contour owing perpendicularly into the ll region this term will have a large e ect on the priority and cause those pixels to be lled in rst. The steps of the algorithm are as follows (taken from [19]): Extract the manually selected initial from 0 Repeat until done: 1. Identify the ll front t. If t = ;, exit. 2. Compute priorities P (p) 8p 2 t 3. Find the patch p^ with the maximum priority, i.e. p^jp^ = argmaxp2 tP (p) 62 4. Find the exemplar (best t patch) q^ 2 that minimizes d( p^; q^) 5. Copy image data from q^ to p^. 6. Update C(p) 8p jp 2 p^ T 6.2.4.1 Exemplar-Based Inpainting Results Looking at Figure 6.12, we see that the Exemplar-Based Inpainting result for the bungee photo is even better than that achieved by the texture synthesis algorithm. The linear structure in the middle is propagated very smoothly with only a small amount of tree textures overlaid. The two plots at the bottom of the gure represent the con dence and data terms used for determining ll priorities. It is apparent how the con dence is generally highest by the outer edges and that the bottom portion of the data term is very weak. This makes sense given that there are few contours in the water region and the most centered at the building in the middle. In Figure 6.13 we see a the side-by-side comparison of the previous texture- based inpainting result and the exemplar-based one. In the majority of the image where the blemishes fall in the middle of smooth-toned skin regions with no edges to propagate the two algorithms perform almost identically. Towards the top hair regions, however, it is possible to start to see some divergence. Looking at Figure 6.13(d) we see some subtly brighter colored regions indi- cating the strong contours of the hair tendrils owing perpendicularly into the ll region. This causes those areas to be lled in rst, keeping the edges sharp. Indeed, 63 a close examination of Figures 6.13 (a)-(b) shows that where in (a) the hair strands seem a bit blurred and rounded o at the edge, in (b) they are propagated onwards and sharply terminated in a manner more closely resembling the original image. 64 Figure 6.12: Exemplar Inpainting Bungee Results 65 (a) Efros Texture Inpainted Image (b)Exemplar-Based Inpainted Image (c) Con dence Term Plot (d) Data Term Plot Figure 6.13: Comparison of the purely texture-based result and the exemplar-based result along with data and con dence plots 66 Chapter 7 Conclusion and Future Work In this thesis, we presented a start-to- nish approach for isolating the blem- ishes in the face region of an inputted photograph and smoothly lling them in. We began by exploring various face detection methods, settling on using the Viola Jones detector [5]. We then explored various existing methods for isolating facial features, de- voting particular attention to that of Active Shape Modeling (ASM) [15], only to determine that even ASM did not give accurate enough contour-tracking to suit our purposes. We then presented our own algorithm for extracting the features. We started o by maximizing the contrast between the skin, blemishes and features by equalizing the image and adjusting for illumination by computing the re ectance. We then used a Canny edge-detector [33] combined with geometric knowledge of rel- ative feature positioning to identify edge-clusters belonging to each feature. These clusters were then trimmed down to exclude any edges with a low probability of be- longing to a main feature based on positioning and various morphological properties. Circling each edge and closing o holes then left the nal feature mask. After using thresholding with the re ectance values of known skin regions to isolate the remaining skin from the background we then turned our attention to detecting the blemishes themselves. We used a Laplacian of Gaussian blob detector 67 across a full scale-space of potential blemish sizes to identify all potential blem- ishes. Thresholding and dilation operations were then carried out to remove the less prominent results. Finally, morphological and position properties were again used to reject any blobs found that were unlikely to be blemishes. In the last portion of the thesis we discussed four di erent methods for in- painting blemish regions once they were found. We determined that interpolation could be useful for minimizing computation time in speci c instances but that the overall highest quality result was produced by a modi ed version of Criminisi et al.?s exemplar-based inpainting technique [19] that uses texture samples from the surrounding image regions to ll in missing pixels, prioritizing contours in the pixel ll-order. 7.1 Future work Right now the entire algorithm is designed to operate almost as well on grayscale images as it does on color. This is bene cial when dealing with grayscale inputs, but most of the time the input will be a color image. Taking into account the added color information should be able to increase the overall performance by a reasonable margin. One way to do this would be to combine edge and LoG data for all three channels, weighting the blue channel for the blemish detection stage and the red channel for the feature extraction. As seen in Figure A.2(d), one issue that sometimes surfaced in our testing was the blemish detection improperly counting parts of the ears as blemishes. This 68 could be dealt with by adding the ears to the feature map using either more geo- metric properties (albeit with some di cult due to frequent occlusion and variance in shape), an object detector, or some combination thereof. A simple solution could be to just discard any smaller connected regions in roughly the correct positions from the skin map, as it seems from our testing that the ears tend to show up as separate regions. Right now the algorithm works very well across varying test subject ethnicities and lighting conditions and even seems to work on people with glasses, but as of right now it is not con gured to deal with anything but a frontal pose. Pose could be accounted for by calculating relative eye size and positioning similar to that used by Sohail and Bhattacharya [29]. Once the pose is known the rough locations of the other facial features can then be calculated and the rest of the algorithm would work without a problem. Other improvements that could be experimented with are downsampling at the beginning to detect prominent edges such as for the features or face border or using a LoG with a very high starting radius to detect initial feature locations. Overall, we have developed a start-to- nish facial blemish removal algorithm that, as is, could be used to save thousands of man-hours of work in image editing programs. 69 Appendix A Experimental Results 70 (a) (b) (c) (d) Figure A.1: Algorithm outputs for a variety of input images 71 (a) (b) (c) (d) Figure A.2: More algorithm outputs for a variety of input images 72 Bibliography [1] M.R. Peres, The Focal Encyclopedia of Photography, Focal Press. Taylor & Francis, 2007. [2] Unsang Park, Shengcai Liao, Brendan Klare, Jimmy Voss, and Anil K. Jain, \Face nder: Filtering a large face database using scars, marks and tattoos", Tech. Rep. MSU-CSE-11-15, Department of Computer Science, Michigan State University, East Lansing, Michigan, August 2011. [3] K Ramesha, K Raja, K Venugopal, and L Patnaik, \Template based mole detection for face recognition", International Journal of computer theory and Engineering, vol. 2, no. 5, pp. 1793{8201, 2010. [4] Jean sbastien Pierrard and Thomas Vetter, \Skin detail analysis for face recog- nition", in In Proc. IEEE Conference on Computer Vision and Pattern Recog- nition, Miineapolis, MN, 2007, pp. 1{8. [5] Paul Viola and Michael Jones, \Rapid object detection using a boosted cascade of simple features", in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. IEEE, 2001, vol. 1, pp. I{511. [6] T Rajpathaka, R Kumarb, and E Schwartzb, \Eye detection using morphologi- cal and color image processing", in Proceeding of Florida Conference on Recent Advances in Robotics, 2009. [7] A.L. Yuille, D.S. Cohen, and P.W. Hallinan, \Feature extraction from faces using deformable templates", in Computer Vision and Pattern Recognition, 1989. Proceedings CVPR ?89., IEEE Computer Society Conference on, San Diego, CA, Jun, pp. 104{109. [8] Kin-Man Lam and Hong Yan, \Locating and extracting the eye in human face images", Pattern Recognition, vol. 29, no. 5, pp. 771 { 779, 1996. [9] A. Pentland, B. Moghaddam, and T. Starner, \View-based and modular eigenspaces for face recognition", in Computer Vision and Pattern Recogni- tion, 1994. Proceedings CVPR ?94., 1994 IEEE Computer Society Conference on, Jun, pp. 84{91. [10] Chin-Chuan Han, Hong-Yuan Mark Liao, Gwo-Jong Yu, and Liang-Hua Chen, \Fast face detection via morphology-based pre-processing", Pattern Recogni- tion, vol. 33, no. 10, pp. 1701{1712, 2000. [11] Guo Can Feng and Pong C. Yuen, \Multi-cues eye detection on gray intensity image", Pattern Recognition, vol. 34, no. 5, pp. 1033 { 1046, 2001. 73 [12] S. Phimoltares, C. Lursinsap, and K. Chamnongthai, \Locating essential facial features using neural visual model", in Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on, Nov., vol. 4, pp. 1914{ 1919 vol.4. [13] Ian Fasel, Bret Fortenberry, and Javier Movellan, \A generative framework for real time object detection and classi cation", Comput. Vis. Image Underst., vol. 98, no. 1, pp. 182{210, Apr. 2005. [14] Anil K Jain and Unsang Park, \Facial marks: Soft biometric for face recogni- tion", in Image Processing (ICIP), 2009 16th IEEE International Conference on. IEEE, 2009, pp. 37{40. [15] T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham, \Active shape models- their training and application", Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38 { 59, 1995. [16] Taeg Sang Cho, William T Freeman, and Hensin Tsao, \A reliable skin mole localization scheme", in Computer Vision, 2007. ICCV 2007. IEEE 11th In- ternational Conference on. IEEE, 2007, pp. 1{8. [17] Michael Barnes, Tom Duckett, Grzegorz Cielniak, Graeme Stroud, and Glyn Harper, \Visual detection of blemishes in potatoes using minimalist boosted classi ers", Journal of Food Engineering, vol. 98, no. 3, pp. 339 { 346, 2010. [18] Alexei Efros and Thomas Leung, \Texture synthesis by non-parametric sam- pling", in In International Conference on Computer Vision, 1999, pp. 1033{ 1038. [19] Antonio Criminisi, Patrick P erez, and Kentaro Toyama, \Region lling and object removal by exemplar-based image inpainting", Image Processing, IEEE Transactions on, vol. 13, no. 9, pp. 1200{1212, 2004. [20] Ming-Hsuan Yang, D. Kriegman, and N. Ahuja, \Detecting faces in images: a survey", Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 1, pp. 34{58, Jan. [21] G. Bradski, \The OpenCV Library", Dr. Dobb?s Journal of Software Tools, 2000. [22] Rainer Lienhart and Jochen Maydt, \An extended set of haar-like features for rapid object detection", in Image Processing. 2002. Proceedings. 2002 Interna- tional Conference on. IEEE, 2002, vol. 1, pp. I{900. [23] S.Charles Brubaker, Jianxin Wu, Jie Sun, MatthewD. Mullin, and JamesM. Rehg, \On the design of cascades of boosted ensembles for face detection", International Journal of Computer Vision, vol. 77, pp. 65{86, 2008. 74 [24] Bernhard Fr oba and Andreas Ernst, \Face detection with the modi ed cen- sus transform", in Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition, Washington, DC, 2004, pp. 91{96. [25] Bo Wu, Haizhou Ai, Chang Huang, and Shihong Lao, \Fast rotation invariant multi-view face detection based on real adaboost", in Proceedings of Sixth IEEE International Conference on Automatic Face and Gesture Recognition, May, 2004, pp. 79{84. [26] Minh-Tri Pham and Tat-Jen Cham, \Online learning asymmetric boosted clas- si ers for object detection", in IEEE Computer Society Conference on Com- puter Vision and Pattern Recognition, Minneapolis, MN, June, 2007, pp. 1{8. [27] Hamed Masnadi-Shirazi and Nuno Vasconcelos, \Asymmetric boosting", in Proceedings of the 24th international conference on Machine learning. ACM, 2007, pp. 609{619. [28] Yoav Freund and Robert E. Schapire, \Experiments with a new boosting algo- rithm", 1996. [29] AbuSayeedMd. Sohail and Prabir Bhattacharya, \Detection of facial feature points using anthropometric face model", in Signal Processing for Image En- hancement and Multimedia Processing, Ernesto Damiani, Kokou Ytongnon, Peter Schelkens, Albert Dipanda, Louis Legrand, and Richard Chbeir, Eds., vol. 31 of Multimedia Systems and Applications Series, pp. 189{200. Springer US, 2008. [30] Tim Cootes, \An introduction to active shape models", Image Processing and Analysis, pp. 223{248, 2000. [31] K. Seshadri and M. Savvides, \Robust modi ed active shape model for auto- matic facial landmark annotation of frontal faces", in IEEE International Con- ference on Biometrics: Theory, Applications, and Systems BTAS ?09., Wash- ington D.C., September, 2009, pp. 1{8. [32] J. Fagertun and M. B. Stegmann, \The IMM frontal face database", Tech. Rep., Informatics and Mathematical Modelling, Technical University of Denmark, DTU, Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby, 2005. [33] John Canny, \A computational approach to edge detection", Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. PAMI-8, no. 6, pp. 679{ 698, 1986. [34] Dhiraj, J. L. Raheja, Vaishali Singh, Sreekanth Pusapati, and Ashutosh Gupta, \Article: Design and implementation of oating point based system for pellet size distribution using circular hough transform", International Journal of Computer Applications, vol. 62, no. 13, pp. 9{15, January 2013. 75 [35] John. AD?Errico, \Inpaint nans", http://www.mathworks.com/ matlabcentral/fileexchange/4551, 2012, [Online; accessed 19-March-2012]. 76