ABSTRACT 
 
 
 
 
Title of Document: PROSPECTIVE HEAD MOVEMENT 
CORRECTION FOR HIGH-RESOLUTION 
MRI USING AN IN-BORE OPTICAL 
TRACKING SYSTEM.   
  
 Lei Qin, Doctor of Philosophy, 2009 
  
Directed By: Professor Yang Tao, Fischell Department of 
Bioengineering 
 
 
In MRI of the human brain, subject motion is a major cause of magnetic resonance 
image quality degradation. To compensate the effects of head motion during data 
acquisition, an in-bore optical motion tracking system is proposed. The system 
comprises one or two MR compatible infrared cameras that are fixed on a holder right 
above and in front of the head coil. The resulting close proximity of the cameras to 
the object allows precise tracking of its movement. During image acquisition, the 
MRI scanner uses this tracking information to prospectively compensate for head 
motion by adjusting gradient field direction and RF phase and frequency. 
Experiments performed on subjects demonstrate the system?s robustness, exhibiting 
an accuracy of better than 0.1mm and 0.15?. 
 
 
 
 
  
 
 
 
 
PROSPECTIVE HEAD MOVEMENT CORRECTION FOR HIGH-RESOLUTION 
MRI USING AN IN-BORE OPTICAL TRACKING SYSTEM.      
 
 
 
By 
 
 
Lei Qin 
 
 
 
 
 
Dissertation submitted to the Faculty of the Graduate School of the  
University of Maryland, College Park, in partial fulfillment 
of the requirements for the degree of 
Doctor of Philosophy 
2009 
 
 
 
 
 
 
 
 
 
 
Advisory Committee: 
Professor Yang Tao, Chair 
Professor Rama Chellappa 
Dr.           Jeff H Duyn 
Professor Elliot R McVeigh 
Professor Hubert Montas 
 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
? Copyright by 
Lei Qin 
2009 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ii 
 
Acknowledgements 
I would like to express my deepest gratitude to my supervisor, Dr. Jeff H Duyn, who 
has been an invaluable mentor for the past thee and half years at NIH. His insightful 
guidance and advice have made substantial differences in my work, as well as my 
life. His confidence in my capability has enabled much of my creative work. His 
encouragement and understanding have made lots of difficulties seem easier.  
I am also deeply grateful to my supervisor, Dr. Yang Tao, for seeing my 
research potential and pushing me to complete my PhD degree. His wide knowledge 
and his logical way of thinking have been of great value for me. His understanding 
and guidance have provided a good basis for the present thesis. His support and 
encouragement have been invaluable for the past four years.  
I would also like to thank Dr. Rama Chellappa, Dr. Elliot R McVeigh and Dr. 
Hubert Montas for serving on my committee and for their time, support, and helpful 
advices during my research and academic development. 
I also owe my most sincere gratitude to Dr. Alan Koretsky, for his guidance, 
encouragement, understanding and generous help in both professional and personal 
matters. He made me believe in myself when I was in doubt, showed me the right 
path when I was lost. I cannot thank him enough for all he did for me. 
I am also grateful to Dr. John Andrew Derbyshire from NHLBI for his help on 
my understanding of MR physics and sequence programming. I will never forget the 
long hours he spent with me on the scanner. He gave me tremendous inspiration in 
my work and gave me help whenever I needed it.  
 
 iii 
 
I enjoyed every single day I worked in the Advance MRI lab. Every one in the 
lab is nice, generous and helpful to me. I want to thank Dr. Peter van Gelderen, Dr. 
Jacco de Zwart, Dr. Jonghoo Lee, Dr. Karin Shmueli, Dr. Bing Yao, Dr. Masaki 
Fukunaga, Dr. Marta Bianciardi, Dr. Shuming Wang, Dr. Joe Murphy-Boesch , Dr. 
Tieqiang Li and Susan Fulton. Special thanks go to Peter and Tieqiang for their 
guidance during my first steps into MR pulse sequence programming; and to Jacco 
for his help on my numerous computer problems. Only because of the help from all of 
you can I make this thesis possible. 
 I would also like to express my appreciation to all present and former 
members of the Bio-imaging and Machine Vision Laboratory at University of 
Maryland for their constant suggestions and friendship, including Bin Zhu, Lu Jiang, 
Xin Chen, Hansong Jing, Xuemei Chen, Abby Vogel, Angela Vargas, and Gary 
Seibel.  
 I would like to extend my deepest gratitude to my parents, who have always 
believed in me and knew I could accomplish the goal of being the first PhD in my 
family. And to my husband Dr. Fenghua Jin, thanks for the enduring love, 
encouragement, inspiration and support through all these years. His presence with me 
has made the whole work more enjoyable. 
 Last, but far from least, I want to express my thanks to all the people who 
have made this thesis possible and because of whom my graduate experience has 
been one that I will cherish forever.  
 
 iv 
 
Table of Contents 
 
 
Acknowledgements ....................................................................................................... ii 
Table of Contents ......................................................................................................... iv 
List of Figures .............................................................................................................. vi 
Chapter 1 Introduction .................................................................................................. 1 
Chapter 2 Review of Literature ..................................................................................... 2 
2.1 Principles of Magnetic Resonance Imaging ....................................................... 2 
2.1.1 Nuclear Magnetic Resonance Signal ........................................................... 2 
2.1.2 Spatial Localization for Imaging ................................................................. 5 
2.1.3 Fast Imaging ? Echo Planar Imaging ......................................................... 10 
2.2 Motion Artifacts ................................................................................................ 11 
2.2.1 Reason of motion artifacts ......................................................................... 11 
2.2.2 Correction for Motion Artifacts ................................................................. 13 
2.3 Stereo Tracking System .................................................................................... 17 
2.3.1 Camera Calibration .................................................................................... 17 
2.3.2 3D Reconstruction ..................................................................................... 19 
2.3.3 Object Tracking ......................................................................................... 20 
2.3.4 3D Motion Estimation................................................................................ 21 
Chapter 3 Objective .................................................................................................... 23 
Chapter 4 Prospective Motion Correction for MRI using a Stereo Tracking System 24 
4.1 System Setup ..................................................................................................... 24 
4.2 System Calibration ............................................................................................ 26 
4.2.1 CCD Camera Calibration ........................................................................... 27 
4.2.2 Stereo Calibration ...................................................................................... 36 
4.2.3 3D Reconstruction from Stereo ................................................................. 39 
4.2.4 Integrated Optical and MRI System Calibration ....................................... 42 
4.3 Real-time Motion Calculation........................................................................... 46 
4.3.1 Kanade-Lucas-Tomasi Tracker .................................................................. 46 
4.3.2 Stereo Matching ......................................................................................... 48 
4.3.3 Rigid Motion Estimation............................................................................ 51 
4.4 Motion Calculation in MRI coordinates ........................................................... 54 
4.4.1 Motion in MRI physical coordinates ......................................................... 54 
4.4.2 Motion in MRI logical coordinates ............................................................ 55 
4.5 Motion Correction for MRI .............................................................................. 57 
4.5.1 Rotation Correction .................................................................................... 57 
4.5.2 Slice Selection Displacement Correction................................................... 58 
4.5.3 Frequency Encoding Displacement Correction ......................................... 66 
4.5.4 Phase Encoding Displacement Correction ................................................. 69 
4.5.5 Summary of Position Compensation ......................................................... 72 
4.5.6 K-space Line Re-acquisition ...................................................................... 73 
4.6 Software Flowchart for the Motion Correction System .................................... 73 
Chapter 5 Results and Discussion with the Stereo Motion Correction System for MRI
..................................................................................................................................... 75 
 
 v 
 
5.1 Camera Calibration Test ................................................................................... 75 
5.2 Stereo Reconstruction Validation ..................................................................... 76 
5.3 Tracker Noise Evaluation ................................................................................. 77 
5.4 Calibration Accuracy Between Cameras and MRI Coordinate System ........... 79 
5.5 Pulse Sequence Test with a Phantom................................................................ 80 
5.6 System Performance Test ................................................................................. 83 
5.7 Volunteer Study ................................................................................................ 84 
5.7.1 Breathing Related Motion .......................................................................... 85 
5.7.2 Minimal Motion ......................................................................................... 86 
5.7.3 Large Stepwise Motion .............................................................................. 87 
5.7.4 Slow Motion............................................................................................... 88 
5.8 Discussion ......................................................................................................... 90 
Chapter 6 Prospective Motion Correction for MRI using a Single Camera ............... 94 
6.1 System Setup ..................................................................................................... 94 
6.2 Method .............................................................................................................. 95 
6.2.1 Motion Computation .................................................................................. 97 
6.2.2 Best Match Search ..................................................................................... 98 
6.3 Result .............................................................................................................. 100 
6.3.1 Accuracy of the Motion Estimation ......................................................... 100 
6.3.2 Volunteer Study ....................................................................................... 103 
6.4 Discussion ....................................................................................................... 104 
Chapter 7 Conclusion ................................................................................................ 107 
Chapter 8 Future Work ............................................................................................. 109 
Chapter 9 Appendices ............................................................................................... 111 
9.1 Derivation of motion in logical coordinate system ......................................... 111 
9.2 MRI Coordinates from DICOM Image........................................................... 112 
9.3 Gauss-Newton Algorithm ............................................................................... 114 
9.4 Stereo System QA ........................................................................................... 117 
Chapter 10 Bibliography ........................................................................................... 119 
 
 
 
 
 
 
 
 
 
 
 
 vi 
 
List of Figures 
 
Figure 2-1    Slice selection. Activating a given magnetic-field gradient creates a 
linear correspondence between the magnetic-field strength and the position along 
the gradient. An RF pulse with center frequency c?  excites spins located at 
position cz ; the bandwidth of ??  and the gradient strength determine the slice 
thickness z? . ......................................................................................................... 6 
Figure 2-2    Sequence diagram for a 2D gradient echo imaging sequence. RF pulse is 
a sinc function. Gz is the slice selection z-gradient. Gy is the phase encoding y-
gradient, pictured as a series of horizontal lines to denote that it is being stepped 
regularly through increasing values during different repetition periods. Gx is the 
frequency encoding x-gradient and S is the echo. ................................................. 7 
Figure 2-3    k-space trajectory for 2D gradient echo imaging sequence. The signal 
starts from kx=ky=0 after each slice selection rephasing lobe. Phase encoding 
gradient moves the k-space trajectory vertically to a ky depending on the current 
Gy (red vertical line). The dephasing lobe of Gx transverses the trajectory 
horizontally to kx,min (red horizontal line). The reversed read gradient then brings 
the spins to kx,max through a complete set of kx values (red arrows). The 
experiment is repeated by changing Gy to obtain a complete set of data (black 
arrows). .................................................................................................................. 9 
Figure 2-4   (a) An EPI sequence with blipped phase encoding gradient.  (b) K-space 
trajectory for the corresponding EPI sequence. ................................................... 11 
Figure 2-5    An axial brain image exhibits ghosting due to head motion. Since the 
movement is random, the ghosts are smeared out along the phase encoding 
direction (left/right). ............................................................................................ 12 
Figure 2-6    A high resolution T2* weighted GRE brain image. The resolution is 
200?200?1000 micrometer, and the scan time is about 5 minutes. ................... 13 
Figure 2-7    Navigator sequence and its application for motion correction. (a) X-
NAV sequence for x-axis displacement measurement. The NAV echo is 
interleaved into the imaging sequence and is similar to an image echo, except 
that no phase encoding is applied. (b) A transverse section of the legs of a 
volunteer. Left leg moved from side to side while the right leg stayed stationary. 
The artifact caused by the left leg motion was marked with the arrow. (c) The X-
NAV data obtained during the scan. The motion of the left leg and the static 
position of the right leg were clearly shown. (d) Motion corrected images using 
the NAV data from (c). The motion artifact of the left leg was eliminated. 
(Figures from Ehman and Felmlee, 1989) ........................................................... 15 
Figure 2-8    k-space trajectory. (a) projection reconstruction; (b) spiral; (c) 
PROPELLER. ...................................................................................................... 16 
Figure 4-1    Experimental setup. The top pane is an illustration of the position of the 
cameras and head coil in the MRI scanner bore. The bottom images show the real 
instruments. Cameras are highlighted with the red ellipse. ................................. 25 
Figure 4-2    Calibration model. (a) Schematics of the phantom; (b) View of the actual 
phantom surface. .................................................................................................. 27 
 
 vii 
 
Figure 4-3    Pinhole camera geometry. OC is the camera center and also the origin of 
the camera coordinate system, Op is the principle point, M is a point in the space 
which has different coordinates in the world and camera coordinate systems, and 
md and mf are the image of M in the image plane with and without lens 
distortion. ............................................................................................................. 28 
Figure 4-4    Camera calibration using a square target. (a) Projection of a square 
ABCD to the image plane; (b) The image and the four vanishing points { 1~V , 2~V , 
3
~V , 
4
~V  }, and the horizon line
H?
~ . ....................................................................... 31 
Figure 4-5    Two views of the calibration target. The red crosses mark the extracted 
corners used for calibration, which have the number of 100 for each view. ....... 36 
Figure 4-6    The epipolar geometry. M is a point in space, Oc and Oc? are the two 
camera centers. The camera baseline intersects each image plane at the epipoles e 
and e?. The camera centers, 3D space point M and its images m and m? lie in a 
common plane, called epipolar plane. The two lines me and m?e? are epipolar 
lines. ..................................................................................................................... 37 
Figure 4-7    Calibration between the optical system and MRI.  The calibration 
phantom was put in the field of view of both the cameras and the MRI scanner.43 
Figure 4-8    Two images from the 3D scan of the calibration phantom. The red 
crossings mark the corners.  The blue arrow marks an air bubble inside the 
phantom. .............................................................................................................. 44 
Figure 4-9    Phantom corner position estimation in MRI coordinate system. (a) shows 
one side of a prism. Blue dots are the observed corner position along that side; 
green line is the estimated side and the red circle is the estimated corner position 
on the phantom surface. (b) shows the alignment between the extracted internal 
structure (green) of the phantom and an ideal model (red). The green lines are the 
side of the prisms and the red circles are the estimated corners position in MRI.
 ............................................................................................................................. 45 
Figure 4-10    Epipolar lines  estimated after stereo calibration. (a) shows image from 
one of the camera and two randomly chosen points. (b) shows the image from the 
other camera and the estimated epipolar lines corresponding to the two points in 
(a) respectively. .................................................................................................... 49 
Figure 4-11    MRI physical and logical coordinate system. {OM, XM, YM, ZM} is MRI 
physical coordinate system, which doesn?t change after the MRI system is set up. 
{OL, XL, YL, ZL} is MRI logical coordinate system, which changes with different 
prescribed scan plane. XL is the frequency encoding direction, YL is phase 
encoding direction and ZL is slice selection direction. ........................................ 56 
Figure 4-12    The change of RF frequency to select the same slice when motion 
occurs.  Under the same slice-selective gradient, to follow the motion, the only 
way is to change the RF frequency. ..................................................................... 59 
Figure 4-13    Slice selection sequences (RF pulse and z-gradient) and the three paths 
that generate the RF pulse, i.e. center frequency path, delta frequency path and 
phase path.  t=0 is the start of TR and only one TR is shown here. .................... 60 
Figure 4-14    Hybrid magnitude and phase image of the cylindrical phantom.  Phase 
encoding gradient was zero. Frequency encoding was left-right direction. ........ 62 
Figure 4-15    The hybrid space image after applying a linear RF frequency. Phase 
encoding gradient is set to zero. For each RF excitation, its frequency changes 
 
 viii 
 
linearly with a step ?? . Perform FFT on each measured echo in k-space can we 
get a hybrid image.  The magnitude of each line in the frequency encoding 
direction (shown with a rectangle shape) represents the object profile. The red 
line is to select an arbitrary point in the hybrid image, and the phases of those 
points also change linearly with a step ?? .  Complex means to combine the 
magnitude and phase of the point in that line. ..................................................... 63 
Figure 4-16    The hybrid phase image of the phantom when phase encoding gradient 
is set to zero. RF transmit frequency varies linearly along the phase encoding 
direction. (a) Phase correction is calculated with t? =264 s? ; (b) Phase 
correction is calculated with t? =180 s? ; (c) Phase correction is calculated with 
t? =230 s? ; (d) Phase correction is calculated with t? =480 s? . ....................... 65 
Figure 4-17    Frequency encoding gradient and the acquisition window associated 
with it. Also the three paths that generate the receive frequency and phase. ...... 67 
Figure 4-18    Hybrid phase image used for measurement of 't? . Phase encoding 
gradient is set to zero. RF receive frequency varies linearly along the phase 
encoding direction with a step of 100Hz. (a) Phase correction is calculated with 
't? =144 s? ; (b) Phase correction is calculated with 't? =100 s? ; (c) Phase 
correction is calculated with 't? =200 s? . (d) Phase correction is calculated with 
't? =300 s? . ......................................................................................................... 69 
Figure 4-19    Software flowchart for the prospective head movement correction 
system for MRI. Calibration is all done offline and programmed with MATLAB. 
Real-time correction was programmed with C++ running in Linux. N is the 
number of total phase encoding steps for a scan. ................................................ 74 
Figure 5-1    Distortion and correction for the camera. (a) Distortion model for the 
camera, containing radial and tangential distortion.  The arrows and number on 
the curve mark the distortion direction and ......................................................... 76 
Figure 5-2    Noise of the tracker. The maximum x translation error as a function of 
the number of tracked feature points. .................................................................. 78 
Figure 5-3    Rotation and Translation parameters estimated from MRI images (solid 
lines)and from camera images transformed into MRI coordinates(dotted lines).  
The two sets of lines matching very well means the calibration between camera 
and MRI is accurate. ............................................................................................ 80 
Figure 5-5    The MR hybrid image without phase encoding gradient. Small breathing 
induced head movement was observed during the scan  of the volunteer. (a) 
without motion correction; (b) with motion correction. ...................................... 84 
Figure 5-6    Prospective breathing induced motion correction for the 2D gradient 
echo acquisition. Images (a,b) show the evolution of motion parameters reported 
by the tracker during the acquisition of images (c) and (d) respectively. Image (c) 
was scanned without motion correction, and (d) was with motion correction; (e) 
and (f) were zoomed in on an area in the frontal brain indicated by the red square 
in images (c) and (d) respectively. ....................................................................... 86 
Figure 5-7    Prospective motion correction data obtained for a volunteer that 
attempted to minimize head motion. Images (a,b) show the motion parameters 
reported by the tracker during the acquisition of images (c) and (d) respectively. 
Image (c) was scanned without motion correction, and (d) was with motion 
 
 ix 
 
correction; (e) and (f) magnify an area in the front of the brain (the area marked 
by the red square in (c) and (d) respectively). ..................................................... 87 
Figure 5-8   Example of the performance of the prospective motion correction method 
in the presence of large stepwise motion during the acquisition of multi-shot 2D 
gradient echo images on a normal volunteer. Images (a-c) show the motion 
parameters reported by the tracker during the acquisition of images (d-f) 
respectively. Image (d) was scanned without intentional motion during the scan, 
whereas (e,f) were scanned in the presence of motion; (e) was acquired without 
correction but (f) was acquired using the proposed method. (g-i) show additional 
detail in an area in the front of the brain that was indicated by the red square in 
images (d-f). ......................................................................................................... 88 
Figure 5-9    Prospective motion correction in the presence of slow motion. The 
motion parameters reported by the tracker during the scan of images (d-f) are 
displayed in (a-c), respectively. Image (d) was acquired in the absence of 
intentional head motion, (e,f) during slow head motion, where, unlike (e), (f) was 
acquired with the proposed correction method. A magnification of the area in the 
red box in (d-f) is shown in (g-i). ........................................................................ 89 
Figure 6-1    System setup. The camera was fixed right on top of the head coil. ....... 95 
Figure 6-2    Method of the motion estimation using a single camera. N training EPI 
and camera images were acquired. From EPI images, 6DOF motion parameters 
were calculated from registration for each training data. During the real-time 
scan, the most similar camera image in the training set was found for the newly 
acquired camera image. The corresponding motion parameters from EPI were 
sent to MRI pulse sequence for prospective motion correction. ......................... 96 
Figure 6-3    Motion parameters from the training data set. The total number of 
training images is 145. The motion is mainly rotation around z axis and 
translation in x axis. ........................................................................................... 101 
Figure 6-4    Correlation of two randomly chosen images with all training images. 
Red crossings mark the chosen image, (a) the correlation between image No. 47 
with others; (b) the correlation between image No. 100 with others. ................ 101 
Figure 6-5    Prospective correction of large motion. (a) was acquired without 
intentional motion during the scan. (b c) were scanned in the presence of motion; 
(b) was acquired using the proposed method but (c) was without correction. (d,e) 
show the evolution of motion parameters during the acquisition of image (b,c), 
respectively. ....................................................................................................... 104 
Figure 9-1    3D reconstruction error if camera parameters change.  The solid lines 
projection is correct and the dotted projection is the result from different camera 
parameters with a small focal length and principle point difference. It?s obvious 
that the3D estimation magnify the camera parameters distinction. ................... 117 
Figure 9-2    3D position estimation for a  checker board pattern. Every circle 
represents a corner of the checker board. Red color shows position estimation 
from the left camera; blue color shows position estimation from the right camera; 
and green color shows those from the stereo. (a) when correct calibration 
parameters were used. (b) when focal length was changed with only 0.1mm. . 118 
.
 
 1 
 
Chapter 1  Introduction 
 
Magnetic resonance imaging (MRI) has been widely used as a noninvasive clinical 
and research modality for the study of human anatomy. However, subject motion 
during scanning remains a severe problem and may degrade image quality below 
levels acceptable for clinical diagnosis. Recent improvements that yield high spatial 
resolution MRI of the brain (~ 0.2 mm) make this problem more acute, since for this 
application the tolerance to motion is reduced, and scan time is increased. The latter 
makes it more difficult for the subject to maintain the same position throughout the 
scan, which is especially problematic for children and patients.  
To compensate the effects of head motion during data acquisition, an in-bore 
optical motion tracking system is proposed. The system comprises one or two MR 
compatible infrared cameras that are fixed on a holder right above and in front of the 
head coil. The resulting close proximity of the camera(s) to the object allows precise 
tracking of its movement. During image acquisition, the MRI scanner uses this 
tracking information to prospectively compensate for head motion by adjusting 
gradient field direction and RF phase and frequency. Experiments performed on 
subjects demonstrate the system was able to improve MR image quality if motion 
occurred during acquisition. 
 
 2 
 
Chapter 2  Review of Literature 
 
Magnetic resonance imaging (MRI) has been widely used as a noninvasive clinical 
and research modality for the study of human anatomy. It exploits the phenomenon of 
nuclear magnetic resonance (NMR) in an external magnetic field, whereby nuclei 
absorb and re-emit certain radio frequency (RF) waves during change in nuclear spin 
state. Image formation using NMR signals was developed by Lauterbur and 
Mansfield (Lauterbur, 1973; Mansfield and Grannell, 1973) in 1973 based on spatial 
encoding principles, which won them the 2003 Nobel Prize in Physiology and 
Medicine.  Since then, MRI has undergone dramatic improvements in all the features 
that define image quality, such as resolution, signal-to-noise ratio (SNR), contrast 
enhancemant and speed. Specific structures, such as arteries, lesions, white matter 
fiber tracts, can also be visualized by manipulating RF fields and local magnetic field. 
 
2.1 Principles of Magnetic Resonance Imaging 
2.1.1 Nuclear Magnetic Resonance Signal 
The nucleus is composed of protons and neutrons, each of which intrinsically 
possesses angular momentum J , often called spin (Liang et al., 2000; Gerlach and 
Stern, 1924). Nuclei with an odd number of protons and/or an odd number of 
neutrons have a net spin. Because the nucleus is positively charged and possesses 
angular momentum, it possesses a magnetic moment ?  also (Rabi et al., 1938). The 
relationship between the spin angular momentum and magnetic moment is 
 
 3 
 
J?? =                                                                                                                      [2.1] 
 
Where ?  is the gyromagnetic ratio. A related constant  ?  is also widely used, which 
is defined as 
?
??
2=                                                                                                                      [2.2] 
The value of ?  or ?  is nucleus-dependent. For hydrogen (proton), ?  is 2.675?108 
rad/s/T, ?  is 42.58MHz/T. Because of its high concentration in the body in water and 
fat molecules, and their relatively high gyromagnetic ratio, the hydrogen proton is the 
source of signal for most clinical MRI exams. 
In the absence of an externally applied magnetic field, the hydrogen spins in 
biological materials have random orientations, therefore the vector sum of the 
magnetic moments, often referred to as net magnetization, denoted by 0M , is zero.  
However, if an external magnetic field 0B  is applied, the spins will take only a 
discrete number of orientations, which equals to twice the nuclear-spin value plus 
one.  For protons with a spin of ?, ?  has two possible orientations with respect to 
0B  direction.  
Since spins in different orientations have different energy of interaction with 
the external magnetic field 0B , two energy states can be measured for protons, a 
high- and a low-energy state. The high-energy state is when spin is aligned opposing 
0B , or spin-down ?E , while the low-energy state is when spin is aligned with 0B , or 
spin-up ?E . At equilibrium, the relative populations of the up and down states are 
given by the Boltzmann distribution as: 
 
 4 
 
kT
Bh
kT
Bh
N
N 00 1exp ?? ???
?
??
?
??=
?
?                                                                               [2.3] 
Where h is Planck?s constant with the value h =1.0546?10-34J.s; k is Boltzmann?s 
constant, with the value k=1.381?10-23J/K; T is the absolute temperature of the 
sample in Kelvin.  
Since the net magnetization 0M  is the vector sum of the individual magnetic 
moments, the excess of spins in the spin-up state results in an 0M  aligned with the 
external field B0. To measure MRI signal by the RF receiver coil, 0M  needs to be 
tipped away from the longitudinal direction (B0 direction) to the transverse plane 
(orthogonal to B0). This can be achieved by applying a radiofrequency (RF) pulse at 
the Larmor frequency,  
00 B?? =                                                                                                                   [2.4] 
The RF pulse is normally turned on for a few microseconds or milliseconds. 
The created B1 magnetic field makes the spins to precess, As a result, the net 
magnetization is tipped away from B0 direction, creating a measurable transverse 
component xyM . 
At equilibrium, the fractional population difference kTBh 0?  is very small, which 
is about 1 in 100,000 nuclei at 1.5T. Because the population difference is so small, 
the equilibrium net magnetization 0M  is also small, making the measured MR signal 
xyM  small. Therefore, to get a better signal-to-noise ratio (SNR) image, a stronger B0 
is desirable. 
 
 
 5 
 
2.1.2 Spatial Localization for Imaging 
A basic example of spatial localization is selective excitation and Fourier spatial 
encoding. To form an image, the first step is to localize the signals by exciting the 
magnetization only within a specified slice or slab of body (Bernstein et al., 2004; 
Lauterbur, 1973; Mansfield and Grannell, 1973). This slice selection is achieved by 
applying the RF pulse in the presence of a magnetic field gradient Gz. As given by the 
Larmor equation, the precession frequency is proportional to the field strength. The 
gradient therefore creates a variation in the Larmor frequency along the direction of 
the gradient. So, as shown in Figure 2-1, the RF pulse with a specific frequency c?  
excites only spins at a specific position cz because only those spins possess a resonant 
frequency that equals to the frequency of the applied RF pulse. The selection of 
another slice can be accomplished by changing the RF frequency, for example in 
Figure 2-1, increasing the RF frequency will excite a slice more superior while 
decreasing the RF frequency will select a slice more inferior. The thickness of the 
slice is determined by the bandwidth of the RF pulse and the amplitude of the 
gradient, as: 
zG
z ? ??=?                                                                                                                 [2.5] 
Where z?  is the slice thickness, Gz is the applied gradient field strength and ??  is 
the bandwidth of the RF pulse. 
 
 6 
 
 
Figure 2-1    Slice selection. Activating a given magnetic-field gradient creates a linear 
correspondence between the magnetic-field strength and the position along the gradient. An RF 
pulse with center frequency c?  excites spins located at position cz ; the bandwidth of ??  and 
the gradient strength determine the slice thickness z? . 
 
After the slice selection gradient is turned off, a phase encoding gradient (Gy) 
can be implemented. The magnetization accumulates a y-dependent phase when the 
y-gradient is kept on for a time y? : 
yG yyy ??? =                                                                                                              [2.6] 
The encoded phase can be written in terms of the k-space spatial frequency (Twieg, 
1983) in y direction: 
yk yy ?? 2=   with  yyy Gk ??=                                                                                 [2.7] 
The gradient Gy varies each time after excitation in a step-like fashion, to 
gather enough information for localizing the spin density in y direction. 
 
 7 
 
The frequency encoding gradient Gx is applied after Gy. A negative dephasing 
lobe is followed by a positive lobe, during which an echo of the magnetization is 
formed and the signal is measured. The accumulated x-dependent phase is: 
xkxtG xxxx ??? 2==   with  xxx tGk ?=                                                                    [2.8] 
The gradient Gx is a constant, and the time tx is a variable along the measurement 
period.  
These spatial encoding events can be depicted in a graphical format, called a 
pulse sequence. Figure 2-2 shows the basic gradient echo (GRE) sequence described 
above. 
 
Figure 2-2    Sequence diagram for a 2D gradient echo imaging sequence. RF pulse is a sinc 
function. Gz is the slice selection z-gradient. Gy is the phase encoding y-gradient, pictured as a 
series of horizontal lines to denote that it is being stepped regularly through increasing values 
during different repetition periods. Gx is the frequency encoding x-gradient and S is the 
measured echo signal. 
 
The spin density ),,( 0zyx?  and the measured signal ),( yx kks  satisfy the 
following equation: 
 
 8 
 
dxdyykxkizyxkks yx
fov
yx ))(2exp(),,(),( 0 +?= ?? ??                                                [2.9] 
Therefore, sufficient k-space coverage is needed to reconstruct the image of the spin 
density with inverse 2D Fourier transform, 
yxyx
fov
yx dkdkykxkikkszyx ))(2exp(),(),,( 0 += ?? ??                                                      [2.10] 
The k-space trajectory of the GRE imaging sequence is illustrated in Figure 
2-3. The initial phase is 0== yx kk  after each slice selection, corresponding to the 
center of the k-space. A phase encoding gradient moves the k-space trajectory 
vertically to a ky depending on the current Gy (red vertical line). The dephasing lobe 
of the frequency encoding gradient transverses the trajectory horizontally to kx,min (red 
horizontal line). The positive read gradient then brings the spins to kx,max, with signal 
being measured (red arrows). This gives one line in k-space, so the experiment is 
repeated by stepped changing Gy to obtain a complete set of data (black arrows).    
 
 9 
 
 
Figure 2-3    k-space trajectory for 2D gradient echo imaging sequence. The signal starts from 
kx=ky=0 after each slice selection rephasing lobe. Phase encoding gradient moves the k-space 
trajectory vertically to a ky depending on the current Gy (red vertical line). The dephasing lobe of 
Gx transverses the trajectory horizontally to kx,min (red horizontal line). The reversed read 
gradient then brings the spins to kx,max through a complete set of kx values (red arrows). The 
experiment is repeated by changing Gy to obtain a complete set of data (black arrows).   
 
 
The MR signal is collected by uniformly sampling points in the k-space rather 
than by continuous monitoring, leading to discrete k-space coverage. 
( )???
=
?
=
?+????=
1
0
1
0
)(2exp),(),(
M
m
N
n
yxyx yknxkmiynxmkks ??                                   [2.11] 
 Where M and N are the number of sampled k-space data points in x and y direction; 
x? and y?  are the pixel size. Therefore, the image of the spin densities can be 
reconstructed from the discrete inverse Fourier transform of the k-space data. 
 
 10 
 
The total scan time required for 2D GRE acquisition is described by the 
following equation: 
NEXMTRTacq ??=                                                                                              [2.12]    
Where TR equals the repetition time and NEX equals the number of repetitions used 
(signal averaging). For a scan with TR 1s, phase encoding 256, and NEX 2, the total 
scan time is about 5 minutes. A higher resolution scan using much more phase 
encoding can prolong the scan time even more. Fast imaging strategies are therefore 
developed. 
 
2.1.3 Fast Imaging ? Echo Planar Imaging 
The first ultra high-speed imaging technique was proposed by Mansfield in 1977 
(Mansfield, 1977), named Echo Planar Imaging (EPI). Since then, various EPI 
techniques have been proposed.  
EPI encodes complete image position information in a single acquisition line 
rather than phase encoding a single line of k-space with every TR interval. Therefore, 
with a single excitation, an image can be formed. A GRE-EPI sequence is shown in 
Figure 2-4(a). Phase encoding of individual gradient echoes is through the use of a 
series of blipped Gy pulses. Figure 2-4(b) illustrates the k-space trajectory of the EPI 
sequence.  
 
 11 
 
 
Figure 2-4   (a) An EPI sequence with blipped phase encoding gradient.  (b) K-space trajectory 
for the corresponding EPI sequence. 
 
Because of the longer readout, EPI suffers more than GRE from various types 
of artifacts. In regions of high field inhomogeneities or at interfaces between tissues 
of different susceptibilities, image distortions or signal drop-out can be generated. 
The odd and even echoes shown in Figure 2-4(b) may have slight timing mismatches 
caused by imperfections in the gradients, which translate into phase errors and 
manifest themselves as duplicate images of the main image along the phase direction. 
 
2.2 Motion Artifacts 
2.2.1 Origin of motion artifacts 
For a slow high resolution structural scan, the head motion of an uncooperative 
volunteer causes motion artifacts primarily in the phase-encoding direction. The 
frequency encoding direction is less affected since the k-space data corresponding to 
the readout direction are collected over a relatively short period of a few milliseconds 
or less, and can be regarded as instantaneous on the time scale of most physiologic 
 
 12 
 
motion. However, in the phase encoding direction, the positions are encoded by a 
step-wise changing magnetic field gradient prior to the acquisition of each line of k-
space. Therefore, the time interval between neighboring two points in phase encoding 
direction is TR, which ranges from tens of milliseconds or several seconds and cannot 
be regarded as instantaneous. The motion will cause additional phase shifts to the 
spins from line to line, which will corrupt the spatial encoding and give rise to 
ghosting. 
Head motion is usually random motion, which makes the artifacts smeared out 
along the phase encoding direction and can obscure pathology (Figure 2-5). 
 
 
Figure 2-5    An axial brain image exhibits ghosting due to head motion. Since the movement is 
random, the ghosts are smeared out along the phase encoding direction (left/right).  
 
 
Recent improvements that yield high spatial resolution MRI of the brain (~ 
0.2 mm, Figure 2-6) make this problem more acute, since for this application the 
tolerance to motion is reduced, and scan time is increased (Duyn et al., 2007; van 
Gelderen et al., 2007). The latter makes it more difficult for the subject to maintain 
 
 
 13 
 
the same position throughout the scan, which is especially problematic for children 
and patients. 
 
Figure 2-6    A high resolution T2* weighted GRE brain image. The resolution is 200?200?1000 
micrometer, and the scan time is about 5 minutes. 
 
 
2.2.2 Correction for Motion Artifacts 
Several methods have been proposed to solve the head movement problem in MRI. 
All model head movement as rigid body motion with six degrees of freedom (DOF), 
namely three rotations and three translations along the MRI coordinate system. These 
parameters are then used to either retrospectively or prospectively compensate for the 
effects of motion on the image data.  
Retrospective motion correction addresses motion artifacts after the 
acquisition of a complete set of raw image data. While this might work well for in-
plane motion, it is generally inadequate for through-plane motion, primarily because 
it cannot correct for the effects of this motion on the local magnetization history (i.e. 
 
 
 14 
 
changes in saturation level of longitudinal magnetization due to motion-induced 
changes in the image-slice location). To avoid this problem prospective motion 
correction techniques have been proposed recently (Lee et al., 1998; Lee et al., 1996; 
Thesen et al., 2000; van der Kouwe et al., 2006; Ward et al., 2000). They track the 
head motion and rectify the acquisition planes correspondingly, by adjusting gradient 
direction and RF phases and frequencies. This avoids problems related to the effects 
of motion on spin history.  
The object motion parameters used by prospective and retrospective 
correction methods can be derived from MRI data using image or navigator based 
methods. Image based methods use image registration algorithms to detect the motion 
parameters (Friston et al., 1995; Friston et al., 1996; Jenkinson et al., 2002; Thesen et 
al., 2000). These techniques, while straightforward, can only calculate motion after 
acquiring a volume and are therefore always lagging the motion. Also, any ghosting 
and blurring artifacts that motion may cause may affect the accuracy of the 
registration algorithms. 
Navigator based motion correction methods acquire a motion-sensitive 
reference signal with the image (Ehman and Felmlee, 1989; Fu et al., 1995; Lee et al., 
1998; Lee et al., 1996; van der Kouwe et al., 2006; Barnwell et al., 2007; Kadah et 
al., 2004). The earliest navigator method was capable of detecting only one-
dimensional (1D) translation (Ehman and Felmlee, 1989) by employing a frequency 
encoding gradient but no phase encoding gradient before image acquisition. Soon 
after, orbital navigators (Fu et al., 1995; Ward et al., 2000), spherical navigators 
(Welch et al., 2002; Wyatt et al., 2005; Petrie et al., 2005), cloverleaf navigators (van 
 
 15 
 
der Kouwe et al., 2006) and other navigators (Spuentrup et al., 2002; Thiel et al., 
2002) (Wang et al., 2005; Pipe et al., 2002)were proposed to allow full tracking of 3D 
motion. However, the extra time required for measuring the navigator echoes led to 
an increased scan time. 
 
Figure 2-7    Navigator sequence and its application for motion correction. (a) X-NAV sequence 
for x-axis displacement measurement. The NAV echo is interleaved into the imaging sequence 
and is similar to an image echo, except that no phase encoding is applied. (b) A transverse section 
of the legs of a volunteer. Left leg moved from side to side while the right leg stayed stationary. 
The artifact caused by the left leg motion was marked with the arrow. (c) The X-NAV data 
obtained during the scan. The motion of the left leg and the static position of the right leg were 
clearly shown. (d) Motion corrected images using the NAV data from (c). The motion artifact of 
the left leg was eliminated. (source: Ehman and Felmlee, 1989)   
 
Self-navigating methods can also be used to retrospectively correct motion 
when combined with particular image acquisition techniques including projection 
acquisition (Glover and Noll, 1993; Glover and Pauly, 1992; Kim et al., 2008), spiral 
acquisition (Glover and Lee, 1995; Liu et al., 2004; Meyer et al., 1992; Hu and 
Glover, 2007), and PROPELLER (Pipe, 1999; Pipe and Zwart, 2006; Wang et al., 
2005; Pipe et al., 2002). These all involve the collection of redundant MRI data, 
  (a) 
   
         (b)              (c)         (d) 
 
 16 
 
usually including the central region of the k-space. The over-sampled central region 
of k-space provides intrinsic averaging of image features that reduces motion 
artifacts, and can be further used to correct spatial inconsistencies in position, 
rotation, and phase between acquisitions. One drawback of these methods is the 
increased scan time related to redundant k-space sampling. 
 
Figure 2-8    k-space trajectory. (a) projection; (b) spiral; (c) PROPELLER. 
 
Alternatively, external motion tracking systems can be used for MRI motion 
correction. These tracking systems rely on additional hardware, such as miniature 
coils (Derbyshire et al., 1998; Nevo et al., 2002), fiducial samples (Eviatar et al., 
1997), optical reflectors (Eviatar et al., 1999) or stereo cameras (Dold et al., 2006; 
Tremblay et al., 2005; Zaitsev et al., 2006). Among them, stereo optical systems have 
been shown to give good motion correction results with reasonable accuracy 
retrospectively (Tremblay et al., 2005) as well as prospectively (Dold et al., 2006; 
Zaitsev et al., 2006). The stereo tracking system works in parallel with the scanner 
thus needs no extra scan time for motion detection in the imaging sequence. 
However, there are limitations to the current tracking systems, the most important 
being that the tracking target required for monitoring object position is either 
uncomfortable for the subjects (e.g., a mouthpiece used in Zaitsev et al., 2006), or 
                    
         (a)         (b)             (c) 
 
 17 
 
hard to combine with close-fitting receiver arrays (e.g., a cap used in Tremblay et al., 
2005). To overcome these limitations, we developed an in-bore video tracking system 
for prospective motion correction that monitors the position of features on the 
subject?s face. The system was designed to allow high resolution (~ 0.2mm) MRI at 
7T in the presence of substantial head motion.   
 
2.3 Stereo Tracking System 
Stereo tracking system includes two cameras, acquiring images of the same object 
simultaneously from slightly different view points. To determine the 3D motion of an 
object, the system needs to be accurately calibrated beforehand. The object?s motion 
parameters can be estimated by analyzing the image pairs captured by the stereo 
system. The tracking feature points are first detected in each image pair and the 3D 
position of these feature points are calculated through stereo triangulation. The 6-
DOF motion parameters (3-DOF translations plus 3-DOF rotations) can be further 
estimated based on the 3D coordinates of these feature points.  
 
2.3.1 Camera Calibration 
Camera calibration is a crucial issue in computer vision. It serves to determine the 
relationship between a 2D image perceived by a camera (or images perceived by 
multiple cameras) and the 3D information of the real object. Precise camera 
calibration is especially important for applications that involve metric measurements, 
such as dimensional measurements (Tomasi and Kanade, 1992), object motion 
 
 18 
 
(Broggi, 1998; Charbonnier and Fournier, 1995), visual inspection (Newman and 
Jain, 1995), etc. 
Camera models usually involve of two sets of parameters: extrinsic and 
intrinsic. Extrinsic parameters, including a rotation and translation vector, model the 
geometrical relationship between the camera and the scene. Intrinsic parameters, 
including focal length, principal points, and distortions, model the internal geometry 
of a camera. They determine how to derive an image point position given the spatial 
position of the point with respect to the camera. The position and orientation of a 
stereo setup is also intrinsic since it is independent of scene structure. Intrinsic 
parameters remain constant after cameras are set up while extrinsic parameters may 
change with different scenes. The estimation of these two sets of parameters is called 
camera calibration, which allows one to relate the image measurements to the spatial 
structure of the observed scene. 
The existing camera calibration approaches can be classified into three 
categories: 
a) Linear closed-form methods. These methods construct a linear 
transformation to relate 3D points with their 2D projections (Abdel-Aziz and Karara, 
1971; Faugeras and Toscani, 1986; Hall et al., 1982). Then least squares methods are 
used to get the closed-form solution. The advantage of these methods is the simplicity 
of the model and fast computation. However, they have the following disadvantages. 
First, the linear models don?t include lens distortion, which make these methods less 
accurate. Second, camera parameters are not explicit in these models, and constraints 
(i.e. rotation matrix should be orthonormal) are not considered.  
 
 19 
 
b) Non-linear optimization methods. When lens distortion is considered in the 
camera model, the relationship between the 3D scene points and their 2D projections 
becomes non-linear. The non-linear error function is usually defined as the distance 
between 2D image points and the projection of the 3D points. Camera parameters can 
then be optimized iteratively by minimizing the defined error function (Faig, 1975; 
Slama et al., 1980). The advantage of these methods is that any distortion can be 
considered in the model and good accuracy can be achieved by reaching correct 
convergence. However, a good initial guess is required to guarantee correct 
convergence.    
c) Two-step methods. These methods first find closed-form solutions for some 
parameters, then do iterative optimization for the other or all of the parameters (Tsai, 
1987; Weng et al., 1992b; Zhang, 2000). These methods take the advantages of the 
previous two methods and can always converge quickly to a correct solution due to 
the closed-form guess in the first step. Explicit intrinsic and extrinsic camera 
parameters and major distortions (namely, radial, tangential, decentering) can all be 
estimated with high accuracy in a short time. 
 
2.3.2 3D Reconstruction 
3D reconstruction computes the position of a point in 3D given its image in two 
views and the camera intrinsic parameters. The linear triangulation method (Bouguet, 
1999; Hartley and Zisserman, 2003) back projects rays from the measured image 
points and find the intersection of the two rays for the 3D point. Since errors are 
inevitable in the measured points, the epipolar constraint (explained in Section 4.2.2 
 
 20 
 
in detail) may not be satisfied and the intersection of the back-projected rays may not 
exist. Sampson error correction is applied to these measured points to correct first-
order errors (Torr and Zisserman, 1997). This approximation is valid only when the 
error of the measured points is small. However, the epipolar constraint is still not 
satisfied exactly. A more costly triangulation method (Hartley and Sturm, 1997) can 
find the optimal 3D points by solving a sixth-degree polynomial.  
 
 
2.3.3 Object Tracking 
Visual tracking is a very important component for systems with applications in robot 
control (Milella and Siegwart, 2006), surveillance (Coifman et al., 1998), human-
computer interface (Gorodnichy and Roth, 2004), and visual reconstruction (Tomasi 
and Kanade, 1992). Various tracking methods have been proposed, which can be 
classified in four categories: model based (Kakadiaris and Metaxas, 2000; Zhou et al., 
2004), region based (Hager and Belhumeur, 1998), contour based (Blake et al., 1993) 
and feature based (Shi and Tomasi, 1994; Yao and Chellappa, 1995; Zheng and 
Chellappa, 1993) methods. In this project, feature based tracking is used, because of 
its subpixel precision, speed, as well as robustness.  
The feature tracking process can be divided into two major subtasks: feature 
selection and feature tracking. Good features can be identified as edges or corners 
(Harris and Stephens, 1988) or zero crossings of the Laplacian of the image intensity 
(Marr et al., 1979). However, these feature selection criteria are often defined 
independently of the tracking algorithm. With the Kanade-Lucas-Tomasi (KLT) 
tracker, instead, a good feature is defined as the one that can be tracked well or that 
 
 21 
 
can make the tracker work best (Lucas and Kanade, 1981; Tomasi and Kanatani, 
1991). The tracker starts from Lucas and Kanade?s fast image registration technique 
that makes use of the spatial intensity gradient of the images to find a good match 
(Lucas and Kanade, 1981). After that initial work, Tomasi and Kanade developed a 
feature tracker based on sum of squared intensity differences (SSD) and assumed 
translational inter-frame motion (Tomasi and Kanatani, 1991). Then, Shi and Tomasi 
proposed an affine model to make the region matching better over longer time spans 
(Shi and Tomasi, 1994). Subsequently, Tommasini proposed an extension of this 
tracker to automatically reject spurious features, which made the tracker even more 
robust (Tommasini et al., 1998). Now, the KLT tracker has been widely used in many 
systems for high accuracy tracking, such as facial feature tracking, robot control, and 
flow velocity measurement.  
 
2.3.4 3D Motion Estimation 
Computing a 3D rigid motion that maps a set of 3D points to another set is the basis 
of the absolute orientation problem (Horn, 1987; Horn et al., 1988). It is an important 
computer vision task and has numerous applications, such as robotics, pose 
estimation, and motion analysis. Solutions to this problem include closed-form and 
iterative methods, between which closed-form solution is preferred because of its 
robustness and efficiency. Iterative methods, however, may be trapped in local 
minima of the error function and always need a good initial guess. Because of these 
reasons, we adopted closed-form methods for our motion estimation. 
 
 22 
 
Four major closed-form solutions are used for 3D rigid body transformations. 
The first one was proposed by Arun (Arun et al., 1987). In their derivation, rotation is 
represented as a standard orthonormal matrix and the solution is based on the singular 
value decomposition (SVD) of a covariance matrix of the data. However, this method 
will fail if the data is seriously noisy or becomes planar. Umeyama (1991) and 
Kanatani (1994) corrected this with different ways of derivation. Horn (1987) 
represented rotation using quaternions and the solution is the eigenvector 
corresponding to the largest positive eigenvalue of a symmetric matrix. Horn (1988) 
later also used an orthonormal matrix to represent rotation, and derived an alternative 
method that used manipulation of matrices and their eigenvalue-eigenvector 
decomposition. Walker et al (1991) presented a solution by representing rotation and 
translation together as a dual quaternion. These four algorithms were compared with 
respect to accuracy, stability and efficiency by Eggert et al (1997). The accuracy 
shows no discernible differences, while the SVD method (Arun et al., 1987; 
Umeyama, 1991) gives the best stability. Even though it is somewhat slower than 
Horn?s (1988) method, the superior accuracy and stability made us choose this 
method in our research.   
 
 23 
 
Chapter 3  Objective 
 
The overall objective of this project is to design a new tracking system to 
prospectively correct head motion for MRI. This optical system should be compatible 
with MRI. It should be fast, robust, and accurate for tracking of head motion and 
should also be able to communicate with MR scanner before the MRI data are 
acquired. With the accomplishment of this project, sensitivity of brain MRI to head 
motion should be substantially reduced. Specifically, the objectives are: 
1) To setup the optical tracking system. One or two MR compatible infrared 
cameras should be used for tracking with illumination of infrared LEDs. The 
tracking system needs to be able to determine the subject?s head position in 
real time (three rotations plus three translations).  
2) To program MR pulse sequence to receive motion parameters from the optical 
tracking system and apply motion correction in real-time. Before each 
excitation, the MRI scan computer should be able to read the motion 
parameters to adjust the gradient field direction and RF phase and frequency 
in the imaging sequence to compensate the changes in position. 
3) To evaluate the system accuracy and performance. Specifically, we need to 
evaluate the accuracy of the optical tracking system, the calibration accuracy 
of the combined optical and MRI system, and the acquired MRI image quality 
with and without motion compensation.  
 
 24 
 
Chapter 4  Prospective Motion Correction for MRI using a 
Stereo Tracking System 
4.1 System Setup 
MRI experiments were performed on a GE (Milwaukee, WI, USA) 7T MRI system, 
equipped with a detunable volume transmit coil and a 32-channel receive-only head 
coil (Nova Medical, Wilmington, MA, USA). The optical tracking system included 
two MR compatible infrared cameras (MRC Systems GmbH, Germany) as part of a 
stereo-vision system to measure the subject?s head motion in 3D. The size of the 
cameras was 28 x 18 x 30mm3. They were fixed on a holder right above and in front 
of the head coil (Figure 4-1). The distance and the angle between them were 12 cm 
and 28? respectively. Surrounding each camera lens, six infrared light emitting diodes 
were used to illuminate the field of view. The distance from the cameras to the face 
was around 8cm. The high spatial resolution of this setup (640 x 480 pixels, about 13 
x 10cm2 field of view) allowed detection of very small movements. 
 
 
 
 25 
 
 
 
Figure 4-1    Experimental setup. The top pane is an illustration of the position of the cameras 
and head coil in the MRI scanner bore. The bottom images show the real instruments. Cameras 
are highlighted with the red ellipse. 
 
The two cameras were connected to a filter box (MRC Systems GmbH, 
Germany) via the camera connector cable. This low pass filter, which has a cutoff 
frequency of 1MHz, was used to prevent interferences caused by the high frequency 
signals of the MR scanner, as well as possible infiltration of MR-frequency noise into 
the magnet room. The video output signals were then captured by a Matrox Morphis 
frame grabber (Matrox Electronic Systems Ltd. Quebec, Canada), using a standard 
BNC cable connection, by a tracking computer running a Linux operating system. 
Real-time tracking and motion estimation software was developed using C++. The 
tracking speed was 10Hz. The tracking computer communicated with the MRI 
scanner through a TCP/IP connection (Santos et al., 2004) to provide the console 
(scan computer) with the real-time motion parameters. This allowed the image 
acquisition pulse sequence to alter gradient direction and RF offset phases and 
frequencies to compensate for the changes in object position.  
 
 
 26 
 
4.2 System Calibration 
A precise calibration is necessary to achieve the goal of motion correction for high-
resolution MRI. A three step calibration procedure was developed, which included 
the calibration for each single camera, the calibration for the stereo, and the 
calibration between the stereo cameras and the MRI scanner.  
To achieve this, a calibration phantom was designed, the internal of which 
consisted of a grid of rectangular prisms (Figure 4-2a) carved out of a plastic form. 
The prisms were all of equal size, 10 x 10 x 18mm3 (length x width x height), and 
were 10 x 10mm2 apart, except those on the four corners, which were cut into 
triangular prisms to provide differentiation in the height direction. The open spaces in 
the phantom were filled in with saline to make them visible to MRI. The cover of the 
phantom was 2.4mm thick. On the surface of the phantom, a piece of paper with a 
black-and-white checkerboard representation of the internal grid was attached (Figure 
4-2b). This design provided many corners for calibration, and the clear sharp corners 
improved calibration accuracy.  
 
 
 27 
 
 
 
(a) (b) 
Figure 4-2    Calibration model. (a) Schematics of the phantom; (b) View of the actual phantom 
surface. 
 
 
4.2.1 CCD Camera Calibration 
Camera calibration is required to build a model that can map between a 3D world and 
a 2D image. The pinhole model is a widely used one and its geometric illustration is 
presented in Figure 4-3: 
In this camera model, a 3D point TWWWW ZYX ],,[=M , is transformed from 
the world coordinate system to the camera coordinate system, TCCCC ZYX ],,[=M . 
Through central projection, the ideal (undistorted) image coordinate of the object 
point is [ ]Tuuu yx ,=m . However, common cameras are often characterized by a 
somewhat distorted projection behavior which causes the observed (distorted) image 
point shifting to [ ]Tddd yx ,=m . 
 
 
 28 
 
 
Figure 4-3    Pinhole camera geometry. OC is the camera center and also the origin of the camera 
coordinate system, Op is the principle point, M is a point in the space which has different 
coordinates in the world and camera coordinate systems, and md and mu are the image of M in 
the image plane with and without lens distortion. 
 
In practice, the image in the image plane will be further sampled by a frame 
grabber card and stored in an image memory buffer. The relationship between a point 
WM  in the world coordinate system and its image pm  in image pixel coordinate can 
be represented as  
( )CMRM ?= WC                                                                                                    [4.1]                  
??
?
??
?==
??
?
??
?=
??
?
??
?=
n
n
n
cc
cc
u
u
u y
xff
ZY
ZXf
y
x mm
/
/                                                                [4.2] 
( ) ( )
( ) ( )??
?
??
?
+++++
+++++=
??
?
??
?=
22
12
4
2
2
1
22
21
4
2
2
1
221
221
uuuu
uuuu
d
d
d yrpyxprkrky
xrpyxprkrkx
y
xm                                 [4.3] 
Where 222 uu yxr +=  
??
?
??
?
+
+=
??
?
??
?=
0
0
yyD
xxD
y
x
dy
dx
p
p
pm                                                                                       [4.4] 
 
 29 
 
 
R and C are rotation matrix and translation vector which characterize the 
transformation between the world coordinate system and the camera coordinate 
system;  f is the focal length; nm  is the normalized coordinate; 1k and k2 are the radial 
lens distortion coefficients; p1 and p2 are the tangential lens distortion coefficients; 
Dx, Dy are the number of pixels per metric unit distance (For most imaging sensors 
currently manufactured, pixels may be assumed perfectly square, implying Dx=Dy. In 
the general case, they may be different); ),( 00 yx  are the coordinates of the optical 
center (principal point) in the image coordinate system. The objective of the camera 
calibration procedure is to determine optimal values for these parameters based on 
image observations of a known calibration target. Since any calibration point WM  
and its corresponding image in the CCD sensor pm  should satisfy Equation 4.1-4.4, 
the unknown parameters can be determined when the computed projection of a 
known structure (i.e. a checkerboard pattern) matches the best with the observed 
projection on the image. If enough (larger than the number of unknown camera 
parameters) calibration point pairs are given, the camera parameters can be estimated.   
A closed-form solution for the unknown parameters without considering lens 
distortion gives a good initial guess. In absence of lens distortion, the projection of 
WM  to pm  can be simplified as: 
??
?
??
?
+
+=
?
?
?
?
?
?
?
?
?
?
?
?
+
+
=?
?
?
??
?=
0
0
0
0
yyf
xxf
yZYfD
xZXfD
y
x
ny
nx
c
c
y
c
c
x
p
p
pm                                                           [4.5] 
 
 30 
 
Where xx fDf =  and yy fDf =  represent the focal length of the camera in terms of 
pixel dimensions in x and y direction. 
Denoting the homogeneous coordinates of a vector [ ]Tyx ?,,=x  by x~ , i.e. 
[ ]Tyx 1,,,~ ?=x , Equation 4.1 and 4.5 can be written as: 
[ ] [ ] WWW
cy
x
p yf
xf
s
MPMTRKMRCR0IK
Mm
~~~
10
~
0
0
0
1
~
0
0
==?
?
?
??
? ?=
?
?
?
?
?
?
?
?
?
?
=
                                         [4.6] 
Where s is an arbitrary scale; K is a 3?3 matrix, mapping the normalized image 
coordinates to the image pixel coordinates; RCT ?= ; and P is a 3?4 matrix, called 
the perspective projection matrix. In matrix P, there are a total of 10 unknowns: 4 for 
K, 3 for R and 3 for C. The parameters contained in K are called the intrinsic camera 
parameters (distortion parameters are intrinsic parameters too, but are not estimated in 
this closed-form solution), and those in R and C are extrinsic parameters.  
Vanishing points can be used for calibration for its simplicity and ease of 
computation (Bouguet, 1999; Daniilidis and Ernst, 1996; Wang and Tsai, 1991). 
Figure 4-4 shows a perspective image of a square ABCD in space. 1nm , 2nm , 3nm  
and 4nm  are the four corners of the square observed in the image plane after 
normalization, which means: 
piy
x
ni yf
xf
s mm ~
1
/1
/1
~
0
0
?
?
?
?
?
?
?
?
?
?
?
?
=                                                                              [4.7] 
 
 
 31 
 
 
Figure 4-4    Camera calibration using a square target. (a) Projection of a square ABCD to the 
image plane; (b) The image and the four vanishing points { 1~V , 2~V , 3~V , 4~V  }, and the horizon 
line H?~ . 
 
By assuming the optical center is at the center of the image, the intrinsic 
parameters to be estimated are therefore xf  and yf  only, and Equation 4.7 can be 
simplified as: 
ipiy
x
ni sf
f
s pKpm ~~
1
0/1
0/1
~ 1?=
?
?
?
?
?
?
?
?
?
?
=                                                                      [4.8] 
Where Kp is the intrinsic camera matrix without principle points, and )4,1(~ ?=iip  is 
the homogeneous pixel coordinates of the corners after subtraction of the optical 
center. 
From the geometry in Figure 4-4, the two vanishing points 1~V  and 2~V  can be 
determined by: 
 
 32 
 
21
432
144
323
211
432
211
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
VV?
??Vmm? mm?
??Vmm? mm?
???
?
?
?
???
?
???
?
?
?
??
??
???
?
?
?
??
??
H
nn
nn
nn
nn
                                                [4.9] 
Where ? is the standard cross product. All the equalities in Equation 4.9 are defined 
up to scale. However, the normalized coordinates of the corners are not available 
since the focal lengths are unknown. But the pixel coordinates of the four corners 
)4,1(~ ?=iip  are available, so Equation 4.9 should be written in pixel unit:  
p
HpH
p
p
p
p
p
pp
ppp
p
ppp
p
p
p
p
p
pp
ppp
p
ppp
?K?
VK?K?KV?KpKpK? ?KpKpK?
VK?K?KV?KpKpK? ?KpKpK?
~~~
~~)~()~(~~~
~)~()~(~~
~~)~()~(~~
~~)~()~(~~~
~)~()~(~~
~~)~()~(~~
*1
2
1
4
*1
3
*1
2
4
*1
1
1
4
1
4
3
*1
3
1
2
1
3
1
1
2
*1
1
*1
1
2
*1
4
1
3
1
2
1
*1
2
1
1
1
1
?
???
???
???
???
???
???
??
?
?
?
?
?
?
?
?????
?
???
???
???
?????
?
???
???
???
     [4.10] 
Where *pK  is the adjoint of Kp. Since K is a 3?3 matrix, and u, v are 3-vectors, we 
have )()()( * vuKKvKu ?=?  (Faugeras and Robert, 1996). Because K is invertible, 
1* )( ?= TKKK , therefore KK ?~** . And also explicitly, 
pp
H
ppp
p
p
ppp
p
p
21
432
144
323
211
432
211
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
VV?
??Vpp? pp?
??Vpp? pp?
???
?
?
?
???
?
???
?
?
?
??
??
???
?
?
?
??
??
                                              [4.11] 
Because the two vanishing points 1~V  and 2~V  are mutually orthogonal: 
0~)()~(
)~()~(~~
2
11
1
2
1
1
1
21
=?
???
??
??
p
p
T
p
Tp
p
p
p
p
VKKV
VKVKVV                                                                    [4.12] 
 
 33 
 
In Equation 4.12, p1~V  and p2~V  can be calculated from Equation 4.11, and are denoted 
here as Tp cba ],,[~~ 1111 ?V  and Tp cba ],,[~~ 2222 ?V , so Equation 4.12 provides one 
constraint in the focal lengths: 
021221221 =++ ccfbbfaa
yx
                                                                                           [4.13] 
Another constraint for the focal lengths comes from the fact that the diagonals 
of the square ABCD are orthogonal too. Therefore, similarly, we get: 
??
???
???
????
??
???
???
???
???
???
???
???
p
p
p
Hp
p
p
p
p
p
Hp
p
p
p
ppp
p
ppp
4
1*1
6
*1
4
3
1*1
5
*1
3
6
*1
4
1
2
1
6
5
*1
3
1
1
1
5 ~
~)~()~(~~
~~)~()~(~~
~~)~()~(~~
~~)~()~(~~
VK?K?KV
VK?K?KV
?KpKpK?
?KpKpK?        [4.14] 
Then, since 
0~)()~(
)~()~(~~
4
11
3
4
1
3
1
43
=?
???
??
??
p
p
T
p
Tp
p
p
p
p
VKKV
VKVKVV                                                                    [4.15] 
We get another constraint for focal lengths with denoting Tp cba ],,[~~ 3333 ?V , and 
Tp cba ],,[~~
4444 ?V : 
043243243 =++ ccfbbfaa
yx
                                                                                           [4.16] 
So, xf  and yf  are the solution of Equation 4.13 and 4.16.  
Next, the extrinsic parameters can be initialized through homography (Zhang, 
2000). Without loss of generality, the calibration pattern plane is assumed to be on 
0=WZ  plane. Denote the ith column of the rotation matrix R by ri. From Equation 
4.6, we have 
 
 34 
 
[ ] [ ]
?
?
?
?
?
?
?
?
?
?
=
?
?
?
?
?
?
?
?
?
?
?
?
=
?
?
?
?
?
?
?
?
?
?
1101
21321 W
W
W
W
p
p
Y
XYX
y
x
s TrrKTrrrK                                       [4.17] 
 Therefore, a calibration point WM~  (since ZW is 0, TWWW YX ]1,,[~ =M ) and its image 
pm
~  is related by a homography H: 
Wps MHm
~~ =        with       [ ]TrrKH
21=                                                       [4.18] 
To calculate the homography, we first define a 9? 1 vector Ha as 
TTTT ],,[
321 hhh ,  where 
T
ih  (i=1,2,3) is the i
th row of H. Then Equation 4.18 can be 
written as 
  
0~~
~~
~ 0
~ 0
~
~
1 3
2
1
3
2
1
=
?
?
?
?
?
?
?
?
?
??
??
??
?
=
=?
=?
?
?
?
?
?
?
?
?
?
?
?
=
?
?
?
?
?
?
?
?
?
?
aHMM0
M0M
Mh
Mh
Mh
M
h
h
h
T
Wp
T
W
T
T
Wp
TT
W
W
T
W
T
p
W
T
p
W
T
T
T
p
p
y
x
s
sy
sx
y
x
s
                                     [4.19] 
When all the corners in our calibration checkerboard pattern were used(Figure 4-2b), 
which was 100, we had 100 of the above equations. These can be written as 
0LHa = , where L is a 200? 9 matrix. Therefore, the solution of Ha is the right 
singular vector of L associated with the smallest singular value. Note that H is 
defined up to a scale factor. Denoting H by its column vector ][ 321 hhhH = , from 
Equation 4.18, we get 
?
?
?
???
?
=
?=
=
=
?=
?
?
?
?
3
1
213
2
1
2
1
1
1
1
21 ][
hKT
rrr
hKr
hKr
HKTrr
?
?
?
?                                                                      [4.20] 
 
 35 
 
To assure r1 and r2 are unit vectors, 2/11
2
1
1
1 ?
?
?
?
?
?
?
?
+= ?? hKhK?  (or choose either 
1
1
1
hK ?=?  or 21
1
hK ?=?  since they are very close). However, because of noise, 
the computed ][ 321 rrrR =  does not in general satisfy the properties of a rotation 
matrix, therefore we need to do the following: 
213
222
12122
111
/
,
/
rrr
rrr
rrrrr
rrr
?=
?
??
?
                                                                                                [4.21] 
Where ??? means ?to be replaced by?, and <,> is the standard inner product.  
The solution of the intrinsic parameters from Equation 4.9 ? 4.16, and the 
extrinsic parameters from Equation 4.18 ? 4.21 assumes no lens distortion of the 
cameras. However, the MR compatible cameras used in this project exhibited 
significant lens distortion, especially radial distortion.  So to estimate all the camera 
parameters, we need to use the already calculated parameters as an initial guess for 
the minimization of the nonlinear cost function: 
2
1 1
212100 ),,,,,,,,,,(??
= =
?
M
i
N
j
wjiiyxppij ppkkyxff MTRmm                                  [4.22] 
Where ),,,,,,,,,,( 212100 wjiiyxp ppkkyxff MTRm  is the projection of point WjM  
from the world coordinate system into the ith calibration image according to Equation 
4.1 ? 4.4. The 3?3 rotation matrix Ri has 3 DOF only and is converted to a 3?1 
vector through Rodrigues formula for computation efficiency (Faugeras, 1993). pijm  
is the measured pixel coordinate. M is the total number of views used for calibration 
 
 36 
 
and N is the number of corners on the calibration target in each view. It has been 
shown that calibration errors decrease when more images are used (Zhang, 2000), so 
20 ~ 30 calibration images were used in our project. Figure 4-5 shows two calibration 
images, with red ?+? marking the N=100 corners (Harris and Stephens, 1988). The 
nonlinear minimization is solved with sparse Gauss-Newton algorithm (Hartley and 
Zisserman, 2003, Section 9.3). 
 
 
Figure 4-5    Two views of the calibration target. The 100 red crosses on each image mark the 
extracted corners used for calibration. 
 
 
 4.2.2 Stereo Calibration 
After calibration of both cameras, the relationship between the two cameras can be 
estimated. Let Oc and Oc? be the camera centers of the left and right cameras and I, I? 
the corresponding image planes respectively (Figure 4-6). Given a point M in the 
world, its projection on the left image is m, and on the right image is m?. For every 
point on the line OcM, their projections in the left image are all m, but their 
projections in the right image are on a line 'ml , which is called the epipolar line of m. 
The line 'ml  is the intersection of the plane I? with the plane ? , defined by m, Oc and 
   (a)       (b) 
 
 37 
 
Oc?, which is called the epipolar plane. Because the epipolar geometry is symmetric, 
point m? in the right image corresponds to an epipolar line lm? in the left image, which 
is the intersection of the same plane ?  with the left image plane I. Furthermore, all 
epipolar planes (with different M) contain the line OcOc?. They all intersect I at a 
common point e and intersect I? also at a common point e?, which are called epipoles.  
 
Figure 4-6    The epipolar geometry. M is a point in space, Oc and Oc? are the two camera centers. 
The camera baseline OcOc?intersects each image plane at the epipoles e and e?. Oc , Oc?, M and its 
images m and m? all lie in a common plane, called epipolar plane. The two lines me and m?e? are 
epipolar lines. 
 
 
The epipolar geometry provides a computational significance in stereo vision 
that for a point in one image, its correspondence in the other image must lie on the 
epipolar line, and therefore the search space for a correspondence is reduced from 2D 
to 1D. But to get that, the rotation Rs and translation Ts between the two camera 
coordinate systems need to be determined.  
Given a set of points in space Mij (i=1,?,M represents the number of images 
used for calibration, j=1,?,N represents the number of corners in each calibration 
image), and let Tcijcijcijcij ZYX ],,[=M , Tcijcijcijcij ZYX ],,[ '''' =M  be the Euclidean 
coordinates of Mij in the two camera coordinate systems, then  
 
 38 
 
ss TMRM += cijcij
'                                                                                                  [4.23]  
From each camera?s calibration step, we get the extrinsic parameters. Also, since the 
two cameras get images of the same object simultaneously, they have the relation: 
iiiicijiicij
iwjicij
iwjicij TRRTMRRM
TMRM
TMRM 1''1''
'''
?? ?+=?
??
???
+=
+=                                    [4.24] 
Therefore, with the M calibration images, we get M rotation and translation 
estimations. The median of these is used as the initial guess to search for the optimal 
Rs and Ts by minimizing the following cost function: 
2
1 1
'
2
'
1
'
2
'
12121 ),,,,,,,,,,,,,,(??
= =
?
M
i
N
j
wjiippij ppkkppkk MTRTRK'Kmm ss
((            [4.25] 
 Where [ ]TTpijTpijpij ',mmm =(  is the observed corner position in both the left and right 
images; wjM  is the point in world coordinate system; 
),,,,,,,,,,,,,,( '2'1'2'12121 wjiip ppkk'ppkk MTRTRKKm ss(  is the projection of a space 
point Mwj to the two cameras by combining Equation 4.1 ? 4.4 and 4.25; K and K? 
are the calibration matrix of the two cameras, including 00 ,,, yxff yx  and 
'
0
'
0
'' ,,, yxff
yx  ; 2121 ,,, ppkk  and 
'
2
'
1
'
2
'
1 ,,, ppkk  are distortion parameters; Ri, and Ti are 
the extrinsic parameters corresponding to the ith calibration image captured by one 
camera. The nonlinear minimization is also solved with the sparse Gauss-Newton 
algorithm (Hartley and Zisserman, 2003). Note that in Equation 4.25, all the intrinsic 
parameters of the two cameras are to be refined even though they have been 
determined already after calibration of each camera. The reason to do it here is that 
 
 39 
 
the epipolar constraint can provide some refinement on these parameters for better 
calibration accuracy.  
 
4.2.3 3D Reconstruction from Stereo 
With all the parameters of the stereo known, from any corresponding image points m 
and m? in the two camera images, we can retrieve the 3D coordinate of that point 
based on stereo triangulation (Hartley and Zisserman, 2003; Hartley and Sturm, 1997; 
Bouguet, 1999). From epipolar geometry in Figure 4-6, m, m?, Oc and Oc? must lie in 
the epipolar plane, therefore the back-projection of the two lines Ocm and Oc?m? 
intersect in 3D at the point M. The computation of a 3D point position from its image 
in two 2D views is called stereo triangulation.  
Let Tnnn yx ],[=m  and Tnnn yx ],[ ''' =m  be the normalized coordinates for 
points m and m? respectively, calculated using Equation 4.2 ? 4.4 after the two 
cameras being calibrated. Let Tcccc ZYX ],,[=M  and Tcccc ZYX ],,[ '''' =M  be the 
coordinates of M in the two camera coordinate systems respectively, from Equation 
4.2, we can get: 
'''
'
'' ~
1
,~
1
ncn
n
ccncn
n
cc Zy
x
ZZy
x
Z mMmM =
?
?
?
?
?
?
?
?
?
?
==
?
?
?
?
?
?
?
?
?
?
=                                                     [4.26]                          
With the known Rs and Ts from stereo calibration, and ss TMRM += cc'  from 
Equation 4.23, we get: 
ss TmRm += ncnc ZZ
~~ ''                                                                                             [4.27] 
 
 40 
 
Since cZ  and 'cZ  are the unknowns, the above equation can be rewritten as: 
[ ] ss TmmR =?
?
?
??
??
'
'~~
c
c
nn Z
Z                                                                                         [4.28] 
Denote ]~~[ 'nn mmRA s?=  (a 3?2 matrix), the least square solution is then: 
sTAAA
TT
c
c
Z
Z 1
' )(
?=?
?
?
??
?                                                                                             [4.29] 
If there is no noise in m and m?, Equation 4.29 returns the exact 3D position 
of the point M, which is the intersection of the two back-projecting rays Ocm and 
Oc?m?. However, if noise exists, the two rays don?t intersect in general, it is thus 
necessary to estimate a best solution for the point in 3D world. This can be realized 
by Sampson correction of the measured points (Hartley and Zisserman, 2003). But 
before Sampson correction, the epipolar constraint needs to be reiterated here. 
For Equation 4.27, by taking the cross product with Ts, followed by the inner 
product with '~ nm , we obtain 0)~(~ ' =?? nn mRTm ss . This can also be written as: 
ss RTEmEm ?== ][0
~~ ' where
n
T
n                                                                         [4.30] 
E is called the essential matrix, and ?][ sT is the skew-symmetric matrix of Ts, defined 
as: 
?
?
?
?
?
?
?
?
?
?
?
?
?
=?
0
0
0
][
12
13
23
ss
ss
ss
TT
TT
TT
sT                                                                                 [4.31] 
Equation 4.30 mathematically describes the epipolar constraint illustrated in 
Figure 4-6. In particular it means that '~ nm  lies on the epipolar line nm mEl ~' = .  
 
 41 
 
A typical observation of a noisy point correspondence 'nn mm ?  doesn?t in 
general satisfy the epipolar constraint. They have to be corrected to 'ncnc mm ?  
which have minimum distance to the measured 'nn mm ?  but satisfy the epipolar 
constraint 0~~ ' =ncTnc mEm exactly. To emphasize the dependency on TTnTn ],[ 'mmx = , 
we will write the epipolar constraint as 0)(C =xE . This cost function may be 
approximated by a Taylor expansion to first order: 
x
E
ExE ?xx?x ?
?+=+ C)(C)(C                                                                               [4.32] 
The corrected TTncTncc ],[ 'mmx =  satisfy the constraint, i.e. 0)(C =cE x . Since 
xc ?xx += , we have 0
C)(C =
?
?+
x
E
E ?xx . Denote the partial derivative matrix 
(Jacobian matrix) x?? EC as J and the cost  )(C xE  as ? , the Sampson correction term 
x?  is thus the solution of equation ?x ?=J? : 
?TTx 1)( ??= JJJ?                                                                                                  [4.33] 
The corrected point is therefore: 
?TTc 1)( ??= JJJxx                                                                                               [4.34] 
 Explicitly, the cost here is nTn? mEm ~~ '= , and the Jacobian matrix is: 
])~(,)~(,)~(,)~[( 212'1' nnnTnT? mEmEmEmExJ =??=                                                   [4.35] 
The elements of J in the previous equation are computed by the chain rule as: 
1
''
''
1 )
~(~)0,0,1(~ ~~~~~
n
T
n
T
n
n
T
n
n
n
n
n
T
n
n xxx
? mEmE
m
mEmmmEmJ ==
?
?
?
?=
?
?=
?
?=                     [4.36] 
 
 42 
 
Then from Equation 4.34 the corrected point is: 
??
?
?
?
?
?
??
?
?
?
?
?
+++???
?
?
?
?
?
?
?
?
?
?
=
?
?
?
?
?
?
?
?
?
?
?
?
2
1
2
'
1
'
2
2
2
1
2
2
'2
1
'
'
'
'
'
'
)~(
)~(
)~(
)~(
)~()~()~()~(
~~
n
n
n
T
n
T
nnn
T
n
T
n
T
n
n
n
n
n
nc
nc
nc
nc
y
x
y
x
y
x
y
x
mE
mE
mE
mE
mEmEmEmE
mEm                 [4.37] 
After Sampson error correction, the 3D position of a point can be estimated 
using Equation 4.29. But since Sampson correction is a first-order correction, it is 
accurate only if the noise in each image is small, making it a suboptimal solution. The 
optimal solution could be found by minimizing the function: 
0~~),(),(),(C '2''2' =+= ncTncncnncnnnE tosubjectdd mEmmmmmmm                [4.38] 
Where d(*,*) is the Euclidean distance between the points. This cost function can be 
minimized using Newton iteration algorithm, but it is more computationally 
expensive. The Sampson error correction therefore gives a close solution but a much 
cheaper computation. 
 
4.2.4 Integrated Optical and MRI System Calibration 
The optical tracking system measured the motion parameters in the camera coordinate 
system. These needed to be converted to the MRI coordinate system to be used by the 
imaging sequence. Therefore, an accurate calibration was necessary to estimate the 
rotation (Rmc) and translation (Tmc) between the camera and MRI coordinates.  
To achieve this, the phantom was put in the target field (approximately the 
position of a subject?s face) within the field of view of the scanner and video cameras 
(Figure 4-7). A 3D image of the phantom was acquired by the MRI scanner 
(resolution 0.4 x 0.4 x 0.7 mm3, TR=55ms, TE=6ms, FA=10?, Slice Number=40) and 
 
 43 
 
the stereo measured the 3D position of the calibration corners on the surface pattern 
based on stereo triangulation including distortion correction for the cameras (Section 
4.2.3).  
 
Figure 4-7    Calibration between the optical system and MRI scanner.  The calibration phantom 
was put in the field of view of both the cameras and the MRI scanner. 
 
To estimate the surface corner positions in MRI coordinates, the internal 
structure needed to be extracted. Figure 4-8 shows two slices from the 3D scan of the 
calibration phantom. We used corners to describe the internal structure, which when 
connected are the sides of prisms. Harris corner detection (Harris and Stephens, 1988) 
was applied on these MR images. A potential problem may be caused by air bubbles 
inside the phantom, i.e. the blue arrow in Figure 4-8a, which could result in a false 
corner. To correct this, lines were fitted on the corners and the crossings of these lines 
were marked as real corners, shown with red ?+? in Figure 4-8. The MRI coordinates 
of these corners can be determined from their pixel coordinates and the slice position 
information from DICOM image header (Section 9.2).  
 
 44 
 
 
Figure 4-8    Two images from the 3D scan of the calibration phantom. The red crossings mark 
the corners.  The blue arrow marks an air bubble inside the phantom. 
 
For each side of the prism, we have forty positions in MRI coordinates since 
we scanned 40 slices. They were used to estimate the corresponding corner position 
on the phantom surface. Figure 4-9a shows the estimation of one side of a prism from 
the extracted corners. Blue dots were the observed corners (Figure 4-8), which 
obviously did not lie on a line because of noise. To avoid this, a line that minimizes 
the perpendicular distances from each of the points to the line needs to be fitted. 
Principal Components Analysis (PCA) is a good solution for 3D line fitting. Denote 
all the blue dots as Mmi=[Xmi,Ymi,Zmi]T  and Mm=[ Mm1, Mm2,? Mmk] (i=1,2,?,k) 
where k is the total number of slices (k equals to 40 here). Principal components 
(P=[p1 p2 p3]) of Mm are the eigenvectors of the covariant matrix of Mm: 
DPPMM TTmm =                                                                                                     [4.39] 
Since Mm is of size 3?k, PCA generates three orthogonal components. The 
first principal component (denote as p1) gives the direction vector of the line because 
   (a)       (b) 
 
 45 
 
it explains as much of the variance in the points as is possible with one dimension 
(green line in Figure 4-9a). The second and third components (p2 and p3) define 
directions perpendicular to the line, and are the error term in the fitting (blue lines in 
Figure 4-9a). The function of the fitted line is therefore: 
1)( pMl tmean nf +=                                                                                              [4.40] 
Where t is the free variable for the line function. Then the position of the 
corresponding corner on the surface can be estimated because the thickness of the 
phantom cover is known as 2.4mm (red circle in Figure 4-9a).  
 
Figure 4-9    Phantom corner position estimation in MRI coordinate system. (a) shows one side of 
a prism. Blue dots are the observed corner position along that side; green line is the estimated 
side and the red circle is the estimated corner position on the phantom surface. (b) shows the 
alignment between the extracted internal structure (green) of the phantom and a simulated ideal 
model (red). The green lines are the side of the prisms and the red circles are the estimated 
corners position in MRI (all dimensions are in mm).      
 
  
By performing the line fitting on each side of all the prisms inside the 
phantom, one can estimate all the corner positions on the surface. These corners do 
not lie in a plane exactly because of noise. So they were further registered to a 
simulated phantom surface (checkerboard pattern). The corners on the simulated 
                                       (a)                                                             (b) 
 
 46 
 
phantom were transformed into the MRI coordinate system and were the corrected 
positions used for calibration. Figure 4-9b shows the results after alignment. Red 
prisms represent the simulated phantom, which has 10 x 10 x 18 mm3 rectangular 
prisms in the middle and triangular prisms on the four corners. Green prisms are 
structures extracted from the MRI 3D image. Red circles represent the calibration 
corner positions (on the camera-visible checkerboard grid, 2.4 mm from the liquid 
surface).  
Therefore, based on the simultaneous measurements of the same points in 
both the MRI scanner and stereo coordinates, the rotation (a 3?3 orthonormal matrix 
Rmc) and translation (a 3?1 vector Tmc) between them can be calculated using a 
singular value decomposition (SVD) method (Section 4.3.3). 
. 
4.3 Real-time Motion Calculation 
4.3.1 Kanade-Lucas-Tomasi Tracker 
The Kanade-Lucas-Tomasi (KLT) tracker (Lucas and Kanade, 1981; Tomasi and 
Kanatani, 1991; Shi and Tomasi, 1994) is a feature tracker widely used in the 
computer vision community. It offers sub-pixel accuracy and is very computationally 
efficient, therefore is utilized for the head tracking in this project.  
Denote ),( tI x  as the image intensity, where x is the position of a point, and t 
is time. As the object moves, the image intensity changes in a complex way, but if the 
camera grabbing frequency is sufficiently high, or the movement between successive 
images is sufficiently small, we can model the two successive images as: 
 
 47 
 
)(),(),( xxdx ntItI +=++ ?                                                                                  [4.41] 
Where ?  is the time difference between sampling of the two frames; d is the 
displacement vector (dx,dy)T; and n is noise.  
The displacement vector d is the one that can minimize the residual error: 
[ ]2),(),()( ? ?++=
W
ii tItI xdxx ??                                                                       [4.42] 
Where xi are points in the Gaussian window W centered on x.  
When the displacement vector is small, the intensity function can be 
approximated by Taylor series expansion truncated to the linear term, as: 
dxxxdx ),(),(),( ??? +??++=++ tItItI iii                                                          [4.43] 
By dropping the time term in the intensity for simplicity and denoting 
),(),()( tItIh iii xxx ?+= ? , ),()( ?+??= tI ii xxxg , we can rewrite the residual error 
as: 
[ ]2)()()( ? +=
W
iih dxgxx?                                                                                     [4.44] 
To find the displacement d, we let the derivative of the error to zero: 
[ ]
??
?
=?
=+=??
W
ii
W
i
T
i
i
W
ii
h
h
)()()()(
0)()()(2)(
xgxdxgxg
xgdxgxdx?
                                                                   [4.45] 
In other words, we must solve the equation: 
eGd =                                                                                                                     [4.46] 
 
 48 
 
Where G is a symmetric 2?2 matrix ?=
W
i
T
i )()( xgxgG and e is a 2?1 vector 
?=
W
iih )()( xgxe . However, because of the linearization of Equation 4.43, the 
solution of d is only approximately correct. Iteration in Newton style is therefore 
needed to search for the right displacement by resampling )( ih x with new d in each 
step. Iteration is stopped when difference in d is negligible, which usually takes less 
than 5 iterations. 
It can also be seen from Equation 4.46 that to get a reliable displacement, both 
eigenvalues of G must be larger than noise level. Explicitly, two small eigenvalues 
mean a roughly flat intensity within a window; a large and a small eigenvalue 
correspond to an edge and two large eigenvalues represent corners. So, to select good 
reliable features, we accept a window if: 
??? >)2,1min(                                                                                                      [4.47] 
Where 2,1 ??  are the eigenvalues of G and ?  is a predefined threshold. 
 
4.3.2 Stereo Matching 
In Section 4.2.3, it was shown that the position of a 3D point can be 
reconstructed from two perspective images. To achieve a correct 3D reconstruction, 
point correspondences between two images need to be established. A large amount of 
work has been carried out for matching different images of a single scene (Chou and 
Chen, 1990; Weng et al., 1992a), but the results are not satisfactory if the only 
geometric constraint, i.e., the epipolar constraint, is not used. In our application, after 
 
 49 
 
stereo calibration, the epipolar geometry is known, therefore robust correspondences 
can be established   between the two images (Zhang et al., 1994). 
After applying KLT on one of the camera image, we got N feature points 
suitable for tracking. For each point mi (i=1,2,?,N) in image 1, a corresponding 
epipolar line can be found as nim mEl ~' = , where E is the essential matrix defined in 
Equation 4.30, nim~  is the normalized homogeneous coordinate for point mi. Figure 
4-10 shows the two epipolar lines (red lines in b) calculated for two randomly picked 
points (red ?+? in a). Both lines contain the corresponding points. Therefore the 
search range for a matching point is narrowed to a band centered on the epipolar line. 
All corners 'ijm (j=1,?,number of corners) are detected in this band as candidates 
(Harris and Stephens, 1988) and correlations are computed to search for the 
corresponding point.  
  (a) (b) 
Figure 4-10    Epipolar lines  estimated after stereo calibration. (a) shows an image from one of 
the camera and two randomly picked points. (b) shows the image from the other camera and the 
estimated epipolar lines corresponding to the two points in (a) respectively. 
 
 
 
 50 
 
Given a point mi in image one and candidate points 'ijm  in image two, a 
correlation window of size (2n+1)? (2m+1) centered at each of these points was 
used. The correlation coefficient is defined as: 
 
[ ][ ]
[ ] [ ] 2'2'2211
'
2
'
211
'
)()()()(
)()()()(
),(
???
?
???
? ?+
???
?
???
? ?+
?+?+
=
??
?
??
?
W
ijij
W
ii
W
ijijii
iji
IIII
IIII
c
??
?
m?mm?m
m?mm?m
mm        [4.48] 
Where ? are the points in the Gaussian window W centered on zero. 
)]12)(12/[()()( +++= ?
?
mnII
W
kk
?
?mm  is the average at point m of Ik (k=1,2). 
Correlation measures the relationship between two windows. Possible correlations 
range from +1 to ?1. A zero correlation indicates that there is no relationship between 
the windows. A correlation of ?1 indicates a perfect negative correlation, meaning 
that as the intensities in one window go up, the other go down. A correlation of +1 
indicates a perfect positive correlation, meaning that intensities in both windows have 
the same pattern. A best match point for mi is therefore the one in 'ijm  which has the 
highest correlation. However, there could be cases where no match point exist in 
image two for a point in image one (occlusion, outside of field-of-view, etc), so a 
minimum correlation threshold was set to avoid this problem. Therefore, the best 
match point is 
thresholdccm ijir _)},(max{ '' >= mm                                                                   [4.49] 
In our implementation, n=m=7 for the correlation window, and a correlation threshold 
of was set to 0.7. 
 
 
 51 
 
4.3.3 Rigid Motion Estimation 
3D positions from the corresponding points in the two views can be calculated based 
on stereo triangulation. When these feature points are tracked in real-time, rigid 
motion parameters (rotation Rc, translation Tc) can be calculated.  
Given two sets of points },...,,{ 21 Nxxx  and },...,,{ 21 Nyyy in 3D space at 
successive time, rigid motion parameters are the solution by minimizing the cost 
function (Arun et al., 1987; Horn, 1987; Horn et al., 1988; Kanatani, 1994; 
Umeyama, 1991; Haralick et al., 1989): 
( )
2
1
2 ?
=
+?=
N
i
ii cc TxRy?                                                                                     [4.50] 
The minimum should occur when the partial derivative of 2?  with respect to Tc 
equals zero.  
( )( )
xRyT
TxRyT
cc
cc
c
?=?
=?+?=?? ?
=
N
i
ii
1
2
0)1(2?                                                                     [4.51] 
Where 
??
==
==
N
i
i
N
i
i NN
11
1,1 yyxx                                                                                   [4.52] 
Therefore once Rc is known, Tc can be calculated easily.  
To minimize 2?  with respect to Rc, the orthonormal constraint of the rotation matrix 
needs to be used. Let [ ]TTTT 321 ,, rrrR c = , where each Tkr  is a row of Rc. The 
constraints can be written as: 
kj
k
k
T
j
k
T
k
?=
==
,1
3,2,1,1
rr
rr                                                                                                 [4.53] 
 
 52 
 
Construct the Lagrangian function for 2?  with the above constraint, we get 
( )
?
??
=
= =
+++?+
??=
3
1
326315214
1
3
1
22
222)1(
k
TTT
k
T
kk
N
i k
cki
T
kik Ty
rrrrrrrr
xr
????
?
                                    [4.54] 
Where [ ] [ ]TcccTiiii TTTyyy 321321 , == cTy . 
Substitute the constraint of Tc in Equation 4.51 to Equation 4.54, we get 
( )
?
??
=
= =
+++?+
???=
3
1
326315214
1
3
1
22
222)1(
)(
k
TTT
k
T
kk
N
i k
i
T
kkik yy
rrrrrrrr
xxr
????
?
                                    [4.55] 
Then take the partial derivative of 2?  with respect to kr , and set them to zero, we get: 
))(())((
))(())((
))(())((
3
1
3
1
3326153
2
1
2
1
3622142
1
1
1
1
3524111
xxrrrrxxxx
xxrrrrxxxx
xxrrrrxxxx
??=+++??
??=+++??
??=+++??
??
??
??
==
==
==
i
N
i
i
N
i
T
ii
i
N
i
i
N
i
T
ii
i
N
i
i
N
i
T
ii
yy
yy
yy
???
???
???
                        [4.56] 
Let  
?
=
??=
N
i
T
ii
1
))(( xxxxA  
?
?
?
?
?
?
?
?
?
?
=
365
624
541
???
???
???
?  
And 
 
 53 
 
T
i
N
i
i
i
N
i
ii
N
i
ii
N
i
i yyyyyy
)()(
))(())(())((
1
3
1
32
1
21
1
1
yyxx
xxxxxxB
??=
??
?
??
? ??????=
?
???
=
===                 [4.57] 
Equation 4.56 can be rewritten as 
BR?ARRB?RAR ccccc =+?=+ TTT                                                                 [4.58] 
Since A is symmetric, AA =T , therefore ( ) TTTTT cccccc ARRRARARR ==  is also 
symmetric. Both of the two terms on the left-side are symmetric, which means the 
right-side must also be symmetric 
TT
cc RBBR =                                                                                                           [4.59] 
If the singular value decomposition (SVD) of B is TUDVB =  where U and V are 
orthonormal and D is diagonal, then 
TTTTTT
ccc RVDURUDVUDVR == )(                                                                  [4.60] 
It can be observed that a solution for Rc is simply 
TVUR
c =                                                                                                               [4.61] 
But Kanatani (Kanatani, 1994) proves that this solution gives a reflection instead of 
rotation if det(VUT)=-1. So the correct solution should be adjusted to 
T
T
U
VU
VRc
?
?
?
?
?
?
?
?
?
?
=
)det(
1
1
                                                                               [4.62] 
Which obviously still satisfies Equation 4.60.  
Given 3D feature point positions at any two successive times, real time 
motion parameters can be calculated by first constructing the correlation matrix B 
 
 54 
 
using Equation 4.57, then calculating the rotation Rc from the SVD of B using 
Equation 4.62 and finally determining translation Tc from Equation 4.51.  
 
4.4 Motion Calculation in MRI coordinates 
4.4.1 Motion in MRI physical coordinates 
During the actual experiment, a number of feature points on the object (or face) were 
automatically selected in the left camera image after which the corresponding points 
were identified in the right camera image (Section 4.3.2). After correction for the 
image distortion of the camera lenses, the position of these feature points could be 
calculated in 3D using triangulation (Section 4.2.3). The changes in position of these 
feature points in the camera images were tracked over time (Section 4.3.1), allowing 
calculation of changes in their 3D position. The translation and rotation parameters 
were then estimated from this collection of 3D points (Section 4.3.3). However, these 
parameters were in the camera coordinate system and needed to be transformed into 
MRI coordinate system based on the prior calibration data. 
At time t0, the relationship between the points MRI coordinates (Xm) and 
camera coordinates (Xc) can be described as follows, based on calibration data: 
mccmcm TXRX += )()( 00 tt                                                                                     [4.63] 
 where Rmc and Tmc are the rotation and translation parameters for the conversion 
between MRI and camera system (Section 4.2.4). 
At time ti, the tracker measures the movement parameters in camera 
coordinate system, 
 
 55 
 
cccc TXRX += )()( 0tti                                                                                           [4.64]                                                             
Where Rc and Tc are the rotational and translational motion parameters. Using the 
above two equations, we can calculate the movement (rotation Rm and translation Tm) 
in MRI coordinates, 
mccmcmcmccmcm
mccmcm
mccmcmcmccmcmmccmcm
TTRTRRRT
RRRR
TTRTRRRXRRRX
++?=
=?
++?=
?
?
??
1
1
1
0
1 )()( tt
i
                               [4.65]     
 
4.4.2 Motion in MRI logical coordinates 
Since the pulse sequence used MRI logical coordinate system instead of physical 
coordinate system, the motion parameters calculated from Equation 4.65 need to be 
transformed into the logical coordinate system.  
MRI physical coordinate system refers to the standard axes of an MRI magnet 
system, shown in Figure 4-11. The subject is usually placed head first into the magnet 
and carried in by a sliding gantry on the patient table. If the person is supine (lying on 
his/her back), the head point along M MO Z  (direction of the main magnetic field), 
right shoulder along M MO X  (left-right direction), and nose along M MO Y  (up-down 
direction). OM is the gradient isocenter, where the gradients produce zero magnetic 
field and which typically corresponds to the center of the magnet. This coordinate 
system does not change and is used in camera and MRI system calibration in Section 
4.2.4. 
The MRI logical coordinate system, however, depends on the prescribed scan 
plane. As in Figure 4-11, OLXL, OLYL and OLZL correspond to frequency encoding, 
 
 56 
 
phase encoding and slice selection directions respectively. These change with 
different scan plane prescription and are used in MRI pulse sequence.  
 
Figure 4-11    MRI physical and logical coordinate system. {OM, XM, YM, ZM} is MRI physical 
coordinate system, which doesn?t change after the MRI system is set up. {OL, XL, YL, ZL} is MRI 
logical coordinate system, which changes with different prescribed scan plane. XL is the 
frequency encoding direction, YL is phase encoding direction and ZL is slice selection direction. 
 
The rotation and translation ( mlR and mlT ) between these two coordinate 
systems can be read out from the pulse sequence after the scan-plane is prescribed: 
))(()( 00 mllmlm TXRX += tt                                                                                   [4.66]                                                    
Therefore, the final motion parameters as will be used by the pulse sequence should 
be (see derivation in Section 9.1): 
mlmcorr RRR =                                                                                                        [4.67] 
mcorrcorr TRT
1?=                                                                                                        [4.68] 
The motion parameters (Rcorr and Tcorr) were sent from the tracking system to the 
MRI scanner over a TCP/IP connection (Santos et al., 2004) for real time 
compensation of motion.  
 
 
 57 
 
4.5 Motion Correction for MRI 
Motion correction was evaluated using a GRE pulse sequence. The sequence updated 
the scan parameters before each RF excitation. Correction for rotation was performed 
by adjusting the gradient rotation matrix; correction for translation in the slice 
selection and read out direction was performed by adjusting the RF transmit and 
receive frequencies and phases respectively; and correction for translation in the 
phase encoding direction was performed by adjusting the receive phase offset. Since 
the correction for translation [ ]Tzyx ddd ,,=corrT  is different in all three directions, 
we will discuss each of them separately. 
 
4.5.1 Rotation Correction 
MRI can acquire images in any plane, which is realized by varying the frequency-
encoding, phase-encoding, and slice-selection gradient waveforms. The directions of 
these three gradients define the XL, YL, and ZL for MRI logical coordinate system 
(Section 4.4.2). Therefore, after the operator?s prescription of a scan plan, the logical 
gradient waveforms that are played out should be in this coordinate system. However, 
a magnetic field gradient system normally consists of only three orthogonal gradient 
coils that generate gradient field in the physical XM, YM and ZM axis. In order to 
create the three logical gradient waveforms, three physical waveforms should be 
digitally mixed. The mixing ratios are determined by angles of the prescribed scan 
plane and can be described with an orthonormal rotation matrix. 
 
 58 
 
?
?
?
?
?
?
?
?
?
?
=
?
?
?
?
?
?
?
?
?
?
)(
)(
)(
)(
)(
)(
)(
tG
tG
tG
t
tG
tG
tG
lZ
lY
lX
mZ
mY
mX
mlR                                                                                     [4.69] 
Where Gmk and Glk (k=X, Y, Z) are the physical and logical gradient waveforms 
respectively, mlR (0) is the same matrix in Equation 4.66. When motion occurs during 
the acquisition, the orientation of the gradient fields needs to be adjusted in real time 
to follow the motion. The rotation matrix Rcorr from Equation 4.67 is thus used in the 
pulse sequence to correct the combination of physical gradient waveforms to generate 
the desired logical gradient waveforms.  
 
4.5.2 Slice Selection Displacement Correction 
It has been shown in Figure 2-1 that a slice can be excited by applying a slice-
selective RF pulse. Since RF pulse is frequency-selective, only spins resonating at the 
same frequency can be excited. To make an RF pulse spatially selective, it is 
necessary to make the spin resonance frequency position-dependent, which is 
accomplished by applying a linear z-gradient field during the excitation. In the 
presence of this gradient, the Larmor frequency at position z is given by 
zGzGBz zz ???? +=+= 00 )()(                                                                            [4.70] 
Therefore a slice at position z can be excited by an RF pulse with 
frequency )(z? . If, however, the object moves dz in this direction, frequency of the 
RF pulse needs to be adjusted to select the same slice for each excitation (Figure 
4-12), 
)())((),(' 00 zzzzz dzGdzGBdz ++=++= ????                                                [4.71] 
 
 59 
 
 
Figure 4-12    The change of RF frequency to select the same slice when motion occurs.  Under 
the same slice-selective gradient, to follow the motion, the only way is to change the RF 
frequency. 
 
For a GE scanner, the RF frequency generation path can be broken up into 3 
components: center frequency, slice selection delta frequency, and phase. The center 
frequency is 0? , which is only B0 dependent, e.g. 298MHz for a 7T scanner. The 
delta-frequency component creates an output which is added to the output of the 
center frequency path for slice selection ( zGz?  in Equation 4.70). The phase 
component determines the phase of the RF transmit pulse.  
Without loss of generality, we assume z=0 from now on, because it is a 
constant after scan plane prescription and only dz is the variable changing with 
motion. Figure 4-13 shows the sequences for slice selection and the three components 
that determine RF frequency and phase. For the GRE sequence used in our 
application, a SINC RF excitation pulse in conjunction with a slice-selection gradient 
is used. The center frequency is a constant, which is 298MHz as mentioned above. 
After prescription by an operator, delta frequency ?d  is set, which is zero here 
because z=0 (the black straight line of ?d ). Also, the phase  ?d  is zero.  
 
 60 
 
 
Figure 4-13    Slice selection sequences (RF pulse and z-gradient) and the three components that 
generate the RF pulse, i.e. center frequency, delta frequency and phase.  t=0 is the start of TR 
and only one TR is shown here. 
 
Figure 4-13 also shows a parameter isodelay, which is a very important 
parameter for an RF pulse. After the RF pulse is played out, because of precession 
caused by the slice selection gradient, the transverse magnetization will generally not 
be coherent across the slice. Instead there will be phase dispersion ( ztGz?? = ). A 
rephasing lobe with an opposite polarity to the slice selection gradient is therefore 
needed to compensate for the phase dispersion. The isodelay is the parameter used to 
calculate the optimal area of rephasing gradient. It is defined as the effective 
precession or dephasing time that results in the phase dispersion across the slice. As 
 
 61 
 
shown in the figure, the rephasing gradient lobe (negative blue lobe) should have the 
same area as the slice selection gradient under isodelay: 
Rz AisodelayG =?                                                                                                   [4.72] 
Where AR is the area of the slice rephasing lobe. The shape of the rephasing lobe does 
not affect the phase, only the area matters. For the SINC RF pulse used in GRE, the 
isodelay is nearly equal to one-half the RF pulse width.  
After the rephasing gradient lobe, phase dispersion of the transverse 
magnetization is eliminated for each RF excitation pulse. Since multiple RF 
excitations have to be applied for imaging (Section 2.1.2), the phase at the start point 
of isodelay should therefore also be consistent for each excitation. 
If no motion occured, 0== ?? dd , meaning no phase difference adjustment 
is made between RF excitations. However, if there is displacement in the z direction, 
the delta frequency component needs to be changed with an offset zz dGd ?? = , a 
phase therefore is accumulated. At the start of isodelay, the phase is: 
)1_( isodelayrfpwtdGd zz ?+?=??                                                                     [4.73] 
This phase obviously is not zero anymore (the blue line of ?d in Figure 4-13), 
which will not cause phase dispersion across slice, but rather phase difference 
between every RF excitations. This phase needs to be subtracted in the RF phase 
generation path to recover net zero phase at isodelay (the red line of ?d  in Figure 
4-13). All parameters in Equation 4.73 are known, except t? , which is the period of 
turning on the delta frequency before the RF pulse plays out. Therefore, to calculate 
the correct phase, the constant t?  has to be measured.  
 
 62 
 
  
To measure t? , a cylindrical shaped phantom was used. The advantage of this 
phantom is that when scanned axially (and the phantom axis is aligned with B0), 
every slice is exactly the same. We set phase encoding gradient to zero, and 
reconstructed the data with FFT transformation only in the frequency encoding 
direction. The resultant is therefore a hybrid space image, with every line in the 
frequency encoding direction corresponding to the profile of the phantom. Figure 
4-14 shows the hybrid magnitude and phase image of the phantom (256?128 matrix, 
TR=100ms, TE=6ms, FA=10?, Slice Thickness=1mm). Every line is exactly the same 
due to identical phase encoding.  
  
Figure 4-14    Hybrid magnitude and phase image of the cylindrical phantom.  Phase encoding 
gradient is zero. Frequency encoding is left-right direction. 
 
If a linear frequency is added to each RF excitation frequency, the selected 
slices will vary correspondingly. However, because the phantom is cylindrical, every 
slice along the slice-selection direction is the same. Therefore the measured signal 
could be regarded as from the same slice. When the phase encoding gradient is set to 
 
 63 
 
zero, the hybrid magnitude image will not change, but the phase along the phase 
encoding direction will change linearly, as shown in Figure 4-15.  
 
Figure 4-15    The hybrid space image after applying a linear RF frequency. Phase encoding 
gradient is set to zero. For each RF excitation, its frequency changes linearly with a step ?? . 
Perform FFT on each measured echo in k-space we can get a hybrid image.  The magnitude of 
each line in the frequency encoding direction (shown with a curvature shape) represents the 
object profile. The red line is to select an arbitrary point in the hybrid image, and the phases of 
those points also change linearly with a step ?? .  Complex means to combine the magnitude and 
phase of the point in that line. 
 
Only three lines are shown for illustration. The RF frequency changed with a 
step ?? . The acquired k-space signals were reconstructed by FFT along only the 
frequency encoding direction. The magnitude profiles all have the same shape 
because the profile of the phantom is independent of slice position. However, the 
phase of any arbitrary point along the phase encoding direction (red line) should vary 
linearly corresponding to ?? , shown as ? , ?? ?+ , ?? ?+ 2 . Even though ??  can 
be calculated by using only two lines, improved precision is achieved when using all 
the information from an acquisition 
?
?
??
?
?=? ??
=
+
1
1
1
*arg
view
i
ii PP?                                                                                              [4.74] 
 
 64 
 
Where view is the total number of phase encoding, Pi is the reconstructed complex 
value of an arbitrary point (must be on the scanned object) on the ith phase encoding 
line. 
Then because 
)2/1_( isodelayrfpwt ?+??=? ??                                                                    [4.75] 
We can find the only unknown parameter t? , which equals to 264 s?  for the scanner 
used.  
If we calculate ??  as the negative value of Equation 4.75 and change the 
phase of the RF pulse with this ?? , the linear phase change along the phase encoding 
direction caused by ??  should disappear (keep the phase encoding gradient still at 
zero), meaning the phase accumulation at the start of isodelay is compensation to 
zero. Since this compensation depends on a correct measurement of t? , the effect of 
using various t?  for RF phase compensation is illustrated in Figure 4-16, in which 
f? ( ???=? 2f ) was set to 100Hz; (a) uses the correct t? =264 s?  and the resultant 
phase image is very homogeneous showing the fully compensated phase, while (b-d) 
uses t? =180 s? , 230 s?  and 480 s?  and the resultant images show obvious 
uncorrected residual phases. 
 
 65 
 
 
Figure 4-16    The hybrid phase image of the phantom when phase encoding gradient is set to 
zero. RF transmit frequency varies linearly along the phase encoding direction. (a) Phase 
correction is calculated with t? =264 s? ; (b) Phase correction is calculated with t? =180 s? ; (c) 
Phase correction is calculated with t? =230 s? ; (d) Phase correction is calculated with 
t? =480 s? . 
 
It can also be seen from Figure 4-16 that the phase image is very sensitive to 
the change of t?  (if f?  is set big, e.g. 100Hz), therefore the image itself can be used 
for the measurement of t? . After setting a f?  for RF pulse, start changing its phase  
     (a)       (b) 
  
      (c)       (d) 
 
 66 
 
using various t? . The t?  that results in the most homogeneous hybrid phase image 
is the correct t?  for the scanner.  
   
4.5.3 Frequency Encoding Displacement Correction 
A frequency encoding gradient waveform consists of two portions, a dephasing 
gradient lobe and a readout gradient lobe. The polarity of the dephasing gradient lobe 
is opposite to the readout gradient lobe, which is used to prepare the transverse 
magnetization so that an echo signal can be created at a later time. The time of the 
echo peak is determined by the area under the dephasing gradient lobe. Because of 
the similar reason as in the previous section, the echo reaches its maximum (all spins 
are in phase, without considering B0 inhomogeneity, susceptibility variations, etc) 
when the area under the readout lobe equals to the area of the dephasing lobe. To 
describe this quantitatively, denote the dephasing gradient as -Gx, the spins along the 
frequency encoding direction precess at different frequencies: 
)()( 0 xGBx x?= ??                                                                                                [4.76] 
The phase accumulation due to the dephasing gradient (in a rotating frame with 
frequency of 00 B?? = ) is 
xTGx xd ?? ?=)(                                                                                                      [4.77] 
Where T is the duration of dephasing gradient.  
When the readout gradient Gx is applied, an additional phase dispersion is 
introduced. The accumulated phase will be: 
)()( TtxGx x ?= ??                                                                                                  [4.78] 
 
 67 
 
At a specific time techo=T, the phase in the above equation equals zero, meaning all 
the spins are in phase and form an echo. If we want the echo to be at the center of the 
k-space, the readout gradient must be on for a time Tacq=2T (Figure 4-17). So during 
the acquisition period from 0 to Tacq, the sampled data produces a symmetric k-space 
line spanning kx,min to kx,max (Figure 2-3). 
 
Figure 4-17    Frequency encoding gradient, the acquisition window associated with it, and the 
three components that generate the receive frequency and phase. 
 
 
When motion dx occurs at this direction, the RF receive frequency needs to be 
offset by 
xxdGd ?? =                                                                                                             [4.79] 
 
 68 
 
Given a receive bandwidth  BW?  and the target xFOV (x direction FOV), the 
readout gradient should be 
xFOV
BWG
x ?= ?                                                                                                        [4.80] 
By substituting Equation 4.80 to 4.79, we can get the offset receive frequency 
xdxFOV
BWd =
?                                                                                                        [4.81] 
Similarly as in Figure 4-13, the offset frequency change results in an extra 
phase on the echo signal. It has to be subtracted from the receive phase for each 
acquisition to get rid of phase difference caused by different x displacement (Figure 
4-17). 
)2/_'( gxwpwtdd +?= ??                                                                                      [4.82] 
Where pw_gxw/2 is the pulse width (duration) of the readout gradient, 't?  is the time 
interval between start of the delta frequency and data acquisition.  
To measure 't? , we again set phase encoding gradient to zero and 
reconstructed the hybrid phase image. Change the receive frequency linearly with a 
step f? (e.g. 100Hz) for each k-space line acquisition, and compensate receive phase 
using Equation 4.82 with various 't? . The 't?  corresponding to the most 
homogeneous phase image was therefore the correct 't?  of the scanner. Figure 4-18 
shows four hybrid phase images corresponding to 't?  = 144 s? , 100 s? , 200 s?  and 
300 s? . The homogeneous image (a) means the receive phase caused by frequency 
offset was all compensated and therefore 't?  = 144 s? . The diagonal shape was due 
to the applied linear change of the receive frequency. 
 
 69 
 
 
Figure 4-18    Hybrid phase image used for measurement of 't? . Phase encoding gradient is set 
to zero. RF receive frequency varies linearly along the phase encoding direction with a step of 
100Hz. (a) Phase correction is calculated with 't? =144 s? ; (b) Phase correction is calculated 
with 't? =100 s? ; (c) Phase correction is calculated with 't? =200 s? . (d) Phase correction is 
calculated with 't? =300 s? . 
 
4.5.4 Phase Encoding Displacement Correction 
Phase encoding gradient is applied while the magnetization is in the transverse plane 
but before the readout (Figure 2-2). By varying the phase encoding linearly, a linear 
       (a)              (b) 
      (c)          (d) 
 
 70 
 
phase is introduced to the magnetization in the phase encoding direction (Figure 2-2). 
FFT is used to reconstruct the spatial information of the object.  
Since the frequency and phase encoding can be analyzed independently, 
Equation 2.11 can be simplified to one dimension: 
??
=
???=
1
0
)2exp()()(
N
n
yy ynkiynks ??                                                                       [4.83] 
Where yny ?= , y? and N are the pixel size and number of pixels in phase encoding 
direction respectively. For each phase encoding step, the gradient Gy changes with a 
constant step creating a constant yk?  ( yyy Gk ???=? ). The N steps phase-encoding 
starting at the bottom edge of k-space (Figure 2-3) can therefore be written as: 
1,...,1,0)2()( ?=??= NmkNmmk yy                                                [4.84] 
The signal is: 
??
=
?????=
1
0
))2/(2exp()()(
N
n
ykNmyniynms ??                                                     [4.85] 
To satisfy the Nyquist criterion, the phase encoding step size yk?  must be chosen so 
that: 
yNyFOVk y ?==?
11                                                                                              [4.86] 
Substitute Equation 4.86 to 4.85, we have 
?
?
?
=
?
=
??=
?????=
1
0
1
0
)/2exp()exp()(
)1)2/(2exp()()(
N
n
N
n
Nnminiyn
yNNmyniynms
???
??
                                                  [4.87] 
The image can be reconstructed from the discrete inverse Fourier transform: 
 
 71 
 
)/2exp()()1)((
1
0
Nnmimsyn
N
m
n ?? ?
?
=
=??                                                                  [4.88] 
The factor (-1)n means to change the sign of the alternate elements of the output 
image.  
If there is dy displacement occurred in this direction, the phase of the receiver needs 
to be changed: 
)/2exp()1(2exp)1)(()(
)/(2exp)()1)((
1
0
/
1
0
/
NmniydNmimsdyn
ydnNmimsdyn
n
N
m
yydy
y
y
N
m
ydyn
y
???
??
?
???
?
???
?
???
?
???
?
??=+??
?????? ?+=?+?
?
?
?
=
?
?
=
?+
          [4.89] 
Note also that   
yyyy Gk
Nk ?? )0(
2)0( =??=                                                                                  [4.90] 
Where )0(yG is the gradient for the first step phase encoding. 
( )yyyyydy dkNidyN Nidyi ??=??
?
?
???
?
?
?=
???
?
???
?
?
?=? ? ??? expexpexp)1( /                            [4.91] 
Substitute Equation 4.90 to 4.91, we get 
( )yyyydy dGi ??? )0(2exp)1( / =? ?                                                                             [4.92] 
Therefore, the phase offset needed to follow dy displacement is: 
yyy dyFOV
mGd
???
?
???
? += ????? 2)0(2                                                                             [4.93]                                                                         
The first phase offset term in Equation 4.93 is the phase accumulated in the 
first encoding step; the second term is a linear phase ramp, meaning the phase 
increases yFOVd y /2?  per phase encoding step.  
 
 72 
 
Note that most of the time, when off-center FOV is required, the shift theorem 
for Fourier transforms was simply used, which says to shift the image along a 
direction, a linear phase ramp is needed in the frequency space (the second term in 
Equation.4.93). But in our case, this is not exactly true because phase encoding 
started from 0 instead of ?N/2. When dy is a constant, the linear phase ramp does 
produce an off-center FOV. The constant phase (first term in Equation 4.93) does not 
affect the image contrast because magnitude image does not change at all and phase 
image has only a global shift. However, when dy varies with motion, the first term 
multiplied with dy is not constant anymore. Thus if not compensated, artifacts will 
appear. 
 
4.5.5 Summary of Position Compensation 
The pulse sequence updated the scan parameters before each RF excitation. The 
correction of motion for the GRE sequence can be summarized as follows.  
a. Correction for rotation was performed by adjusting the gradient rotation 
matrix with Rcorr (Equation 4.67). 
b. Correction for slice selection translation was performed by adjusting the RF 
transmit frequency offset with zz dG? , and adjusting the RF phase with 
)1_( isodelayrfpwtdGd zzz ?+??= ??  (Equation 4.73). 
c. Correction for read out direction translation was performed by adjusting the 
RF receive frequency with xdxFOVBWd =? (Equation 4.81), and adjusting the 
receiver phase with )2/_'( gxwpwtdd x +??= ??  (Equation 4.82). 
 
 73 
 
d.  Correction for translation in the phase encoding direction was performed by 
adjusting the receive phase offset with yyyy dyFOVmGd ??
?
?
???
? += ????
?
2)0(2  
(Equation 4.93). (The total receiver phase offset is therefore the summation of 
phase shift in c and d, yx ddd ??? += ) 
 
 
4.5.6 K-space Line Re-acquisition 
An additional feature of the pulse sequence was that it allowed for a re-acquisition of 
motion corrupted k-space lines. This feature was added to reduce the effect of motion 
that was too fast to be compensated by gradient and RF frequency and phase 
adjustment. Re-acquisition was performed when the motion that occurred during the 
acquisition of a given k-space line exceeded a preset threshold. The displacement 
threshold for the reacquisition of a k-space line was set to 0.2mm (because image 
resolution was chosen to be about 0.2mm). A rotation threshold was not used because 
head rotation was always perceived as a displacement as well. This is because 
rotation is defined relative to the center of the image, whereas the head generally 
pivots around the back of the head, where it is supported by the receive coil array, 
opposite of the part of the head viewed by the cameras. 
 
4.6 Software Flowchart for the Motion Correction System 
Figure 4-19 shows the software flowchart of the proposed motion correction system. 
Calibration was done offline and was programmed with MATLAB. Real time 
 
 74 
 
tracking software was running on Linux with C++. The tracking computer 
communicated with the MRI scanner through a TCP/IP connection to provide the 
console (scan computer) with the real-time motion parameters. 
  
 
 
 
Figure 4-19    Software flowchart for the prospective head movement correction system for MRI. 
Calibration is all done offline and programmed with MATLAB. Real-time correction is 
programmed with C++ running in Linux. N is the number of total phase encoding steps for a 
scan. 
 
 
 75 
 
Chapter 5  Results and Discussion with the Stereo Motion 
Correction System for MRI 
 
5.1 Camera Calibration Test 
The cameras used in this project have 4 mm lenses and were positioned about 8cm 
away from the volunteer?s face. Distortion of the images is very serious. Figure 5-1a 
shows the distortion model on each pixel of one of the stereo cameras. The blue cross 
in the center marked the image center and the blue circle marked the principal point. 
Each arrow represented the displacement of the pixel caused by lens distortion. The 
displacement was bigger at the corners of the image, which reached more than 50 
pixels. It meant that without a correct camera calibration, the 3D estimation from 
stereo would not be accurate. Image (b) shows a checkerboard image acquired with 
the corresponding camera. The edges of the checkerboard pattern were not straight. 
Instead, they were curved because of the distortions. Image (c) shows the image after 
distortion correction, which recovered the straight checkerboard edges. 
(a) 
 
 76 
 
(b) (c) 
Figure 5-1    Lens distortion and correction for a camera. (a) Distortion model of the camera.  
The arrows and number on the curve mark the distortion direction and amplitude. (b) an image 
of a checkerboard pattern. The lens distortion is obvious especially close to the corners of the 
image. (c) lens distortion corrected image of b.  
 
5.2 Stereo Reconstruction Validation 
The checkerboard pattern was attached to a highly sensitive, 6-axis-
positioning system (M-824 compact 6-axis-positioning system, Physik Instruments, 
GmbH & Co. KG, Germany) to test the stereo?s accuracy. The hexapod was driven 
by six high-resolution actuators with sub-micron precision Experiments were done 
outside of the scanner since the positioning system was not MR-compatible. The 
cameras were fixed firmly on top of the moving platform at a distance of about 8cm 
to simulate the position of the stereo with respect to the human face. The platform 
was controlled by a computer to move in the x, y, and z direction over known 
distances and rotate around the three axes with known degrees, where the x, y and z 
were defined as the two orthogonal axes of the platform, and the z axis was defined as 
pointing out of it. Camera images were acquired at each position to measure these 
movements. The discrepancy between the real and measured positions was evaluated 
 
 77 
 
to determine the stereo?s accuracy, which turned out to be 0.04?0.03mm, 
0.03?0.02mm and 0.02?0.01mm in x, y and z direction, and errors of rotation around 
x, y and z axes were 0.03?0.13?, -0.02?0.05? and 0.01?0.03? respectively. This high 
accuracy is well within the highest available MRI resolution, which garantees 
adequate motion correction. 
 
5.3 Tracker Noise Evaluation 
Since the motion correction was performed for each acquired k-space line, the 
tracking noise was also an important factor. Our real-time tracking was based on 
feature points, which provided sub-pixel accuracy but was also sensitive to noise in 
the video images. The tracking noise was evaluated by monitoring a stationary object 
for several minutes by tracking 3 feature points. 
When the tracker was set up outside of the scanner, the noise was very small 
even by tracking only 3 feature points, as shown in Table 1. However, when the 
tracker was put inside the MRI, a significant increase of the error was observed, even 
without scanning. During a scan the noise level increased even further. This might be 
attributed to remaining interference of the MRI scanner with the video cameras. 
These findings suggest that there is an effect of the static magnetic field as well as of 
the switching gradients and/or RF. The maximum translation error exceeded 2mm, 
which was unacceptable for the high-resolution MRI motion correction.  
 
 
 
 
 78 
 
Table 1    Accuracy and noise of the tracker when it was put outside or inside of the scanner.  
 Tracker Outside Tracker Inside (no 
scan) 
Tracker Inside (during 
scan) 
 Mean ?  Max Mean ?  Max Mean ?  Max 
x (mm) 0.065 0.047 0.215 0.002 0.315 1.057 0.702 0.609 2.772 
y (mm) 0.027 0.014 0.068 0.076 0.034 0.179 0.052 0.057 0.314 
z (mm) 0.029 0.025 0.107 0.158 0.221 0.918 0.111 0.288 1.138 
rx (?) 0.021 0.014 0.061 0.146 0.169 0.740 0.047 0.226 0.881 
ry (?) 0.016 0.024 0.081 0.068 0.080 0.338 0.034 0.143 0.588 
rz (?) 0.038 0.028 0.132 0.019 0.249 0.864 0.545 0.485 2.166 
 
To reduce the tracking noise, multiple feature points were used.  Figure 5-2 
shows the maximum x-translation error as a function of the number of feature points. 
It shows that tracker noise decreased with increasing number of feature points. For 
our application, we opted to select around 20 points to keep the maximum error 
below 0.2 mm and 0.2 degree.  
 
Figure 5-2    Noise of the tracker. The maximum x translation error as a function of the number 
of tracked feature points. 
 
However, the number of feature points on the human face detectable with both 
video cameras was often below 20, in part because of the cameras? limited field of 
 
 79 
 
view of about 13 x 10cm2. To overcome this we decided to place a sticker on the 
subject?s forehead that displayed a number of white triangles on a black background. 
 
5.4 Calibration Accuracy Between Stereo and MRI Coordinate System 
The calibration phantom (Figure 4-2) was used to test the calibration accuracy 
between the stereo system and the MRI scanner. First, the phantom was put at an 
arbitrary position. The stereo measured the corner positions in 3D and transformed 
them to MRI coordinates based on the calibration results. Then 3D MR images were 
acquired, from which the 3D MRI coordinates of the corners were determined. 
Accuracy was obtained by measuring the discrepancy between the estimated position 
(obtained by using the camera model and calibration parameters) and real ones 
(measured from MR images). This procedure was repeated at different arbitrary 
phantom positions. The accuracy of these points was found to be 0.06?0.05mm, 
0.10?0.08mm and 0.15?0.13mm in the x, y, and z direction in the MRI coordinate 
system respectively, which should be sufficient for the motion compensation task.  
The results are also visualized in Figure 5-3 where rotation and translation 
parameters are calculated from these points at different phantom positions. Solid lines 
show parameters estimated using MRI images. Dotted lines show parameters 
estimated using camera images but further transformed into MRI coordinate system. 
The two sets of estimation matched very well, which means the calibration between 
the two coordinate systems was accurate. 
 
 80 
 
 
Figure 5-3    Rotation and Translation parameters estimated from MRI images (solid lines) and 
from camera images transformed into MRI coordinates (dotted lines).  The two sets of lines 
matching very well means the calibration between camera and MRI is accurate. 
 
5.5 Pulse Sequence Test with a Phantom 
To test the accuracy of RF frequency and phase adjustment for translational 
displacement in slice-selection, phase-encoding and frequency-encoding directions, a 
high resolution phantom was scanned with our modified GRE sequence. (Rotation 
doesn?t need to be tested for the sequence because it only requires rotation matrix 
adjustment but doesn?t need any frequency or phase compensation.) Halfway through 
the experiment, a translation of 10mm was executed and the scan was continued until 
the end with the phantom staying at the new position. The particular motion pattern 
was used to maximize errors near the center of ky-space, which causes the most 
salient artifacts. 
 
 81 
 
Figure 5-4 shows the results of three sets of scans (256?128 matrix, TR= 
100ms, TE= 6.9ms, FA=30o). The first column represents the control scan without 
motion, second and third columns represent scans with displacements, without 
correction and with correction respectively. Displacements were executed all in the 
OMZM direction, so for axial scans (a-c), displacements were in the slice-selection 
direction; for coronal scans (d-f) with frequency-encoding superior-inferior (OMZM), 
displacements were in the frequency-encoding direction; for coronal scans (g-i) with 
phase-encoding superior-inferior (OMZM), displacements were in the phase-encoding 
direction. It can be seen that for all cases, the displacements were compensated 
successfully, which means the measurement of scanner specific time t?  and 't?  is 
accurate and the strategy of RF frequency and phase adjustments are correct. 
Note that image (g-i) show an aliasing artifact because of the small FOV 
prescribed in the phase-encoding direction, which, however, does not affect the 
displacement compensation test.  
 
 82 
 
 
 
 
 
    
        (a)                         (b)        (c) 
   
         (d)                       (e)         (f) 
   
          (g)                          (h)          (i) 
Figure 5-4    Translation compensation of a phantom. (a,d,g) were scanned statically; all others 
were scanned with the phantom locating in one position for the first half of the scan and the 
second half in another position 10mm apart. (b,e,h) were scanned without position correction, 
but (c,f, i) were with correction. Displacements in (a-c) were in slice-selection direction;  (d-f)  in 
frequency-encoding direction, and (g-i) in phase-encoding direction. 
 
 83 
 
5.6 System Performance Test 
With all the separate parts of the prospective motion correction system validated, the 
performance of the system as a whole needed to be tested. The optimal way is to 
evaluate hybrid space data of a scan with motion. Because ideally without motion, all 
lines in the phase-encoding direction should be exactly aligned. Imperfect alignment 
would indicate imperfect motion compensation. If motion can be corrected fully, then 
an alignment should be achieved.  
We set phase encoding gradient to zero and reconstructed the hybrid data 
(resolution 0.23mm2, TR=100ms, TE=25ms, FA=10?). Figure 5-5a shows motion 
induced by the breathing of a volunteer recorded with the tracker during the scan. 
Image (b) was scanned without real-time motion correction and the lines in the phase 
encoding direction shifted during the scan. When prospective motion correction was 
applied for image (c) acquisition, those lines were all aligned.  
Since the in plane resolution was only 0.2mm, the perfectly aligned lines in 
Figure 5-5b suggested that the error of the whole correction system should be less 
than a pixel or 0.2mm. Therefore, this is a quick and coarse way to validate the 
accuracy and reliability of the system. 
 
 
 84 
 
 
Figure 5-5    The MR hybrid image without phase encoding gradient. (a) Small breathing 
induced head movement was observed during the scan of the volunteer. (b) without motion 
correction; (c) with motion correction. 
 
5.7 Volunteer Study 
The efficacy of the tracking system was studied on a normal volunteer under an 
institutional review board-approved protocol. A nominal in-plane spatial resolution of 
0.23 x 0.23mm2, a slice thickness of 3mm, 300ms TR, 25ms TE and 30? flip angle 
were used. Four motion conditions were studied: one with cyclic breathing-related 
head motion (Section 5.7.1), one with minimal motion (Section 5.7.2), one with large 
stepwise motion (Section 5.7.3), and one with slow gradual motion (Section 5.7.4). 
 
                (a) 
  
                                         (b)                                                                           (c) 
 
 85 
 
5.7.1 Breathing Related Motion 
During this experiment, the volunteer was instructed to not pay any effort to lay still.  
Motion parameters reported by the tracker (Figure 5-6a,b) showed cyclic head motion 
with a period consistent with the respiratory cycle. The observed rotation was mainly 
around the x-axis and the translation along the z-axis (in the MRI physical coordinate 
system), which suggested a nodding like motion. The scan was performed twice, once 
without motion correction and once by using the prospective motion correction 
technique proposed here. Figure 5-6c,e showed that even though the motion was only 
about 2mm and 1?, ghosting and blurring artifacts were clearly observed in the high-
resolution images (0.23 x 0.23mm2 in-plane resolution). The prospective line-by-line 
motion corrected image using our real-time compensation system showed much 
improved image quality and a virtual elimination of ghosting artifact (Figure 5-6d,f) 
as compared to the uncorrected images (Figure 5-6c,e). The reacquisition rate was 
2.3%.  
 
 
 86 
 
 
Figure 5-6    Prospective breathing induced motion correction for the 2D gradient echo 
acquisition. Images (a,b) show the evolution of motion parameters reported by the tracker 
during the acquisition of images (c) and (d) respectively. Image (c) was scanned without motion 
correction, and (d) was with motion correction; (e) and (f) were zoomed in on an area in the 
frontal brain indicated by the red square in images (c) and (d) respectively.   
 
5.7.2 Minimal Motion 
During this experiment, the volunteer was instructed to stay as still as possible. 
Despite the efforts of the volunteer, the motion parameters reported by the tracker 
(Figure 5-7a,b) showed a small gradual and jittering motion. This scan was also 
repeated twice, once without (a,c,e) and once with (b,d,f) motion correction. Both 
uncorrected (c,e) and corrected (d,f) images have similar quality and show little 
evidence of ghosting. The reacquisition rate of Figure 5-7d was 4.6%.  
 
 
 87 
 
  
Figure 5-7    Prospective motion correction data obtained for a volunteer that attempted to 
minimize head motion. Images (a,b) show the motion parameters reported by the tracker during 
the acquisition of images (c) and (d) respectively. Image (c) was scanned without motion 
correction, and (d) was with motion correction; (e) and (f) magnify an area in the front of the 
brain (the area marked by the red square in (c) and (d) respectively).   
 
5.7.3 Large Stepwise Motion 
Experiments in the presence of strong intentional motion demonstrated the robustness 
of the prospective motion correction technique (Figure 5-8). The motion parameters 
reported by the tracker indicated a range of motion of close to 10 mm translation and 
10 degree rotation. The severe ghosting and blurring artifacts observed without 
correction (e,h) were almost entirely avoided with prospective motion correction (f,i), 
and the image quality of (f,i) was very similar to (d,g), which were acquired in the 
absence of intentional motion and with motion correction on (motion parameters in 
a). The reacquisition rate for (f,i) was 10%.  
 
 
 88 
 
 
Figure 5-8   Example of the performance of the prospective motion correction method in the 
presence of large stepwise motion during the acquisition of multi-shot 2D gradient echo images 
on a normal volunteer. Images (a-c) show the motion parameters reported by the tracker during 
the acquisition of images (d-f) respectively. Image (d) was scanned without intentional motion 
during the scan, whereas (e,f) were scanned in the presence of motion; (e) was acquired without 
correction but (f) was acquired using the proposed method. (g-i) show additional detail in an 
area in the front of the brain that was indicated by the red square in images (d-f).   
 
 
5.7.4 Slow Motion 
The final motion type that was investigated was slow motion resulting in large 
displacements and/or substantial rotation angles. Even without any reacquisition this 
type of motion was handled well by the prospective motion correction system. The 
motion parameters reported by the tracker (Figure 5-9b,c) showed that the range of 
 
 
 89 
 
the motion was close to 10 mm of translation and 10 degrees of rotation. The 
reacquisition rate for (f,i) was 0% since the reacquisition mode was turned off. Image 
quality of (f,i), when motion correction was used, was notably better than (e,h), 
acquired in the absence of motion correction. Furthermore, image quality of (f,i) was 
similar to (d,g), acquired in the absence of intentional motion (motion parameters in 
a). 
 
Figure 5-9    Prospective motion correction in the presence of slow motion. The motion 
parameters reported by the tracker during the scan of images (d-f) are displayed in (a-c), 
respectively. Image (d) was acquired in the absence of intentional head motion, (e,f) during slow 
head motion, where, unlike (e), (f) was acquired with the proposed correction method. A 
magnification of the area in the red box in (d-f) is shown in (g-i). 
 
 
 90 
 
5.8 Discussion 
In this study, a method for the prospective correction of 3D motion for high-
resolution MRI was developed and evaluated. Preliminary data obtained on a human 
volunteer show substantial image quality improvement for high resolution gradient 
echo imaging at 7T, in particular under conditions of periodic head motion related to 
breathing and intentional head rotation. 
To obtain adequate tracking accuracy, the cameras were put right above the 
MRI receiver coil, monitoring the human face from a very small distance. Since the 
stereo 3D reconstruction accuracy depends on the angle and relative distance between 
the two cameras, the best setup of the two cameras should be explored further for 
better accuracy. 
The accuracy of the calibration between the tracker and the MRI scanner is 
also essential for system performance. Since all the motion parameters were estimated 
in the camera coordinate frame, they had to be transformed precisely to the MRI 
coordinate frame to be used in the sequence. However, calibration was difficult in this 
system because the tracker can only see the surface of the calibration object, whereas 
the MRI can only see its internal structure. A previous publication (25) dealt with this 
issue in an iterative way. They used three gel-filled reflective spheres for estimating 
the initial coordinate transformation, and then iteratively refined this transformation 
based on the internal structures of the phantom. This method, though accurate, took 
more than 30 minutes to accomplish. This limitation was addressed in our project by 
designing a phantom with high resolution, well ordered structures. Since every 
parameter of the phantom was known, the relationship between the external 
 
 91 
 
appearance (black-and-white checkerboard) and internal MR-visible structure was 
easily established, which simplified the calibration and shortened the calibration time 
to several minutes.  
The noise of the tracker was another important factor essential for reliable 
correction of MRI data. It was demonstrated here that at least 20 feature points 
needed to be tracked for adequate precision. Because of the setup of our cameras, 
only a part of a face, lacking a large number of feature points, could be seen from the 
cameras. This was the reason for using a ?tracking target? mounted to the forehead. If 
the cameras could be put further away from the face, direct tracking of the face 
curvatures could be used instead of these feature points. Fixing the cameras to the 
inside of the magnet bore, instead of to the transmit coil as was done here may be a 
solution. Another solution to this issue of limited field of view would be the use of 
cameras with a fish-eye lens, which would give a larger view for a given camera 
distance from the face. However, the camera distortion correction may be more 
complicated. It has also been demonstrated that the noise comes from the interference 
between the cameras and the MRI scanner, so a better shielding or hardware filtering 
could be a solution too. 
Due to the tracker?s relative low frame rate some fast motion could not be 
adequately corrected. For this reason, a reacquisition function was added to the 
sequence in this project. The number of re-acquired lines (reacquisition rate) was 
dependent on the type of head motion and quite variable, e.g. 2.3% for the conditions 
presented in Figure 5-7d and 10% for those in Figure 5-8f. Considering the resulting 
improvement in image quality and the small increase in total imaging time, this 
 
 92 
 
sacrifice appears worthwhile. Nevertheless, improved tracking speed might reduce or 
eliminate the need for re-acquisition. This was confirmed in the ?slow motion? 
experiment, during which the re-acquisition was turned off. Under this condition, the 
tracking speed exceeded the head motion speed, resulting in excellent image quality 
without any reacquisition, as was seen in Figure 5-9(f,i). With the continuing increase 
of computational power, and improved motion tracking algorithms, it is therefore 
expected that the need for re-acquisition will be reduced. One possible area for 
improved tracking is the use of predicitive estimates for object motion, for example 
by using Kalman filtering.  
Although the motion correction scheme presented here was effective in 
reducing the effects of motion on image quality (e.g. compare Figure 5-6d and Figure 
5-8f with correction versus the not corrected versions in Figure 5-6c and Figure 5-8e), 
there remain possibilities for improvement. For example, the motion-corrected 
images in Figure 5-8f and Figure 5-9f are slightly inferior to the corresponding 
images acquired during the minimal head motion (Figure 5-8d and Figure 5-9d). This 
apparently imperfect motion compensation could be due to a number of factors, for 
example phase changes due to small changes in local B0 amplitude, or subtle changes 
in B1 amplitude and phase. The contribution of these effects and their potential 
mitigation remains to be investigated.   
In this current method, quantitative validation of image quality improvement 
on volunteer study has not been fulfilled. One of the reasons is that since image 
resolution is only 0.2mm, the reference image (no motion) can hardly be exactly the 
same slice as in the with motion case, which makes quantitative comparison very 
 
 93 
 
difficult. To get a better validation, we need to test this technique on more volunteers 
and ask neurologists to give objective judgments on the image quality.  
 
  
 
 94 
 
Chapter 6  Prospective Motion Correction for MRI using a 
Single Camera 
 
A substantial amount of clinical and research MRI images are rendered unusable 
because of subject motion. To solve this problem, prospective motion correction 
based on external tracking systems has been proposed. Stereo optical tracking system 
has been shown to give good real-time motion correction with reasonable accuracy. 
However, the calibration of this system is very complicated and time consuming, as it 
requires a camera system calibration as well as a calibration between cameras and 
MRI system using dedicated phantoms. We therefore propose an alternative motion 
correction method for MRI that does not require calibration and can work with just a 
single video camera.  
 
6.1 System Setup 
All experiments were performed on a GE (Milwaukee, WI, USA) 7T MRI system. An 
MR compatible infrared camera (MRC Systems GmbH, Germany) was fixed on a 
holder right above and in front of the head coil (Figure 6-1). Six infrared emitting 
diodes were used to illuminate the field of view. The distance from the camera to 
volunteer?s face was around 8cm. The high spatial resolution of this setup (320 x 240 
pixels, about 13 x 10cm2 field of view focusing mostly on the nose of the volunteer) 
allowed the detection of sub-millimeter movements. 
 
 
 95 
 
 
Figure 6-1    System setup. The camera was fixed right on top of the head coil. 
 
6.2 Method 
Rather than using phantom calibration scans, this method uses training scans to 
estimate head position and link the camera image measurement to the MRI scanner 
coordinate system.  
During the training scan, the subject was asked to slowly move his head in the 
directions that were least restrained, including rotations in the axial and sagittal 
planes. N EPI (Section 2.1.3; parameters: 128?96 voxels over 24?18 cm2, 11 3mm 
slices with 0.5mm gap, TE 50ms, TR 1.2s) and camera images were acquired 
simultaneously (Figure 6-2). From the EPI images, motion parameters in the MRI 
coordinate system (3 rotations and 3 translations) between each volume were 
calculated with image registration method. Since camera images were acquired 
simultaneously with EPI, each camera image in the training data thus corresponded to 
a motion vector. The training data were then sorted based on the motion parameters to 
help speed up the real-time search. An ROI region (the nose) was also selected based 
on these training camera images. 
 
 96 
 
 
Figure 6-2    Method of the motion estimation using a single camera. N training EPI and camera 
images were acquired. From EPI images, 6DOF motion parameters were calculated from 
registration for each training data. During the real-time scan, the most similar camera image in 
the training data was found for the newly acquired camera image. The corresponding motion 
parameters from EPI were sent to MRI pulse sequence for prospective motion correction.   
 
During the real-time scan, the correlation between the ROI of the newly 
captured camera image and the ones in the training camera images was calculated to 
find the most similar one. The corresponding motion parameters of that training 
image served as the estimation of the current head position. These motion parameters 
were then sent to the MRI scan computer at 10Hz speed. A gradient echo sequence 
 
 97 
 
was modified to incorporate motion correction by adjusting gradient rotation and RF 
frequencies and phases before each RF excitation (Section 4.5). 
 
6.2.1 Motion Computation 
The motion parameters can be calculated from the N EPI images in the training data 
through image registration. Image registration involves the mapping between a pair of 
images, so that one image after transforming with the mapping parameters can be 
matched to the other one. 
tt TXRY += ii                                                                                                         [6.1] 
Where Xi=[xi1, xi2, xi3]T and Yi=[yi1, yi2, yi3]T are the corresponding points in the two 
images in MRI physical coordinate system (after transformation as in Section 9.2) 
and Rt, Tt are the rigid body transformation between them. 
The registration parameters can therefore be determined by minimizing the 
sum of squared intensity differences between images (Friston et al., 1995; Friston, 
2007): 
?? ?=??=
i
ii
i
ii f
222 );(
ttt pXYTXRY?                                                    [6.2] 
Where f is the transformation function, and pt = [pt1, pt2,?, pt6]T is the vector 
containing three rotation and three translation parameters. 
Given n corresponding points in the two images, Equation 6.2 can be solved 
using Gauss-Newton algorithm (Section 9.3). For the mth iteration, the vector pt is 
updated as: 
( ) bJJJpp tt TTmm 1)()1( ?+ +=                                                                                      [6.3] 
 
 98 
 
Where 
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
6
)(
2
)(
1
)(
6
)(
2
2
)(
2
1
)(
2
6
)(
1
2
)(
1
1
)(
1
);();();(
);();();(
);();();(
t
m
n
t
m
n
t
m
n
t
m
t
m
t
m
t
m
t
m
t
m
p
f
p
f
p
f
p
f
p
f
p
f
p
f
p
f
p
f
ttt
ttt
ttt
pXpXpX
pXpXpX
pXpXpX
J
?
????
?
?
 (3n?6 matrix) and 
??
?
?
?
?
?
??
?
?
?
?
?
?
?
?
=
);(
);(
);(
)(
)(
22
)(
11
m
nn
m
m
f
f
f
t
t
t
pXY
pXY
pXY
? ? (3n?1 vector). 
For each iteration step, Yi are re-sampled using the new transformation 
matrix. The iterative process is repeated until 2?  can not be decreased or for a fixed 
number of iterations. This approach is based on the assumption that the starting 
estimate is close to the correct solution and the image is smooth, which is the case in 
our application because the motion between successive images acquired in the 
training data is very small and the relative low resolution EPI images are smooth. 
Therefore the possibility that a solution stuck in a local minimum instead of a global 
minimum is small 
 
 
6.2.2 Best Match Search 
The search for the camera image in the training data that most closely matched the 
newly acquired image was realized through a similarity measure. The correlation 
coefficient (CC) is optimal in similarity measure as defined in Equation 4.48, 
reiterated here for a correlation between images A and B within a window W: 
 
 99 
 
( )( )
( ) ( ) 22 )()(
)()(
???
?
???
? ?
???
?
???
? ?
??
=
??
?
??
?
WW
W
BBAA
BBAA
CC
xx
x
xx
xx
                                                       [6.4] 
Where A  and B are the mean pixel values of image A and B within the window W 
respectively. To speed up the computation, the correlation in image space can be 
calculated in frequency space using FFT.  
Let AAA ?= )()(? xx , BBB ?= )()(? xx , then the numerator in Equation 6.4 is 
?? ?
??
==
1
0
)(?)(?)(?)(?
N
nW
n nBnABACC
x
xx                                                                          [6.5] 
Where N is the size of the window. It can be calculated through )(? kFA , Fourier 
transform of )(? nA ; and )(? kFB , Fourier transform of )(? nB . 
Since 
?? ?
? ??
?
?
=
?
=
?
=
?
=
?
=
?
=
?
=
?
?
??
?
? ???=
??=
?=?????? ?
1
0
1
0'
1
0
1
0
1
0'
1
0
1
0
/)'(2)'(?)(?1
/2/'2)'(?/2)(?1
/2)(*?)(?1)(*?)(?
N
n
N
n
N
k
N
k
N
n
N
n
N
k
BABA
NkmnnienBnA
N
NmkieNknienBNnkienA
N
NmkiekFkF
NkFkFIFFT
?
???
?
 
The quantity in parentheses is 0 for all values of n? except those of the form 
pNmnn +?=' , where  p is any integer. At those values, it is 1. Therefore, 
( ) ??
=
+?=?
1
0
* )(?)(?)(?)(?
N
n
BA pNmnBnAkFkFIFFT                                                         [6.6] 
By comparing Equation 6.6 with 6.5, the correlation coefficient can be 
calculated as follows: convert the two images to 1D vectors, FFT the two vectors, 
multiply one resulted transform by the complex conjugate of the other, inverse FFT 
 
 100 
 
transform the product and the first component (m=0) is the correlation coefficient. All 
other components of ( ))(?)(? * kFkFIFFT BA ?  are the values of the correlation at different 
lags.  
 
6.3 Result 
6.3.1 Accuracy of the Motion Estimation 
To estimate the accuracy of the method, several minutes of data were acquired, 
consisting of 145 EPI volumes and the corresponding 145 camera images. During this 
time, the subject slowly moved his head in the directions that were least restrained, 
mainly rotations in the axial plane. All the EPI images in the training set were 
registered to a reference (the first volume in the training set can be used as a 
reference) to get the motion parameters. The range of rotation around ZM axis was 
about 17 degrees and translation in XM direction was 18mm (Figure 6-3).  
Figure 6-4 shows the correlation of two randomly chosen images in the 
training data set with all images. The curve suggests that the correlation is higher 
when positions are close, which means CC is a good measurement for most closely 
match image search. 
 
 
 101 
 
 
Figure 6-3    Motion parameters from the training data set. The total number of training images 
is 145. The motion is mainly rotation around ZM axis and translation in XM axis. 
 
 
Figure 6-4    Correlation of two randomly chosen images with all training images. Red crossings 
mark the chosen image, (a) the correlation between image No. 47 with others; (b) the correlation 
between image No. 100 with others. 
 
 
To test the accuracy of motion estimation, the data were divided into two sets, 
set A and set B. For every camera image in set B, the most similar image in set A was 
determined based on correlation. The motion parameters of this image in set A served 
as motion estimate for that image in set B, whereas the true motion was derived from 
               (a)      (b) 
 
 102 
 
registration of the EPI image in set B to the reference. Difference between camera 
derived position and EPI registration parameters was used to validate motion 
estimation accuracy.  
A k-fold cross-validation method (Duda et al., 2001) was used, where k was 
chosen to be 10. Training images were divided into k folds. For each trial, one fold 
images were used as set B, while the remaining k-1 fold images formed set A. After 
repeating the validation k times by using all different sets of B, the average error 
across the k trials was computed. Translation error in XM, YM and ZM direction is 
0.021? 0.249mm, 0.004? 0.052mm and 0.001 ? 0.066mm and rotation error around 
XM, YM and ZM axis is 0.001? 0.026?, -0.001? 0.050?, and -0.006 ? 0.129? 
respectively.  
However, note that this error estimation was only true if the sampled motion 
space covered the whole space the volunteer could move inside the head coil. For 
instance, if the sampled space during training stage only covered rotations about ZM, 
while during the real-time scan, the volunteer moved around XM axis, even though a 
most closely matched image in the training data could be determined based on the 
highest correlation coefficient, the estimated motion parameters were not correct. 
Therefore the key issue for this method is to sample as complete as possible the 
motion space (the volunteer moves as much as he can) during the training stage. 
 
 
 
 
 
 103 
 
6.3.2 Volunteer Study 
A GRE sequence modified to incorporate motion correction by adjusting gradient 
rotation and RF frequencies and phases before each RF excitation was used on a 
normal volunteer under an institutional review board-approved protocol (parameters: 
512?384 voxels over 24?18 cm2, 3mm slice thickness, TR=100ms, TE=30ms). 
During the real-time scan, the most similar camera image in the training data was 
found for the newly captured camera image based on the correlation coefficient inside 
the ROIs (nose area chosen from the training data). The corresponding motion 
parameters of that training image were sent to the MRI scan computer at 10Hz speed 
for prospective motion correction.  
Experiments were performed with and without motion correction each with 
similar motion. Results are shown in Figure 6-5. The image quality of (b) is similar to 
(a), and much better than (c). The results suggest that the new system has good 
precision to be used in high resolution MRI. Motion parameters shown in (d,e) were 
not smooth because of the not dense enough sampling of the training data acquisition. 
 
 
 104 
 
 
Figure 6-5    Prospective correction of large motion. (a) was acquired without intentional motion 
during the scan. (b c) were scanned in the presence of motion; (b) was acquired using the 
proposed method but (c) was without correction. (d,e) show the evolution of motion parameters 
during the acquisition of image (b,c), respectively. 
 
6.4 Discussion 
We propose a new motion correction method for MRI using single camera without 
any calibration. A short training scan is required to allow object motion estimation 
from the camera images. It does not need any tracking target, therefore is more user 
friendly. Initial testing results show that this method is very promising.  
The major factor that affects the motion estimation accuracy is the registration 
error from EPI images. Even thought registration can get sub-voxel accuracy, if 
motion occurs between the acquisition of the first and last slice of a volume, rigid 
body registration is not able to correct. This could be a problem in our training data 
        (a)       (b)    (c) 
         (d)        (e) 
 
 105 
 
acquisition, because the TR of EPI acquisition was set relatively long (1s) to get a 
good coverage of the brain, and the volunteer was asked to continuously move his 
head during that period of time. In addition, ghost artifacts in a single slice could 
make the registration even harder. A solution to this is that during the training period 
of time, the volunteer should move very slowly to introduce negligible artifacts 
coming from these two effects. 
  Another issue that could affect the accuracy is the EPI image quality. The 
tradeoff for EPI?s fast imaging speed is that it suffers image distortions, especially in 
a high magnetic field like in our case, which may in turn affect the registration 
performance. Field inhomogeneity can cause image distortions and even signal loss. 
Even though a high order shim can ameliorate this problem to some extent, the 
inhomogeneities remain, particularly close to tissue-air and tissue-bone interfaces. 
And even the field can be shimmed to almost homogeneous in the brain for 
acquisition of the first image in the training, the field changes when volunteer moves 
for about 20 degrees and 20mm, which could result in different distortion in the 
images and cannot be recovered by solely registration. Furthermore, gradient non-
linearity, Nyquist ghost, etc, could also distort EPI acquisitions (Cohen and 
Weisskoff, 1991; Farzaneh et al., 1990; Jezzard and Clare, 1999). 
The correlation coefficient was proved to be optimal in searching for the most 
similar image in the training data for a new image. It was not only accurate, but also 
insensitive to noise because global information in an image has been used. However, 
the accuracy for motion estimation does not just rely on the best image search. It 
depends on the completeness of the sampled motion space during training stage. If the 
 
 106 
 
sample space is very limited, then the best match image is not able to provide correct 
motion parameters.  
In conclusion, to get the best performance with this method, several 
conditions have to be satisfied: EPI image quality should be good and the volunteer 
should move slowly and to the full extent (not large because of the tight receive coil 
used) during the training data acquisition.    
 
 
 
 107 
 
Chapter 7  Conclusion 
 
 
Head motion during scanning can greatly affect MRI image quality and is a major 
reason for rendering data useless in both clinical and research application. High-
resolution brain structural imaging is especially compromised by head movement 
because high-resolution scans have longer scan time. Two new motion correction 
methods were proposed here using either stereo or a single camera to measure and 
correct rigid body motion in real time. A gradient echo sequence was modified to 
accept motion estimates before each TR to adjust the gradient orientations and RF 
frequencies and phases for each excitation. Images were reconstructed on the scanner 
computer for immediate access. Experiments done with both of the two methods 
demonstrated a consistent improvement in image quality if motion occurred during 
the acquisition. 
The advantage of using an optical system was its totally parallel computation 
to the pulse sequence, therefore no extra scanning time was needed for motion 
detection. The camera set up right on top of the head coil provided very high 
sensitivity for motion estimation.    
For the motion correction method using stereo cameras, the position 
estimation accuracy was better than 0.1mm and 0.15?, which has been shown to be 
good enough for high-resolution imaging. However, it did have several drawbacks 
that a good calibration took some extra preparation time before experiments, and the 
?tracking target? used for motion detection was not totally user friendly. The first 
problem was not a big issue since stereo calibration only needed to be done offline 
 
 108 
 
once as long as the cameras were fixed solidly on the holder (Section 9.4), and the 
calibration between the cameras and the MRI scanner was simplified to just several 
minutes. For the second problem, the tracking target was used only to remove noise 
caused by the scanner. A better shielding of the cameras should be able to solve the 
problem. 
For the single camera method, no calibration was needed at all, but only a 
short training period. The big advantage of this method was that no tracking target 
was needed, because face features were used directly for motion detection. It is more 
user friendly. However, the accuracy depended a lot on EPI image quality and 
sampled motion space during the training stage. For the first problem, parallel 
imaging (Pruessmann et al., 1999) should be able to improve both the speed and 
image quality. For the second problem, the volunteer should be instructed to move as 
much as he can inside the coil during the training data acquisition. 
 
 
 109 
 
Chapter 8  Future Work 
 
 
In this project, two systems were built for prospective motion correction for high-
resolution MRI in 7T. The feasibility of using optical tracking system to help MRI 
image acquisition has been demonstrated. But future work can be carried out to 
improve the performance further. 
 Both methods calculated motion after acquiring video data. Therefore, there is 
always a time delay for correction in the sequence. A better solution is to predict 
motion using the Kalman filter. Another positive feature about Kalman filter is its 
ability to remove tracking noise, which may further lead to an unnecessary tracking 
target for the stereo correction method.  
 Currently, the motion calculation was done continuously and whenever it was 
ready, it was sent to the MRI scan computer. The MRI scan computer then read the 
motion parameters inside the registers and executed the correction. Therefore when 
MRI started to read the data in the registers, they may not be fully updated yet. As a 
result, the adjustment of gradients and RF frequencies and phases may use 
uncorrected parameters. A future work is to synchronize the motion computation and 
MRI motion correction to avoid this problem. 
  A better way to acquire training data for the single camera method is to 
synchronize camera image acquisition with navigators. The purpose of training is to 
link camera images with motion parameters in MRI system. The EPI takes too long to 
achieve this. A 3D navigator, however, will take only several milliseconds to tens of 
milliseconds. As a result, the sampling of complete motion space will take less than 
 
 110 
 
one minute. With this method, we will be able to combine the advantage of both 
optical system (parallel computation with the sequence) and navigator (fast 
acquisition), and is believed to be more practical. 
 
 
 
 111 
 
Chapter 9  Appendices 
 
9.1 Derivation of motion in logical coordinate system 
The rotation and translation ( mlR  and mlT ) between MRI physical and logical 
coordinate systems can be read out from the pulse sequence after the scan-plane is 
prescribed as in Equation 4.66 (the time variable t0 is dropped for brevity): 
)( mllmlm TXRX +=                                                                                                [9.1]                                                                                                                             
New motion parameters are denoted as mR and mT  in physical coordinate system. 
Use subscripts ?o? and ?n? to define the position before and after motion (short for 
?old? and ?new?), we have 
)( mmmm TXRX += on                                                                                                 [9.2] 
and 
)( mllmlm TXRX += oo                                                                                                [9.3] 
The new motion parameters in logical coordinate system Rn and Tn must also satisfy 
the physical and logical relation, as 
)( nlnm TXRX += nn                                                                                                   [9.4] 
So, 
 
[ ]
[ ]
nmnmlmlmnlmlmn
nmnmllmlmn
nmnmmn
nmmmn
nmnl
TTRTRRRXRRR
TTRTXRRR
TTRXRR
TTXRR
TXRX
?++=
?++=
?+=
?+=
?=
???
??
??
?
?
111
11
11
1
1
)(
)(
o
o
o
o
nn
                                               [9.5] 
 
 112 
 
Since the scan plane needs to follow the motion, it means nlX  should equal 
o
lX  all the time, therefore the rotation term in Equation 9.5 should be the identity 
matrix and the translation should be zero: 
??
???
+=
=?
??
???
=?+
=
???
?
mnmln
mlmn
nmnmlmlmn
mlmn
TRTT
RRR
0TTRTRRR
IRRR
111
1
                                      [9.6] 
The motion parameters that need to be sent to the sequence for correction are 
the difference between the new ones and the old ones read-out from the sequence (i.e. 
mlR  and mlT ), which are: 
mlmcorr RRR =                                                                                                          [9.7] 
mcorcorr TRT
1?=
r                                                                                                          [9.8] 
 
9.2 MRI Coordinates from DICOM Image 
 
To get the MRI physical coordinate (millimeter) of any point from its DICOM image 
(index) is not very straightforward, because they use two different coordinate systems 
and different units. 
DICOM uses patient coordinate system, meaning for axial scan 
Xd increases  right to left 
Yd increases anterior to posterior 
Zd increases inferior to superior 
MRI physical coordinate system on the other hand is defined as: 
Xm increases  left to right 
Ym increases posterior to anterior 
 
 113 
 
Zm increases inferior to superior 
DICOM headers have directions within the image plane, i.e. the Xd and Yd direction, 
r1=Header.ImageOrientationPatient[1:3]                                                             [9.9] 
r2=Header.ImageOrientationPatient[4:6]                                                           [9.10] 
where Rd=[r1 r2 r3] is the rotation matrix describing the DICOM image plane in the 
patient coordinate system. The direction orthogonal to the plane, i.e. the Zd direction, 
is defined as 
213 rrr ?=                                                                                                               [9.11] 
Then inter-slice distances need to be determined for unit transformation. If 
only one slice is scanned, 
z = Header.Slicethickness                                                                                       [9.12] 
Otherwise, it can be calculated from slice positions, as 
z =( Header(i+1).ImagePositionPatient - Header(i).ImagePositionPatient)r3 [9.13] 
where i is the index of one slice. 
Voxel size is therefore, 
v = [Header.PixelSpacing(1)  Header.PixelSpacing(2)  z]                                   [9.14] 
where PixelSpacing saves in-plane pixel size in DICOM. 
Position of the first voxel in DICOM?s patient coordinate system is 
v0 = Header.ImagePositionPatientT                                                                                                        [9.15] 
To map voxels to the patient coordinates is 
??
?
??
?=
1
)(* 0
__ 0
vvRH diagd
patienttodicom                                                                    [9.16] 
Where diag(v) is a 3?3 matrix with the diagonal elements coming from v.  
 
 114 
 
Then map from the patient coordinate system adopted by DICOM to the MRI 
physical coordinate system is a rotation about the Z axis: 
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
__ MRItopatientH                                                                           [9.17] 
Combine both transformations: 
patienttoDICOMMRItopatient ____ * HHH =                                                                       [9.18] 
Any voxel in DICOM image has MRI coordinate as 
dm HXX =                                                                                                              [9.19] 
Note that if MATLAB is used, the voxel index is from 1 instead of 0, a translation 
matrix  
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
1
11
11
11
mat  is needed to be multiplied with Equation 9.18 for a 
correct mapping. 
matHHH ** ____ patienttoDICOMMRItopatient=                                                             [9.20] 
 
9.3 Gauss-Newton Algorithm 
Gauss-Newton iteration is a technique to locate the minimum of a multivariate 
function that is expressed as the sum of squares of non-linear functions (Kelley, 1999; 
Nocedal and Wright, 2006). In this section, we do not have the ambition to state new 
concepts on Newton iteration, but just to give a derivation for a better explanation in 
Section 4.2.1, 4.2.2 and 6.2.1. 
 
 115 
 
Let f be the function that maps a measurement vector x through parameter vector p  to 
another measurement y, the solution p is obtained when distance 2?  is minimized: 
22 );( pxy? f?=                                                                                                   [9.21] 
By assuming );( pxf as a linear function in the neighborhood of p0, a Taylor series 
expansion will lead to the approximation 
)();();();(? 000 ppppxpxpx ???+= fff                                                                   [9.22] 
The derivative of the error function at p is 
));(?();(
?
2?
2
pxyp pxp? ff
T
????=??                                                                             [9.23] 
And because );(? pxf is linear, 
p
px
p
px
?
?=
?
? );();(? 0ff                                                                                               [9.24] 
Substitute Equation 9.22 and 9.24 to 9.23, we get 
))();();(();(2? 0000
2
yppppxpxppxp? ????+??=?? fff
T
                                         [ 9.25] 
Let  the Jacobian matrix be: 
p
pxJ
?
?= );( 0f                                                                                                          [9.26] 
Then solve for p when 0?
2
=?? p? : 
00
1)( p?JJJp += ? TT                                                                                              [9.27] 
Therefore, the solution p is obtained with an initial estimate p0 and refined iteratively 
according to: 
 
 116 
 
nn
TT
n p?JJJp +=
?
+
1
1 )(                                                                                         [9.28] 
The above algorithm is suitable for minimization with respect to a small 
number of parameters, but not suitable for minimizing cost functions with respect to 
large numbers of parameters because of the heavy burden of matrix inverse in 
Equation 9.28. However, in this project, when many parameters need to be estimated, 
e.g. 6M+8 unknowns in Equation 4.22, the Jacobian matrix is very sparse and can be 
taken advantage for great time savings.  
Use Equation  4.22 as an example, the parameters to be estimated are intrinsic 
parameters Tyx ppkkyxff ],,,,,,,[ 212100=a  and extrinsic parameters ii TR ,  (using  
],...,,[ 621 iiii rrr=b  to denote the 6DOF parameters describing them). Then p can be 
written as  
TT
M
TTT ),...,,,(
21 bbbap =                                                                                          [9.29] 
The Jacobian matrix is therefore 
??
?
?
?
?
?
??
?
?
?
?
?
=??=
MM
p
B
B
B
A
A
A
p
mJ
??
2
1
2
1
                                                                [9.30] 
Where amA ??= /ipi , iipi bmB ??= / . The sparse condition is because of the fact 
that each projected points ipm depend only on a and ib  but not any other )( ijj ?b , 
so that 0/ =?? jip bm . Then Equation 9.28 can be solved for each iteration step a lot 
faster based on this sparse matrix. 
 
 117 
 
9.4 Stereo System QA 
The stereo camera calibration takes quite some time but it does not have to be done 
every time before using it, as long as nothing changes. Therefore, it is important to 
check the stereo to make sure nothing changed, i.e. a system QA. It can be realized by 
comparing the 3D position estimations for a checkerboard pattern using either camera 
or the stereo. Because the checkerboard size is known, we are able to estimate corner 
positions in 3D from a single camera based on homography (Equation 4.18 ? 4.21). 
The difference between single camera and stereo estimation can be very large even if 
a small change in the cameras? parameters changed (Figure 9-1)  
 
 
Figure 9-1    3D reconstruction error if camera parameters change.  The solid lines projection is 
correct and the dotted projection is the result from different camera parameters with a small 
focal length and principle point change. It?s obvious that the 3D estimation magnify the camera 
parameters distinction.  
 
 
Figure 9-2 shows the 3D corner positions estimated using left camera (red 
circles), right camera (blue circles) and stereo (green circles). When correct 
calibration parameters were used, the three estimations match perfectly as shown in 
(a). But if a very small change is made on the focal length, e.g. 0.1mm, they differ 
quite a lot (b). Also, the difference between three estimations can be calculated. For 
 
 118 
 
image (a), the maximum error between left-right camera estimation is 0.0098092; the 
maximum error between left-stereo camera estimation is 0.064398; and the maximum 
error between right-stereo camera estimation is 0.087699. But for image (b) the 
maximum error between left-right camera estimation is 0.48134; the maximum error 
between left-stereo camera estimation is 3.287; and the maximum error between 
right-stereo camera estimation is 3.107. Therefore, the stereo QA can be 
accomplished either visually or quantitatively.  
In real application, the calibration parameters we use do not change, while 
cameras? status may change accidentally. But the error magnification should be the 
same as in this simulation. Therefore, each time before the stereo system is used, run 
the system QA to make sure if a re-calibration is necessary. 
 
  
Figure 9-2    3D position estimation for a checkerboard pattern. Every circle represents a corner 
of the checkerboard. Red circles represent positions estimated from the left camera; blue circles 
represent positions estimated from the right camera; and green circles represent those estimated 
from the stereo. (a) correct calibration parameters were used. (b) focal length was changed with 
0.1mm. 
          (a)                  (b) 
 
 119 
 
Chapter 10 Bibliography 
 
 
 
Abdel-Aziz, Y. I. and Karara, H. M. (1971) Direct linear transformation into object 
space coordinates in close-range photogrammetry. Proc. Symposium on Close-
Range Photogrammetry, 1-18. 
Arun, K. S., Huang, T. S. and Blostein, S. D. (1987) Least-squares fitting of two 3D 
point sets. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 
9, 698-700. 
Barnwell, J. D., Smith, J. K. and Castillo, M. (2007) Utility of navigator-prospective 
acquisition correction technique (PACE) for reducing motion in brain MR 
imaging studies. AJNR Am J Neuroradiol, 28, 790-1. 
Bernstein, M. A., King, K. F. and Zhou, Z. J. (2004) Handbook of MRI pulse 
sequences. Amsterdam ; Boston: Academic Press. 
Blake, A., Curwen, R. and Zisserman, A. (1993) A framework for spatiotemporal 
control in the tracking of visual contours. International Journal of Computer 
Vision, 11, 127-145. 
Bouguet, J. Y. (1999) Visual methods for three-dimensional modeling. Ph.D. Thesis: 
California Institute of Technology. 
Broggi, A. (1998) Vision-based diving assistance. Intelligent Systems and their 
Applications, IEEE, 13, 22-23. 
Charbonnier, L. and Fournier, A. (1995) Heading guidance and obstacles localization 
for an indoor mobile robot. Advanced Robotics, 1995. ICAR '95. Proceedings., 
8th International Conference on. pp. 507-513. 
Chou, C.-H. and Chen, Y.-C. (1990) Moment-preserving pattern matching. Pattern 
Recognition, 23, 461-474. 
Cohen, M. S. and Weisskoff, R. M. (1991) Ultra-fast imaging. Magnetic Resonance 
Imaging, 9, 1-37. 
 
 120 
 
Coifman, B., Beymer, D., McLauchlan, P. and Malik, J. (1998) A real-time computer 
vision system for vehicle tracking and traffic surveillance. Transportation 
Research Part C: Emerging Technologies, 6, 271-288. 
Daniilidis, K. and Ernst, J. (1996) Active intrinsic calibration using vanishing points. 
Pattern Recognition Letters, 17, 1179-1189. 
Derbyshire, J. A., Wright, G. A., Henkelman, R. M. and Hinks, R. S. (1998) Dynamic 
scan-plane tracking using MR position monitoring. J Magn Reson Imaging, 8, 
924-32. 
Dold, C., Zaitsev, M., Speck, O., Firle, E. A., Hennig, J. and Sakas, G. (2006) 
Advantages and limitations of prospective head motion compensation for MRI 
using an optical motion tracking device. Acad Radiol, 13, 1093-103. 
Duda, R. O., Hart, P. E. and Stork, D. G. (2001) Pattern classification. New York: 
Wiley. 
Duyn, J. H., van Gelderen, P., Li, T. Q., de Zwart, J. A., Koretsky, A. P. and 
Fukunaga, M. (2007) High-field MRI of brain cortical substructure based on 
signal phase. Proc Natl Acad Sci U S A, 104, 11796-801. 
Eggert, D. W., Lorusso, A. and Fisher, R. B. (1997) Estimating 3-D rigid body 
transformations: a comparison of four major algorithms. Machine Vision and 
Applications, 9, 272-290. 
Ehman, R. L. and Felmlee, J. P. (1989) Adaptive technique for high-definition MR 
imaging of moving structures. Radiology, 173, 255-63. 
Eviatar, H., Saunders, J. and Hoult, D. (1997) Motion Compensation by gradient 
adjustment. Proc Intl Magn Reson Med, 1898. 
Eviatar, H., Schattka, B., Sharp, J., Rendell, J. and Alexander, M. (1999) Real time 
head motion correction for functional MRI. Proc Intl Magn Reson Med, 269. 
Faig, W. (1975) Calibration of close-range photogrammetric systems: mathematical 
formulation. Photogrammetric Engineering and Remote Sensing, 41, 1479-
1486. 
 
 121 
 
Farzaneh, F., Riederer, S. J. and Pelc, N. J. (1990) Analysis of T2 limitations and off-
resonance effects on spatial resolution and artifacts in echo-planar imaging. 
Magnetic Resonance in Medicine, 14, 123-139. 
Faugeras, O. (1993) Three-dimensional computer vision : a geometric viewpoint. 
Cambridge, Mass.: MIT Press. 
Faugeras, O. and Robert, L. (1996) What can two images tell us about a third one? 
International Journal of Computer Vision, 18, 5-19. 
Faugeras, O. and Toscani, G. (1986) The calibration problem for stereo. Computer 
Vision and Pattern Recognition, 1986. Proceedings., 1986 IEEE Computer 
Society Conference on. pp. 15-20. 
Friston, K., Ashburner, J., Frith, C., Poline, J., Heather, J. and Frackowiak, R. (1995) 
Spatial registration and normalization of images. Human Brain Mapping, 2, 
165-189. 
Friston, K. J. (2007) Statistical parametric mapping : the analysis of funtional brain 
images. Amsterdam ; Boston: Elsevier/Academic Press. 
Friston, K. J., Williams, S., Howard, R., Frackowiak, R. S. and Turner, R. (1996) 
Movement-related effects in fMRI time-series. Magn Reson Med, 35, 346-55. 
Fu, Z. W., Wang, Y., Grimm, R. C., Rossman, P. J., Felmlee, J. P., Riederer, S. J. and 
Ehman, R. L. (1995) Orbital navigator echoes for motion measurements in 
magnetic resonance imaging. Magn Reson Med, 34, 746-53. 
Gerlach, W. and Stern, O. (1924) Ueber die Richtungsquantelung im Magnetfeld. Ann 
Phys, 74, 673. 
Glover, G. H. and Lee, A. T. (1995) Motion artifacts in fMRI: comparison of 2DFT 
with PR and spiral scan methods. Magn Reson Med, 33, 624-35. 
Glover, G. H. and Noll, D. C. (1993) Consistent projection reconstruction (CPR) 
techniques for MRI. Magn Reson Med, 29, 345-51. 
 
 122 
 
Glover, G. H. and Pauly, J. M. (1992) Projection reconstruction techniques for 
reduction of motion effects in MRI. Magn Reson Med, 28, 275-89. 
Gorodnichy, D. O. and Roth, G. (2004) Nouse [`]use your nose as a mouse' 
perceptual vision technology for hands-free games and interfaces. Image and 
Vision Computing, 22, 931-942. 
Hager, G. D. and Belhumeur, P. N. (1998) Efficient region tracking with parametric 
models of geometry and illumination. Pattern Analysis and Machine 
Intelligence, IEEE Transactions on, 20, 1025-1039. 
Hall, E. L., Tio, J. B. K., McPherson, C. A. and Sadjadi, F. A. (1982) Measuring 
Curved Surfaces for Robot Vision. Computer, 15, 42-54. 
Haralick, R. M., Joo, H., Lee, C., Zhuang, X., Vaidya, V. G. and Kim, M. B. (1989) 
Pose estimation from corresponding point data. Systems, Man and 
Cybernetics, IEEE Transactions on, 19, 1426-1446. 
Harris, C. and Stephens, M. (1988) A combined corner and edge detector. Alvey 
Vision Conference. pp. 147-151. 
Hartley, R. and Zisserman, A. (2003) Multiple view geometry in computer vision. 
Cambridge, UK ; New York: Cambridge University Press. 
Hartley, R. I. and Sturm, P. (1997) Triangulation. Computer vision and image 
understanding, 68, 146-157. 
Horn, B. K. P. (1987) Closed-form solution of absolute orientation using unit 
quaternions. J. Opt. Soc. Am. A, 4, 629-642. 
Horn, B. K. P., Hilden, H. M. and Negahdaripour, S. (1988) Closed-form solution of 
absolute orientation using orthonormal matrices. J. Opt. Soc. Am. A, 5, 1127-
1135. 
Hu, Y. and Glover, G. H. (2007) Three-dimensional spiral technique for high-
resolution functional MRI. Magn Reson Med, 58, 947-51. 
 
 123 
 
Jenkinson, M., Bannister, P., Brady, M. and Smith, S. (2002) Improved optimization 
for the robust and accurate linear registration and motion correction of brain 
images. Neuroimage, 17, 825-41. 
Jezzard, P. and Clare, S. (1999) Sources of distortion in functional MRI data. Human 
Brain Mapping, 8, 80-85. 
Kadah, Y. M., Abaza, A. A., Fahmy, A. S., Youssef, A. B., Heberlein, K. and Hu, X. 
P. (2004) Floating navigator echo (FNAV) for in-plane 2D translational 
motion estimation. Magn Reson Med, 51, 403-7. 
Kakadiaris, L. and Metaxas, D. (2000) Model-based estimation of 3D human motion. 
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22, 1453-
1459. 
Kanatani, K. (1994) Analysis of 3-D rotation fitting. Pattern Analysis and Machine 
Intelligence, IEEE Transactions on, 16, 543-549. 
Kelley, C. T. (1999) Iterative methods for optimization. Philadelphia: SIAM. 
Kim, S., Dougherty, L., Rosen, M. A., Song, H. K. and Poptani, H. (2008) Automatic 
correction of in-plane bulk motion artifacts in self-navigated radial MRI. 
Magn Reson Imaging, 26, 367-78. 
Lauterbur, P. C. (1973) Image formation by induced local interactions. Examples 
employing nuclear magnetic resonance. . Nature, 190-191. 
Lee, C. C., Grimm, R. C., Manduca, A., Felmlee, J. P., Ehman, R. L., Riederer, S. J. 
and Jack, C. R., Jr. (1998) A prospective approach to correct for inter-image 
head rotation in fMRI. Magn Reson Med, 39, 234-43. 
Lee, C. C., Jack, C. R., Jr., Grimm, R. C., Rossman, P. J., Felmlee, J. P., Ehman, R. 
L. and Riederer, S. J. (1996) Real-time adaptive motion correction in 
functional MRI. Magn Reson Med, 36, 436-44. 
Liang, Z.-P., Lauterbur, P. C. and IEEE Engineering in Medicine and Biology 
Society. (2000) Principles of magnetic resonance imaging : a signal 
processing perspective. Bellingham, Wash. 
New York: SPIE Optical Engineering Press ; 
 
 124 
 
IEEE Press. 
Liu, C., Bammer, R., Kim, D. H. and Moseley, M. E. (2004) Self-navigated 
interleaved spiral (SNAILS): application to high-resolution diffusion tensor 
imaging. Magn Reson Med, 52, 1388-96. 
Lucas, B. D. and Kanade, T. (1981) An interative image registration technique with 
an application to stereo vision. International Joint Conference on Aritificial 
Intelligence, 1981. Proceedings.,, 674-679. 
Mansfield, P. (1977) Nulti-planar image formation using NMR spin-echoes. J Phys C 
Solid State Phys, 10, L55-L58. 
Mansfield, P. and Grannell, P. (1973) NMR 'diffraction' in solids? J Phys C Solid 
State Phys, 6, L422. 
Marr, D., Poggio, T. and Ullman, S. (1979) Bandpass channels, zero-crossings, and 
early visual information processing. J. Opt. Soc. Am., 69, 914-916. 
Meyer, C. H., Hu, B. S., Nishimura, D. G. and Macovski, A. (1992) Fast spiral 
coronary artery imaging. Magn Reson Med, 28, 202-13. 
Milella, A. and Siegwart, R. (2006) Stereo-Based Ego-Motion Estimation Using Pixel 
Tracking and Iterative Closest Point. Computer Vision Systems, 2006 ICVS 
'06. IEEE International Conference on. pp. 21-21. 
Nevo, E., Roth, A. and Hushek, M. (2002) An electromagnetic 3D locator system for 
use in MR scanners. Proc Intl Magn Reson Med, 334. 
Newman, T. S. and Jain, A. K. (1995) A Survey of Automated Visual Inspection. 
Computer vision and image understanding, 61, 231-262. 
Nocedal, J. and Wright, S. J. (2006) Numerical optimization. New York: Springer. 
Petrie, D. W., Costa, A. F., Takahashi, A., Yen, Y. F. and Drangova, M. (2005) 
Optimizing spherical navigator echoes for three-dimensional rigid-body 
motion detection. Magn Reson Med, 53, 1080-7. 
 
 125 
 
Pipe, J. G. (1999) Motion correction with PROPELLER MRI: application to head 
motion and free-breathing cardiac imaging. Magn Reson Med, 42, 963-9. 
Pipe, J. G., Farthing, V. G. and Forbes, K. P. (2002) Multishot diffusion-weighted 
FSE using PROPELLER MRI. Magn Reson Med, 47, 42-52. 
Pipe, J. G. and Zwart, N. (2006) Turboprop: improved PROPELLER imaging. Magn 
Reson Med, 55, 380-5. 
Pruessmann, K. P., Weiger, M., Scheidegger, M. B. and Boesiger, P. (1999) SENSE: 
Sensitivity encoding for fast MRI. Magnetic Resonance in Medicine, 42, 952-
962. 
Rabi, I. I., Zacharias, J. R., Millman, S. and Kusch, P. (1938) A new method of 
measuring nuclear magnetic moments. Phys Rev, 53, 318. 
Santos, J. M., Wright, G. A. and Pauly, J. M. (2004) Flexible real-time magnetic 
resonance imaging framework. Conf Proc IEEE Eng Med Biol Soc, 2, 1048-
51. 
Shi, J. and Tomasi, C. (1994) Good features to track. Computer Vision and Pattern 
Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society 
Conference on. pp. 593-600. 
Slama, C. C., Theurer, C. and Henriksen, S. W. (1980) Manual of photogrammetry. 
Falls Church, Va.: American Society of Photogrammetry. 
Spuentrup, E., Stuber, M., Botnar, R. M., Kissinger, K. V. and Manning, W. J. (2002) 
Real-time motion correction in navigator-gated free-breathing double-oblique 
submillimeter 3D right coronary artery magnetic resonance angiography. 
Invest Radiol, 37, 632-6. 
Thesen, S., Heid, O., Mueller, E. and Schad, L. R. (2000) Prospective acquisition 
correction for head motion with image-based tracking for real-time fMRI. 
Magn Reson Med, 44, 457-65. 
Thiel, T., Czisch, M., Elbel, G. K. and Hennig, J. (2002) Phase coherent averaging in 
magnetic resonance spectroscopy using interleaved navigator scans: 
 
 126 
 
compensation of motion artifacts and magnetic field instabilities. Magn Reson 
Med, 47, 1077-82. 
Tomasi, C. and Kanade, T. (1992) Shape and motion from image streams under 
orthography: a factorization method. International Journal of Computer 
Vision, 9, 137-154. 
Tomasi, C. and Kanatani, K. (1991) Detection and tracking of point features. Tenical 
Report: CMU-CS-91-132. 
Tommasini, T., Fusiello, A., Trucco, E. and Roberto, V. (1998) Making good features 
track better. Computer Vision and Pattern Recognition, 1998. Proceedings. 
1998 IEEE Computer Society Conference on. pp. 178-183. 
Torr, P. H. S. and Zisserman, A. (1997) Robust parameterization and computation of 
the trifocal tensor. Image and Vision Computing, 15, 591-605. 
Tremblay, M., Tam, F. and Graham, S. J. (2005) Retrospective coregistration of 
functional magnetic resonance imaging data using external monitoring. Magn 
Reson Med, 53, 141-9. 
Tsai, R. (1987) A versatile camera calibration technique for high-accuracy 3D 
machine vision metrology using off-the-shelf TV cameras and lenses. 
Robotics and Automation, IEEE Journal of, 3, 323-344. 
Twieg, D. B. (1983) The k-trajectory formulation of the NMR imaging process with 
applications in analysis and synthesis of imaging methods. Med Phys, 10, 610-
21. 
Umeyama, S. (1991) Least-squares estimation of transformation parameters between 
two point patterns. Pattern Analysis and Machine Intelligence, IEEE 
Transactions on, 13, 376-380. 
van der Kouwe, A. J., Benner, T. and Dale, A. M. (2006) Real-time rigid body motion 
correction and shimming using cloverleaf navigators. Magn Reson Med, 56, 
1019-32. 
 
 127 
 
van Gelderen, P., de Zwart, J. A., Starewicz, P., Hinks, R. S. and Duyn, J. H. (2007) 
Real-time shimming to compensate for respiration-induced B0 fluctuations. 
Magn Reson Med, 57, 362-8. 
Walker, M. W., Shao, L. and Volz, R. A. (1991) Estimating 3-D location parameters 
using dual number quaternions. CVGIP: Image Understanding, 54, 358-367. 
Wang, F. N., Huang, T. Y., Lin, F. H., Chuang, T. C., Chen, N. K., Chung, H. W., 
Chen, C. Y. and Kwong, K. K. (2005) PROPELLER EPI: an MRI technique 
suitable for diffusion tensor imaging at high field strength with reduced 
geometric distortions. Magn Reson Med, 54, 1232-40. 
Wang, L. L. and Tsai, W.-H. (1991) Camera calibration by vanishing lines for 3-D 
computer vision. Pattern Analysis and Machine Intelligence, IEEE 
Transactions on, 13, 370-376. 
Ward, H. A., Riederer, S. J., Grimm, R. C., Ehman, R. L., Felmlee, J. P. and Jack, C. 
R., Jr. (2000) Prospective multiaxial motion correction for fMRI. Magn Reson 
Med, 43, 459-69. 
Welch, E. B., Manduca, A., Grimm, R. C., Ward, H. A. and Jack, C. R., Jr. (2002) 
Spherical navigator echoes for full 3D rigid body motion measurement in 
MRI. Magn Reson Med, 47, 32-41. 
Weng, J., Ahuja, N. and Huang, T. S. (1992a) Matching two perspective views. 
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 14, 806-
825. 
Weng, J., Cohen, P. and Herniou, M. (1992b) Camera calibration with distortion 
models and accuracy evaluation. Pattern Analysis and Machine Intelligence, 
IEEE Transactions on, 14, 965-980. 
Wyatt, C. L., Ari, N. and Kraft, R. A. (2005) Spherical navigator registration using 
harmonic analysis for prospective motion correction. Inf Process Med 
Imaging, 19, 738-49. 
Yao, Y.-S. and Chellappa, R. (1995) Tracking a dynamic set of feature points. Image 
Processing, IEEE Transactions on, 4, 1382-1395. 
 
 128 
 
Zaitsev, M., Dold, C., Sakas, G., Hennig, J. and Speck, O. (2006) Magnetic resonance 
imaging of freely moving objects: prospective real-time motion correction 
using an external optical motion tracking system. Neuroimage, 31, 1038-50. 
Zhang, Z. (2000) A flexible new technique for camera calibration. Pattern Analysis 
and Machine Intelligence, IEEE Transactions on, 22, 1330-1334. 
Zhang, Z., Deriche, R., Faugeras, O. and Luong, Q.-t. (1994) A robust technique for 
matching two uncalibrated images through the recovery of the unknown 
epipolar geometry. Tenical Report: INRIA. 
Zheng, Q. and Chellappa, R. (1993) Automatic feature point extraction and tracking 
in image sequences for unknown camera motion. Computer Vision, 1993. 
Proceedings., Fourth International Conference on. pp. 335-339. 
Zhou, S. K., Chellappa, R. and Moghaddam, B. (2004) Visual tracking and 
recognition using appearance-adaptive models in particle filters. Image 
Processing, IEEE Transactions on, 13, 1491-1506.