ABSTRACT 
 
 
 
 
Title of Document: SUPPORTING EXPLORATORY WEB 
SEARCH WITH MEANINGFUL AND 
STABLE CATEGORIZED OVERVIEWS 
  
 William Michael Kules, III,  
Doctor of Philosophy, 2006 
Directed By: Professor Ben Shneiderman,  
Department of Computer Science 
 
 
 
This dissertation investigates the use of categorized overviews of web search results, 
based on meaningful and stable categories, to support exploratory search. When 
searching in digital libraries and on the Web, users are challenged by the lack of 
effective overviews. Adding categorized overviews to search results can provide 
substantial benefits when searchers need to explore, understand, and assess their 
results. When information needs are evolving or imprecise, categorized overviews 
can stimulate relevant ideas, provoke illuminating questions, and guide searchers to 
useful information they might not otherwise find. When searchers need to gather 
information from multiple perspectives or sources, categorized overviews can make 
those aspects visible for interactive filtering and exploration. However, they add 
visual complexity to the interface and increase the number of tactical decisions to be 
made while examining search results. 
 
  
Two formative studies (N=18 and N=12) investigated how searchers use categorized 
overviews in the domain of U.S. government web search. A third study (N=24) 
evaluated categorized overviews of general web search results based on thematic, 
geographic, and government categories. Participants conducted four exploratory 
searches during a two hour session to generate ideas for newspaper articles about 
specified topics. Results confirmed positive findings from the formative studies, 
showing that subjects explored deeper while feeling more organized and satisfied, but 
did not find objective differences in the outcomes of the search task. Results indicated 
that searchers use categorized overviews based on thematic, geographic, and 
organizational categories to guide the next steps in their searches.  
 
This dissertation identifies lightweight search actions and tactics made possible by 
adding a categorized overview to a list of web search results. It describes a design 
space for categorized overviews of search results, and presents a novel application of 
the brushing and linking technique to enrich search result interfaces with lightweight 
interactions. It proposes a set of principles, refined by the studies, for the design of 
exploratory search interfaces, including ?Organize overviews around meaningful 
categories,? ?Clarify and visualize category structure,? and ?Tightly couple category 
labels to search result list.? These contributions will be useful to web search 
researchers and designers, information architects and web developers.  
 
  
 
 
 
 
 
 
SUPPORTING EXPLORATORY WEB SEARCH WITH MEANINGFUL AND 
STABLE CATEGORIZED OVERVIEWS   
 
 
 
By 
 
 
William Michael Kules, III 
 
 
 
 
 
Dissertation submitted to the Faculty of the Graduate School of the  
University of Maryland, College Park, in partial fulfillment 
of the requirements for the degree of 
Doctor of Philosophy 
2006 
 
 
 
 
 
 
 
 
 
 
Advisory Committee: 
Professor Ben Shneiderman, Chair 
Professor Dagobert Soergel 
Associate Professor Douglas W. Oard 
Assistant Professor Lise Getoor 
Catherine Plaisant, Associate Research Scientist 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
? Copyright by 
William Michael Kules, III 
2006 
 
 
 
 
 
 
 
 
 
 
 
 
 
ii 
 
Dedication 
 
 
 
 
 
 
 
 
 
This dissertation is dedicated to 
 
the memory of Abbott and Wanda Washburn, 
 
and to Julia, Genna, and Ruby. 
 
 
 
 
iii 
 
Acknowledgements 
Researching and writing this dissertation ranks among the most satisfying intellectual 
activities I have been privileged to pursue. It would not have been possible without 
the support of family, friends, mentors, and colleagues. 
 
Ben Shneiderman introduced me to the field of human-computer interaction research. 
He showed me a path that allows me to pursue my interest in technology while 
contributing ? in a small but direct way ? to humane ends. He has mentored me as I 
followed this path, provided financial support, made terrific opportunities available to 
me, and encouraged me when I questioned my ability to finish. 
 
My committee members have challenged and supported me along the way.  
Doug Oard has consistently challenged me to sharpen my thinking, clarify my 
writing, and more deeply explore fundamental human-computer interaction issues. 
Lise Getoor has provided practical advice and feedback at critical junctures. Dagobert 
Soergel has pushed me to expand my understanding of information organization 
beyond data structures, to appreciate the human importance of classification much 
more deeply than I otherwise would. Catherine Plaisant has been a mentor since the 
beginning of my association with the Human-Computer Interaction Lab. She 
facilitated my early work with government agencies, provided detailed advice on my 
research, introduced me to the espresso machine on the fourth floor, and has always 
been available to bounce ideas around. 
 
Other colleagues, mentors, and friends have helped in ways too numerous to recount. 
I am grateful for help from Alex Aris, Ira Chinoy, Gene Chipman, Abdur 
Chowdhury, Chip Denman, Jerry Fails, Kathleen Grathwol, Harry Hochheiser, Chang 
Hu, Hilary Hutchinson, Jack Kustanowitz, Tom Lalonde, Katy Lawley, Jaime 
Montemayor, Craig Murray, Anne Rose, Kiki Schneider, Greg Smith, Ryen White, 
Haixia Zhao, Julie Zito, the study participants, and many others in the Human-
Computer Interaction Lab and beyond. The staff of the Computer Science Department 
and at the Institute for Advanced Computing Studies have provided valued 
administrative support. This research was supported in part by an AOL Fellowship in 
Human-Computer Interaction and National Science Foundation Digital Government 
Initiative grant (EIA 0129978) ?Towards a Statistical Knowledge Network.? 
 
My deepest gratitude goes to my family: my daughters, Genna and Ruby, who put up 
with a too-occasionally distracted dad, and to my partner and wife, Julia Washburn. 
Her unwavering support during a decade of graduate school made this possible. 
 
 
 
 
iv 
 
Table of Contents 
Dedication..................................................................................................................... ii 
Acknowledgements......................................................................................................iii 
Table of Contents......................................................................................................... iv 
List of Tables .............................................................................................................viii 
List of Figures............................................................................................................... x 
Chapter 1: Introduction........................................................................................... 1 
1.1 Motivation..................................................................................................... 1 
1.2 Illustrative example....................................................................................... 2 
1.3 Research contributions.................................................................................. 7 
1.4 Terminology.................................................................................................. 9 
Chapter 2: Related work ....................................................................................... 10 
2.1 Information seeking ? theory, studies and systems .................................... 10 
2.2 Using categories for information retrieval.................................................. 16 
2.2.1 Studies of categorized overviews for web search............................... 17 
2.2.2 Other studies of categorized overviews for search results.................. 21 
2.3 Visualizing and interacting with search results .......................................... 23 
2.4 Summary..................................................................................................... 28 
Chapter 3: Early designs and formative studies ................................................... 29 
3.1 Early designs............................................................................................... 30 
3.2 Formative study prototypes......................................................................... 35 
3.3 Study 1: Expandable outliner vs. treemap vs. control ................................ 37 
3.3.1 Research questions.............................................................................. 37 
3.3.2 Experimental conditions ..................................................................... 38 
3.3.3 Hypotheses.......................................................................................... 39 
3.3.4 Scenario and task design..................................................................... 39 
3.3.5 Materials and procedure...................................................................... 43 
3.3.6 Participants.......................................................................................... 44 
3.3.7 Results................................................................................................. 45 
3.4 Study 2: Automated clustering vs. government hierarchy.......................... 54 
3.4.1 Research questions.............................................................................. 54 
3.4.2 Experimental Conditions .................................................................... 56 
3.4.3 Scenario and task design..................................................................... 57 
3.4.4 Procedure ............................................................................................ 60 
3.4.5 Participants.......................................................................................... 61 
3.4.6 Results................................................................................................. 61 
3.5 Discussion of studies 1 and 2...................................................................... 69 
3.5.1 Benefits of categorized overviews...................................................... 69 
3.5.2 Effect of visual presentation of overviews.......................................... 71 
3.5.3 Effect of categories used for overviews.............................................. 72 
3.5.4 The importance of text........................................................................ 73 
3.5.5 Other findings ..................................................................................... 73 
3.5.6 Limitations of these studies ................................................................ 74 
3.5.7 Summary of studies 1 and 2................................................................ 75 
 
 
v 
 
Chapter 4: Analysis, principles, and design of the SERVICE system.................. 76 
4.1 Analysis of categorized overview use......................................................... 76 
4.1.1 Process model of exploratory search .................................................. 77 
4.1.2 Action: Scan categorized overview .................................................... 84 
4.1.3 Action: Narrow or broaden by category ............................................. 86 
4.1.4 Action: Move pointer over result........................................................ 87 
4.1.5 Action: Move pointer over category................................................... 87 
4.1.6 Tactics................................................................................................. 88 
4.1.7 Other impacts of categorized overviews............................................. 89 
4.1.8 Implications......................................................................................... 91 
4.2 Design principles for exploratory search interfaces.................................... 92 
4.2.1 Provide overviews of large sets of results........................................... 94 
4.2.2 Organize overviews around meaningful categories............................ 95 
4.2.3 Visualize and clarify category structure ............................................. 96 
4.2.4 Tightly couple category labels to result list ........................................ 97 
4.2.5 Ensure that full category information is available .............................. 99 
4.2.6 Support multiple types of categories and visual presentations ......... 100 
4.2.7 Use separate facets for each type of category................................... 101 
4.2.8 Arrange text for scanning/skimming ................................................ 102 
4.2.9 Visually encode quantitative attributes on a stable visual structure . 103 
4.2.10 Summary........................................................................................... 104 
4.3 SERVICE requirements and architecture ................................................. 104 
4.4 Fast Feature classifiers.............................................................................. 109 
4.4.1 Online Lean Techniques ................................................................... 115 
4.4.2 Top-Level DNS Domain Classifier .................................................. 116 
4.4.3 Last Time Visited Classifier ............................................................. 117 
4.4.4 Document Size Classifier.................................................................. 118 
4.4.5 Online Rich Techniques.................................................................... 119 
4.4.6 U.S. Government Classifier .............................................................. 120 
4.4.7 Open Directory Project Classifier..................................................... 121 
4.4.8 Multi-threading the ODP Classifier.................................................. 125 
4.4.9 Extracting multiple facets from the ODP hierarchy ......................... 125 
4.5 AOL Music prototype............................................................................... 126 
4.6 General web search interface.................................................................... 131 
4.7 Summary of the SERVICE system........................................................... 139 
Chapter 5: Study 3: Categorized overviews using ODP and US government 
categories 141 
5.1 Research questions.................................................................................... 142 
5.2 Experimental conditions ........................................................................... 144 
5.3 Scenario and task design........................................................................... 147 
5.4 Hypotheses................................................................................................ 150 
5.4.1 Process-oriented hypotheses ............................................................. 150 
5.4.2 Outcome-oriented hypotheses........................................................... 155 
5.5 Participants................................................................................................ 156 
5.6 Materials ................................................................................................... 157 
5.6.1 Interfaces........................................................................................... 157 
 
 
vi 
 
5.6.2 Script and training videos ................................................................. 158 
5.6.3 Online questionnaires........................................................................ 158 
5.6.4 Paper forms ....................................................................................... 159 
5.6.5 System technology............................................................................ 159 
5.7 Procedure .................................................................................................. 160 
5.8 Pilot testing ............................................................................................... 162 
5.9 Analysis methodology .............................................................................. 163 
5.9.1 Quantitative analysis methodology................................................... 163 
5.9.2 Qualitative analysis methodology..................................................... 165 
5.10 Results....................................................................................................... 170 
5.10.1 Quantitative results ........................................................................... 170 
5.10.2 Qualitative results ............................................................................. 193 
5.11 Discussion................................................................................................. 208 
5.11.1 Topic and task efficacy ..................................................................... 208 
5.11.2 Differences in search behavior.......................................................... 209 
5.11.3 Cognitive impact of categorized overviews...................................... 212 
5.11.4 Differences by breadth of topic......................................................... 217 
5.11.5 Differences in searcher thinking about search tactics....................... 218 
5.11.6 Effect on quality of search outcome ................................................. 220 
5.12 Limitations ................................................................................................ 221 
5.12.1 Subject population ............................................................................ 221 
5.12.2 Category structure and membership ................................................. 221 
5.12.3 Scenario and task .............................................................................. 223 
5.12.4 Time constraints................................................................................ 224 
5.12.5 Interface design................................................................................. 224 
5.12.6 Topic breadth .................................................................................... 224 
5.12.7 Quantitative analysis......................................................................... 225 
5.12.8 Qualitative analysis........................................................................... 225 
5.13 Summary................................................................................................... 226 
Chapter 6: Contributions..................................................................................... 230 
6.1 Benefits of categorized overviews............................................................ 230 
6.2 Limitations of categorized overviews....................................................... 230 
6.3 Analysis of search tactics with categorized overviews............................. 231 
6.4 Design principles for categorized overviews of search results................. 232 
6.5 Fast feature classifiers............................................................................... 233 
6.6 Enriching search result interaction with brushing and linking ................. 233 
6.7 Design space of categorized overviews .................................................... 234 
6.8 Working system for categorized overviews of web search results........... 234 
Chapter 7: Future work....................................................................................... 236 
7.1 Evaluation of exploratory search interfaces.............................................. 236 
7.2 Structure of category hierarchies for search results.................................. 237 
7.3 Graphical overviews of search results ...................................................... 238 
7.4 Leveraging the Semantic Web.................................................................. 239 
7.5 Lightweight customization of categories.................................................. 239 
Appendix A: Study 1 ? Perspectives identified by subjects ..................................... 241 
Appendix B: Study 1 ? Unusual results identified by subjects................................. 245 
 
 
vii 
 
Appendix C: Study 3 ? Paper materials.................................................................... 247 
Appendix D: Study 3 ? Online questionnaires ......................................................... 255 
Bibliography ............................................................................................................. 261 
 
 
viii 
 
List of Tables 
Table 1. Mean correctness scores for each interface, with standard deviation in 
parentheses.......................................................................................................... 46 
Table 2. Median position of identified perspective, with standard deviation in 
parentheses.......................................................................................................... 47 
Table 3. The fraction and percent of perspectives which were found beyond the top 
10 results. ............................................................................................................ 47 
Table 4. Mean number of top-level and second-level categories selected during 
perspectives task for the overview conditions, with standard deviation in 
parentheses.......................................................................................................... 48 
Table 5. Number and percent of participants who found something unusual by 
condition and scenario. ....................................................................................... 48 
Table 6. Number and percent of times a participant identified selected unusual items. 
Maximum possible was 18 (6 participants per condition, 3 scenarios each)...... 49 
Table 7. Mean subjective satisfaction measures, 1=poor, 9=good, except for #4 
(Difficulty) which is reversed. Standard deviations are shown in parentheses 
with ANOVA degrees of freedom, F values and significance. Signifiant 
differences are shown in bold. ............................................................................ 51 
Table 8. Mean differences in subjective ratings between conditions (standard 
deviation in parentheses). These questions were asked immediately after each 
scenario. .............................................................................................................. 62 
Table 9. Mean preferences for each task by all participants, participants associated 
with federal government and participants not associated with federal government 
(1 = preferred automated clustering, 9 = preferred government hierarchy). ...... 63 
Table 10. Actions available to searchers when evaluating a typical search result list.81 
Table 11. Additional actions available to searchers when evaluating search results 
with categorized overviews................................................................................. 83 
Table 12. Tactics enabled by categorized overviews.................................................. 89 
Table 13. Seven web search interfaces that represent large result sets in the initial 
results. The default value and user-selectable range are shown where it was 
reported or could be determined. ........................................................................ 95 
Table 14: Techniques for Search Result Categorization. SERVICE implements a set 
of online, fast-feature classifiers, in the black border....................................... 113 
Table 15. Online lean classifiers can provide simple categories to help users locate 
relevant information.  The three classifiers that have been implemented in 
SERVICE 2.0 are highlighted in bold............................................................... 116 
Table 16. Online rich classifiers can provide meaningful and stable categories that 
add context to the search results. ...................................................................... 120 
Table 17. Percent of the top 100 results categorized by the US Government classifier 
for five representative queries........................................................................... 121 
Table 18. Percent of the top 100 results categorized by the Open Directory Project 
classifier for five representative queries in each of two domains: general web 
search and government web search................................................................... 123 
 
 
ix 
 
Table 19. Coverage for the top 100, 250 and 350 search results from 246 queries 
based on the TREC 2004 Robust Topics. ......................................................... 124 
Table 20. Dimensions of the design space for categorized overviews. .................... 139 
Table 21. Top level categories extracted from the ODP for the Topic facet............ 145 
Table 22. Paired topics (broad and narrow) used for the study. This was the complete 
text read to the participants to describe the topic.............................................. 150 
Table 23. Percent of collected pagesthat had been categorized, by System
*
............ 179 
Table 24. Percent of collected pages that were categorized, by Topic
*
.................... 179 
Table 25. Top 3 categories used for each topic. ....................................................... 190 
Table 26. System preferences for known item, simple informational, comparison, and 
exploratory tasks. .............................................................................................. 191 
Table 27. Accuracy of participant understanding for selected categories (Kids and 
Teens, Reference, and Computers). .................................................................. 193 
Table 28. Mean (SD) query length by topic and system........................................... 195 
Table 29.  The 6 behavioral codes. Plus signs indicate that participants considered this 
a positive aspect. Negative signs indicate they considered it a negative aspect of 
their interaction. Neutral or mixed opinions are indicated by a 0. The count is the 
number of participants who made this type of comment.................................. 196 
Table 30.  The 34 cognitive and affective codes. ..................................................... 202 
Table 31.  The 9 judgment codes.............................................................................. 205 
Table 32. Mentions of geographic or government category use............................... 207 
Table 33. A BBC web page on human smuggling was categorized into eight 
categories in two facets, most of which were at least four levels deep. Truncating 
the categories to two levels removed useful contextual information................ 214 
Table 34. Perspectives identified for the Urban Sprawl scenario............................. 241 
Table 35. Perspectives identified for the Breast Cancer scenario............................. 242 
Table 36. Perspectives identified for the Alternative Energy scenario..................... 243 
Table 37. Unusual results identified for the Urban Sprawl scenario. ....................... 245 
Table 38. Unusual results identified for the Breast Cancer scenario........................ 246 
Table 39. Unusual results identified for the Alternative Energy scenario................ 246 
 
 
x 
 
List of Figures 
Figure 1. Search results for the query "median" are coupled with a categorized 
overview................................................................................................................ 4 
Figure 2. Placing the pointer over the Kids and Teens category pops up a list of its 
nonempty subcategories and highlights the visible results in the Kids and Teens 
category................................................................................................................. 5 
Figure 3. Selecting the Kids and Teens category filters the results to just that 
category................................................................................................................. 6 
Figure 4. This automatically clustered overview for the same query, from Clusty.com, 
does not provide a meaningful cluster label for child-friendly pages................... 7 
Figure 5. The Flamenco interface permits users to navigate by selecting from multiple 
facets. In this example, the displayed images have been filtered by specifying 
values for two facets (Materials and Structure Types). The matching images are 
grouped by subcategories of the Materials facet?s selected Building Materials 
category............................................................................................................... 13 
Figure 6. The CitiViz search interface visualizes search results using scatterplots, 
hyperbolic trees, and stacked discs. The hyperbolic tree, stacked disks, and 
textual list on the left are all based on the ACM Computing Classification 
System................................................................................................................. 14 
Figure 7. The PunchStock photo search interface provides categorized overviews of 
photo search results............................................................................................. 15 
Figure 8. The NCSU library catalog provides categorized overviews of search results 
using subject headings, format, and library location. ......................................... 16 
Figure 9. The Cha-Cha system organizes intranet search results by an automatically 
generated web site overview............................................................................... 19 
Figure 10. The WebTOC system provides a table of contents visualization that 
supports search within a web site........................................................................ 19 
Figure 11. The Clusty metasearch engine uses automated clustering to produce an 
expandable overview of labeled clusters. ........................................................... 21 
Figure 12. The Dyna-Cat system organized medical search results by a taxonomy of 
question types...................................................................................................... 22 
Figure 13. Grokker clusters documents into a hierarchy and produces an Euler 
diagram, a colored circle for each top-level cluster with sub-clusters nested 
recursively........................................................................................................... 25 
Figure 14. Kartoo generates a thematic map from the top dozen search results for a 
query, laying out small icons representing results onto the map. ....................... 25 
Figure 15.  This GRiDL example shows search results organized by the ACM 
classification and date......................................................................................... 27 
Figure 16. This treemap shows 157 search results for the query ?breast cancer? 
encoded as leaf nodes in a broad and deep thematic hierarchy.  The leaf nodes 
have constant size, so it is easy to see that most results fall under the Health top-
level category. The bright red nodes (which appear as dark gray when rendered 
as a gray-scale image) are highly ranked, while the orange and yellow nodes are 
 
 
xi 
 
ranked lower. This makes it easy to see that there is at least one moderately 
ranked page in the Society category. .................................................................. 32 
Figure 17. Zooming into the Society category provides previews of the three web 
pages falling in that category. ............................................................................. 33 
Figure 18. The top 200 search results for the query ?soybeans? in government agency 
web sites is shown as a treemap. Each node represents an agency. The color 
coding shows that most results are from the Department of Agriculture, but the 
National Aeronautics and Space Administration (NASA), the House of 
Representatives, and the Senate all yielded many results, too. Leaf node size is 
constant. .............................................................................................................. 33 
Figure 19. Clicking on the NASA node displays a text list of the search results from 
that agency. ......................................................................................................... 34 
Figure 20. In this mock-up, the top 40 search results from the query ?breast cancer? 
are organized by thematic categories and represented as red markers on vertical 
bars for each category. Two of the categories (Society and Health) are expanded 
horizontally to show the top results in those categories. The other categories are 
collapsed, showing just the bars and markers to indicate the number of results 
and their ranks within the entire list of results.................................................... 34 
Figure 21.  Detail of the expandable outliner condition. The top 200 urban sprawl 
results have been categorized into a two-level government hierarchy, which is 
used to present a categorized overview on the left. The Interior Department, 
which has 20 results, has been expanded and the National Park Service has been 
selected. The effect on the right side is to show just the three results from the 
Park Service. ....................................................................................................... 36 
Figure 22.  Detail of the treemap condition, which used nesting to show both top and 
second-level categories simultaneously. The set of results and the selected 
agency (NPS) is the same as in Figure 21........................................................... 36 
Figure 23.  The control condition mimics a typical set of Google search results, 
adding the government department and agency.................................................. 39 
Figure 24. The Vivisimo search engine was used for the clustered hierarchy 
condition. ............................................................................................................ 59 
Figure 25. Process model of search in the context of work and information-seeking 
tasks..................................................................................................................... 78 
Figure 26. Long labels are obscured by the bar charts in this WebTOC display. ...... 92 
Figure 27. The SERVICE system consists of three major subsystems: the user 
interface, the data model (which includes machine interfaces to two search 
engines and the search result classes), and the classifiers. It also includes 
facilities to log JavaScript events from the search result page. ........................ 106 
Figure 28.  SERVICE operation is shown as a dataflow. Queries are sent to the search 
engine, which generates a result set. The results are categorized using one or 
more classifiers. The overview is created from the categorized search results. 106 
Figure 29. Components used to categorize web search results.  A set of search results 
returned from a search engine is categorized by a classifier. The classifier may 
optionally reference previously acquired information or knowledge, such as a 
database of rules or training data. ..................................................................... 111 
 
 
xii 
 
Figure 30. A search for songs with the words "road" and "travel" in the title yields 
124 results.  The results are presented with two categorized overviews: by genre 
and by date. Here, the results have been filtered (by clicking) to show just the 21 
Country songs. .................................................................................................. 128 
Figure 31.  Brushing the pointer over a category highlights the results that fall in that 
category. In this screenshot, the pointer has been placed over the ?2000s? 
category, showing albums released in the 2000s highlighted with yellow (shown 
boxed for clarity in these figures). .................................................................... 129 
Figure 32. Brushing the pointer over an album title highlights all the categories for 
that album. Here we see that J.E. Mainer?s ?20 Old-Time favorites? is in both 
the Country and Folk categories, and that it was released in the 1990s. .......... 130 
Figure 33. This SERVICE search interface allowed users to select one set of 
categories at a time, which were displayed with an expandable outliner. This 
screenshot shows search results with a categorized overview based on the DNS 
domain. The US and international categories have been expanded. The results 
have been filtered to display just the 53 US commercial (.COM) sites. A drop-
down list at the top of the overview allows users to select alternate category sets.
........................................................................................................................... 132 
Figure 34. In this search interface, ODP top-level categories are shown as separate 
facets. ................................................................................................................ 135 
Figure 35. The search interface treats the ODP Reference category as a top-level 
facet. The remaining ODP categories are treated as another facet, in conjunction 
with the top-level DNS domain and the US government categories. ............... 136 
Figure 36. The search interface for the final study coupled the ranked result list with a 
categorized overview based on topical, geographical and US government 
classifications.................................................................................................... 138 
Figure 37. The baseline system (control condition) presented search results as a 
typical ranked list, similar to Google. It was referred to as the Kittery system in 
the study............................................................................................................ 146 
Figure 38. The experimental condition coupled the ranked result list with a 
categorized overview based on topical, geographical and US government 
classifications. This was referred to as the Portsmouth system in the study. ... 147 
Figure 39. The interface used by participants was comprised of the system under test 
(left) and the Collector form (right).................................................................. 158 
Figure 40. The experimental setup. Study participants sat in front of the computer, 
and the observer sat to their left........................................................................ 161 
Figure 41. Subject assessment of topic breadth (N=12). Participants did not perceive 
the breadth of the topics significantly differently............................................. 171 
Figure 42. Histograms of a) original location of search result in list, and b) 
log(original location). ....................................................................................... 172 
Figure 43. Normal Quantile-Quantile plot of the residuals for the log(original 
location) model. Residuals are moderately skewed, but not enough to invalidate 
the ANOVA results........................................................................................... 173 
Figure 44. Original location of viewed pages in search results, a) by System
*
, and b) 
by Topic
+
 (N=924). (Note: For all boxplots, the bold line in the middle of the 
box indicates the median; the upper and lower boundaries of the box indicate the 
 
 
xiii 
 
first and third quartiles, and the whiskers extend 1.5 times the interquartile range 
from the box. For all figures, statistically significant differences, p<0.05, are 
marked with an asterisk in the caption, and marginally significant differences, 
p<0.10, are marked with a plus sign.)............................................................... 173 
Figure 45.  Percent of pages viewed by original location of page within search results, 
for each system. The interface displayed approximately 10 results per screen. 
The dashed line shows the initial screen break................................................. 174 
Figure 46. Interaction plot of mean depth of viewed pages for System and Topic 
factors. Except for the human smuggling topic, searchers viewed pages more 
deeply using the Categorized overview system. The largest change between 
systems was for the workplace allergies topic.................................................. 174 
Figure 47. For each topic, percent of pages viewed by original location of page within 
search results..................................................................................................... 175 
Figure 48. Histograms of a) original location of collected pages, and b) log(original 
location). ........................................................................................................... 176 
Figure 49. Original location of collected pages, a) by System, and b) by Topic
+
 
(N=611)............................................................................................................. 176 
Figure 50. Percent of pages collected by original location of page within search 
results. The interface displayed approximately 10 results per screen. The dashed 
line shows the initial screen break. ................................................................... 177 
Figure 51. For each topic, percent of pages collected by original location of page 
within search results.......................................................................................... 178 
Figure 52. Histograms of a) queries per search and log(queries per search)............ 180 
Figure 53. Normal Q-Q plot of residuals for the number of queries per search....... 181 
Figure 54. The number of queries per search, a) by System
*
, and b) by Topic
*
 (N=95).
........................................................................................................................... 181 
Figure 55. Interaction plot of mean number of queries per search for System and 
Topic factors. .................................................................................................... 182 
Figure 56.  Ease/difficulty (1=difficult, 9=easy) of exploring search results, a) by 
System
+
, and b) by Topic (N=96)..................................................................... 182 
Figure 57. Agreement that they got a good overview of the topic, a) by System, and 
b) by Topic
+
 (N=96).......................................................................................... 183 
Figure 58. Normal Q-Q plot of residuals for the organization of search results 
measure. ............................................................................................................ 184 
Figure 59. Agreement that system organized results well, a) by System
*
, and b) by 
Topic (N=96). ................................................................................................... 184 
Figure 60. The normal Q-Q plot shows a slightly skewed distribution of residuals. 185 
Figure 61. Agreement that interface helped assess results, a) by System
*
, and b) by 
Topic (N=96). ................................................................................................... 185 
Figure 62.  Adjectives by System. ............................................................................ 186 
Figure 63. The normal Q-Q plot shows a normal distribution of residuals, indicating a 
good fit for the model. ...................................................................................... 187 
Figure 64. Change in familiarity after search, a) by System, and b) by Topic
*
 (N=96).
........................................................................................................................... 187 
Figure 65. Useful information responses, a) by System, and b) by Topic (N=96)... 188 
Figure 66. Progress toward scenario goal, a) by System, and b) by Topic (N=96).. 188 
 
 
xiv 
 
Figure 67. Distribution of idea quality ratings, a) by System, and b) by Topic
+
 
(N=679; idea rating 1 = poor, 9 = excellent). ................................................... 189 
Figure 68. For the query ?leonardo da vinci?, placing the pointer over the top-level 
category Computer opened a small pop-up window with the five populated 
subcategories..................................................................................................... 192 
 
 
 
1 
 
Chapter 1:  Introduction 
1.1 Motivation 
The World Wide Web creates tantalizing opportunities for learning and research. 
Every day, teachers, journalists, researchers and ordinary citizens search the web as 
they attempt to find, organize, understand, and ultimately learn from information on 
the web. These users struggle with information overload, coping with an 
overabundance of information that lacks a comprehensible organization. Search 
engines are effective at generating extensive lists of results that are highly relevant to 
user-provided query terms. For known-item queries, users often find the site they are 
looking for in the first page of results. However, a list may not suffice for more 
sophisticated exploratory tasks, such as learning about a new topic or surveying the 
literature of an unfamiliar field of research, or when information needs are imprecise 
or evolving (White, Kules, Drucker, & schraefel, 2006). 
 
The lack of comprehensible overviews of web search results is particularly 
problematic when users initiate exploratory searches to satisfy information needs that 
are imprecise or evolving or when their domain knowledge is limited. Incompletely 
formulated queries yield a plethora of potentially relevant search results, which must 
be examined and understood. This is exacerbated by the frequent use of short queries 
(Spink, Wolfram, Jansen, & Saracevic, 2001 & Saracevic, 2001). Although it is 
difficult to quantify the prevalence of such exploratory searches, recent analysis of 
 
 
2 
 
search goals suggests that between 20-30% of all web queries may be exploratory in 
nature (Rose & Levinson, 2004), which motivates study of this type of search. 
 
This dissertation explores the premise that organizing search results into 
comprehensible visual overviews using meaningful and stable categories can support 
user exploration and understanding of large search result sets. When searchers need to 
gather information from multiple perspectives or sources, categorized overviews can 
organize results from web or digital library searches.  Categorized overviews can help 
searchers explore alternative sources, assess utility of results, and decide on next 
steps.  When searchers' information needs are evolving or imprecise, categorized 
overviews help by stimulating relevant ideas, provoking illuminating questions, and 
guiding searchers to useful information they might not otherwise find.  
 
Research prototypes and commercial search engines have incorporated categorized 
overviews, but (as discussed in the Related Work section) there have been few user 
studies of categorized overviews for exploratory web search, and there is little 
research explaining whether they are effective, why, and under what circumstances. 
Research is needed to understand how categorized overviews change the way users 
conduct web searches, to guide the design of search engine interfaces, and to justify 
the entry and maintenance of category metadata. 
1.2 Illustrative example 
A simple scenario, using the SERVICE search system (described in Chapter 4), 
illustrates the use of categorized overviews in a web search. Genna, who is 10, has 
 
 
3 
 
been assigned a homework problem to find the median value of a set of numbers. Her 
father wants to quickly find an age-appropriate definition and example for her. He 
isn?t sure what query terms would best limit his query to age-appropriate definitions, 
so he types ?median? into the search engine and peruses the results (Figure 1). The 
fifth item in the list looks promising, so he clicks on it to view the page, but it turns 
out to be too wordy. Placing the pointer over the Kids and Teens category pops up a 
list of its nonempty subcategories and highlights the two visible results that fall in the 
Kids and Teens category (Figure 2). These two items are in the Wikipedia, so they 
might be helpful, but he sees a subcategory called School Time that looks more 
promising and he decides to see the list of all the Kids and Teens results. Clicking on 
Kids and Teens yields a list of child-friendly web pages. ?Lesson on the Median of a 
Set of Data? is no longer available, but ?How to Calculate the Median Value? looks 
like what he wants. The snippet says it is for K-12 kids and uses easy language. He 
clicks on the result and finds exactly what he needs. 
 
This example illustrates several common elements of exploratory search using 
categorized overviews. Genna?s father did not know what term to use in his query to 
select for age-appropriate pages.  He did know that there was a top-level category for 
Kids and Teens, because he had seen it on previous searches, so he was confident that 
he could use a broad query and then narrow his results if needed. After scanning the 
result list, he used the categories. The pop-up subcategories provided additional 
information that induced him to explore all the Kids and Teens results instead of 
clicking on the Wikipedia results. Finally, the desired item was ranked #29 in the 
 
 
4 
 
original list, so he would have had to scroll or page to the third screen before he 
would have found it without the category overview. For comparison, Figure 4 shows 
an automatically clustered overview from Clusty.com for the same query, which does 
not provide a meaningful cluster label for child-friendly pages. 
 
Figure 1. Search results for the query "median" are coupled with a categorized overview. 
 
 
5 
 
 
 
Figure 2. Placing the pointer over the Kids and Teens category pops up a list of its nonempty 
subcategories and highlights the visible results in the Kids and Teens category. 
 
 
6 
 
 
Figure 3. Selecting the Kids and Teens category filters the results to just that category.  
 
 
7 
 
 
Figure 4. This automatically clustered overview for the same query, from Clusty.com, does not 
provide a meaningful cluster label for child-friendly pages. 
1.3 Research contributions 
This dissertation investigates the use of categorized overviews based on meaningful 
and stable categories to support exploratory search. It makes three contributions. 
First, it presents an analysis of search with categorized overviews, particularly 
focusing on how searchers evaluate their search results and decide their next move 
 
 
8 
 
(e.g. scroll/page for more results, refine their query, revise their conception of the 
information need, etc.).  
 
The analysis provides theoretical support for the second contribution, a set of 
principles for the design of search interfaces to support exploratory search. The 
principles, refined and validated by empirical studies, complement and extend general 
human-computer interaction, web design, information architecture, and information 
visualization principles. They will be useful for search interface designers, because 
they provide guidance for the appropriate integration of visual overviews with search 
result lists, and particularly for the textual surrogates embedded in result lists. These 
principles represent a strong call for exposing meaningful structure ? which is often 
used internally by search engines, but less often visible at the user interface ? without 
abandoning the tried and true value of text.  
 
The final contribution of this dissertation research is the SERVICE (SEarch Result 
Visualization and Interactive Categorized Exploration) architecture and 
implementation technology, illustrated with two working categorizing search 
interfaces: AOL music search and general web search. The ideas embedded in the 
user interface will be useful to designers of other search interfaces.  The SERVICE 
system will be a flexible, extensible platform for additional research in categorizing 
search interfaces. 
 
 
9 
 
1.4 Terminology 
In this dissertation, the term category is used to designate a concept (with an 
associated label) for grouping entities such that all of the entities that are members of 
that group share a common attribute. A category may be drawn from a formally 
defined classification or ontology with controlled vocabulary or indexing language. 
Alternately, it may come from an informal grouping that is simply meaningful within 
a context of use. This broad definition glosses over differences between 
categorization and classification systems, and between different types of 
classifications (Jacob, 2004; Soergel, 1974; Taylor, 1999). For this work, the most 
important characteristic of a set of categories is that the categories provide some way 
to organize and filter search results that is meaningful, and ultimately useful, to 
information seekers. 
 
 
10 
 
Chapter 2:  Related work 
Exploratory search is a sub-task in the context of a higher-level information seeking 
task, which is in turn motivated by a perceived information need. Searchers interact 
with search engines or search systems to formulate and execute queries, examine 
results, and browse for information to satisfy their information need. Categories may 
be used to organize results, which are then visualized for searchers to examine and 
use. This chapter presents a review of three areas of work related to this dissertation: 
information seeking (section 2.1), the use of categories to support information 
seeking (section 2.2), and the visualization of search results (section 2.3). 
2.1 Information seeking ? theory, studies and systems 
Evolving information needs form a core motivation for information seeking. Dervin 
and Nilan (1986) consider user needs in the context of a sense-making theory of 
human behavior. Gaps in knowledge are conceptualized as questions, which can 
motivate a person to seek information. Belkin (1980) developed the Anomalous 
States of Knowledge model to explain information seeking behavior on open-ended 
questions. The model addresses iteration and refinement of the seeker?s knowledge, 
specification of the problem, and an evolving ability to articulate requests. Kuhlthau?s 
model of the stages of the information seeking process tracks cognitive and affective 
states in a constructive knowledge acquisition process such as writing a paper 
(Kuhlthau, 1991). Particularly in the latter two models, users? information needs are 
initially ill-defined, requiring a process of refinement. Marchionini?s electronic 
browsing model includes problem definition and refinement in a seven-stage process 
 
 
11 
 
(Marchionini, 1995). Choo, Detlor and Turnbull (2000) develop a behavioral model 
of organizational information seeking on the web by integrating Ellis? (1989) six 
stages of information seeking (starting, chaining, browsing, differentiating, 
monitoring, and extracting) with Aguilar?s (1988) four modes of scanning (undirected 
viewing, conditioned viewing, informal search, and formal search). 
 
Problem refinement is inherent in each of these models, as users struggle to 
understand available information, refine the information need, and find new 
information. There has been growing interest in successive searching on the Web, in 
digital libraries, and in online public access catalogs (OPACs). Studies have found 
that users perform repeated searches on similar topics over a period of time. Spink, 
Bateman and Jansen (1999) surveyed users of the Excite search engine and found that 
two-thirds performed successive searches, with 30% searching at least 6 times on one 
topic. Spink, Wilson et al. (2002) found that successive searches often involved 
refining or extending previous searches in response to changes in understanding and 
evaluation of previous results. Vakkari (2000) studied 11 students who attended a 
two-semester proposal writing seminar, and found that as students progressed, they 
used more search terms and the search terms were more specific. 
 
Many information seeking environments have been developed. The Digital Library 
Integrated Task Environment (DLITE) supports interaction with multiple search 
services while developing bibliographic citations (Cousins, Paepcke, Winograd, Bier, 
& Pier, 1997). It supports iterative searching by providing a persistent desktop on 
 
 
12 
 
which queries, results and services are maintained. The SketchTrieve system provides 
a similar information seeking environment, with an emphasis on allowing the user to 
connect services to generate search results, then place and annotate them (Hendry & 
Harper, 1997). The NaviQue workspace supports information seeking using a 
navigational perspective, based on a zooming user interface (Furnas & Rauch, 1998). 
More recently, researchers have advocated embedding the search function into 
application environments to support task-specific searching (Hendry, to appear). 
 
Traditional OPACs allow users to browse and search using subject classifications. 
Allen (1995) describes two digital library interfaces based on two hierarchical 
classifications, the Dewey Decimal System and the ACM Computer Reviews 
classification. These interfaces show search results against the classification hierarchy 
and integrate several other features. HIBROWSE, an OPAC system, exploits faceted 
hierarchies to provide visual query specification and to organize results (Pollitt, 
1997). Flamenco (Figure 5) provides interfaces to specialized collections (art, 
architecture, and tobacco documents), using faceted hierarchies to produce menus of 
choices for navigational searching (Hearst et al., 2002). The Envision digital library 
of computer science literature displayed search results using a matrix of icons, 
allowing searchers to easily manipulate the visualization (Nowell, France, Hix, Heath, 
& Fox, 1996). Citiviz (Figure 6) displays search results using a hyperbolic tree 
(Lamping & Rao, 1996) and a scatterplot (Perugini et al., 2004). The Technical 
Report Visualizer prototype (Ginsburg, 2004) allows users to browse a digital library 
by one of two user-selectable hierarchical classifications, also displayed as hyperbolic 
 
 
13 
 
trees and coordinated with a detailed document list. Categorized overviews are used 
in the Punchstock image search interface (punchstock.com, Figure 7) and the search 
interface for the North Carolina State University (NCSU) library catalog 
(www.lib.ncsu.edu/catalog/, Figure 8). 
  
 
Figure 5. The Flamenco interface permits users to navigate by selecting from multiple facets. In 
this example, the displayed images have been filtered by specifying values for two facets 
(Materials and Structure Types). The matching images are grouped by subcategories of the 
Materials facet?s selected Building Materials category. 
 
 
 
14 
 
   
Figure 6. The CitiViz search interface visualizes search results using scatterplots, hyperbolic 
trees, and stacked discs. The hyperbolic tree, stacked disks, and textual list on the left are all 
based on the ACM Computing Classification System. 
 
 
 
15 
 
 
Figure 7. The PunchStock photo search interface provides categorized overviews of photo search 
results.  
 
 
 
16 
 
 
Figure 8. The NCSU library catalog provides categorized overviews of search results using 
subject headings, format, and library location. 
2.2 Using categories for information retrieval 
The field of Library and Information Science (LIS) has an established history of 
research in classification systems and their use. The emphasis within LIS on human 
 
 
17 
 
information behavior and information seeking has traditionally informed the 
development of classifications for libraries, archives, and museums. Faceted 
classification (Vickery, 1960), which is of particular interest in this dissertation, has 
influence beyond the LIS world, with human-computer interaction (HCI) researchers 
adopting its methods to support exploration and retrieval in large digital document 
collections. 
 
For exploratory searchers, categories drawn from classifications, taxonomies, 
ontologies, and other knowledge structures support information organization and 
retrieval, provide semantic roadmaps to fields of knowledge, and improve learning 
(Soergel, 1999). There is growing use of thesauri on the web to support information 
retrieval (Shiri & Revie, 2000). Web directories such as Yahoo! (www.yahoo.com) 
and the Open Directory Project (www.dmoz.org) (DMOZ) catalog a small but 
important fraction of the Web, providing an overview of general Web content and 
enabling users to find information by browsing a familiar subject hierarchy. These 
knowledge structures can be used to categorize search results for presentation.  
 
In this dissertation, the interest is not in how classifiers work (e.g., machine learning), 
but simply that they provide a way to identify category membership for search results. 
2.2.1 Studies of categorized overviews for web search 
Meaningful and stable categories have been found beneficial for the organization of 
web search results in the limited studies conducted. Grouping search results by a two-
level subject classification expedited document retrieval for informational tasks with 
 
 
18 
 
a single correct answer (Dumais, Cutrell, & Chen, 2001). For question answering 
tasks, search results augmented with category labels produced the fastest performance 
and were preferred over results without category labels (Drori & Alon, 2003). The 
Cha-Cha system organizes intranet search results by an automatically generated web 
site overview (Figure 9). Preliminary evaluations were mixed, but promising, 
particularly for what users considered ?hard-to-find information? (Chen, Hearst, 
Hong, & Lin, 1999). The WebTOC system (Figure 10) provides a table of contents 
visualization that supports search within a web site, although no evaluation of its 
search capability has been reported (Nation, Plaisant, Marchionini, & Komlodi, 
1997). WebTOC displays an expandable/collapsible outliner (similar to a tree 
widget), with embedded colored histograms showing quantitative variables such as 
size or number of documents under the branch. 
 
 
 
19 
 
 
Figure 9. The Cha-Cha system organizes intranet search results by an automatically generated 
web site overview. 
 
Figure 10. The WebTOC system provides a table of contents visualization that supports search 
within a web site. 
 
 
20 
 
 
Clustering web search results into dynamic categories, in which documents are 
grouped by similarity measures rather than explicit categorical attributes, has been 
investigated as an alternative to classification, and has been shown to improve on 
ranked lists for information retrieval metrics such as precision and recall (Hearst & 
Pedersen, 1996; K?ki, 2005; Marshall, McDonald, Chen, & Chung, 2004; Zamir & 
Etzioni, 1999; Zeng, He, Chen, Ma, & Ma, 2004) or task completion time (Turetken 
& Sharda, 2005). Chen, Houston, Sewell, & Schatz (1998) found that recall improved 
when searchers were allowed to augment their queries with terms from a thesaurus 
generated via a clustering-based algorithm. A one-level clustered overview was found 
helpful when the search engine failed to place desirable web pages high in the ranked 
results, possibly due to imprecise queries (K?ki, 2005). Clusty (www.clusty.com) 
uses this technique to produce an expandable overview of labeled clusters (Figure 
11). The benefits of clustering include domain independence, scalability, and the 
potential to capture meaningful themes within a set of documents, although results 
can be highly variable (Hearst, 1999). Generating meaningful groups and effective 
labels is a recognized problem (Rivadeneira & Bederson, 2003). As Rivadeneira and 
Bederson observed, web search results lack ?1)? a natural spatial layout of the data; 
and 2)? good small representations,? which makes designing effective visual 
representations of search results challenging. Using visual structures built around 
meaningful classifications may ameliorate this problem, as illustrated by promising 
interfaces like WebTOC. 
 
 
 
21 
 
 
Figure 11. The Clusty metasearch engine uses automated clustering to produce an expandable 
overview of labeled clusters. 
2.2.2 Other studies of categorized overviews for search results 
The Flamenco system (Hearst et al., 2002; Yee, Swearingen, Li, & Hearst, 2003) 
provided interfaces to specialized collections (art, architecture and tobacco 
documents), using faceted hierarchies to produce menus of choices for navigational 
searching. A usability study compared the interface to a keyword-based search 
interface for an art and architecture database for structured and open-ended, 
exploratory tasks (Yee, Swearingen, Li, & Hearst, 2003). With Flamenco, users were 
more successful at finding relevant images (for the structured tasks) and reported 
higher subjective measures (for both the structured and exploratory tasks). The 
exploratory tasks were evaluated using subjective measures, because there was no 
 
 
22 
 
(single) correct answer and the goal was not necessarily to optimize a quantitative 
measure such as task duration. The Dyna-Cat system (Figure 12) organized medical 
search results by a taxonomy of question types (Pratt, Hearst, & Fagan, 1999). In a 
comparison with clustering and ranked list interfaces, Dyna-Cat helped searchers find 
more answers to general fact-finding questions within a fixed time. Searchers also felt 
that they learned more using Dyna-Cat. The SuperBook interface organized search 
results within a book according to the text?s table of contents, expediting searches 
without loss of accuracy (Egan et al., 1989). The GRiDL prototype displays search 
result overviews in a matrix using two hierarchical categories, allowing users to drill 
down for details (Shneiderman, Feldman, Rose, & Grau, 2000). The List and Matrix 
Browsers provide similar functionality, again using linear and grid-based displays 
(Kunz, 2003). Informal evaluations of these two interfaces have been promising, 
although no extensive studies of the techniques have been published. 
 
 
Figure 12. The Dyna-Cat system organized medical search results by a taxonomy of question 
types. 
 
 
 
23 
 
2.3 Visualizing and interacting with search results 
The most common presentation of search results is the textual list, typically showing 
document titles and a few other pieces of information such as author, URL, a snippet 
of text (possibly with matching query terms highlighted). The results can be ordered 
by a computed relevance rank or by other attributes such as date, author, organization, 
etc. Drori and Alon (2003) compared four textual lists based on permutations of two 
variables (document category and lines from the document) in a 2x2 arrangement. 
Results were presented with and without categories, and with either the first lines of 
the document or the first lines relevant to the query. They found that the interface 
with categories and query-relevant lines from each document produced the fastest 
performance and was preferred by subjects. Dumais, Cutrell and Chen (2001) studied 
the effect of grouping results by a two-level category hierarchy and found that 
grouping by a well-defined classification speeds user retrieval of documents. 
Northern Light (www.northernlight.com), a commercial search engine, provides such 
a capability by grouping results in their Custom Search Folders. Exalead 
(exalead.com) organizes search results according to categories in the Open Directory 
Project. Other categories, such as organization charts, and geographic and temporal 
hierarchies, can also be used to organize search results. 
 
The success of search result visualization has been mixed. Several web search (or 
metasearch) engines, including Grokker (www.grokker.com), Kartoo 
(www.kartoo.com), and FirstStop WebSearch (www.firststopwebsearch.com) 
incorporate visualization. Grokker clusters documents into a hierarchy and produces 
 
 
24 
 
an Euler diagram, a colored circle for each top-level cluster with sub-clusters nested 
recursively (Figure 13). Users explore the results by ?drilling down? into clusters 
using a 2-D zooming metaphor. It also provides several dynamic query controls for 
filtering results. Unfortunately, this interface has been found to compare poorly with 
textual alternatives (Rivadeneira & Bederson, 2003). The authors found that the 
textual interfaces were significantly preferred. Kartoo, a metasearch engine, generates 
a thematic map from the top dozen search results for a query, laying out small icons 
representing results onto the map. When the pointer is placed over a document icon, 
arcs are displayed from that document to each relevant theme on the map. When the 
pointer is placed over a theme on the map, arcs are displayed to the related 
documents. This Flash-based alternative to search results is eye-catching (they offer a 
similar HTML-based version, too), but its utility is not clear. FirstStop WebSearch 
optionally displays collections of thumbnails instead of textual lists as part of a 
desktop-based search appliance. 
 
 
 
25 
 
 
Figure 13. Grokker clusters documents into a hierarchy and produces an Euler diagram, a 
colored circle for each top-level cluster with sub-clusters nested recursively. 
 
 
Figure 14. Kartoo generates a thematic map from the top dozen search results for a query, laying 
out small icons representing results onto the map. 
 
 
26 
 
 
The WebTOC and GRiDL prototypes display search results using hierarchical 
categories, allowing users to drill down for details (Nation, Plaisant, Marchionini, & 
Komlodi, 1997; Shneiderman, Feldman, Rose, & Grau, 2000). WebTOC displays an 
expandable/collapsable tree browser/outliner, with embedded colored histograms 
showing the number of documents under the branch and their sizes. GRiDL uses a 
grid to display two categorical attributes of a collection of documents. Each 
row/column of the grid represents a value for one of the categorical attributes. For 
each cell, if there are fewer than about 50 documents with that combination of values, 
each document is represented as a colored dot, where colors indicate a third 
categorical variable. If there are too many documents to fit into the cell, a histogram 
shows the distribution of documents across the third variable. More recently, outliner 
and matrix displays have been used to show search results, categorized into an 
ontology-based classification (Kunz & Botsch, 2002). SuperTable (Klein, M?ller, 
Reiterer, & Eibl, 2002) integrates several information visualization techniques, 
including a scatterplot, TileBars (Hearst, 1995), and a bargraph, using linking and 
brushing to coordinate multiple tiled windows. Informal evaluations of these 
interfaces have been promising, but no extensive studies of the techniques have been 
published. 
 
 
 
27 
 
 
Figure 15.  This GRiDL example shows search results organized by the ACM classification and 
date. 
 
Evaluations often indicate that interface effectiveness is dependent on the specific 
information-seeking task. Risden, Czerwinski et al. (2000) compared a standard 
collapsible tree browser, a 2D textual layout (similar to Yahoo!) and a 3D interface 
for tasks that involved finding or creating categories of content in a web site. The 3D 
interface produced significantly faster performance when finding existing categories, 
but not when adding new categories. The authors speculate that the accessibility of 
context information in the 3D interface (not available in the other interfaces) may 
have been more beneficial for the finding task than the creation task. Sebrechts, 
Vasilakis et al. (1999) compared text, 2D, and 3D visualizations of clustered search 
results, finding that overall, the text was fastest and 3D was slowest, although for 
experienced users 3D was faster. They also found reliable differences in response 
time by the interaction of task type and interface, concluding that the match between 
 
 
28 
 
visualization features and tasks was more important than the dimensionality of the 
visualization. A comparison of information retrieval systems from TREC-6 found 
similar results (Swan & Allen, 1998). Kleiboemer, Lazear et al. (1996) found 
graphical displays to be more difficult than text, and Chen, Houston et al. (1998) 
suggest that the simple labels provided by Yahoo! were more useful for navigating a 
document space than a Kohonen map. Becks, Seeling and Minkenberg (2002) found 
document maps to be successful for tasks requiring detailed structural analysis of 
document inter-relationships, but also noted that users wanted to see more text, tightly 
coupled to the display, or another expressive arrangement of clusters. 
2.4 Summary 
Using categories to organize and explore general web search results is a promising 
but unproven technique (Hearst, 2006). Few user studies have examined the use of 
meaningful and stable categories specifically for organizing web search results. User 
studies have investigated meaningful and stable categories for organizing database 
search results, and studies have been conducting using automated clustering of web 
search results to generate dynamic categories. Most studies have focused on non-
exploratory tasks. With the growing use of categorized overviews for search results, 
there is a need for design principles for more open, exploratory search interfaces that 
are based on a firm theoretical and empirical foundation. This dissertation addresses 
these issues.  
 
 
29 
 
Chapter 3:  Early designs and formative studies 
This chapter describes early user interface designs for the SERVICE system, and 
reports on two formative studies conducted with categorized overviews that used 
United States (US) government agencies and departments as meaningful and stable 
categories. The purpose of the studies was to illuminate searchers? use of categorized 
overviews to explore and understand search results. This would help to refine the 
emerging principles and analysis (both described in Chapter 4). The research goals 
motivating these studies include: 
1. Identifying search tasks and sub-tasks that benefit from categorized overviews 
2. Understanding how the visual presentation of the overview affects its utility 
3. Understanding how the categories used for the overview affect its utility and 
the user?s search experience 
 
Study 1 compared three presentations of results categorized into a 2-level government 
hierarchy. Two overview+detail interfaces (an expandable outliner and a treemap) 
allowed users to narrow the search results by categories, and a third interface (the 
control) provided a typical set of results with category information displayed below 
each result. Study 2 investigated the effect of two different kinds of categories. One 
search interface used the government organizational hierarchy and the other used 
Vivisimo?s automated clustering. The information seeking tasks used in these two 
studies were motivated by work with government agencies through the GovStats 
project (Ceaparu & Shneiderman, 2004; Hert, 2002; Kules & Shneiderman, 2003). In 
this domain, web sites such as FirstGov (www.firstgov.gov), FedStats 
 
 
30 
 
(www.fedstats.gov), Science.gov (www.science.gov), and other specialized search 
engines provide some help for searchers. FirstGov has recently launched a search tool 
that incorporates Vivisimo?s automated clustering technology to provide clustered 
overviews of search results, but to my knowledge no search engines currently provide 
overviews of search results categorized by government agency. Studies have found 
that queries for governmental information comprise 1.5%-3.0% of all queries to 
general web search engines (Jansen, Spink, & Pedersen, 2005; Spink & Jansen, 
2004), suggesting that this would be a useful niche to study. 
 
This chapter first presents early designs and the prototypes used for the two studies. 
The study designs and results are presented in sections 3.3 and 3.4, followed by a 
discussion of the findings and limitations of both studies in section 3.5. 
3.1 Early designs 
Early designs helped to define the design principles. They explored graphical 
approaches to display overviews of search results. Treemap displays used the leaf 
nodes (boxes) to represent items (individual web pages) in a thematic hierarchy 
(Figure 16, Figure 17), and government agencies (Figure 18, Figure 19). These 
displays effectively showed the distribution of results across categories and 
highlighted unusually placed results. An alternate mock-up (Figure 20) showed 
search results as red markers on vertical bars that represented categories. One bar was 
displayed for each category. The placement of the marker indicated the rank of the 
results within the entire list, with the highly ranked documents at the top of the bar. 
The vertical bars could be expanded horizontally to display the title and snippet for 
 
 
31 
 
the top 2-3 results within that category. Up to two categories could be expanded at a 
time. This design showed the distribution of results across categories along with the 
rank of each result, and embedded the text of the top 5-10 results in the overview. It 
allowed comparison of results between two categories. 
 
The visual overviews were promising during informal reviews with professional 
colleagues and fellow students. The larger-than-usual number of results, the 
meaningful categories (thematic and government agency-based), and the color-coding 
were appreciated for their ability to provide a visual overview of the search results. 
The reviews also highlighted the importance of retaining the title, snippet, and URL 
in a textual list of results, simultaneously visible on the screen. Users wanted to read 
the text for details while they looked at the overview.   
 
 
32 
 
 
                    
Figure 16. This treemap shows 157 search results for the query ?breast cancer? encoded as leaf 
nodes in a broad and deep thematic hierarchy.  The leaf nodes have constant size, so it is easy to 
see that most results fall under the Health top-level category. The bright red nodes (which 
appear as dark gray when rendered as a gray-scale image) are highly ranked, while the orange 
and yellow nodes are ranked lower. This makes it easy to see that there is at least one moderately 
ranked page in the Society category. 
 
 
 
33 
 
 
Figure 17. Zooming into the Society category provides previews of the three web pages falling in 
that category. 
 
                     
Figure 18. The top 200 search results for the query ?soybeans? in government agency web sites is 
shown as a treemap. Each node represents an agency. The color coding shows that most results 
are from the Department of Agriculture, but the National Aeronautics and Space Administration 
(NASA), the House of Representatives, and the Senate all yielded many results, too. Leaf node 
size is constant.  
 
 
34 
 
 
Figure 19. Clicking on the NASA node displays a text list of the search results from that agency. 
 
 
Figure 20. In this mock-up, the top 40 search results from the query ?breast cancer? are 
organized by thematic categories and represented as red markers on vertical bars for each 
category. Two of the categories (Society and Health) are expanded horizontally to show the top 
results in those categories. The other categories are collapsed, showing just the bars and markers 
to indicate the number of results and their ranks within the entire list of results. 
 
 
35 
 
3.2 Formative study prototypes 
These two early studies organized a pre-computed set of search results from 
government web sites into a two-level hierarchy of departments and agencies. The 
U.S. federal government organizational hierarchy was used as a meaningful and 
stable structure to categorize search results. Results were categorized into the leaf 
nodes of a broad, shallow, 2-level government agency hierarchy by matching the 
URLs to a database of federal government web sites. 
 
Two forms of the categorized overview were prototyped: an expandable outliner and 
a treemap. Based on feedback on the initial designs, the overview was paired with a 
Google-style ranked list of search results. This provided the title, snippet, and URL in 
a form suitable for efficient skimming and scanning. The overview was tightly 
coupled with the list so that clicking on a node in the overview filtered the list results 
to show results from that category. Both overview conditions allowed participants to 
show or hide empty categories, and the expandable outliner additionally allowed 
participants to display or hide the counts of results in parentheses after each category. 
 
 
36 
 
 
 
Figure 21.  Detail of the expandable outliner condition. The top 200 urban sprawl results have 
been categorized into a two-level government hierarchy, which is used to present a categorized 
overview on the left. The Interior Department, which has 20 results, has been expanded and the 
National Park Service has been selected. The effect on the right side is to show just the three 
results from the Park Service. 
 
 
Figure 22.  Detail of the treemap condition, which used nesting to show both top and second-level 
categories simultaneously. The set of results and the selected agency (NPS) is the same as in 
Figure 21. 
 
 
37 
 
 
3.3 Study 1: Expandable outliner vs. treemap vs. control 
3.3.1 Research questions 
This study investigated the first two research goals listed above:  
1. Identifying search tasks and sub-tasks that benefit from categorized overviews 
2. Understanding how the visual presentation of the overview affects its utility 
 
It used a constant categorization, thus the effect of different categorizations was not 
examined. For the visual presentation of results, an overview+detail approach was 
consistent with the initial principles. Three common exploratory search tasks were 
identified:  
? Finding groupings of information (based on departments and agencies) that 
have large numbers of results,  
? Identifying different aspects of or perspectives of a query topic, and  
? Identifying unusual results.  
 
This study addressed three research questions: 
? Can an overview+detail display of search results based on a government 
hierarchy improve exploratory search success over the typical ranked list? 
? Can a graphical overview improve on a non-graphical overview? 
? What patterns of usage does the overview+detail approach induce? 
 
 
38 
 
3.3.2 Experimental conditions 
The study compared presentations of search results with and without categorized 
overviews using pre-specified queries and a fixed set of search results. The U.S. 
federal government organizational hierarchy served as a meaningful and stable 
structure to categorize search results. Results were categorized into the leaf nodes of a 
broad, shallow, 2-level government agency hierarchy by matching the URLs to a 
database of federal government web sites. Although the organizational hierarchy is 
strictly a tree and not a hierarchy as defined by Kwasnik (1999) because it does not 
implement the is-a relationship or inheritance, it has many benefits: It is reasonably 
complete and comprehensive; the categorization rules are systematic and predictable, 
and a given result will (with very few exceptions) be found in a single category 
(mutual exclusivity).  
 
The study used a 1x3 between groups design (N=18, 3 groups of 6), with interface 
type as the independent variable. The control condition (Figure 23) displayed search 
results in a manner similar to Google, adding the government department and agency, 
but it provided no categorized overview. Two experimental conditions used 
overview+detail interfaces: an expandable outliner (Figure 21), or a treemap (Figure 
22). Both allowed participants to limit the displayed list of results by selecting 
(clicking on) a single category. The overview conditions allowed participants to show 
or hide empty categories. The expandable outliner additionally allowed participants to 
display or hide the counts of results in parentheses after each category, although this 
 
 
39 
 
was not used in the experiment. Both quantitative and qualitative data were collected. 
Preliminary results were reported in Kules & Shneiderman (2004). 
 
 
Figure 23.  The control condition mimics a typical set of Google search results, adding the 
government department and agency. 
 
3.3.3 Hypotheses 
In addition to collecting qualitative data, this study tested three hypotheses, based on 
the initial design principles for exploratory search interfaces: 
1. Overview conditions will yield higher successful completion rates within a 
fixed time. 
2. Overview conditions will be rated more favorably than the control. 
3. Overview conditions (and particularly the treemap) will be judged as more 
complex than the control and more difficult to learn. 
3.3.4 Scenario and task design 
Scenarios and tasks were carefully constructed to provide a realistic exploratory 
search context while constraining the search task to the examination of a constant 
(across participants) set of search results. It was also desirable to control ? to the 
extent possible ? for differences in interpretation of the exploratory search tasks 
(J?rvelin & Ingwersen, 2004). Examining search results is a necessary step within a 
 
 
40 
 
larger information seeking process, the objective of which is to satisfy a perceived 
information need or problem (Marchionini, 1995). In turn, the perceived information 
need is situated within a higher level social, cultural and organizational context and 
motivated by a higher-level work (or pleasure) objective (Bystr?m & Hansen, 2002; 
J?rvelin & Ingwersen, 2004). For these reasons, the task design for these studies 
considered multiple levels of context. Bystr?m and Hansen (2002) proposed a three-
level abstraction for task context which was adapted as a frame for these two studies.  
 
The highest level of Bystr?m and Hansen?s taxonomy is the work task. Work tasks 
are situated in the work organization and reflect organizational and cultural norms, as 
well as organizational resources and constraints. In these two studies, the scenarios 
described a simulated work task, as advocated in Borlund (2003), which provided the 
?cover story? that encouraged participants to bring their own knowledge and 
experience (however limited) to the subsequent tasks. The scenarios provided a 
second level of context, the information seeking context, by locating the searcher 
within the initial stages of an exploratory search task, equivalent to the pre-focus 
exploration stage of Kuhlthau?s (1991) six stages or the pre-focus stage of Vakkari 
(2001). The scenarios described the participant (information searcher) as being at a 
?starting point? or ?exploring topics and defining your paper?s thesis.? Within this 
stage, the third level of context was the information retrieval context, which placed 
the participants in the Examine Results stage of an information seeking session by 
indicating that they had just entered a pre-specified query. This enabled the use of a 
consistent set of search results across all participants. 
 
 
41 
 
 
The scenarios thus attempted to provide a set of situational and contextual cues to 
induce a realistic information need within each participant. Due to practical 
limitations on the software (search results had to be pre-processed), and the duration 
of experimental sessions, it was not practical to use real-life, participant-provided 
search tasks as recommended by Borlund (2003). Because these were formative 
studies, I chose to expose participants to three diverse scenarios and collect a wider 
range of data, rather than a tailored scenario advocated by Borlund.  
 
The scenario content was motivated by work on the challenges of finding government 
information and publications (Ceaparu & Shneiderman, 2004; Kules & Shneiderman, 
2003; Marchionini, Plaisant, & Komlodi, 1998). The GovStats project?s work with 
statistical agencies generated 15 prototype scenarios (Ceaparu & Shneiderman, 2004). 
Many of these involved some aspect of learning about a general topic such as breast 
cancer, Alzheimer?s disease, or soybean production. The statistical information 
seeking scenarios were readily generalized to the full government domain for these 
studies, with details such as age and location included to provide a plausible 
description. 
 
Each scenario introduced a pre-specified query and a set of 200 search results for the 
queries ?breast cancer?, ?alternative energy? and ?urban sprawl?: 
 
 
 
42 
 
Scenario 1 (Urban sprawl) - Imagine that you are a 40-year old social 
activist in a rural town near the Washington, DC metropolitan area and have 
become increasingly concerned about the impact of urban sprawl on your 
town. You are planning to write a letter to your neighbors about the issue, and 
you would like to learn more about it. You are using the Web as a starting 
point, because you are not located near a major library. You are first 
interested in federal government information, and later you?ll look at state 
and local information. You have just entered the search terms ?urban sprawl? 
into a new search engine for government web sites. 
 
Scenario 2 (Breast cancer) - You are a 30-year old journalist writing an 
article on breast cancer and what the federal government is doing about it. 
You are exploring the topic, starting by looking on the Web to find out what 
kind of information is available. You have just entered the search terms 
?breast cancer.? 
 
Scenario 3 (Alternative energy) - You are taking an undergraduate class in 
environment sciences, and preparing to write a term paper on government 
involvement in alternative energy technologies. Your first step is to get an 
overview from the web of the information available to identify potential topics. 
You have just entered the search terms ?alternative energy.? 
 
For each scenario the three tasks were described to the participants as: 
 
 
43 
 
 
Task A (Overview) - Your first step is to get an overview of which federal 
agencies (the 2
nd
 level organizations) have substantial amounts of information 
on this topic. This will help you decide where to focus your research efforts. 
What 3 agencies publish the most information about this topic? (Time limit: 3-
4 minutes) 
 
Task B (Finding perspectives) - The web contains a variety of sources, 
perspectives and viewpoints on almost any given topic, and this is true within 
the federal government. Find 3 web pages providing different aspects of or 
perspectives on this topic. (Time limit: 3-4 minutes) 
 
Task C (Finding unusual results) - Spend a couple more minutes exploring 
these results. Do you notice any results that, at first glance, appear to be 
unusual, unexpected or surprising? If so, explain why they are unusual. (Time 
limit: 2-3 minutes) 
 
The unusual results in Task C were interpreted by participants, with respect to 
individual results or the entire set of results. The tasks were time-limited to permit 
completion of the session within approximately one hour.  
3.3.5 Materials and procedure 
After the participants signed an informed consent form, they completed a short 
demographic questionnaire, providing their age, gender, occupation, knowledge of 
 
 
44 
 
federal government organization, web experience, search experience and search 
frequency. They were asked to think-aloud (Ericsson & Simon, 1984) and ask 
questions throughout the session. Training was provided for the interface to be used, 
and they were encouraged to use it with sample search results (from the query 
?soybeans?) until they were comfortable. They were instructed to view just the results 
and categorized overview (when available). After participants were comfortable with 
the interface, the first scenario was presented, and they were asked to perform the 
three tasks. The tasks were presented in an order searchers would commonly follow 
in the exploratory search scenario. That is, they would start by seeking an overview of 
the results, then explore, and finally integrate and reflect on their findings, possibly 
identifying unusual results or yielding other insights. Following these tasks, each 
participant was asked for subjective ratings of the interface and an informal interview 
was conducted to elicit comments. These steps were repeated for the remaining two 
scenarios. The total session time was approximately one hour. The procedures and 
materials were pilot tested with four participants to refine scenarios, tasks and 
measures. The task time limits were adjusted to keep the sessions within the one-hour 
target while giving participants enough time to at least make a good start on each 
task. 
 
3.3.6 Participants 
Eighteen participants (11 male, 7 female) were recruited from university and 
professional contacts. They ranged in age from 22 to 54, with the average age being 
35. Seven were students. A heterogeneous group was appropriate due to the formative 
 
 
45 
 
nature of the study. All reported some familiarity with the federal government. All 
had at least one year of experience with web search and reported searching at least 
once a week. 
 
3.3.7 Results 
A one-way analysis of variance (ANOVA) for 10 measures was performed using 
SPSS or Excel. The measures were a correctness score on task A plus nine subjective 
satisfaction measures. When the ANOVA indicated significant differences, post hoc 
analysis was performed using a Tukey test. For the perspectives task, the position of 
selected pages was measured, as well as the number of pages selected beyond the top 
10. For the unusual results, the number of unusual results identified was measured. 
Participants made individual determinations of what was unusual. After the sessions, 
the perspectives and unusual items identified were reviewed, along with the 
comments of participants and the observer?s notes. 
3.3.7.1 Correctness score 
In task A participants were asked to find the three agencies that provided the most 
pages within the provided results. When several agencies were tied for third place, 
any of them were considered correct. The measured scores for all three scenarios 
were summed, yielding a total score in the range 0-9. Rank order was not evaluated 
for correctness. The ANOVA showed significant differences in the mean total scores, 
f(2, 15) = 6.74, p = 0.008. Post hoc analysis showed significant differences between 
the control and expandable outliner and between the control and treemap, but not 
 
 
46 
 
between the expandable outliner and treemap (Table 1). These results support our 
conjecture that a meaningful categorical grouping would benefit users for this task. 
 
Table 1. Mean correctness scores for each interface, with standard deviation in parentheses. 
 Control Expandable Outliner Treemap 
Correctness score 6.50 (1.38) 8.33 (1.21) 8.67 (0.52) 
 
3.3.7.2 Perspectives found 
The perspectives task required participants to identify three different perspectives on 
or aspects of the topic. I compared task completion rates, position of pages found and 
number of pages found beyond the top 10. The perspectives reported by participants 
are listed in Appendix A. 
 
Task completion ? With two exceptions, all participants completed all tasks. One 
member of the control group provided only one perspective for the Urban Sprawl 
scenario, and one member of the Expandable Outlier group provided only two 
perspectives for the Breast Cancer scenario. 
 
Position of perspectives found ? For each scenario, I determined the positions (rank) 
of the pages from which each identified perspective was drawn and computed the 
median value (Table 2), as well as the fraction and percent of perspectives that were 
identified from beyond the top 10 results (Table 3). The ANOVA showed significant 
differences, f(2, 146) = 17.10, p << 0.01. Post hoc analysis showed significant 
 
 
47 
 
differences between the control and expandable outliner and between the control and 
treemap, but not between the expandable outliner and treemap. 
 
Table 2. Median position of identified perspective, with standard deviation in parentheses 
 Control Expandable Outliner Treemap 
Position of identified 
perspective 
4 (9.79) 38 (55.77) 18 (56.85) 
 
Table 3. The fraction and percent of perspectives which were found beyond the top 10 results. 
Scenario Control Expandable 
Outliner 
Treemap Over all 
conditions 
Urban Sprawl   8/16 (50%) 10/18 (56%)   6/18 (33%) 24/52 (46%) 
Breast Cancer 10/18 (56%) 10/17 (59%)   8/18 (44%) 28/53 (53%) 
Alternative Energy   7/18 (39%) 14/18 (78%) 16/18 (89%) 37/54 (69%) 
Over all scenarios 25/52 (48%) 34/53 (64%) 30/54 (56%)  
 
Category use ? For the overview conditions, I computed the mean number of 
categories selected during the task (Table 4). Note that no top-level categories were 
selected within the treemap. I can conjecture two explanations for this. First, users 
may have preferred the specificity of the second-level categories (agencies) rather 
than the top-level (departments). The nature of the treemap layout, however, suggests 
another explanation. The top level categories are selected by clicking on narrow 
rectangles containing the labels, whereas the second-level categories are selected by 
clicking on the much larger color-coded rectangles. Users may not have noticed this 
distinction, and clicked second-level rectangles intending to select the top-level 
categories. Random clicking could have had a similar effect. 
 
 
48 
 
 
Table 4. Mean number of top-level and second-level categories selected during perspectives task 
for the overview conditions, with standard deviation in parentheses. 
 Expandable Outliner Treemap 
Top-level categories 3.07 (2.76) 0.00 (0.00) 
Second-level categories 2.07 (1.22) 2.22 (1.35) 
Total 5.13 (2.85) 2.22 (1.35) 
 
3.3.7.3 Unusual results task 
The number of participants who found something unusual for each condition and scenario was 
counted ( 
Table 5).  
 
Table 5. Number and percent of participants who found something unusual by condition and 
scenario. 
Scenario Control Expandable 
Outliner 
Treemap 
Urban Sprawl 4 (67%) 6 (100%) 5 (83%) 
Breast Cancer 5 (83%) 5 (83%) 6 (100%) 
Alternative Energy 4 (67%) 5 (83%) 6 (100%) 
 
For each condition, the number of times participants identified unusual items was 
counted. The full tables for each scenario are in Appendix B. With six participants per 
condition and three scenarios each, any item could be identified at most 18 times. 
Two unusual items were notable, both related to the number of results found from a 
department or agency. The table shows the number of times participants identified 
these two items and the corresponding percent of the maximum possible.  
 
 
49 
 
 
Table 6. Number and percent of times a participant identified selected unusual items. Maximum 
possible was 18 (6 participants per condition, 3 scenarios each). 
Unusual item Control Expandable 
Outliner 
Treemap 
Why so many from a 
department/agency 
3 (17%) 4 (22%) 8 (44%) 
Why so few from a 
department/agency 
0 (0%) 9 (50%) 4 (22%) 
 
During the experimental sessions, many of the 12 overview participants 
spontaneously commented on the lack of results from an agency. As the comments in 
the following sections illustrate, this could be surprising and useful information. 
Since this was not anticipated, I reviewed the video of all sessions, and found that 
only one of the six control participants indicated (at any time during the experimental 
session) that they found it surprising that an agency had few or no results. However, 
nine of the 12 overview participants at some time found this surprising. From 
participant comments, it appears that the display of agencies with zero results and the 
color coding contributed to the searchers making such observations.  
3.3.7.4 Subjective satisfaction measures 
The subjective satisfaction questionnaire used a nine-point scale for all nine 
questions. Participants were asked to circle the number that most closely reflected 
their impression of the software. Five semantic differentials measured ranges between 
two assessments (1 = left-hand side, 9 = right-hand side): 
1. Confusing?Understandable 
2. Unhelpful?Helpful 
 
 
50 
 
3. Complex?Simple 
4. Easy?Difficult 
5. Frustrating?Satisfying 
 
Four questions assessed agreement with the following statements (1 = disagree, 9 = 
agree): 
6. Overall, I was able to get a good overview of the available search results 
for the tasks 
7. For the first task in each scenario, I am confident that I found the agencies 
with the most pages in the search results 
8. For the second task in each scenario, I am confident that I found good 
examples of web pages that represent different perspectives or viewpoints 
in the search results 
9. For the third task in each scenario, I was able to find unusual results 
effectively 
 
For all questions, higher values indicate higher satisfaction ratings. Question 4 was 
originally written with a value of ?9? meaning the most difficult and is reversed for 
presentation here. The values have been adjusted to reflect this reversal. 
 
 
51 
 
 
Table 7. Mean subjective satisfaction measures, 1=poor, 9=good, except for #4 (Difficulty) which 
is reversed. Standard deviations are shown in parentheses with ANOVA degrees of freedom, F 
values and significance. Signifiant differences are shown in bold. 
    ANOVA 
 Control Expandable 
Outliner 
Treemap df F sig
1. Under-
standable 
6.50 (1.34) 8.33 (1.21) 8.67 (0.52) 2,15 1.985 .172
2. Helpful 6.00 (1.27) 8.33 (0.52) 7.50 (0.84) 2,15 9.805 .002
3. Simple 7.50 (0.55) 7.50 (1.05) 7.50 (1.04) 2,15 0.000 1.000
4. Easy 4.50 (0.55) 7.67 (2.34) 7.00 (1.55) 2,15 6.143 .011
5. Satisfying 5.17 (1.83) 7.83 (0.98) 6.78 (1.73) 2,15 6.698 .008
6. Overview 6.17 (2.14) 8.50 (0.84) 7.83 (0.75) 2,15 4.457 .030
7. Most pages 5.33 (1.97) 7.50 (1.38) 8.00 (2.00) 2,15 3.703 .049
8. Perspectives 6.33 (1.21) 8.33 (0.52) 7.83 (0.98) 2,15 7.222 .006
9. Unusual 4.83 (2.79) 7.33 (1.21) 6.17 (1.83) 2,15 2.235 .141
 
The ANOVA analyses show significant differences for questions 2 and 4-8. For these 
questions, the post hoc analysis shows significant differences between the control and 
each overview condition, but not between the two overview conditions. Table 7 
shows satisfaction values with standard deviation in parentheses and ANOVA 
degrees of freedom, F values and significance. Clearly, users with an overview were 
more satisfied. 
3.3.7.5 Observations and participant comments 
Task A (Overview) ? Most users of the control interface linearly scanned the list to 
get a rough idea of the top agencies. They usually scanned the list once and produced 
an educated guess. Several particularly motivated participants scanned the entire list 
twice, once to get a rough idea of the top agencies and a second time to confirm their 
initial estimate by counting (spending much more time on the task). Users of the 
 
 
52 
 
expandable outliner interface typically scanned the top-level departments, and then 
drilled down into the agency level. The implementation only showed one open 
department at a time, and participants often had to re-open a department several times 
to compare counts between agencies. Users of the treemap interface appeared to use 
the color-coding more than the expandable outliner users, and then they would scan 
for the counts. When the counts were not displayed (which occasionally occurred due 
to a programming error) they would move their pointer over the node to view the 
pop-up details. Many participants were puzzled or frustrated by this obvious usability 
flaw and commented on it. Several users of the treemap suggested that a color 
gradient could be used to show more detail. In both overview interfaces, some 
participants commented on using the ?Hide empty categories? feature extensively. 
The readability advantage that this provided was particularly noted in the treemap 
interface. In both overview conditions, several participants asked if there was a way 
to sort the overview by the result count. 
 
Task B (Finding perspectives) ? The control group typically scanned the results 
linearly until they had found three satisfactory perspectives. A few participants would 
scan down one or two pages, and then scan up from the bottom, stating that they 
expected the lower-ranked results would produce different perspectives. Most 
participants scanned either the title only or title and snippet. Very few of these 
participants appeared to use the department/agency name. The overview groups, 
however, often immediately clicked on a department or agency node. When asked to 
explain this behavior, they typically replied that their knowledge of the agency or the 
 
 
53 
 
large number results from that agency led them to believe they would get a certain 
perspective by doing so. A few indicated that they just picked agencies randomly with 
a similar expectation. After selecting an agency, some participants would 
exhaustively scan the restricted list of results before selecting another agency, while 
others would find an acceptable page and immediately select another agency.  
 
Task C (Finding unusual results) ? Participants typically used similar tactics as for 
task B. The control group participants often satisficed after a few pages. As with task 
B, the findings varied widely among all participants and within groups. Several 
participants commented: 
 
What I found informative was? what didn?t show up, which I wouldn?t know 
if the hierarchy wasn?t there. 
 
The biggest surprises are the ones that are red [have the most results] and 
black [have no results]? 
 
This participant added that if he noticed that an agency had no results, and he 
expected it to, he would look at the uncategorized results. Since the result set had 200 
results total, the ability to filter out the 130 results that were categorized into known 
agencies would allow him to focus on the remaining 70 uncategorized results: 
 
 
 
54 
 
I would... go to the uncategorized and see what I find there. When that was the 
case [it would be] frustrating that there were 70 [uncategorized] results, but... 
70 is a whole lot better than 200, and look how much I can cut out. 
 
Several participants indicated that they selected an agency that had results but which 
they believed was unrelated to the topic to look for a surprising result. 
 
For both tasks B and C, participants occasionally asked for clarification of the task or 
expressed concerns that they weren?t sure that they were doing what had been 
requested. 
 
The results from this study are discussed in section 3.5, in conjunction with the results 
from the second study. 
3.4 Study 2: Automated clustering vs. government hierarchy  
3.4.1 Research questions 
The second study focused on the first and third research goals listed at the beginning 
of this chapter:  
1.   Identifying search tasks and sub-tasks that benefit from categorized overviews 
3.   Understanding how the categories used for the overview affect its utility and 
the user?s search experience 
It varied the categories and used a single display style, an expandable outliner, for 
both conditions. 
 
 
 
55 
 
The emerging principles (described in section 4.2) asserted that overviews should be 
organized by meaningful, stable classifications, but the overviews built with 
dynamically generated categories used by clustering search engines (e.g. Vivisimo) 
have been found helpful, even though participants sometimes fail to understand the 
clusters or their labels. This motivated investigation of how automatically clustered 
overviews supported user examination of search results. For this study, two new tasks 
were identified: idea generation and resource finding. These more complex 
exploratory search tasks were refinements of the tasks used in study 1, and allowed 
me to explore different search tasks (research goal 1). This study addressed three 
specific research questions with a combination of observation and questionnaires: 
 
1. What differences can we observe in how participants examine search results 
with respect to domain and classification knowledge when they use overviews 
based on dynamic categories (automated clustering) vs. overviews based on 
stable categories (government hierarchy)? 
2. What differences can we observe in how participants examine search results 
with respect to the type of search task when they use overviews based on 
dynamic vs. overviews based on stable categories? 
3. What differences do participants perceive in their search processes and 
outcomes when they use overviews based on dynamic categories vs. 
overviews based on stable categories? 
 
 
56 
 
3.4.2 Experimental Conditions 
A within-subject experimental design (N=12) with qualitative observation was used 
to address these questions. Two experimental conditions were used by each 
participant: Condition 1 used the Vivisimo search engine (Figure 24), as an example 
of an interface using dynamic categories to provide an overview. Vivisimo uses a 
form of automated clustering that generates hierarchies of concisely labeled clusters. 
The clusters are formed and labeled by finding common words and phrases in the 
titles and snippets. The cluster labels are displayed using an expandable outliner to 
provide an overview of the search results. Condition 2 used the expandable outliner 
interface from the previous study, in which results were organized by government 
department and agency. This experimental design unavoidably conflated several 
search engine and interface design issues with the classification. In addition to the 
different presentation style of the results, the search results for condition 1 were 
computed prior to the start of the experimental sessions, whereas Vivisimo was used 
on-line with live results. This was acceptable, however, because  
a) the basic layout of results and interaction styles were consistent,  
b) the study did not seek specific quantitative measures that would be affected 
by these differences, and  
c) the focus was on subjective satisfaction measures and observation.  
The order of interface presentation was counterbalanced; half the participants used 
the Vivisimo interface first, and half used the government hierarchy first. Two of the 
three scenarios were used for each participant, one for each interface, allowing me to 
collect data from each scenario eight times over the course of the study. 
 
 
57 
 
3.4.3 Scenario and task design 
As argued earlier, the exploratory search tasks must be placed in the context of 
realistic higher level information seeking and work scenario to motivate the specific 
tasks and control for how participants interpret the search tasks. The three scenarios 
from study 1 were revised and adapted to more clearly specify a high-level 
information need and to provide a stronger indication of the organizational context. 
The age element was removed because it was not judged helpful in setting the context 
in the first study. The revised scenarios were: 
 
Scenario 1 (Breast cancer) - Imagine that you are a Washington Post 
reporter who writes about government affairs. You have been asked to 
research a special series of articles for the Health section on what the federal 
government is doing about breast cancer. You have just entered the search 
terms ?breast cancer? in a new government search engine. 
 
Scenario 2 (Alternative energy) - Imagine that you are a Senate staffer. You 
have been asked to write a summary of government activity on wind power as 
an alternative energy source as background for a comprehensive legislative 
funding initiative. The summary will be read by the senators and other 
legislative staff. It will overview federal government activities, without 
advocating particular actions or expressing specific opinions. As a starting 
point, you are using a new government search engine to gather information. 
You have just entered the search terms ?alternative energy wind power?. 
 
 
58 
 
 
Scenario 3 (Urban sprawl) - Imagine that you an undergraduate student 
taking a class on Science and Public Policy. Your professor has assigned a 
20-page term paper on the federal government?s role in addressing urban 
sprawl. (Urban Sprawl is low density, automobile dependent development 
beyond the edge urban areas.) You are at the stage of exploring topics and 
defining your paper?s thesis. As a starting point, you are using a new 
government search engine to gather information. You have just entered the 
search terms ?urban sprawl?. 
 
Within each scenario, participants were asked to perform 3 tasks: 
 
Task A (Overview) ? Please spend 2-3 minutes exploring these search 
results to find out what kind of information is available. 
 
Task B (Idea generation) ? The wording of this task was customized for 
each scenario (see discussion in section 3.4.6.1): 
Scenario 1 - Please spend 4-5 minutes using these results to formulate 
2 story ideas that could be developed into a series of articles. State 
each story idea in a single sentence. Bookmark the pages that 
contribute to the ideas. 
Scenario 2 - Please spend 4-5 minutes using these search results to 
find 3 examples of important programs, studies, activities, etc. that 
 
 
59 
 
should be considered by anyone interested in this legislation. You 
should try to find the 3 most important examples within these results. 
Bookmark the pages. 
Scenario 3 - Please spend 4-5 minutes using these results to identify 3 
possible paper topics. State each topic idea as a single sentence. 
Bookmark the pages that contribute to the topic. 
 
Task C (Finding resources) ? Please spend 2-3 minutes using these search 
results to find 3 web pages likely to list sources (people or organizations) you 
would like to contact. Bookmark the pages you found. 
 
 
Figure 24. The Vivisimo search engine was used for the clustered hierarchy condition. 
 
 
 
60 
 
3.4.4 Procedure 
After the participants signed an informed consent form, they completed a short 
demographic questionnaire, providing their age, gender, occupation, knowledge of 
federal government organization, web experience, search experience, search 
frequency and whether they had participated in study 1. The two hierarchical 
overviews were described and they were given a sample task to try with both 
interfaces. They were encouraged to think aloud as they attempted the sample tasks, 
and any questions were addressed. As in the first study, participants were instructed 
to view just the results and categorized overview (when available). When they were 
comfortable with the interfaces, the first scenario was presented, and they performed 
the three tasks and completed a short subjective questionnaire. These steps were 
repeated for the second scenario. After the second scenario, participants completed 
another short questionnaire comparing the two interfaces and an unstructured 
interview was conducted to collect additional user comments. Due to the small 
sample size and formative nature of the study, statistical significance was not 
analyzed. The audio and screen video for the session was captured using Camtasia 
(about 8 hours total). Sessions lasted approximately one hour. 
 
The procedures and materials were pilot tested with 2 participants to clarify the 
scenarios and task descriptions and to streamline the questionnaires. The instructions 
were clarified so that participants would avoid Vivisimo?s sponsored links and the 
?Find in clusters? feature, which was not available in the government hierarchy 
interface. 
 
 
61 
 
3.4.5 Participants 
Twelve participants (6 male, 6 female) were recruited from university and 
professional contacts. They ranged in age from 22 to 58, with the average age being 
42. Three were students, and six had some strong connection to the federal 
government, either being employees or working closely with a department or agency. 
All had at least a year of experience with web search and reported searching at least 
once/week. All except one participant reported some familiarity with the federal 
government. Three participants in the previous study were recruited to see if their 
experience would differ from others. 
3.4.6 Results 
3.4.6.1 Subjective Measures 
Post-scenario questionnaires - After each scenario, participants were asked to 
complete a short questionnaire in which they provided subjective ratings for their 
experience with that interface (Table 8). Differences between the two conditions were 
slight, not more than one point on the nine-point scale, and variance was high.  
 
 
62 
 
 
Table 8. Mean differences in subjective ratings between conditions (standard deviation in 
parentheses). These questions were asked immediately after each scenario. 
Mean difference (std dev) Question 
Favors 
automated 
clustering 
Favors 
government 
hierarchy 
Q1. Prior familiarity with topic   1.00 (3.61) 
Q2a. Stressful/relaxing 0.67 (1.43)  
Q2b. Interesting/boring 0.33 (0.98)  
Q2c. Tiring/restful   0.33 (1.50) 
Q2d. Easy/difficult 0.17 (1.33)  
Q3. Tried to only view related information 0.83 (0.79)  
Q4. Got a good overview of results   0.58 (3.06) 
Q5. Usefulness of hierarchy for general exploration 0.75 (4.14)  
Q6. Usefulness of hierarchy for ideas/examples task 0.83 (2.25)  
Q7. Usefulness of hierarchy for finding resources task   0.58 (2.97) 
Q8. Noticed something unusual/surprising 0.08 (0.67)  
Q9. Confidence that respondent found good resources   0.75 (1.54) 
Q10. Confidence that respondent generated good ideas 0.25 (2.30)  
 
Exit questionnaires ? A post-session questionnaire solicited, participant preferences 
(Table 9). One participant did not answer these questions. Mean preferences are also 
shown with participants segmented by whether they were associated with the federal 
government (participants were evenly divided). 
 
 
63 
 
 
Table 9. Mean preferences for each task by all participants, participants associated with federal 
government and participants not associated with federal government (1 = preferred automated 
clustering, 9 = preferred government hierarchy). 
Mean preference (std dev) Question 
All 
participants
Associated with 
federal government
Not associated 
with federal 
government 
Q1. Preferred condition 
for general exploration 
task 
3.82 (2.68) 4.00 (3.16) 3.60 (2.30) 
Q2. Preferred condition 
for ideas/examples task 
4.27 (2.45) 4.38 (2.56) 3.60 (2.40) 
Q3. Preferred condition 
for finding resources 
task 
6.00 (2.79) 6.67 (2.66) 5.20 (3.03) 
 
Based on participant comments and a post-hoc review, I determined that generating 
ideas (scenarios 1 and 3) and finding examples (scenario 2) were not the same type of 
tasks. When the analysis was limited to the 4 cases in which scenarios 1 and 3 were 
both used, the mean preference value for question 2 was 3.25 (standard deviation 
2.06), suggesting a stronger preference for the clustered hierarchy for the task of 
generating ideas.  
3.4.6.2 Observations and Participant Comments 
The observed interactions varied widely between participants, reflecting personal 
preferences, skills, knowledge, motivation and attitude. They suggest interactions 
between domain knowledge, task and the classification scheme. 
 
 
 
64 
 
Domain and classification knowledge ? Participants applied their government 
knowledge to both interface conditions, but particularly to the government hierarchy: 
 
Now I definitely want to go over here, because we're talking energy... go to 
DOE [Department of Energy]... you're saying wind energy... important to 
DOE... what other government agency?.... well nothing showed up under  
defense, that's interesting... go to Uncategorized... The other one where wind 
energy might be important might be Commerce, but let?s look at Energy first. 
 
They also used opinions and biases to guide their exploration, as another participant 
admitted: 
 
The fact that I have feelings about how HUD works... (laughs) and there was 
a subcategory that said Independent Agencies appealed to my revolutionary 
spirit... I said alright well who's trashing these guys...and that probably 
played some role... 
 
They occasionally chose the wrong category based on incorrect domain knowledge: 
 
Well I know that NASA is under commerce [clicks Commerce]..., oh I?m not 
even clicking on NASA. Is NASA part of Commerce? No, maybe it's not. It's its 
own independent agency [clicks Independent Agencies]. There you go, I was 
looking at NOAA. 
 
 
65 
 
 
For at least one participant, the utility of the government hierarchy also depended on 
his specific knowledge of the government relative to the scenario topic. He 
commented: 
 
What you bring to it becomes a very powerful factor. The fact that I know the 
agencies with respect to this topic made this a snap which wasn't the case with 
the other one. 
 
When using the clustered hierarchy, participants occasionally expressed confusion 
when they noticed that government agencies were not organized in a manner 
consistent with their understanding of the U.S. government?s organization. 
 
Classification and task ? Participants expressed a variety of opinions on the 
applicability of each classification (the government hierarchy or the Vivisimo 
clustered  hierarchy) to the different tasks (ideas versus resources). Comments 
included: 
 
If I was just looking for sources of people to talk to I might prefer [the 
government hierarchy], but if I'm looking for ideas, stories [the clustered 
hierarchy] is probably more useful. 
 
 
 
66 
 
For what I do I would prefer the government thing, because at my level what I 
care about are finding data, but the data that I find, but the data I use has to 
be "blessed"... has to come from BLS... if I'm using statistics on agency size, if 
I want to know how big homeland security is, I got to get it from Homeland, or 
OMB or OPM or something like that. 
 
One user initially found the clustered hierarchy too complex, but after using it 
commented: 
 
It?s sort of set up posing a question. If you want cancer facts, do you want this 
aspect or that? It?s sort of leading you down a path. It?s helping you ask the 
questions you need to ask, whereas you?re sort of asking them intuitively, it?s 
doing that in sort of a logical path. I like that. It?s helping you burrow down 
into your search strategy. 
 
But another participant was wary of the level of detail in the clustered hierarchy: 
 
Sometimes, particularly when I'm looking for ideas, having stuff ? this is the 
nature of the digital age ? having stuff broken down too finely makes thinking 
more difficult, makes search for stuff more efficient but makes thinking about 
stuff more difficult for me... it's a lot easier for me to think in a category that 
talks about the statements of independent agencies... as opposed to going 
 
 
67 
 
through [the clustered hierarchy].  I'm not necessarily looking for something 
that's that efficient.? 
 
The same participant found using the clustered hierarchy condition to induce ?a more 
deliberative process? it requires me to put a lot more into this thing.? 
 
Category labels ? Participants would often look at categories without selecting them. 
They expressed two reasons for this. First the category label might be meaningful but 
not relevant. Second, the category label might not be meaningful in the context of the 
scenario. As one participant commented about the labels used for the clustered 
hierarchy: 
 
Stuff like ?Green? is useless to me. ?Renewable and Alternative?... is what it 
and a hundred other things are... doesn?t save me time. 
 
Several participants compensated for this by expanding each of those categories. This 
often revealed more interesting subcategories: 
 
The refinements were more useful than the major subject headings. They get 
down to a level of detail that is more useful. I'd have to look and see how well 
that correlates... the breakdowns are actually a whole lot more useful. The 
next time through I'd use them more aggressively. 
 
 
 
68 
 
Assessing search results ? When assessing the relevance of search result items or 
categories, participants commented on multiple facets, including topicality, 
pertinence, utility, document quality and source credibility. They often expressed 
skepticism about the results they found, because they were not able to view the 
individual web pages (due to the experimental procedure). As two participants noted: 
 
I find a web site that seems to have a lot of really interesting stuff [in the 
search result list] and then find it... is sponsored by the nuclear industry and 
everything is powerfully skewed... or some rant by some lunatic...with federal 
sites in particular they have this laundry list of what they?re responsible for... 
but it ends up so sanitized... 
 
I?d have to see if this stuff is substantive or not? so much of this stuff is 
window dressing. 
 
Acronyms appeared to be widely problematic, although the study did not 
quantitatively measure this. Problems were particularly noticeable within category 
labels. Even experienced government participants had puzzling encounters with 
unknown agency or project acronyms. 
 
Usability of the expandable outliner ? Participants found both interfaces quite 
understandable and quickly became comfortable with the expandable outliner. Most 
participants became comfortable alternating between the outliner, selecting a 
 
 
69 
 
category, and then scanning the search result list. Several usability issues were 
observed or noted by participants. The small size of the expander (a plus sign) in both 
interfaces caused several participants to initially overlook this capability ("I sort of 
forgot about this little plus thing"). One participant was irritated by the fact that in the 
Vivisimo interface the overview pane scrolled back to the top whenever a category 
was expanded.  
 
The following section discusses the results from this study in conjunction with the 
results from the first study. 
3.5 Discussion of studies 1 and 2 
These two studies began to answer the research goals posed at the beginning of this 
chapter and suggested additional insights. They showed that categorized overviews of 
the top 200 search results could be useful for the selected tasks. They also showed 
benefits and drawbacks of the dynamic categories. They corroborated several of the 
emerging principles (section 4.2) and entailed revisions to others, as discussed in the 
following sub-sections. 
3.5.1 Benefits of categorized overviews 
As expected, study 1 confirmed that the categorized overview conditions (the 
expandable outliner and the treemap) produced significantly higher successful 
completion rates for the task of identifying the agency with the most pages 
(hypothesis 1). The subjective measures showed that the overview treatments were 
preferred (hypothesis 2) and this was supported by user comments. Participants found 
 
 
70 
 
the overviews significantly easier to use, more helpful, and more satisfying than the 
control (the standard Google interface), and they were more confident of their own 
success. They agreed more strongly that they had gained a good overview and found 
good examples of different perspectives. There was no significant difference between 
the three interfaces on the question of whether they had found unusual results 
effectively, although the difference in means is suggestive. This task was the most 
open-ended and most subject to interpretation by participants, and this was reflected 
in the subjective measure variability as well as the questions participants asked to 
clarify the task. 
 
The results support the premise that the categorized overview interfaces are seen as 
simple, understandable and easy to learn (i.e., hypothesis 3 of study 1 was not 
supported). For the treemap interface, this conclusion is qualified by noting that 
participants were provided brief training in the use of the treemap. 
 
During the perspectives task in study 1 (?Find 3 web pages providing different 
aspects of or perspectives on this topic?), participants found their perspectives 
significantly deeper in the ranked list of results. This result is consistent with results 
reported in K?ki (2005), that searchers viewed pages deeper in the results. It provides 
quantitative evidence that the categorized overviews also helped searchers find 
relevant and useful pages deeper in the results. Participants using the expandable 
outliner found more of their perspectives beyond the top 10 results than did 
participants using the control, but the treemap outcomes were mixed. Participants 
 
 
71 
 
may have taken longer to become comfortable with the treemap interface. I observed 
a large variation in how participants interpreted this task.  
 
Having the overview available helped participants to notice areas particularly well-
covered and not well-covered by the search results. This can be attributed to the use 
of the meaningful and comprehensive hierarchy, which allowed users to make 
inferences and draw conclusions. In all of the experimental sessions for study 1, only 
one of the six control participants found it surprising that an agency had few or no 
results, whereas nine of the 12 overview participants at some time found this 
surprising. During the Unusual results tasks, treemap users particularly noted 
agencies that they had not expected to have results (but that did), while expandable 
outliner users noticed the opposite, i.e., those agencies with few or no results. This 
difference might be explained by the large, colored rectangles used for the treemap 
(thus drawing attention to agencies with results) and the expandable outliners linear 
arrangement of text (which encouraged scanning of agency names). This explanation 
is supported by the participant comments and suggests that color coding might be 
more useful in the expandable outliner if used more extensively. 
3.5.2 Effect of visual presentation of overviews 
The appeal of both the expandable outliner and treemap presentation of overviews 
was confirmed by the lack of statistically significant differences between the 
expandable outliner and the treemap in study 1. Most participants preferred the 
expandable outliner, although several participants found the graphical nature of the 
treemap more appealing. The participant comments suggest that additional user 
 
 
72 
 
control of the overview would be desirable. This included allowing participants to 
select the desired presentation, as well as creating or selecting the categorization 
scheme used. 
3.5.3 Effect of categories used for overviews 
When the overview was available participants took advantage of it, even when the 
organizing structure was not optimal for the task. Observations and participant 
comments indicated that participants used their prior knowledge of the classification 
to interpret search results. Participants indicated that they became more familiar with 
the government hierarchy over the course of the experiment. Because the government 
hierarchy is stable, this familiarity may be beneficial in successive searches. 
 
In study 2, the distinct nature of the categories probably contributed to differences in 
which tasks each was preferred for. Some participants appreciated the dynamically 
generated hierarchy for the ideas task. Its statistically based clustering yielded labels 
that they found suggestive of topic ideas. The labels of the dynamic categories were 
drawn from the titles and snippets in the results, and may have been more suggestive 
of themes. Some participants felt strongly that the government hierarchy helped them 
explore and understand the results more effectively. The labels in the government 
hierarchy indicated the provenance, or source, of web pages. The inclusion rules were 
more transparent and predictable to users for the government hierarchy than for the 
Vivisimo hierarchy, permitting more reliable inferences. Based on the results of study 
2, one emerging design principle (originally ?Organize results by meaningful, stable 
classifications,? in section 4.2.2) was revised to reflect the complementary nature of 
 
 
73 
 
stable and dynamically generated classifications. Together, they supported a variety 
of exploratory search sub-tasks.  
 
Individual user characteristics as well as task type appeared to affect user preferences 
for the classification hierarchy, suggesting that searchers be allowed to select from 
multiple organizational schemes. Several participants commented that they would like 
the ability to organize results in multiple ways, possibly customizing their own 
organization scheme. This buttresses another design principle (Support multiple 
visual presentations and classifications), suggesting that the faceted category 
approach (Yee, Swearingen, Li, & Hearst, 2003) could be beneficial for organizing 
web search results. Participant comments suggested that there may also be value in 
personally-created or customizable taxonomies. 
3.5.4 The importance of text 
Observations and participant comments confirmed that text was important, even with 
the overviews available. As one person noted, the overview was a starting point. But 
searchers still needed to scan substantial amounts of text. This was particularly 
noticeable with those participants who interpreted the tasks more realistically, 
requiring in-depth evaluation/assessment. This bolstered confidence in a third 
principle (Arrange text for scanning/skimming).  
3.5.5 Other findings 
Government agency acronyms were problematic for all participants, particularly 
within category labels. A simple capability to perform a glossary lookup would 
 
 
74 
 
probably be very helpful. Using hover text could allow searchers to pause the pointer 
over unfamiliar acronyms to see the full name of the agency or department. 
 
Participants rarely commented on the need to scroll within either the overview or 
result list. This suggested that it is a very lightweight action, and may not 
substantially affect the searcher?s cognitive process. It further suggests that larger sets 
of results (at least 100-200) can be usefully accommodated on a single page. Google, 
Yahoo!, and Vivisimo can return 100 results per page (with typical load times less 
than 5 seconds on a broadband network), so this is technically feasible. 
3.5.6 Limitations of these studies 
These studies were formative in nature, and the results must be interpreted within the 
context of the specified tasks and domain. They employed a small sample of subjects, 
who were presented with pre-defined scenarios, queries and tasks. The presentation of 
the categorized overview and results in study 2 was not strictly equivalent. The 
government hierarchy was limited in size and the specific tasks represented only a 
small slice of the tasks searchers perform in real-world topic searches. But, based on 
participant comments, the scenarios appeared to evoke a realistic information need in 
the subjects, and they used tasks that exploratory searchers really do perform. 
Examining large numbers of results and evaluating them in the context of current 
knowledge are characteristic of exploratory search tasks. By focusing on a specific 
domain (government web search), the immediate scope of the findings was limited, in 
return for gaining a deeper understanding of how searchers used categorized 
overviews within that domain. 
 
 
75 
 
3.5.7 Summary of studies 1 and 2 
The results of these two formative studies suggested answers to the three research 
goals. Exploratory search tasks can be supported by categorizing search results into 
comprehensible visual overviews using meaningful classifications. Stable 
classifications and dynamically generated classifications can be complementary ways 
to organize results, valuable for different tasks. The use of stable hierarchies helped 
participants notice missing information, and the dynamically generated classifications 
were found useful for generating topic ideas. The study results also motivated several 
new requirements: user-selectable classifications and a lightweight mechanism for 
customizing hierarchies. The studies were used to refine the emerging design 
principles. They raised the question of which tasks are best supported by stable 
categories versus dynamic categories.  
 
Situating the study tasks within the specific domain of government web search and 
within higher level work tasks reduced variation in participants? perception of the 
tasks without resorting to known-item search tasks. It allowed collection of a rich set 
of observations about how searchers use categorized overviews of search results.  
 
 
76 
 
Chapter 4:  Analysis, principles, and design of the SERVICE 
system 
This chapter presents the three main contributions of this dissertation: an analysis of 
categorized overviews (section 4.1), design principles for exploratory search 
interfaces (section 4.2), and the architecture and design process of the SERVICE 
system (sections 4.3 - 4.6). Although presented linearly here, they evolved in an 
interwoven, iterative manner. The results of the two early studies, described in 
Chapter 3, informed the design of the SERVICE system. They also helped refine the 
emerging analysis and design principles. The design process was informed by the 
analysis and design principles, and in turn these were challenged and refined by the 
design process. Each of the three was influenced by the process of developing and 
refining the other two. The third and final study, described in Chapter 5, helped 
validate elements of the analysis and principles, and suggested limitations that 
continued the iterative process of refinement. 
4.1 Analysis of categorized overview use 
The purpose of this analysis is to explain how categorized overviews can change the 
way searchers comprehend and interact with their search results. This helps to justify 
the design principles and ground the SERVICE interface design in a principled 
theoretical base. This analysis is applicable to the form of categorized overviews 
studied here, specifically the use of the categorized overview presented 
simultaneously with a list of search results. It is focused on one activity in the search 
process (examining search results) and one form of interface (categorized overview). 
 
 
77 
 
It is presented as one step in understanding how exploratory searchers conduct their 
searches, with the hope that it will be useful as a framework for more ambitious 
theoretical analysis. 
 
This section first presents a process model of exploratory search, and then identifies 
functional capabilities that categorized overviews provide and actions that they permit 
searchers to take. It describes how searchers can reason about search results using 
categorized overviews and tactics that they may adopt to take advantage of the 
overviews. 
4.1.1 Process model of exploratory search 
Examining search results is a necessary step within a larger information seeking 
process, the objective of which is to satisfy a perceived information need or problem 
(Marchionini, 1995). In turn, the perceived information need is motivated and 
initiated by a higher-level work task (Bystr?m & Hansen, 2002; J?rvelin & 
Ingwersen, 2004). Work tasks are situated in the work organization and reflect 
organizational culture and social norms, as well as organizational resources and 
constraints. The work task is similar to Sutcliffe and Ennis? goal or information need, 
or Marchionini?s recognition and acceptance of an information problem, but the work 
task specifically situates these in an organizational context. In the context of the work 
task, a second level of context is defined, in which information-seeking tasks are 
identified. These tasks vary as the work task progresses. The third level of context is 
the information retrieval context, wherein searchers identify sources, issue queries, 
and examine results. 
 
 
78 
 
 
The process model proposed here (Figure 25) combines the Marchionini model with 
the three-level Bystr?m & Hansen model used in the formative studies (described in 
section 3.3.4). The model defines five activities: recognize an information need (to 
satisfy a work task), define an information-seeking problem (to satisfy the 
information need), formulate query, examine results, and view documents. It places 
activities in the context of the three levels of information-seeking and work tasks. It 
shows how search activities are sequenced within the iterative search process. Each 
higher-level activity can involve multiple subsidiary activities. 
 
 
Figure 25. Process model of search in the context of work and information-seeking tasks. 
 
 
 
79 
 
The process is initiated when a searcher recognizes an information need and decides 
to try to satisfy it (Bystr?m & Hansen, 2002; Marchionini, 1995). This need may arise 
because the searcher perceives a gap or anomaly in knowledge needed to satisfy an 
externally imposed work task (Belkin, 1980). To satisfy the information need, the 
searcher undertakes one or more information-seeking tasks, which can be structured 
as a linear sequence or hierarchical decomposition of tasks. For example, the paper 
writing process could be modeled as a series of stages (Kuhlthau, 1991), or a medical 
search could be modeled using a hierarchical decomposition of goals (Bhavnani & 
Bates, 2002). Each of these tasks requires selecting a source, and then engaging in 
one or more information retrieval tasks. Within each information retrieval task, the 
searcher formulates queries, examines results, and selects individual documents to 
view. As a result of examining search results and viewing documents, the searcher 
gathers information to help satisfy the immediate information-seeking problem and 
eventually the higher level information need. This model collapses Marchionini?s 
source selection stage into the information-seeking problem. It also combines query 
formulation and execution. Reflection is inherent in each activity, and each activity 
except query formulation can return to a previous or higher level activity. 
 
The strategies and tactics that searchers use are affected by the capabilities provided 
by the search interface (Bates, 1990; Golovchinsky, 1997). Strategies are high level 
plans for the whole search, and tactics are individual actions or sequences of actions 
(often called moves) taken to further the search (Bates, 1979; Marchionini, 1995). 
Searchers can take numerous actions while examining search results (Bates, 1990; 
 
 
80 
 
Fidel, 1985; Garcia & Sicilia, 2003; Marchionini, 1995; Shneiderman & Plaisant, 
2004; Wildemuth, 2004). Specific actions supported by a web search interface can be 
discerned by analyzing the structure of text and hyperlinks on a search result page. 
For a typical search result page showing a ranked list of results, this yields the set of 
actions listed in Table 10. Each action involves cognitive and physical effort and can 
result in visual changes in the interface or changes in task, domain, or category 
knowledge (cognitive changes). The visual changes enable cognitive changes by 
making information visible on the screen. The cognitive changes are necessary to 
make progress on the information problem and are reflected in transitions between 
activities. For example, while examining results, searchers may scan a screen of 
results, causing them to identify additional query terms, causing a transition to the 
formulate query activity. In this analysis, actions that require visual scanning and/or 
moving the mouse without clicking are classified as low effort because they involve 
little physical effort, and they do not result in major changes to the display, thus 
minimizing cognitive effort. Actions such as selecting a result to view or scrolling the 
screen require a moderate amount of cognitive or physical effort because they require 
clicking and reorienting as the visual presentation changes. Issuing a new query 
requires a high amount of effort because of the cognitive effort required to formulate 
the query and the need to reorient when the new set of search results is displayed.  
 
 
 
 
 
 
81 
 
Table 10. Actions available to searchers when evaluating a typical search result list. 
Action Effort Visual changes Cognitive changes 
Scan one 
screen of 
results list 
Low 
physical; 
low-
medium 
cognitive 
(depends 
on type of 
scan) 
None 
? Identify page to view 
? Assess results overall 
? Identify additional query 
terms 
? Refine information need 
? Refine information problem 
? Extract useful information 
Scroll screen Medium Shift visible 
subset of search 
results 
None 
Select next or 
previous page 
of results 
Medium Shift visible 
subset of search 
results 
None 
Select a result 
to view 
specific web 
page 
Medium Bring web page 
into view 
None 
Reformulate 
query 
High Generate new set 
of search results 
None 
View specific 
web page 
Variable Variable ? Identify additional query 
terms 
? Refine information need 
? Refine information problem 
? Extract useful information 
 
Adding a categorized overview to search results changes the information that is 
available and the actions searchers can take with low or moderate physical and 
cognitive effort. The categorized overviews used in this research add the following 
design elements: 
 
 
82 
 
? The overview presentation ? a visual or graphical representation of the 
categories represented by the search results 
? Hyperlinks to narrow and broaden the displayed set of search results 
? When the pointer is placed over a category, the corresponding search results 
in that category are highlighted and a pop-up window is displaying with a list 
of non-empty sub-categories 
? When the pointer is placed over a search result, the corresponding categories 
of which that result is a member are highlighted 
Table 11 summarizes the actions afforded by these design elements, the effort 
required, and their visual and cognitive effects. 
 
 
83 
 
 
Table 11. Additional actions available to searchers when evaluating search results with 
categorized overviews. 
Action Effort Visual changes Cognitive changes 
Scan 
categorized 
overview 
Low 
physical; 
low-
medium 
cognitive 
(depends 
on type of 
scan) 
None 
? Identify category to consider 
? Assess results overall 
? Identify additional query 
terms 
? Refine information need 
? Refine information problem 
? Extract useful information 
? Assess match between 
categories and information 
need 
Select category 
to narrow or 
broaden results 
Medium Filter visible 
results, limiting to 
members of 
selected category 
None 
Move pointer 
over result 
Low Highlight 
category 
membership 
? Identify categories to consider 
? Assess results overall 
? Identify additional query 
terms 
? Refine information need 
? Refine information problem 
Move pointer 
over category  
Low Highlight results 
in category 
(currently visible 
results only) and 
display 
subcategories 
? Identify page to view 
? Assess results overall 
? Identify additional query 
terms 
? Refine information need 
? Refine information problem 
 
 
 
84 
 
4.1.2 Action: Scan categorized overview 
Scanning an overview is a lightweight physical action if the complete overview is 
visible on the screen (i.e., no scrolling is needed) and the elements are arranged in a 
consistent manner, using linear lists, columns, or matrices (Teitelbaum & Granda, 
1983). The cognitive effort will be low when the categories and their structure are 
familiar, but may be higher when first encountered or for unfamiliar knowledge 
domains. This impacts the knowledge that searchers can draw on to reason about the 
results and make inferences or predictions about meaning, authority, validity, 
relevance, and overall utility (Marchionini, 1995). Anderson (1990) argues that 
categories are ideally suited to supporting prediction. The category labels indicate 
statistical and conceptual relationships among members of a category, as well as 
distinguishing relationships between members of different categories (Markman & 
Ross, 2003). They limit the information people need to consider when making 
inferences (Markman & Ross, 2003), thus permitting reduced cognitive effort. This 
helps searchers to efficiently predict the utility of subsets of pages (i.e., the pages in a 
selected category) within the search results. For example, in the ?median? scenario 
described in Chapter 1, the task was to find an age-appropriate description of the term 
?median? for a ten year-old. In that context, web pages in the Kids and Teens 
category would be very likely to be useful. Category information should help 
searchers assess their search results overall, assess the match between the categories 
and their information need, and identify subsets of the results to consider exploring. 
The category labels can be thought of as suggesting alternative ?patches? of data 
within the results (Bates, 1989). The categories can also be considered to increase the 
 
 
85 
 
?information scent? (Pirolli & Card, 1995; Pirolli & Card, 1999) of pages that fall 
below the first screen of results. 
 
Exploratory searchers may value novelty in their search results. Unusual results or 
patterns of results may be important. When searchers value novelty, categories that 
were expected (or not expected) to contain results can surprise searchers when they 
do not (or do). This can cause searchers to reflect on their queries, information needs, 
or information problems. It may also prompt additional questions. This was 
particularly notable in the first study, when users spontaneously commented on the 
absence of any ?breast cancer? search results from the Department of Education. In 
the context of the third study, it also prompted users to think of additional story ideas, 
which they pursued by selecting the category. 
 
Relationships used to predict relevance or utility may be based on belief or logic 
(Marchionini, 1995). They can be based on searcher experience, or even bias or 
prejudice. As one subject admitted during study 1, he used his opinions about the 
Department of Housing and Urban Development (HUD) and his affinity for the 
concept of independence to guide his information seeking (quote on page 64). Of 
course incorrect knowledge can lead to incorrect predictions of relevance or utility, 
and thus to poor choices. This was observed in the first formative study when, for 
example, a participant incorrectly thought that the National Aeronautics and Space 
Administration was an agency under the Commerce Department and filtered using the 
wrong category. 
 
 
86 
 
 
Stable categories enable a searcher to develop familiarity and reuse category 
knowledge. When a searcher first encounters a category, he or she is not merely 
evaluating the results with respect to the information need. The searcher is also 
assessing whether the results are consistent with their expectation of the category. 
The assessment affects the category knowledge, confidence, and intentions for future 
use. Cognitive load is higher for the first encounter, but less upon subsequent 
encounters. The searcher?s understanding of specific categories grows with use. A 
poor first impression, such as when a category selection yields mystifying results, can 
discourage use, as was occasionally observed in the empirical studies. 
 
How people interpret category labels, the meanings they infer, and ultimately how 
they use categorized overviews, depends on personal knowledge of a subject, past 
experience, and the immediate context of use (Jacob, 2004). This is true even for 
well-defined categories, like the US government agencies used in the formative 
studies. With hierarchically organized categories, like the Open Directory Project, the 
interpretation of a category label can be affected by its parent category and any child 
categories. This was notable in the third study, which truncated categories at the third 
level. This had the effect of removing valuable contextual information from the 
category label, and resulted in confusion about the contents of the category. 
4.1.3 Action: Narrow or broaden by category 
Selecting a category to narrow results restricts the displayed results to those that are 
members of the specified category. After being narrowed by a category, results may 
 
 
87 
 
be broadened, removing these restrictions. This action can be considered as a form of 
query reformulation (Golovchinsky, 1997) or view navigation (Furnas, 1997). Study 
participants? comments indicated both perspectives. This action requires moderate 
physical effort because the user must move the pointer and click on a link. It requires 
moderate cognitive effort because the user must reorient to the changed list of results. 
4.1.4 Action: Move pointer over result 
In the SERVICE web search prototype (described in section 4.6), moving the pointer 
over a result provides additional details about that result by highlighting any 
categories displayed in the overview that it is a member of. This action requires low 
effort. Because the overview might only be displaying an upper level of the category, 
this does not necessarily provide complete category information. For example, if a 
result were a member of the thematic category 
/Arts/Television/Networks/Cable/BBC, but the display was only showing top-level 
thematic categories (e.g., Arts, Business, Computers), the Arts label would be 
highlighted. The search interface could also open a pop-up window near the result 
with the complete category information, although this might be large and distracting 
when a result is a member of many categories. 
4.1.5 Action: Move pointer over category 
In the SERVICE web search prototype, moving the pointer over a category in the 
overview has two effects. First, it highlights any results visible on the screen that are 
members of the category. This can provide examples of the members of that category. 
Second, it opens a pop-up window with a list of the non-empty subcategories of that 
 
 
88 
 
category, providing a preview of the effect of clicking. This action requires low 
effort. 
4.1.6 Tactics 
The actions enabled by categorized overviews can lead to altered search tactics 
because they change the information available and the range of possible interactions.  
This allows searchers to draw on new tactics and revise old ones while reducing effort 
and/or improving outcomes. For example, studies have found that most searchers do 
not examine more than the first page of search results (Jansen, Spink, & Saracevic, 
2000), suggesting the often observed tactic for evaluating search results. With the 
typical list, the searcher may scan 10-20 results, assessing their predicted utility for 
the task. With the addition of a categorized overview, the searcher can also scan the 
overview, using the categories to help predict the utility of the results that fall within 
those categories, as part of a single cognitive action. The categorized overview can 
typically show 20-30 categories and slightly reduces the number of results that can be 
displayed on a screen, typically by less than one result. This increases the amount of 
information that searchers can acquire within a limited time without appreciably 
raising their cognitive effort and with no additional physical effort (beyond eye 
movement). The use of these tactics does depend on searchers having an appropriate 
mental model of the categorized overview. Seven tactics evidenced in study 3 are 
shown in Table 12.  
 
 
89 
 
 
Table 12. Tactics enabled by categorized overviews. 
Tactic Description Benefit 
Broad queries Type broader queries in the 
search box, with few terms, then 
narrow results using the 
categorized overview. 
Reduced cognitive effort to 
generate the query. 
Organize 
examination by 
overview 
Use the categorized overview to 
determine the order in which 
result subsets are examined. 
Helps monitor search to keep 
it on track and efficient. 
Overview as 
backup 
Examine the top portion of the 
list first. If not satisfied, examine 
the overview to identify subsets 
to examine. 
May help when relevant 
documents are not at top of 
list. 
Preview before 
narrowing 
Examine the subcategory 
information before narrowing 
results to that category. 
Avoids low relevance results. 
Improves confidence in 
expected results of action. 
Assess result set Scan categorized overview to 
determine what categories are 
represented and how results are 
distributed across categories.  
Helps provide an overall 
understanding of the results of 
the query. May help assess the 
overall quality of the results 
and by implication the query. 
Probe using 
categorized 
overview 
Select specific categories and 
examine the results to assess 
subsets of the results. 
Reduces effort compared to 
typing multiple queries. 
Ignore Ignore the categorized overview. Avoids or simplifies decisions 
about actions to take. 
 
4.1.7 Other impacts of categorized overviews 
One question raised by the changes in search tactic described above is whether the 
visibility of the category labels biases the way that searchers assess their results. This 
could be a particular concern when there is limited metadata for categorizing search 
results because there may be a non-trivial number of pages that remain uncategorized. 
 
 
90 
 
If searchers are biased toward pages that have been categorized, perhaps at the 
expense of uncategorized pages, they could overlook valuable information. 
 
Satisficing is a well-known behavior in information seeking, as searchers deal with 
time constraints and information overload (Simon, 1979). Information seekers will 
not spend an unlimited amount of time and effort on a search. They stop when they 
achieve an acceptable level of achievement, which can often be quite low. Searchers 
seek to minimize effort or maximize expected value using the available information. 
Categorized overviews provide more choices and lower cognitive cost for those 
choices involving category selection. For example, in study 3, with thematic 
categories, one subject made extensive use of the News category, because the task 
involved generating ideas for news articles, and this appeared to simplify his task, 
even though it also removed many potentially useful results from consideration. 
 
Categorized overviews do add visual and cognitive complexity. They add visual 
complexity because the overview typically contains 20-30 category labels. They 
increase the number of possible actions and therefore the number of decisions that 
must be made. This can lead to excessive cognitive effort for some searchers. For 
some searchers, these issues can overshadow the benefits. During one experimental 
session during the third study, the subject asked if he could turn off the overview. 
However, the same study confirmed that most subjects found that the added 
complexity was a reasonable trade-off.  
 
 
91 
 
4.1.8 Implications 
Organizing search results by meaningful categories allows the category knowledge to 
be used when viewing results. Tightly coupling category labels to the result list 
allows searchers to efficiently narrow /refine results using the categories. Supporting 
multiple kinds of categories permits searchers to draw on multiple forms of category 
knowledge. Retaining the result list and arranging it for efficient scanning and 
skimming is essential to supporting efficient assessments of the surrogates. Interfaces 
that lack a result list prevent searchers from efficiently assessing results. Attempts to 
use purely graphical displays are inappropriate for many tasks, according to this 
analysis, because users need to evaluate concepts and semantics, and these are best 
represented by text, by language. Meaningful category labels, however, can 
compactly encode important concepts because even short labels can convey meaning 
accurately and effectively (although this sometimes requires learning the meaning of 
the labels). Categorized overviews do add to the visual complexity of the display, 
increase the number of decisions that searchers must make, and may bias searchers 
away from uncategorized but valuable search results. 
 
The analysis leaves open the possibility that quantitative concepts, such as document 
counts, that influence relevance prediction or otherwise indicate utility or novelty, can 
be usefully encoded with graphical elements, provided they do not affect the primacy 
of the text. For example, the color-coded bars employed by WebTOC (Nation, 
Plaisant, Marchionini, & Komlodi, 1997) can provide ancillary value by showing 
which categories are highly populated, but with the design shown in Figure 26, longer 
 
 
92 
 
labels are obscured by the bars. The challenges of first generation search visualization 
tools can be analyzed from this perspective, too. Visualizations like Grokker 
(www.grokker.com) and Kartoo (www.kartoo.com) privileged graphical displays, 
relegating text to a secondary role. Moreover, the visualizations were not based on 
meaningful underlying categories. 
 
 
Figure 26. Long labels are obscured by the bar charts in this WebTOC display. 
4.2 Design principles for exploratory search interfaces 
User interface design principles capture important constraints, capabilities, features, 
tradeoffs, human preferences, domain knowledge, and human and machine 
processing limits encompassed by a design space. They can document best practices, 
useful heuristic strategies, and design patterns. Principles represent an integration of 
theoretical knowledge and empirical study, distilled to provide practical guidance to 
interface designers. They evolve, informed by theory, study, and reflection. The 
design principles proposed here integrate knowledge from human-computer 
interaction, information visualization, and information science with the analysis in 
 
 
93 
 
section 4.1, the results of the three studies (described in Chapters 3 and 5), and 
practical experience developing search interfaces. The principles are: 
 
? Provide overviews of large sets of results 
? Organize overviews around meaningful categories 
? Clarify and visualize category structure 
? Tightly couple category labels to result list 
? Ensure that the full category information is available 
? Support multiple types of categories and visual presentations 
? Use separate facets for each type of category 
? Arrange text for scanning/skimming 
? Visually encode quantitative attributes on a stable visual structure 
 
This set of design principles is based on the premise that consistent, comprehensible 
visual displays built on meaningful and stable classifications will better support user 
understanding of search results. As users explore search results, they are grappling 
with multiple simultaneous information problems: Their conceptualizations of the 
high-level information needs are imperfect and evolving; their understandings of the 
relevant concepts and terminology are limited, and their understandings of the 
presentation and interactions available in the interface are incomplete (Marchionini, 
1995). Helping searchers to incrementally solve these problems allows them to 
fluently transition between the information seeking activities shown in Figure 25 and 
enables them to make more effective progress toward their high-level objectives.  
 
 
94 
 
4.2.1 Provide overviews of large sets of results 
During an exploratory search, users may not have clearly formed information needs, 
the needs may be evolving, or they may not know the terminology and concepts of 
the search domain. In contrast to known-item search, fact retrieval navigational 
search, where the smallest possible number of highly relevant documents is desirable, 
during exploratory search, there may be hundreds or thousands of potentially relevant 
results. The visual information-seeking mantra prescribes, ?Overview first?? 
(Ahlberg, 1993), and this is as appropriate for displays of search results as it is for 
other forms of information visualization. This is not a new idea ? Table 13 lists 
several web search interfaces that display large number of search results ? but it is 
important to reiterate that not all searches can be satisfied with a high-precision result 
set. The ideal number will certainly depend on many factors, including (but not 
limited to) the task domain, topic, the quality and quantity of documents, and search 
engine capabilities. The fact that many of the pages viewed in the three studies were 
ranked in the range of 50
th
-100
th
 suggests that at least 100 results will be required to 
form the basis of a useful overview of Web search results.  
 
 
95 
 
 
Table 13. Seven web search interfaces that represent large result sets in the initial results. The 
default value and user-selectable range are shown where it was reported or could be determined. 
 Number of results displayed 
Search interface Default Range 
Vivisimo 200 100-500 
Findex 150 Unknown 
Google 10 10-100 
Grokker 160 Unknown 
Grouper 50 10-200 
SWISH 100 N/A 
Yahoo! 10 10-100 
 
4.2.2 Organize overviews around meaningful categories 
Gaining an overview of search results involves a number of cognitive subtasks, 
including interpretation of the results within the context of the searcher?s internal 
mental model of the knowledge domain. Using meaningful, stable categories to 
organize results can place each result in a known context. Soergel (1999) has 
observed that classifications, taxonomies, and ontologies provide semantic roadmaps 
to fields of knowledge, improve communication and learning, and support 
information retrieval, among other benefits. The categories help searchers understand 
what concepts, ideas, and relationships are relevant in a domain, as well as suggesting 
query refinements. Categories based on document format, language, or Domain Name 
Service (DNS) domain can be useful. Numeric attributes such as date or size can be 
grouped into meaningful categories. For example, the Last Time Visited classifier 
 
 
96 
 
described in section Last Time Visited Classifier categorizes web search results into 
the categories: Today, Yesterday, Within a Week, Before Last Week, and Never 
Visited. Even abstract or computed attributes such as a journal impact factor 
(Garfield, 2005) can form the basis of meaningful, albeit controversial or limited, 
categories. Kwasnik (1999) argued that classifications support reflection, discovery, 
and knowledge creation.  
 
The analysis in section 4.1 suggests that stable categories will allow searchers to 
reuse category knowledge on subsequent searches. This principle was originally 
formulated as, ?Organize results by meaningful and stable classifications,? 
emphasizing the importance of the stability imposed by traditional classification 
schemes. Dynamic categories, such as those generated by automated clustering 
techniques change with each query. Thus the learning benefits of stable categories 
may accrue less. In study 2, however, participants commented on the benefits of both 
stable and dynamic categories for different exploratory search tasks. Consequently, it 
was revised to reflect the complementary value of both stable and dynamic 
categories. 
4.2.3 Visualize and clarify category structure 
If the categories are drawn from a classification, taxonomy, or ontology, the structure 
should be made visible. This simple rule can be overlooked by implementers, who 
use that information for sophisticated query modification or relevance ranking 
schemes but then neglect to present it to the end user. The structure provides context 
for individual category labels, shows relationships between concepts, allows users to 
 
 
97 
 
focus on the portions of the concept space that are of most interest. The visual 
presentation must be disciplined to avoid overwhelming or disorienting searchers.  
 
Practitioners should review at least the top two levels of a hierarchy, considering 
whether they need to be adjusted to provide the clearest overview. Parent-child (or 
broader-narrower) relationships that are clear when encountered while browsing a 
thesaurus or directory of web pages are not always clear when used in the context of a 
categorized overview of search results. The structure of the hierarchy may need to be 
changed in these cases. The importance of this emerged from the third study 
(described in Chapter 5). Some participants were puzzled because Television was a 
subcategory of Arts. Both of these categories were drawn from the Open Directory. 
The relationship between the two is clear when they are browsed within the Open 
Directory but not when used to organize search results, which lack the context 
provided on the home page. 
4.2.4 Tightly couple category labels to result list 
Tightly coupling the category labels displayed in the overview with the result list 
enables searchers to rapidly explore relationships between the two. Most commonly, 
the category labels can be clicked to narrow or broaden the result list. When this 
capability is implemented, it is important to provide clear feedback indicating which 
categories are currently applied. In all three studies, participants appeared to 
occasionally forget or overlook the fact that they were viewing a subset of their 
original query. 
 
 
 
98 
 
One benefit of tight coupling between the categories and results is that it allows 
searchers to very quickly see examples. Within a category, example results help to 
clarify the meaning of the categories and often provide indications of relevance, 
quality, etc. Even within well-known classifications, some category labels may be 
ambiguous or unfamiliar. A few examples can often clarify this. Dumais, Cutrell, & 
Chen (2001) noted that individual page titles helped disambiguate category names in 
their study of search results. This principle was initially formulated as ?Provide 
examples of documents for each category.? It was replaced with the current version 
because tightly coupling an overview with the result list provides a mechanism for 
users to quickly view a few examples or the complete set of all matching documents.  
 
Brushing and linking techniques tightly couple multiple views of data in an 
information visualization, so that an action in one view (brushing) is linked to an 
action in another view. This can be applied to search results (Klein, Reiterer, M?ller, 
& Limbach, 2003) to synchronize two views of the results, an overview and a detailed 
list. This can support richer interactions between category information and individual 
results. For example, pausing the pointer over a result in the list can highlight (in the 
overview) all categories containing that result. Brushing must be carefully used, 
though. During the evolution of the SERVICE prototypes, I experimented with a 
variation of this technique. In one version, pausing the pointer over a category label 
had the effect of immediately hiding all results that were not from that category. This 
was a very quick way to see results in a category, but was very disruptive. The screen 
would flash excessively as users moved the pointer over the categorized overview, 
 
 
99 
 
and the rearrangement of the list required users to visually reorient themselves with 
each change. The final design highlights the currently visible results that are members 
of a category when the pointer is placed over the category. 
4.2.5 Ensure that full category information is available 
When using deep hierarchies, designers should ensure that full category information 
(the complete label or descriptor) is available to searchers. The category labels in the 
overview indicate which categories results are in, but this may be limited to the top 
few levels because of the limited display space. During all three studies, but 
particularly during study 3, participants wondered aloud what specific category 
results were in. They were occasionally confused because only the top two levels of 
the category were visible in the overview. For example, the category 
/Arts/Television/Networks/Cable/BBC was truncated to /Arts/Television in the 
overview. Providing the full category label could clarify this. Displaying category 
labels in each result can be helpful (Drori & Alon, 2003). However, when this was 
implemented in the SERVICE system, the individual results became too large 
because results often appeared in multiple categories. Therefore, it was disabled prior 
to study 3. During development, we also experimented briefly with opening a pop-up 
window when the pointer moved over the result, but this was found to be visually 
distracting, because of the large size of the pop-up window. A small hyperlink in each 
result may be an appropriate design compromise, although this was not implemented 
or evaluated. 
 
 
100 
 
4.2.6 Support multiple types of categories and visual presentations 
No single type of category is effective for all users, tasks, and domains. In her 
comparison of categories and clustering for organizing search results, Hearst (1999) 
noted that neither categories nor automatically constructed clusters will always align 
with users? interests. Libraries provide subject, author, and title indexes and archives 
provide multiple finding aids for their holdings. GRiDL, SuperTable (Klein, M?ller, 
Reiterer, & Eibl, 2002), and Vivisimo?s new Clusty.com search engine are examples 
of search result interfaces that permit users to reorganize results using alternate sets of 
categories. During the studies, several participants noted that they would like to be 
able to select or define their own categories and re-arrange them for their own 
purposes. Likewise, no single presentation style is ideal for all situations and tasks 
(Risden, Czerwinski, Munzner, & Cook, 2000; Sebrechts, Vasilakis, Miller, Cugini, 
& Laskowski, 1999; Shneiderman & Plaisant, 2004; Swan & Allen, 1998). 
Exploratory searchers should be allowed to select a task-appropriate form of data 
display (Shneiderman, Byrd, & Croft, 1997). Alternatively, if that level of control and 
the corresponding increase in complexity is not appropriate for the intended users, 
designers should have a variety of categories and presentation styles to choose from, 
so they can choose appropriate categories and visual presentation styles. It may be 
useful to provide functionality that enables a knowledgeable proxy for the user (e.g., a 
?power user?) to customize the overview and share it with others. Supporting 
multiple classifications and multiple visual presentations will enable users to view 
and explore search results from the perspectives most appropriate to their needs. 
 
 
101 
 
4.2.7 Use separate facets for each type of category 
When a rich set of categories encodes multiple types of relationships, presenting them 
as separate facets can clarify meanings and relationships that might otherwise be 
ambiguous. For example, categories for is-a, is-about, and part-of relationships 
should be presented separately. Faceted classifications organize a domain into 
orthogonal sets of categories, which are ideally homogeneous, mutually exclusive, 
and represent a single characteristic of division (Vickery, 1960). They have been used 
to organize catalogs, classifications, and thesauri (Soergel, 1974; Vickery, 1960), 
information spaces on the Web (Louie, Maddox, & Washington, 2003), and search 
interfaces (Yee, Swearingen, Li, & Hearst, 2003). Facets are flexible and extensible; 
they do not require comprehensive knowledge or impose a rigid ordering, and they 
allow the indexed entities to be viewed from a variety of perspectives (Kwasnik, 
1999). The importance of this principle was clarified during the development of the 
SERVICE system. During informal user tests, searchers experienced confusion when 
categories with different meanings were used in the same facet. Separating 
geographic categories from topical categories in the final interface helped reduce this 
problem in the third study (described in Chapter 5). Other instances of categories that 
should have been separated out remained problematic. Therefore, hierarchies used in 
a categorized overview should be analyzed to determine whether they should be 
restructured into separate facets. The informal analysis performed during 
development yielded a noticeable improvement, suggesting that even a lightweight 
faceted analysis focused on the upper levels of a hierarchy could be beneficial. 
 
 
102 
 
4.2.8 Arrange text for scanning/skimming 
At a perceptual level, users of search results attempt to rapidly ingest large amounts 
of text. In the formative studies, I observed searchers scanning titles and snippets of 
text to quickly select specific pages to view. They skimmed the pages and returned to 
the list to repeat this cycle. It could be argued that this is simply a result of the textual 
presentation format, but it also reflects more fundamentally that the source documents 
are inherently textual and are not easily presented graphically. Considered from an 
information visualization and perceptual processing perspective, text may be one of 
the most compact representations available for the broad range of information to be 
displayed as the result of a search. The graphical marks that humans recognize as 
letters are rapidly processed into words and concepts, allowing such diverse concepts 
as ?war in Iraq,? ?hot coffee and muffin,? and ?search result visualization? to be 
represented in just a few pixels. This reflects a fundamental distinction between the 
strengths of human and machine capabilities. As humans we have an extensive, 
nuanced understanding of language that allows us to take advantage of a rich set of 
cues, including morphology, syntax, lexicon, context, and pragmatics that are only 
approximated by the algorithms implemented in machines. 
 
Three important attributes of web search results identified by Drori (2003) ? title, line 
in context (a snippet of text containing one or more query terms), and keywords ? are 
free-text and not easily represented visually. The fourth important attribute identified 
by Drori, category, can be drawn from a controlled vocabulary and often structured 
hierarchically in thematic groupings. Arranging these elements in a consistent manner 
 
 
103 
 
(e.g. linear lists, columns, or matrices) (Teitelbaum & Granda, 1983) and ensuring 
that they are visible (rather than requiring interaction such as moving the pointer over 
an item) will support fast scanning and skimming. Aula (2004) found that presenting 
snippets as bulleted lists was 20% faster than the standard textual display. 
Appropriate use of font weights, styles, sizes, and colors will also help (Tullis, 1988).  
4.2.9 Visually encode quantitative attributes on a stable visual structure 
Information visualization principles are grounded in our understanding of human 
perceptual and cognitive systems, particularly their structure, functions, strengths, and 
limitations. Visualization techniques such as size, color, or shape-coding engage the 
human perceptual and cognitive systems by encoding data into visual constructions 
(Card, Mackinlay, & Shneiderman, 1999). Quantitative attributes such as dates or 
document counts and nominal attributes with a small range of values such as 
document types can be visually encoded by position, color, shape or size. Compared 
with text, quantitative attributes may be effectively visualized in more flexible ways. 
The underlying structure (the visual substrate) upon which the quantitative attributes 
are displayed is not limited to a list or grid because the perceptual systems are 
effective at detecting visual patterns, outliers, etc.  
 
Stable, consistent, and meaningful displays have been shown to promote success in 
user interfaces (Shneiderman & Plaisant, 2004; Tullis, 1988). Niemela & Saariluoma 
(2003) demonstrated the importance of both spatial layout and semantics (labels) in 
learning a visual display. Providing a stable visual structure for the overviews, 
 
 
104 
 
structured around meaningful categories, will allow searchers to focus on the task at 
hand rather than re-interpreting a changing presentation of the results. 
4.2.10 Summary 
These eight design principles for categorized overviews have been refined and 
validated by the design and evaluation of the SERVICE system. They complement 
and extend general human-computer interaction, web design, information 
architecture, and information visualization principles. They will be useful for search 
interface designers because they provide guidance for the appropriate integration of 
visual overviews with search result lists, and particularly for the textual surrogates 
embedded in result lists. They do not yet address a number of issues, including how 
much stability is needed in the visual structure versus how much variability can be 
tolerated, what the permissible trade-offs are, and how much context is needed when 
navigating search results. These principles represent a strong call for exposing 
structure ? which is often used internally by search engines, but less often exposed at 
the user interface ? without abandoning the tried and true value of text. 
4.3 SERVICE requirements and architecture 
The initial SERVICE platform (version 1.0) supported the formative studies (studies 
1 and 2) by providing tools to generate prototype interfaces with categorized 
overviews of search results using a government hierarchy. These prototypes could be 
used to explore a pre-computed set of results. SERVICE 2.0 was designed to satisfy 
three objectives: 
? Provide a platform for investigating categorized overview interfaces 
 
 
105 
 
? Implement an architecture that facilitates easy plug-in of web search result 
classifiers 
? Provide working search interfaces and logging features for study 3 (described 
in Chapter 5) 
 
The findings of the early studies were used with the analysis and emerging design 
principles to define a set of high level requirements and specific desirable features. 
The feature list was pared to the features most important for the final study. The 
requirements and feature list guided the design and development of SERVICE 2.0. 
The following sections describe the SERVICE 2.0 architecture, the Fast Feature 
Classifiers that were implemented, the AOL Music Search Prototype, and the search 
interface constructed for the study 3. 
 
The SERVICE architecture is organized around three major subsystems, all built 
using Java technology: the user interface, the data model (which includes the search 
result classes and machine interfaces to two search engines), and the classifiers 
(Figure 27). It also includes a small subsystem for logging JavaScript events from the 
search result page. The general operation of the system is shown as a data flow in 
Figure 28. Queries are sent to the search engine, and the results are categorized using 
one or more classifiers. the overview is generated from the categorized results. 
 
 
106 
 
 
Figure 27. The SERVICE system consists of three major subsystems: the user interface, the data 
model (which includes machine interfaces to two search engines and the search result classes), 
and the classifiers. It also includes facilities to log JavaScript events from the search result page. 
 
 
Figure 28.  SERVICE operation is shown as a dataflow. Queries are sent to the search engine, 
which generates a result set. The results are categorized using one or more classifiers. The 
overview is created from the categorized search results. 
 
 
 
107 
 
The search result classes are used to create and manage search results, and include an 
interface to the search engines. They send queries to the search engine and parse the 
results to extract individual search result elements and group them into a set of search 
results. There are methods to cache results to and retrieve them from a database, 
optionally using a user ID. By default, when processing a query, the cache is first 
checked to see if it the query can be satisfied locally. 
 
The classifiers are Java classes that that implement a common Classifier interface to 
categorize search results into meaningful and stable categories. A SERVICE classifier 
at minimum implements methods to: 
? Categorize a single search result  
? Categorize a set of search results 
? Return the name of the classifier 
A SERVICE classifier is any class that provides these minimal services. They do not 
necessarily implement machine learning or automated classification methods, 
although these could be integrated using the SERVICE architecture. An important 
design criterion for the classifiers was that they rapidly categorize results, using only 
data available in the search results (Zamir & Etzioni, 1998). This motivated the 
development of a set of Fast Feature classifiers, described below. SERVICE 2.0 
supports nine classifiers, allowing the search interface to categorize search results into 
thematic categories and a US government organizational hierarchy, as well as others. 
 
 
 
108 
 
The user interface (UI) is the third major component of the SERVICE system. To 
support future studies, an important design goal was that the system be easily used by 
a variety of users. Ideally, the system would be accessible from any standard web 
browser without requiring special configuration. Early versions of the UI were 
implemented as Java applications, but early in the development process this was 
changed to a web-based application, using JavaServer Pages (JSP). This allows Java 
and HTML code to be combined in a single file, which is useful for rapidly 
prototyping and refining the UI, even though it does tightly couple content generation 
and presentation. Using Java applets or building browser plug-ins would have 
supported a richer set of interactions and visualizations, but for the purposes of this 
research, a combination of JSP and JavaScript provided enough functionality with 
minimal end-user demands. The design process and prototype evolution are discussed 
below. 
 
SERVICE 2.0 implements a client-side logging function to capture events on the 
search results pages. The interface used for the study logs new queries (via the 
onsubmit event), page loads (oninit), mouseovers (onmouseover), page scrolls 
(onscroll), and link selection (onclick). Event time and the user ID are captured along 
with an event type and optional event data. JavaScript functions manage a set of log 
buffers, which are filled by calls from event handlers on the search result page. As the 
buffers fill, they are asynchronously sent to a log service. This is currently done by 
encoding the log contents as a URL and using that as the source for a JavaScript 
image object. This causes the JavaScript engine to send a request using that URL, 
 
 
109 
 
ostensibly to retrieve an image file. A more elegant approach would use the 
XMLHttpRequest object. The log service parses the URL request to recover the 
individual events. It timestamps entries upon receipt, so that any large differences in 
the clocks between the client and server can be accounted for. It does not account for 
differences due to network transport. For the study, the client and server were both 
hosted on the same machine, so the clock differences were not an issue, but future 
studies will involve remote clients. An important limitation of the JavaScript-based 
logging function is that it only logs events on search result pages. These pages are 
generated by SERVICE, so the logging code can be included. Other pages are not 
instrumented, and therefore do not generate log events. Since the primary interest of 
study 3 was on the search result page, this was acceptable, but a proxy server was 
installed to log all non-local pages.  
4.4 Fast Feature classifiers 
The need to rapidly categorize search results into meaningful and stable categories 
motivated development of a set of nine Fast Feature classifiers (Kules, Kustanowitz, 
& Shneiderman, to appear).
1
 These classifiers use information available in the search 
results, typically the title, snippet, and URL, with valuable knowledge from external 
digital resources. The need to augment search results with additional metadata is 
indicative of the growing challenge facing digital libraries and archives caused by 
semi-structured and unstructured documents. Traditional digital libraries maintain 
                                                 
1
 Jack Kustanowitz contributed to the initial design of the fast feature classifiers and implemented five 
classifiers, under my direction. 
 
 
110 
 
rich metadata for their holdings, but as their holdings expand to include 
heterogeneous collections of semi-structured information, the available metadata 
dwindles, and human-generated metadata is expensive to create. External sources of 
digital knowledge can be integrated to provide valuable metadata, in this case, by 
supplying meaningful category information. 
 
Figure 29 elaborates on Figure 28, showing a general data flow for the process of 
categorizing search results. Classifiers can be characterized along three dimensions: 
Lean/rich, online/offline and fast-feature/full-feature (Kules, Kustanowitz, & 
Shneiderman, to appear). Lean/rich captures the scope, breadth, and depth of the 
categories used. Online/offline refers to whether the categorization process requires 
extensive offline setup or configuration (e.g., training a statistical text classifier). 
Fast-feature/full-feature indicates whether the classifier can rapidly categorize search 
results at search time. These three dimensions are used as a framework to characterize 
the SERVICE classifiers. 
 
 
111 
 
        
Figure 29. Components used to categorize web search results.  A set of search results returned 
from a search engine is categorized by a classifier. The classifier may optionally reference 
previously acquired information or knowledge, such as a database of rules or training data. 
 
Lean categories are simple, readily understandable categories with modest breadth 
and depth. In the context of the web, they can be constructed from document 
attributes such as file formats (DOC, PDF, PPT, etc.), DNS top-level domains (COM, 
GOV, ORG, etc.), and meaningful date or size ranges. As an example of the utility of 
lean categories, Matsuda & Fukushima (1999) found that using the document type 
(e.g., product catalog, online shop, call for papers, home page, bulletin board) in 
searches improved precision of the results.  
 
Rich categories are extensive classifications, taxonomies, ontologies, or other 
knowledge structures, often professionally developed, that provide ?semantic 
roadmaps? of an area of knowledge that can be useful for searchers (Soergel, 1999). 
Examples of rich classifications include the ACM Computing Classification System, 
West Publishing's Key Numbers classification of legal topics, Library of Congress 
Start: Query 
Search Engine 
Result Set 
Classifier 
End: Categorized Results 
DB 
Offline 
Classifier 
Offline 
Online 
(optional) 
 
 
112 
 
Subject Headings, and the US Government organizational hierarchy. Web directories 
like Yahoo! and ODP organize web sites into thematic hierarchies. They are of 
interest here because they cover a small but important portion of the web with high 
quality. Taxonomies such as MeSH also have been used to organize search results in 
specialized (non-web) search applications (Hearst & Karadi, 1997; Pratt, Hearst, & 
Fagan, 1999).  
 
Categorization can be done either completely online (at query time), or it may require 
prior processing (offline). Online categorization can be done when the search results 
are generated if the mapping of page to the hierarchy is trivial (for example, grouping 
by the DNS domain suffix such as .GOV, .COM, .EDU, etc.), or if it comes ?for free? 
with the result set (search engines may provide one or more topical categories for 
each result), or if it is a function of the result set (such as grouping by document size, 
where the size ranges depend on the result set).  Online categorization can be done 
from a database, either local or remote (such as querying the Open Directory Project 
(ODP) web directory (dmoz.org) if the topical category is not provided with the query 
result set). 
 
Offline categorization is required if no database exists to map search results to the 
desired categories.  In that case, an agent such as a web crawler looks at URLs (fast-
feature) or actual web pages (full-feature), potentially creates a hierarchy or reads an 
existing one, and places that page into the appropriate place in the hierarchy, storing 
the resulting mapping in a database.  Run-time activity is then simply looking up the 
 
 
113 
 
URL in question in the database and returning the appropriate mapping. Web page 
classifiers may require offline training to learn statistical models of the categories. 
 
A search-result categorization technique is referred to as fast-feature if it requires 
only information provided in the search result set, and therefore does not require the 
full text of each link destination. In contrast, a full-feature technique is one that 
requires the full text of the link destination (or possibly other documents, e.g., if it 
uses structural information such as hyperlinks). Typically information returned 
includes URL, date, size, and perhaps summary and/or topical category. Thus, for 
example, a technique such as a text match on the URL would be considered fast-
feature, but one that does textual analysis of the body of the HTML page pointed to 
by the link would not. Table 14 summarizes how these distinctions may divide up the 
space that describes how search results are analyzed. 
 
Table 14: Techniques for Search Result Categorization. SERVICE implements a set of online, 
fast-feature classifiers, in the black border 
 Online (at query time) Offline (requires prior setup or 
background processing) 
Full-feature Accessing each web page in 
a search result and doing 
extensive analysis (not 
addressed here; often 
impractical due to 
performance) 
Extensive text processing, manual, 
link analysis, machine learning 
(Work done by information retrieval 
and classification researchers) 
Fast-feature Uses only features in result 
set, such as title, snippet, 
URL, domain, size, ODP, 
pre-existing database map 
Web crawler for URL directory 
hierarchy parsing, search engine 
mining (query probing) 
 
 
 
114 
 
Full-feature online techniques would consist of reading a list of links returned from a 
search engine, and then at runtime, downloading each destination, performing some 
analysis on each page, and then doing some kind of categorization.  This is not easily 
scalable to large result sets, because it requires N network calls for N results and is 
largely dependent on remote sites for correct functionality.  While it might be feasible 
on a set of pages with reliable links and guaranteed fast network performance, or 
when pages are available on the local machine (e.g. a search engine that caches 
indexed pages), it is not practical in general. 
 
Much research has been done on full-feature offline techniques by information 
retrieval classification researchers.  In general, these require downloading and 
analyzing the full contents of each page, whether it is using link data to automatically 
build site maps as in MAPA (Durand & Kahn, 1998), or machine-based learning 
techniques that can categorize pages based on statistical analysis of word counts.  
Manual categorization, in which page designers categorize their respective pages can 
also be seen as a full-feature technique, as it also requires knowledge of the page 
contents. 
 
The following subsections discuss two kinds of fast-feature classifiers. Six online lean 
techniques are briefly considered before focusing on the three online rich techniques. 
The fast-feature techniques draw on meaningful relationships between a feature in the 
search result and some external database or other knowledge structure. If the 
relationship exists, that is evidence of membership in the category. The converse, 
 
 
115 
 
however, is not true. If no relationship exists, that can either mean that the page is not 
a member of the category, or that the external database is incomplete. When no 
relationship exists, an assignment could be made using traditional classification 
techniques. This might result in more pages being categorized, but it could also result 
in incorrectly categorized pages. The techniques described here are conservative; they 
do not assign pages to categories without an explicit relationship in the external 
database.  
 
When analyzing these techniques, an important characteristic is what proportion of 
search results can be categorized. To assess the potential utility of these methods, 
examples of each kind of classifier were implemented, and the percentage of search 
results that each categorized (which will be referred to as coverage) was measured or 
analytically assessed. Each classifier was targeted to a specific domain, so five 
representative queries were constructed for each target domain. For each query, the 
top 100 search results were retrieved from the Google search engine, and the number 
of results categorized by the classifier was measured. Additional analysis was 
performed on the ODP classifier, because I intended to use it in study 3. 
4.4.1 Online Lean Techniques 
A fast-feature online categorization technique is one that does not require the offline 
creation of a database, and also does not require the full text of the link destinations. 
The lean techniques often draw on surface features of the URL, such as the top-level 
domain to classify documents into simple categories. Table 15 contains a sample of 
lean classifiers. This list is far from complete, but it illustrates the breadth of 
 
 
116 
 
classifications available using only the data returned from the search engine and any 
freely available, pre-existing databases. The following three sections describe online 
lean fast-feature classifiers. 
 
Table 15. Online lean classifiers can provide simple categories to help users locate relevant 
information.  The three classifiers that have been implemented in SERVICE 2.0 are highlighted 
in bold. 
Name Description 
Top-level DNS 
Domain 
This classifier extracts the final part of the hostname, which 
typically indicates either a country code (e.g., us, jp, uk, de, 
etc.), or one of the defined top-level domains (.COM, .EDU, 
.ORG, .GOV, etc.).  This provides a simple way to provide a 
flat (non-hierarchical) categorization. A search for ?chip 
manufacturers?, for example, could be usefully organized 
according to country code. 
Last Time Visited The web browser history can be used to categorize documents 
by how recently they were visited (e.g., today, yesterday, this 
week, this month, never).  
Document Format The file format of the document (e.g., HTML, PDF, PS), can 
often be determined from the suffix of the filename in the URL 
or from a format indicator in the search results. 
Document 
Language 
The document language can be inferred from the title and 
snippet using dictionary lookup, yielding a flat categorization.  
Document Size This classifier groups results into similar size classes.  Size 
categorization may be useful for image search. 
Document Indexing 
Date 
Search engines sometimes provide the date the document was 
indexed (or ?crawled?) in search results. This can be used to 
categorize documents by how recently they were indexed, 
using values similar to the previous example. 
 
4.4.2 Top-Level DNS Domain Classifier 
The domain classifier is one of the simplest of the classifiers implemented in 
SERVICE. It places URLs into a flat set of about 110 categories based on the domain 
suffix (.COM, .EDU, .GOV, etc.) or the appropriate country code. A lookup table 
 
 
117 
 
maps the country code to country name, so that the categorization text can use the 
actual country name. For example, the following two URLs are categorized as 
follows: 
? www.whitehouse.gov/ -> GOV 
? http://www.corriere.it/ -> Italy 
(Country names can also be determined for non-country code-based URLs 
(Periakaruppan & Nemeth, 1999; Watters & Amoudi, 2003), which would provide a 
mechanism for categorizing URLs into a geographic hierarchy.) 
 
A user interface showing this categorization would allow quick navigation to all 
educational institution web sites, for example. Because the domain is available in 
almost every search result, this has the desirable property of nearly 100% coverage, 
that is, almost no results are left ?uncategorized.? Country codes may not be 
immediately recognizable to searchers, and at least one country (Tuvalu) has used its 
top-level domain (.tv) to host television websites, which could pose some challenges 
to searchers.  
4.4.3 Last Time Visited Classifier 
Categorizing search results by when they were last seen can be useful in certain 
situations. Although searchers attempt to re-access previously found documents via 
search engines, they have trouble remembering the specific query and/or navigation 
sequence that they originally used (Aula, Jhaveri, & K?ki, 2005; Wen, 2003). 
Integrating these categories into a search interface could help searchers more readily 
find previously visited pages. Alternatively, these pages could be excluded from 
 
 
118 
 
search results if the searcher wished to find new material. Personal browse histories 
maintained by a web browser can be used to indicate whether a web page or its web 
site has been visited and if so, when it was last visited. The SERVICE classifier 
categorizes web pages into five categories: Today, Yesterday, Within a Week, Before 
Last Week, and Never Visited. This classifier depends on the existence of a complete 
browse history, which introduces the issues of privacy and data storage size. The 
initial implementation works with the Firefox web browser 
(www.mozilla.com/firefox). It uses an external script to read the web browser history 
file, which is only updated when the browser exits, so sites visited in the current 
session are not immediately visible. If a complete browse history is available, this 
technique will provide 100% coverage, because any page not in the history can 
accurately be placed in the Never Visited category. If the browse history is limited, 
however, the Never Visited category cannot be used, because the absence of a page in 
the history file could either mean the page was never seen, or that it was seen but 
subsequently removed from the history. 
4.4.4 Document Size Classifier 
The Document Size Classifier uses page size information when it is available in the 
search results. When search engines return size information for pages, a dynamic 
categorization of sizes can be determined automatically, and this classifier can thus 
also run online. This could be useful when searching for images or multimedia 
documents. Categorization may be done uniformly using a fixed set of ranges (which 
may yield many categories with 0 results), or by online defining ranges that contain 
matches within the result set. Our implementation defines a constant number of 
 
 
119 
 
groups, divides the range of page sizes by the number of groups, and then places the 
results into one of those groups.  This is useful for visualizing a uniform distribution 
of page sizes. An alternative implementation could choose categories of fixed 
intervals, such as 100k-200k, 200k-300k, etc., even if the categories were not a 
uniform size.  This would be useful for seeing, for example, that no results were 
between 100k and 3MB for a given query.  If both of these implementations were 
published and adhered to the common interface, a searcher could choose which size 
classifier to use based on the desired visualization or search. This classifier will 
trivially yield 100% coverage. 
4.4.5 Online Rich Techniques 
Rich categories are appealing to users because the descriptive terms facilitate 
understanding. The fast-feature, rich techniques typically use a pre-existing database 
to map a URL to one or more categories. Table 16 identifies several rich classifiers. 
This illustrates the breadth of classifiers available. The following sections describe 
online, rich, fast-feature classifiers. 
 
 
 
 
 
 
 
 
 
120 
 
Table 16. Online rich classifiers can provide meaningful and stable categories that add context to 
the search results. 
Name Description 
US Government This classifier uses a pre-existing database that maps URLs to a 
government hierarchy. For example 
www.whitehouse.gov/president maps to the second-level 
category Executive/Executive_Office_of_the_President. 
Open Directory 
Project (ODP) 
This classifier uses the Open Directory Project category 
information that is returned with the query results to build its 
hierarchy. The ODP is a human-edited web directory 
(www.dmoz.org). 
Musical Genre This classifier parses search results from the AOL Music search 
engine to categorize songs according to a two-level musical 
genre. (A similar classifier categorizes songs by period.) 
 
4.4.6 U.S. Government Classifier 
The government classifier uses an existing database that maps government web pages 
into a government hierarchy, for example mapping http://www.af.mil/ to the hierarchy 
node /Executive/Executive_Agencies/Department_of_Defense/ 
Department_of_the_Air_Force.  Since the lookup is done locally, this can be done 
online at query-time. On its own, this classifier has coverage that is limited to the list 
of URLs in the database. However, any URL that is an extension of this base URL is 
also associated with the Air Force. The coverage can therefore be extended by using 
prefix matching, i.e., any URL beginning with www.af.mil/ would be mapped to this 
node, unless a more detailed match was found. Five representative queries were 
constructed by selecting the most commonly asked questions reported by the 
First.Gov web site (http://answers.firstgov.gov/ cgi-bin/gsa_ict.cfg/php/enduser/ 
std_alp.php), removing obviously navigational questions, as described in Broder 
(2002), and creating short queries from keywords in the questions. The results are 
 
 
121 
 
shown in Table 17. For both the ?new passport? and ?foreign embassy? queries, many 
of the uncategorized pages were from the domain ?usembassy.gov?, whereas the 
database had ?usembassy.state.gov?. This slight difference illustrates the sensitivity 
of this approach to URL variations, and suggests that additional heuristics or an index 
of synonymous URLs could be developed to make it more robust, a technique used 
by search engines. 
 
Table 17. Percent of the top 100 results categorized by the US Government classifier for five 
representative queries. 
Query % Categorized 
new passport site:gov   39 
start business site:gov   58 
gasoline prices site:gov 100 
foreign embassy site:gov   43 
obtain grant site:gov   72 
 
4.4.7 Open Directory Project Classifier 
Web directories such as Yahoo! (www.yahoo.com), LookSmart 
(www.looksmart.com), and the Open Directory Project (www.dmoz.org) catalog a 
small but important fraction of the Web. They provide an overview of general Web 
content and enable information seekers to find information by browsing a familiar 
subject hierarchy. As of April, 2006, its 72,000 volunteer editors had indexed 5.3 
million web sites in 590,000 categories (16 top level categories). The Open Directory 
Project classifier uses Open Directory Project information to place search results into 
categories within the ODP hierarchy. Even though web directories cover only a small 
fraction of the web, popularity follows a power law (Cunha, Bestavros, & Crovella, 
 
 
122 
 
1995). That is, a few sites receive much use. I conjectured that the highest ranking 
pages in search results would often be cataloged in the ODP. To categorize a search 
result into the ODP hierarchy, the web site is looked up in the ODP using prefix 
matching as in the US Government classifier. Since web sites can be cataloged in 
multiple categories, this yields a list of categories for the result. For example, a web 
page from the web site of the University of Maryland Human-Computer Interaction 
Lab would be categorized into the following three ODP categories: 
? /Computers/Human-Computer_Interaction/Academic 
? /Computers/Computer_Science/Academic_Departments/North_America/Unit
ed_States/Maryland 
? /Reference/Education/Colleges_and_Universities/North_America/United_Stat
es/Maryland/University_of_Maryland/College_Park/Departments_and_Progra
ms 
 
The classifier used a web service provided by Alexa.com. The Alexa service only 
categorized a single web page per HTTP request, so a cache was implemented to 
minimize processing time for search results when they have been previously 
encountered. 
 
Five queries representative of general web search were selected from the most 
common searches reported by AskJeeves search engine (Ask.com, 2005), after 
removing navigational queries. In addition, the five government queries described 
above were also evaluated (Table 5).  
 
 
123 
 
 
Table 18. Percent of the top 100 results categorized by the Open Directory Project classifier for 
five representative queries in each of two domains: general web search and government web 
search. 
Query % Categorized
General web search 
music lyrics 76
Games 83
Maps 90
real estate 82
Poems 76
Government web search 
new passport site:gov 69
start business site:gov 73
gasoline prices site:gov 90
foreign embassy site:gov 68
obtain grant site:gov 88
 
The preliminary tests were promising, and I wished to measure coverage for a more 
extensive set of searches. Coverage rates for the ODP were particularly interesting, 
because the study 3 would use these categories for general web search. The TREC 
2004 Robust Topics provided a set of 250 queries created as realistic, but difficult 
topics for information retrieval. For each of the 250 topics, the contents of the Title 
field were submitted to a Google search and the top 350 results were collected. This 
yielded 86,900 results. Because of the quantity of results, it was not practical to use 
the Alexa service to categorize them. The ODP data was imported it into a MySQL 
database and processed using PHP scripts. Each result was then checked to see if it 
could be categorized in the ODP. The number of results categorized within the top 
100, 250 and 350 results was measured (Table 19). The average coverage for the 246 
 
 
124 
 
queries successfully processed and categorized was 66.0%, 62.9% and 61.6% for the 
top 100, 250 and 350 results, respectively. 
 
Table 19. Coverage for the top 100, 250 and 350 search results from 246 queries based on the 
TREC 2004 Robust Topics. 
 Range Mean (SD) % Categorized 
Top 100 36-87   66.0   (7.68) 66.0 
Top 250 87-194 157.2 (16.00) 62.9 
Top 350 110-257 215.6 (21.11) 61.6 
 
This work is related to work by Chirita, Nejdl, Paiu, & Kohlsch?tter (2005) in its use 
of ODP data to organize search results. They used the ODP data to re-rank Google 
search results, boosting the rank of preferred categories, which were selected in 
advance by the searchers. They found that the top 5 re-ranked results were judged 
better than the original top 5, which illustrates the value that a large-scale knowledge 
resource can provide. The ODP is used differently in SERVICE, to expose the 
structure of the search results to searchers in the form of an overview, allowing 
searchers to choose categories at search time and avoiding the need to pre-specify 
categories of interest. The measured coverage results were higher for the SERVICE 
tests than in theirs, and we can consider two possible causes for this. They elicited 
specific types of queries (ambiguous, partially ambiguous, and unambiguous) from 
their test participants, who were research colleagues, whereas the SERVICE tests 
used a set of TREC topics. It is possible that their queries were focused more 
narrowly to yield the desired level of ambiguity. It is also possible that the prefix 
 
 
125 
 
matching strategy allowed the SERVICE classifier to categorize a larger fraction of 
pages. The results of study 3 lend support to the prefix matching approach. 
4.4.8 Multi-threading the ODP Classifier 
During development of the search UI, two additional requirements were identified. 
As mentioned above, the Alexa Web Service categorizer processes only one URL per 
call. Each call typically required 1-2 seconds to send the HTTP request, and then 
receive and parse the XML response. To categorize 100 search results was found to 
take close to two minutes. At that time, the ODP data had not yet been downloaded to 
a local database, so to reduce the categorization time, I implemented a multi-
threading option. This allowed multiple search results to be simultaneously processed. 
Alexa, however, would occasionally return an error, with likelihood of the error 
increasing with the number of simultaneously outstanding requests. This required 
retrying the request, which could further add to the load. A backoff strategy was 
needed to slow the request rate when this happened. By evaluating different 
combinations of values for the number of threads and the backoff times, I was able to 
reduce the typical time to process a set of 100 results to 20-30 seconds, with an 
acceptable error rate, approximately one or two per result set. 
4.4.9 Extracting multiple facets from the ODP hierarchy 
The second additional requirement was due to the need to extract multiple facets from 
the (single) ODP hierarchy. Specifically, the Geographic facet is extracted from the 
top-level Region category. This was desirable because the geographic categories 
implied a different relationship than the other categories. When a web site was 
 
 
126 
 
categorized under the other categories that generally (although far from always) 
meant that the web site was about the concept represented by the category. However, 
web sites were categorized under the Region category when the organization that 
published the site was located in a specific region. This qualitative difference in the 
relationship between a category and its member web sites warranted a visually 
separate facet in the UI. To accommodate this need, the classifier was extended, 
adding a new constructor that accepted a top-level category value as the root of a 
category hierarchy. 
4.5 AOL Music prototype 
The AOL Music prototype demonstrated categorized overviews of music search 
results, integrating an external database with AOL Music Search results to generate 
the overviews. It began to explore the use of multiple facets by displaying two types 
of categories in the overview: genre and era. The genre facet consisted of 11 top-level 
genres of music (e.g., blues, classical, country, etc.), combined with an optional 
second-level, drawn from an uncontrolled vocabulary. Thus a song could be 
categorized into the top-level category, Rock, and a second-level category, Pop-Folk. 
The era facet was composed of decades from 1910 to 2000.  
 
This design is similar to guided search designs such as Tower Records music search 
(towerrecords.com). Whereas that system draws data from a single database, the AOL 
music prototype demonstrates the integration of web-based search results with an 
external data resource. It also illustrates the utility of the SERVICE 2.0 architecture, 
since the prototype was built in less than a day. The category information for both 
 
 
127 
 
facets was extracted from the freedb.org CD database, which contains entries for 1.9 
million CD albums. Two new classifiers were constructed, and the web-based 
interface was adapted to display the song, artist, and album for each search result. 
Additionally, a search engine interface was constructed to send the user-specified 
song query to AOL?s music search engine and parse the HTML results. A query 
typically can be processed, categorized, and results displayed within 5-10 seconds, 
depending on the speed of the AOL Music search engine. 
 
 
128 
 
 
Figure 30. A search for songs with the words "road" and "travel" in the title yields 124 results.  
The results are presented with two categorized overviews: by genre and by date. Here, the 
results have been filtered (by clicking) to show just the 21 Country songs. 
 
 
 
129 
 
 
Figure 31.  Brushing the pointer over a category highlights the results that fall in that category. 
In this screenshot, the pointer has been placed over the ?2000s? category, showing albums 
released in the 2000s highlighted with yellow (shown boxed for clarity in these figures). 
 
 
 
130 
 
 
Figure 32. Brushing the pointer over an album title highlights all the categories for that album. 
Here we see that J.E. Mainer?s ?20 Old-Time favorites? is in both the Country and Folk 
categories, and that it was released in the 1990s. 
 
 
 
131 
 
4.6 General web search interface 
The SERVICE requirements document and feature list guided development of the 
search interface for study 3. The SERVICE architecture facilitated implementation of 
alternate user interface designs and categorization schemes. User interface designs 
were informally reviewed with HCIL and professional colleagues through the 
evolution of the interface designs to the design used in the third study. 
 
Since the study 3 would investigate categorized overviews in the context of general 
web search, it was important to select sets of categories that were appropriate for that 
domain. The evolving SERVICE designs explored multiple presentations based on 
the Open Directory Project classifier. As a formal classification, web directories have 
several limitations (Taylor, 1999); however, they provide a rich hierarchy appropriate 
for categorizing general web search results. As reported in section 4.4.7, a substantial 
portion of typical web search results have been cataloged within the ODP, which 
made it practical to use for the study. The evolving SERVICE designs explored 
multiple presentation based on the ODP categories. 
 
The first implementation of SERVICE 2.0 used a Java application to send queries to 
the search engine, parse and categorize the results, and cache them in a local database 
(Figure 33). The application opened an external web browser to display the results. 
The URL passed to the browser pointed to a local JSP script and encoded any 
selected category filters. The JSP script extracted the results from the database and 
formatted them to display the (possibly filtered) list of results to mimic Google. The 
 
 
132 
 
overview allowed users to select (via a drop-down list at the top) from multiple 
category sets (ODP categories, US government, and DNS domain). One category set 
was visible at a time, displayed using an expandable outliner. Because only a single 
facet was displayed, the entire height of the screen could be used, and multiple 
branches or sub-categories could be expanded at one time. 
 
 
Figure 33. This SERVICE search interface allowed users to select one set of categories at a time, 
which were displayed with an expandable outliner. This screenshot shows search results with a 
categorized overview based on the DNS domain. The US and international categories have been 
expanded. The results have been filtered to display just the 53 US commercial (.COM) sites. A 
drop-down list at the top of the overview allows users to select alternate category sets. 
 
 
 
133 
 
This design had several drawbacks. It required searchers to manage two windows. 
Tiling the windows was preferred, but the one of the windows could inadvertently be 
moved, minimized, obscured, or closed. When searchers clicked on a category in the 
overview, the results page would load in the browser window, but if that window was 
obscured, searchers would not see the results, because the browser did not receive the 
focus and get moved to the top of the visible stack of windows. The design also 
exacerbated existing usability issues with the browser Back button (Cockburn & 
Jones, 1996; Kaasten & Greenberg, 2001; Milic-Frayling et al., 2004), because the 
state of the overview could become inconsistent with the browser window. As users 
navigated with the Back and Forward buttons, the overview was not tightly coupled 
to the browser and would not be updated. Finally, the use of a Java application, 
although acceptable in the confines of a controlled experimental study, would present 
installation challenges for end-users. 
 
This led to two important decisions about the evolving SERVICE system. Integrating 
the categorized overview and the list into a single JSP page would present a cleaner, 
more consistent interface to searchers. And displaying multiple facets simultaneously 
would provide alternate perspectives within the overview, hopefully providing a more 
complete overview of the results. These changes were first instantiated in the AOL 
music search prototype, followed by the next web search design, which displayed 
four facets simultaneously in the overview (ODP categories, DNS domain, US 
government, and document size). Display and selection of categories within each 
facet was sequential, however. This meant that only one branch at a time could be 
 
 
134 
 
explored within a facet, and only one category at a time could be selected within a 
facet. Trade-offs inherent in the display and navigation of hierarchies are well-
recognized (Hochheiser & Shneiderman, 1999; Larson & Czerwinski, 1998; Miller, 
1981; Norman, 1991; Zaphiris & Mtei, 1997). For the constrained space available to 
the categorized overviews these were reasonable choices. 
 
Subsequent designs explored variations in the structure of the categorized facets. For 
example, one design promoted all 16 top-level categories in the ODP to separate 
facets (Figure 34). This forced the overview to extend beyond the first screen and 
required excessive scrolling. Another design promoted just one ODP top-level 
category, Reference, to a separate facet, retaining the others in a Topic facet (Figure 
35). This was done based on the observation that the Reference category implied a 
different relationship with its member sites than the other ODP categories. Reference 
indicated a kind-of web site, whereas other categories were thematic, indicating what 
the web site was about. This separation seemed to help searchers better comprehend 
the search results as they filtered and explored. The ODP Regional category was also 
promoted to a facet for a similar reason. Although the official description of the 
category in the ODP states that, ?The Regional category contains English language 
sites about geographical regions of the world,? in practice, web sites are apparently 
categorized there because the publishing organization is located in a geographic area, 
thus encoding a location-of-publisher meaning. 
 
 
 
135 
 
 
Figure 34. In this search interface, ODP top-level categories are shown as separate facets. 
 
 
 
136 
 
 
Figure 35. The search interface treats the ODP Reference category as a top-level facet. The 
remaining ODP categories are treated as another facet, in conjunction with the top-level DNS 
domain and the US government categories. 
 
The Last-Time-Visited facet, seen in the ?median? search example of Figure 1, was 
incorporated to explore the use of personally meaningful categories in the overview. 
It currently requires running external scripts to update the database for each use, so it 
has not been used extensively or evaluated. An approach to implementing a practical 
classifier is discussed as future work in Chapter 7. 
 
 
 
137 
 
There is a trade-off between the number of facets displayed and the need to constrain 
the overview to a single screen. Additional facets also bring additional visual and 
cognitive complexity to the overview. With rich sets of categories (which yield wide 
and deep hierarchies), as the user navigates into the second and third-level categories, 
the number of categories often expands substantially, and this can cause the overview 
to grow beyond a single screen. The final study design used three facets: ODP topics, 
geography (drawn from the ODP Regional category), and US government (Figure 
36). Limiting the overview to three facets helped ensure that they did not extend 
beyond one screen and avoided ?facet overload.? As the study results reported in 
Chapter 5 show, this was an effective compromise for most searchers. An alternative, 
permitting searchers to customize facet and category display, is discussed as future 
work in Chapter 7. 
 
 
 
138 
 
 
Figure 36. The search interface for the final study coupled the ranked result list with a 
categorized overview based on topical, geographical and US government classifications. 
 
The design decisions made during this process are shown in Table 20. They illustrate 
the breadth of issues encountered when designing categorized overviews. 
 
 
139 
 
Table 20. Dimensions of the design space for categorized overviews. 
Design dimension Design choices Support in 
SERVICE? 
In study 3 
interface? 
Display of facets 
? Design-time 
? User-controlled 
 9 
 
 9 
 
Selection of facets 
? One-at-a-time 
? Simultaneous 
 9 
 9 
 
 9 
Display and 
selection of 
categories within a 
facet 
? Sequential 
? Simultaneous 
 9 
 9 
 9 
Display and 
selection of 
categories between 
facets 
? Sequential 
? Simultaneous 
 9 
 9 
 
 9 
Visible levels of 
hierarchy displayed 
? Current level 
? Current + children 
? Current + grandchildren 
? Display children in pop-up 
 9 
 9 
 
 9 
 9 
 
 
 9 
Overall depth of 
hierarchy 
? Fixed 
? Unlimited 
 9 
 9 
 9 (3) 
Display of 
?uncategorized? 
pseudo-category 
? Displayed 
? Hidden 
 9 
 9 
 9 
Display of empty 
categories 
? Displayed 
? Hidden 
 9 
 9 
 
 9 
Sort order of 
categories 
? Alphabetic 
? Thematic 
? Numeric (largest first) 
 9 
 9 
 9 
 
 
 9 
Actions / operations 
on overview 
? Filter/narrow 
? Broaden 
? Exclude category 
? Hide category 
? Brushing and linking 
? Edit/restructure hierarchy 
 9 
 9 
 9 
 
 9 
 9 
 9 
 9 
 
 9 
 
4.7 Summary of the SERVICE system 
The SERVICE architecture and infrastructure support two working categorizing 
overview interfaces: AOL music search and general web search. These search 
 
 
140 
 
interfaces were developed in accord with the analysis and emerging design principles 
for exploratory search interfaces. They support multi-faceted exploration of large sets 
of search results, providing categorized overviews based on meaningful and stable 
categories. The general web search interface was evaluated in the third study, as 
reported in the next chapter, which helped to validate and refine the principles and 
analysis. 
 
The SERVICE architecture defines a common Java interface to support easy plug-in 
of alternate category schemes. The technology is comprised of approximately 40 Java 
class files, which implement nine classifiers plus the two search interfaces. The two 
search interfaces use JavaServer Pages (JSP), hosted by an Apache Tomcat servlet 
container. The system runs on Windows and Linux, and uses the Java Database 
Connectivity (JDBC) API to integrate with MySQL and MS-Access databases. The 
system also implements a client-side logging facility that supports capture of any 
JavaScript events, including scrolling, mouse clicks and mouseovers, passing the 
timestamped events back to a Java-based logging tool. Four external data resources 
containing over 500 MB of data were processed to extract category information, 
using Java, Perl and PHP. The ideas embedded in the user interface will be useful to 
designers of other search interfaces, and it will be made available on the Categorized 
Search web page (http://www.cs.umd.edu/hcil/categorizedsearch/). The SERVICE 
system will be a flexible, extensible platform for additional research in categorizing 
search interfaces. 
 
 
141 
 
Chapter 5:  Study 3: Categorized overviews using ODP and US 
government categories 
Study 3 built on the results of the formative studies, scaling up from prototype to the 
working SERVICE 2.0 system. SERVICE enables support of general web search with 
the thematic classifications provided by the Open Directory Project (ODP). At least 
one commercial search engine (Exalead.com) has implemented categorized 
overviews of web search results using an adaptation of the ODP. However I am not 
aware of any studies of this approach. 
 
The previous studies used a diverse set of participants and scenarios. That was 
desirable because of the formative nature of those studies. The queries and search 
results were fixed, and the search tasks were narrowly described. This third study 
used a narrowly tailored scenario ? asking participants to generate newspaper article 
ideas for selected topics ? that would be meaningful for a homogeneous group of 
study participants, recruited primarily from journalism students. This allowed 
participants to perform real web searches using the working SERVICE prototype and 
permitted data to be collected in a more realistic context, enhancing the external 
validity of the study. To minimize the impact on the study?s internal validity, it was 
important to control the participant and scenario/task variables. This has been shown 
to be an effective way to balance the need for experimental control with realism when 
evaluating information retrieval systems (Borlund, 2000). 
 
 
 
142 
 
We first looked for a pool of subjects whose common background could be used as 
the basis for a simulated work task in an exploratory search scenario. Visiting 
Professor of Journalism Ira Chinoy was intrigued by the potential benefits of 
categorized overviews of search results for journalists and agreed to critique the study 
scenario and help recruit journalism students. He helped develop a narrowly tailored 
scenario that was meaningful for the homogeneous group of study participants.  
5.1 Research questions 
The research questions for this study were: 
1. How do searchers think differently about their search tactics when categorized 
overviews are available to augment the result list? 
2. What kinds of novel behaviors do searchers exhibit when categorized 
overviews are available?  
3. How do the benefits of categorized overviews of search results for exploratory 
search observed in the first two studies compare with those observed in the 
domain of general web search? Specifically, how do searchers experience 
different topical perspectives or unusual/surprising results? Do they notice 
categories that are particularly well-covered by search results? 
4. In what ways could the presence of categorized overviews affect the quality of 
the search outcome? 
5. When categorized overviews are used, what differences can we identify for 
the above questions between broad and narrow topics? 
 
 
 
143 
 
Evaluating exploratory search interfaces is challenging. The nature of exploratory 
tasks can make it difficult to specify objective performance measures like time to 
completion, error rates, precision, or recall. Completing an exploratory task often 
involves developing and refining an information need that is specific to the 
individual. Mistakes and back-tracking are part of the process as searchers learn 
concepts and vocabulary. Documents that have great utility or novelty to one person 
may have little value to another, because of variations in domain knowledge, 
interests, and previously encountered information, so establishing ground truth for a 
measure of relevance is problematic. Evaluations have assessed and rated the quality 
of a task outcome to generate quantitative measures on a lesson plan creation task 
(Kabel, Hoog, Wielinga, & Anjewierden) or measured incidental learning that 
occurred during a search session (Pirolli, Schank, Hearst, & Diehl, 1996). Exploratory 
tasks have been decomposed or narrowed to constrain the task (Janecek & Pu, 2005). 
A combination of quantitative and qualitative evaluation methods have also been used 
(Yee, Swearingen, Li, & Hearst, 2003).  
 
This study adopted the latter approach. Based on previous research (the formative 
studies and other studies), I expected to observe quantifiable and significant 
differences relative to the first three questions. They suggested hypotheses, described 
below, that were empirically tested. A qualitative approach extended the hypothesis 
tests by looking for phenomena not modeled by the research variables. For example, I 
expected that searchers would explore deeper in their result lists using the categorized 
overview, which was a testable hypothesis. I also anticipated that the interface would 
 
 
144 
 
prompt additional behavioral changes, but there was no a priori list; that would be 
developed from the data. Thus a combination of observation and semi-structured 
interview questions was used to investigate all five questions. 
 
5.2 Experimental conditions 
This study compared presentations of search results with and without categorized 
overviews. The categorized overviews were based on three facets: Topic, Geography 
and Government Agency. The topical facet (extracted from the ODP) classified web 
sites according to the 14 top-level categories shown in Table 21. The geographical 
facet was extracted from the ODP top-level category, Region. The US Government 
facet used a revised version of the hierarchy of the prior studies. The main difference 
was the addition of information to categorize state-level web sites. The topic and 
geography facets were chosen because they would apply to most search results and 
because they had been perceived favorably during the design and informal user 
testing. The government facet was included because participants in the earlier studies 
had commented on the credibility of government information. It was likely that the 
exploratory search scenario would induce subjects to look for government 
information because of its perceived credibility.  
 
Web sites were categorized into the top 2 levels of each hierarchy. Unlike the 
previous studies, in which all sites were placed into leaf nodes, in this study sites 
could be cataloged into any level of the hierarchy. Based on observations during those 
studies, I did not expect users to find this problematic. This structure was consistent 
 
 
145 
 
with the organization of the ODP and simplified development of the prototype. The 
categorized overviews were thus comprised of three 2-level facets. Each facet 
included a top-level pseudo-category (called Uncategorized) in which pages not 
categorized within that facet were placed. This allowed searchers to narrow results to 
just those not categorized within the facet. 
Table 21. Top level categories extracted from the ODP for the Topic facet. 
Arts Business Computers 
Games Health Home 
Kids and Teens News Recreation 
Reference Science Shopping 
Society Sports  
 
The study also attempted to investigate the effect of broad and narrow topics on the 
search process and outcomes. Because the topical facet was based on a general-
purpose classification and limited to a depth of two levels, it forms a set of broad 
categories. I wished to investigate whether broader topics were more amenable to the 
categorized overview than narrower topics. Ultimately, the variability in the 
participants? perceptions of topics foiled a rigorous comparison, but it did provide 
illuminating qualitative results. 
 
This study used a 2x2 within-subjects comparative design (N=24), with System 
(baseline or categorized overview) and Topic Type (broad or narrow) as the 
independent variables. The baseline condition presented search results as a typical 
ranked list, similar to Google (Figure 37). The experimental condition augmented the 
list with a categorized overview (Figure 38). 
 
 
146 
 
 
 
Figure 37. The baseline system (control condition) presented search results as a typical ranked 
list, similar to Google. It was referred to as the Kittery system in the study. 
 
 
 
147 
 
 
Figure 38. The experimental condition coupled the ranked result list with a categorized overview 
based on topical, geographical and US government classifications. This was referred to as the 
Portsmouth system in the study. 
 
5.3 Scenario and task design 
As in the formative studies, a high-level scenario was constructed around an 
exploratory search task. The task involved generating ideas for newspaper articles. 
Information seeking by journalists involves identification of an ?angle? or perspective 
on the story. The angle is often structured as a working hypothesis that drives 
 
 
148 
 
development of the story. Information needs are often uncertain because of the 
fluidity of evolving plans, and the story angle can change ? even at later stages of the 
information seeking process ?in response to external events (such as breaking news) 
or internal needs (e.g., increasing or decreasing the desired story length) . Journalists 
work under tight deadlines, often with only hours between story assignment and 
filing. These characteristics guided design of the scenario and task. Professor Chinoy 
verified that the scenario and task were appropriate for the journalism students we 
would recruit as study participants. They were also verified as part of the exit 
interview. The scenario and task were described to participants as follows: 
 
Imagine that you are a reporter for a national newspaper. Due to some recent 
events, your editor has just asked you to generate a list of ideas for a series of 
articles on [the topic, e.g. urban sprawl]. There?s a meeting in an hour, so she 
doesn?t need a lot of detail, but she wants a diverse list of 8-10 (or more) 
ideas for discussion. They should cover many different aspects of the topic, to 
appeal to a broad range of readers. Unusual or provocative ideas are good. 
You have about 10 minutes to conduct a short web search to find out what 
information is available and generate the ideas. Your results will be judged 
(by your imaginary editor) on the quality and diversity of ideas. For example, 
?public health impact? would be an okay idea. and ?obesity as a public 
health impact of urban sprawl? would be even better, because it is a bit more 
specific. As you use the search engine to explore and generate article ideas, 
enter them in the Collector form and include the web page that inspired your 
 
 
149 
 
idea. It is important that you enter the ideas, not notes like ?a good page?. 
Think of this list [point to the Collector] as a bullet list for the discussion. 
 
Matched pairs of topics (broad and narrow) were developed using the following 
procedures (see Table 22). For the broad topics, an informal survey of the literature 
generated a list of potential topics. For each potential topic, a query was constructed 
from the topic terms. A Google search was conducted, and for those searches that 
produced at least 500,000 results, the top 100 results were categorized into the three 
facets and the percentage of categorized results within each facet was computed. 
Topics that had similar percentages between the three facets were used in various 
combinations during the early study design and the pilot testing, and a pair of topics 
that participants found similar was selected. A similar procedure was used to select 
the narrow topics, starting with 250 topics from the 2004 TREC Robust Topics, 
eliminating topics with specific geographic references. To further narrow the scope of 
the topic for participants, the TREC description field was adapted and included in the 
description that was provided to participants. This procedure did not ultimately have 
the intended effect of providing broad and narrow topics, and it is critiqued in section 
5.11.1. 
 
 
 
 
 
 
150 
 
Table 22. Paired topics (broad and narrow) used for the study. This was the complete text read 
to the participants to describe the topic. 
 Topic 1 Topic 2 
Broad Workplace allergies (WA) The aging workforce (AW) 
Narrow Human smuggling - Human smugglers 
make money by smuggling, although 
the people being smuggled may or 
may not be willing participants. (HS) 
International art crime - Includes 
theft, fraud or embezzlement in 
the international buying or selling 
of art objects. (IAC) 
 
5.4 Hypotheses 
The research questions entailed two kinds of hypotheses, process-oriented and 
outcome-oriented. Process-oriented hypotheses addressed questions related to how 
the interface affected the search process and attitudes. Outcome-oriented hypotheses 
addressed the question of how the interface affected the quality of the participants? 
generated ideas and overall progress toward the scenario goal. 
5.4.1 Process-oriented hypotheses 
The categorized overviews make more terms visible to the searcher, at the slight cost 
of reducing the number of individual results visible on the screen. The analysis 
suggests that this could induce searchers to examine results distributed more evenly 
throughout the list, in effect, more deeply within the list. Because relevant search 
results can be found well beyond the top 10 or 20 results, this behavior could be 
beneficial. Studies of clustered search results have observed this beneficial behavior. 
For narrow topics, the effect could be reduced either because of fewer results being 
categorized (i.e., more uncategorized results) or less cognitive overlap between the 
topic and searcher domain and category knowledge. 
 
 
151 
 
 
H1a. Searchers will view (click on) results more deeply when using the categorized 
overview than when using the baseline. 
H1b. When using the categorized overview, searchers will view (click on) results 
more deeply for broad topics than for narrow topics. 
 
If searchers view deeper pages, then they might also collect pages from more deeply 
within the list, too. Similarly, if searchers use the categorized overviews to filter 
results, this might collect more pages that have been categorized (into any category) 
instead of uncategorized pages. For narrow topics, the effect could be reduced 
because of less use of the categorized overview. Although viewing pages more deeply 
within the result list is considered beneficial, these two behaviors could indicate that 
the categorized overview biased searchers. In the context of the study scenario, the 
decision to collect a page depends on the searchers? assessment of the utility of the 
page, and specifically whether that page suggested an idea. Thus a higher-level 
cognitive factor could be involved. 
 
H2a. Searchers will collect results more deeply when using the categorized 
overviews. 
H2b. When using the categorized overview, searchers will collect results more deeply 
for broad topics. 
 
 
 
152 
 
H3a. Searchers will collect a larger proportion of links from categorized facets (i.e. 
in ODP or government sites) when using the categorized overviews. 
H3b. When using the categorized overview, searchers will collect a larger proportion 
of links from categorized facets (i.e. in ODP or government sites) for broad topics. 
 
If searchers are exploring each set of results more fully with the categorized 
overview, then they might issue fewer queries overall during the time allotted for 
searching (12 minutes). For narrow topics, the effect could be reduced because of less 
use of the categories. 
 
H4a. Searchers will issue fewer queries with the categorized overviews. 
H4b. When using the categorized overviews, searchers will issue fewer queries for 
broad topics. 
 
The analysis suggests that the availability of categorized overviews will provide more 
information to the searcher and give them additional control over their results. This 
should lead them to rate the categorized overview interface higher than the baseline 
for organizing, exploring and gaining an overview of the results, finding useful pages 
and several measures of user satisfaction. The additional display and interaction 
elements, and the need to make more search decisions, could also cause users to 
perceive the categorized overview as more complex. For narrow topics, the effect 
could be reduced because of less use of the categorized overview. 
 
 
 
153 
 
H5a. Searchers will find it easier to explore search results with the categorized 
overview. 
H5b. When using the categorized overview, searchers will find it easier to explore 
search results for broad topics. 
 
H6a. Searchers will agree more strongly that the system provided a good overview of 
information available about this topic on the Web when using the categorized 
overview. 
H6b. When using the categorized overview, searchers will agree more strongly that 
the system provided a good overview of information available about this topic on the 
Web for broad topics. 
 
H7a. Searchers will agree more strongly that the system organized the results well 
when using the categorized overview. 
H7b. When using the categorized overview, searchers will agree more strongly that 
the system organized the results well for broad topics. 
 
H8a. Searchers will agree more strongly that the system helped them assess results 
and decide what to do next when using the categorized overview. 
H8b. When using the categorized overview, searchers will agree more strongly that 
the system helped them assess results and decide what to do next for broad topics. 
 
H9. Searchers will rate the categorized overview easier to use than the baseline. 
 
 
154 
 
H9b. When using the categorized overview, searchers will rate the system easier to 
use for broad topics. 
 
H10a. Searchers will rate the categorized overview more stimulating than the 
baseline. 
H10b. When using the categorized overview, searchers will rate the system more 
stimulating for broad topics. 
 
H11a. Searchers will rate the categorized overview more ?wonderful? than the 
baseline. 
H11b. When using the categorized overview, searchers will rate the system more 
?wonderful? for broad topics. 
 
H12a. Searchers will rate the categorized overview more satisfying than the baseline. 
H12b. When using the categorized overview, searchers will rate the system more 
satisfying than the baseline. 
 
H13a. Searchers will rate the categorized overview more complex than the baseline. 
H13b. When using the categorized overview, searchers will rate the system more 
complex than the baseline. 
 
 
 
155 
 
5.4.2 Outcome-oriented hypotheses 
Categorized overviews may enable searchers to make more connections between the 
search results displayed to them and their existing knowledge. Although there are 
more intervening variables between interface and outcomes than between interface 
and behavior, the categorized overviews might help searchers become more familiar 
with the topic, find more useful information, make more progress towards the 
scenario goal, and produce better article ideas. For narrow topics, the effect could be 
reduced either because of less use of the categorized overview or less cognitive 
overlap between the topic and searcher domain and category knowledge. 
 
H13a. Searchers will feel more familiar with the topic with the categorized overview. 
H13b. With the categorized overview, searchers will feel more familiar with the topic 
for broad topics. 
 
H14a. Searchers will find more useful information with the categorized overview. 
H14b. With the categorized overview, searchers will find more useful information for 
broad topics. 
 
H15a. Searchers will make more progress toward the scenario goal with the 
categorized overview. 
H15b. With the categorized overview, searchers will make more progress toward the 
scenario goal for broad topics. 
 
 
 
156 
 
H16a. Searchers will produce higher quality article ideas with the categorized 
overview. 
H16b. With the categorized overview, searchers will produce higher quality article 
ideas for broad topics. 
5.5 Participants 
Twenty-four participants (5 male, 19 female) were recruited primarily from the 
University of Maryland?s Philip Merrill College of Journalism and paid $30 for their 
participation. Campus colleagues agreed to distribute an email solicitation to mailing 
lists. Respondents were asked to complete an online questionnaire to collect basic 
demographic information, yielding a pool of 59 potential participants. This included 
journalism and non-journalism students. The journalism students were invited to sign 
up for experiment sessions via an online scheduling form, yielding 20 participants. 
Non-journalism students were then invited to sign up for the remaining sessions. 
They ranged in age from 18 to 27 years, with a median age of 20. Twenty-one were 
undergraduate students, one was a graduate student and two had graduate degrees.  
Participants reported being experienced and proficient at web searching. All reported 
at least three years of search experience, and all but two reported searching at least 
once per day, with two reporting searching 1-2 times per week. They all reported 
being successful in their searches ?Most of the time? or ?Always or almost always.? 
When asked to rate their search skills on a 1-5 scale (1 = novice, 5 = expert), nine 
reported a 3, twelve reported a 4, and three reported a 5. All used the Google search 
engine and 14 reported using other search engines. All reported performing searches 
 
 
157 
 
for class research, 23 searched for entertainment or recreation, and 22 searched for 
news and information on events. 
 
5.6 Materials 
5.6.1 Interfaces 
The search interfaces were assigned neutral names (Kittery for the baseline and 
Portsmouth for the experimental) and displayed alongside a small web application, 
the Collector form (Figure 39). The Collector form provided fields to capture ideas 
and the relevant URLs, and listed them in reverse chronological order so participants 
could refer to them during the session. The screen resolution was 1280x1024 pixels. 
Prior to search, the search window was set to 1024 pixels wide and the collector 
window to 256 pixels wide. 
 
 
 
158 
 
 
Figure 39. The interface used by participants was comprised of the system under test (left) and 
the Collector form (right).  
5.6.2 Script and training videos 
A written script provided participants with background information on the study, to 
describe the scenario and task and to introduce the training task. Three short (1-3 
minute) training videos, produced using Camtasia Studio, introduced participants to 
the two interfaces and the Collector form. 
5.6.3 Online questionnaires 
Three online questionnaires were used during the experimental sessions (see 
Appendix D). An entry questionnaire collected participants? demographic and search 
experience data. A pre-search questionnaire captured knowledge of and interest in 
 
 
159 
 
each topic prior to the search. A post-search questionnaire repeated the pre-search 
questions and collected reactions to the topic, interface and search process. Paper 
print-outs of all forms were available in case of communication problems with the 
external server (but were never needed). 
5.6.4 Paper forms 
The informed consent form was approved by the University of Maryland Institutional 
Review Board (see Appendix C). A payment acknowledgement form was used to 
verify that subjects had received payment for their participation in the study. One 
paper checklist ensured completion of all parts of the experimental procedure in the 
correct order, and another checklist ensured that participants were exposed to the 
basic system features and task elements during the training task. The exit interview 
questions were read to the participants from a paper form. 
5.6.5 System technology 
Participants used an IBM T42p laptop with a 15 inch display, 1 GB of RAM, and a 
1.8 GHz Intel Pentium M processor running Windows XP Professional. An external 
keyboard and mouse were attached, with an external pair of speakers for the training 
videos. Camtasia Studio 3 was used to capture screen video and audio, with a desktop 
microphone. The SERVICE 2 web search prototype was configured in two versions 
(one for each interface), both with logging enabled to capture category and result list 
clicks, as well as mouseover and scroll events. A Tomcat 5 server running on the 
laptop hosted the search application, interfacing to the search engine, managing the 
fast-feature classifiers, and generating the user interfaces. It also hosted the Collector 
 
 
160 
 
application. The applications connected via JDBC to a MS-Access database that was 
used to cache search results and to store the ideas and links. An Apache web server 
was configured as a proxy server to log all pages visited during the experiment 
sessions. This was desirable because the JavaScript based logs only capture web 
pages directly visited from the search results page. An open source web survey 
system, phpESP (phpesp.sourceforge.net), was adapted for the online questionnaires. 
This was hosted on a Redhat Linux server with a MySQL database to store 
questionnaire results. Participants used the Mozilla Firefox browser (v. 1.0.7) 
configured to use the proxy server for non-local HTTP requests. The laptop was 
connected to the Internet via the campus T3 connection. 
5.7 Procedure 
The experiment sessions were individually conducted in an office on the University 
of Maryland campus (Figure 40). As participants arrived, they were welcomed and 
provided a short introduction to the study, informed that they would be asked to 
perform four searches, answer several questionnaires, and that they would receive 
$30 at the end of the session. After an opportunity to ask questions, they signed two 
copies of the informed consent form. They were invited to adjust the chair, keyboard 
and mouse for their comfort and offered water and some candy snacks. After they 
signed the informed consent form, they completed the online entry questionnaire, 
providing demographic and search experience data, and viewed the training video 
appropriate to the first interface condition. Following the video, the scenario and task 
were described, and they used the system for a training task on the topic ?urban 
sprawl.? They were encouraged to ask questions and think out loud, using a think-
 
 
161 
 
aloud protocol (Ericsson & Simon, 1984). The training checklist ensured that they 
used the basic system features on their own or with prompting. When the checklist 
was completed, they were asked if they had any questions and if they were ready to 
continue. 
 
 
Figure 40. The experimental setup. Study participants sat in front of the computer, and the 
observer sat to their left. 
 
They were then presented with the first topic. They completed the online pre-search 
questionnaire, performed the timed search and completed the post-search 
questionnaire. This was repeated this for the second topic. After a short break, they 
were shown the second interface and given time to become comfortable with it. The 
 
 
162 
 
remaining two searches were then completed as before. The session concluded with a 
semi-structured exit interview and payment of the $30. 
 
The order of the training videos varied slightly depending on the interface 
presentation order. When the baseline interface was used first, the video for the 
collector form only was shown prior to the training task, and then the video for the 
experimental interface was shown after the break, immediately before they would use 
that interface. When they used the experimental interface first, they viewed the video 
for both the collector and the experimental interface prior to the training task. An 
alternative approach would have shown all videos and conducted all trainings at the 
beginning of the session. After discussion with colleagues, I decided that participants 
would be more likely to forget how to use the experimental interface if they weren?t 
shown it immediately prior to use. 
5.8 Pilot testing 
Before conducting the study, I pilot tested portions of the materials and procedures 
with six participants, and then ran the entire final experimental protocol with six 
others. During the pilot testing, various pairings of broad and narrow topics were used 
to select the final pairing. I observed how participants responded to the topics, and 
after the session asked them to compare the pairs of topics, and used this feedback to 
select the final pairs of topics for each topic type. The training time was extended to 
permit participants to work until they felt comfortable with both the systems and the 
task, and the scenario and task descriptions were refined. The final pilot tests 
 
 
163 
 
confirmed that the session duration was about two hours and 15 minutes, including 
about 30 minutes for the semi-structured exit interview. 
5.9 Analysis methodology 
5.9.1 Quantitative analysis methodology 
The quantitative data to be analyzed included the original location of items in the 
search result lists, counts, and subject preferences rated on an interval scale. In all 
cases, the null hypothesis was that there was no difference between the groups. A p-
value of 0.05 was used to reject that hypothesis, yielding a 5% chance of incorrectly 
rejecting the null hypothesis (a false positive, or type I error). Marginally significant 
differences (p <0.10) are also reported. Except where noted statistical tests were 
performed with the R Statistics package, version 2.2.0 (R Development Core Team, 
2005). R is an open source implementation of the S language and environment which 
was developed at Bell Laboratories. 
 
Search result location data 
The original location of selected or collected items in the search result list was treated 
as an interval scale. A professional statistician confirmed that an ANOVA analysis 
would be appropriate for these data. For all significant ANOVA results, the normal 
Quantile-Quantile (Q-Q) Plots were examined to confirm that the residuals were 
distributed normally. Where the raw data did not follow a normal distribution, the raw 
data was transformed using a logarithmic transform, an accepted technique for 
handling non-normal distributions (Jaccard, 1983). 
 
 
 
164 
 
Collecting pages from categorized facets 
When entering ideas into the idea collector, searchers simultaneously entered the 
URL of the page that prompted the idea, referred to here as the collected page. For 
each collected page a boolean variable (InAnyFacet) was computed indicating 
whether the page was found in any of the facets (topic, geographic, or government). 
Chi square analyses were used to test the relationship between the InAnyFacet 
variable and the System and Topic variables. For the System analysis, where there 
contingency table contains two rows and two columns, the Yates' continuity 
correction was applied. This is a commonly, albeit not universally, applied adjustment 
intended to provide a better estimate of the significance level (Jaccard, 1983). A 
factorial logistic regression analysis would also have been appropriate, and would 
have allowed me to additionally investigate the interaction between System and Topic 
variables. However, this was not a compelling interest for this study, and the Chi 
square tests are simpler to interpret and report.  
 
Quality of generated ideas 
The quality of the generated ideas was assessed using newsworthiness criteria 
suggested by Professor Chinoy. High quality ideas would pose a question or paradox, 
contain conflict and human interest elements, indicate the context of the idea, and 
reflect intangible elements such as ?coolness?. Other factors included timeliness, 
potential impact, and proximity. Diverse ideas were preferred, and redundant ideas 
were ignored. Each idea was rated on a scale from 1 (poor) to 9 (excellent) by a single 
judge (the researcher). The ideas were assessed blind, without knowledge of the 
 
 
165 
 
system used or participant, using a MS-Access form. Two passes were made through 
the ideas for each topic. This allowed me to become familiar with all the ideas before 
assigning final quality rating. 
 
To test the relationship between the Idea Quality variable and the System factor, a 
Wilcoxon rank sum test was used. This is the non-parametric equivalent of the 
independent samples t-test. It is appropriate here because the independent variable 
(System) is categorical with two levels, and the dependent variable (Idea Quality) is 
ordinal. To test the relationship between Idea Quality and Topic, a Kruskal-Wallis 
test was used. It is the non-parametric equivalent of a one-way ANOVA, and is used 
here because the independent variable is categorical with four levels. As the results 
below indicate, no statistically significant differences were detected for the System 
factor. If any had been detected, a second assessor would have rated the ideas, and the 
inter-rater agreement would have been checked to provide a more rigorous 
assessment. 
 
Subjective measures 
Subjective measures used Likert scales and semantic differentials on a 9 point scale. 
ANOVA statistics were used as above to identify significant differences. 
5.9.2 Qualitative analysis methodology 
The research questions were addressed qualitatively by direct observation, review of 
selected video and participant response to questions, and by a limited quantitative 
analysis of responses to three selected questions. Three forms of raw data were 
 
 
166 
 
available for this purpose. First, all sessions, including training, searches and 
interviews were recorded and participants were instructed to think out loud while they 
searched, which enabled me to flag interesting actions or comments in my 
observation notes and then review the sessions afterwards. This provided a total of 
about 100 minutes of audio and video per session. Second, immediately after each 
search, subjects were asked, ?What are your thoughts at this point?? They were asked 
to respond verbally or in written form on the post-search questionnaire. This typically 
yielded a 1-2 minute reply or 3-5 written sentences. Third, the exit interview included 
10 open-ended questions, usually lasting 20-35 minutes. 
 
Participants were instructed to think out loud during their searches, with the following 
request: 
Please think out loud as you take each action, for example, when you enter a 
query, click on something, or scroll a page. Briefly say why you did it and 
then tell me your reaction to the system?s response. I?m also interested in 
what?s good or bad, problems or insights, and anything confusing. 
During the training task, they were encouraged to think out loud, but due to the 
limited amount of time for each session, they were not specifically trained in the use 
of the think-aloud technique (Ericsson & Simon, 1984). Instead they were encouraged 
during the training tasks and prompted several times at the beginning of each search. 
Several of the participants responded well and provided useful concurrent reports. 
Others were more reticent during the search, but all subjects responded eagerly during 
the exit interview.  
 
 
167 
 
 
Three open-ended questions from the exit interview were chosen for a detailed 
quantitative analysis. These questions related directly to the research questions, and I 
expected that the responses would help identify the concepts and issues that were 
most salient to searchers as they reflected on their experience. The selected questions 
were: 
? Did the categorized overview change the way you searched? Can you 
describe an example? 
? Can you describe an example where the categorized overview [helped; OR 
hindered, frustrated or mislead ? whichever not indicated in previous 
question]? 
? Did you notice any difference in how you used the categorized overview each 
time? Can you describe an example? 
Notice that in the second question, the object was to elicit feedback on whichever 
aspect (positive or negative) the participant did not mention when answering the first 
question. 
 
Responses for each question were transcribed into an Access database table, and an 
inductive approach was used to develop and assign an initial code list. Although a 
qualitative data analysis tool such as NUD*IST or NVivo could have been used, the 
availability of Access and the limited nature of the analysis made Access preferable. 
Each response was reviewed, noting salient comments that appeared relevant to the 
research questions, and assigning a short label to sets of related comments. After 
 
 
168 
 
responses from 12 participants were transcribed and coded, the codes were reviewed 
and obvious duplicates were merged. These codes were divided into five groups: 
? Behavior differences 
? Cognitive and affective impacts 
? Judgments of outcomes 
? Facet usage 
? Miscellany 
The code also noted whether the comment reflected a positive or negative judgment 
by the participant (some comments were neutral or did not have a judgment element). 
Each response was then entered in the Access table. Before the remaining 12 
responses were coded, the code list was again reviewed. A second full pass was 
conducted to review the initial code assignments and assign a small number of new 
codes. This yielded a set of 64 codes (see Appendix E for complete list). The five 
code groups were used to organize the subsequent analysis, and individual code 
values were used to prompt consideration of specific behaviors, judgments, etc. I 
reviewed my notes, participant responses to the interview questions and the session 
recordings to analyze each code. 
 
Validity 
This analysis represents a principled approach answering to the research questions, 
drawing on the naturalistic inquiry paradigm (Guba & Lincoln, 1982). It 
complements the quantitative analysis, which seeks to identify commonalities across 
search experiences, by illuminating differences in search experiences. As a step 
 
 
169 
 
toward validating the analysis, this section has been peer-reviewed by a colleague in 
the College of Information Studies, Katy Newton Lawley.
2
 
 
The three interview questions required introspection and reflection. Introspection and 
reflection can allow the investigator to gain access to thoughts that are ?mediated by 
knowledge structures or artefacts that we design and use,? (Nielsen, Clemmensen, & 
Yssing, 2002) Categorized overviews are designed expressly to expose specific 
knowledge structures, thus this form of analysis is appropriate for examining 
responses to categorized overviews. Verbal reports, and retrospective reports in 
particular, are subject to known problems and limitations (Ericsson & Simon, 1984). 
Respondents may misremember a thought or action, or inadvertently use inferences 
instead of memory. The form of the verbal probe or even its emphasis can affect the 
information provided. Subjects were asked to report on aspects of their thoughts and 
actions that they did not necessarily attend to at the time of the interaction. Inevitably, 
respondents make judgments about past thoughts, decisions or actions that emphasize 
some and distort or overlook others. To minimize these problems, the questions were 
constructed to elicit specific examples and concrete details in conjunction with 
reflection/introspection. 
                                                 
2
 This section has benefited from Ms. Lawley?s critique and advice, but any errors or deficiencies in 
this section are, of course, the responsibility of the author. 
 
 
170 
 
5.10 Results 
These sophisticated users coping with challenging search tasks over a two hour period 
produced a wealth of data. The quantitative results provide a baseline for future 
studies while showing some differences in behavior and strong preferences. They do 
not show objective differences in outcomes. The qualitative data include thoughtful 
comments indicating many strengths and some weaknesses of the categorized 
overviews. 
5.10.1 Quantitative results 
5.10.1.1 Breadth of Topics 
The original intent of the analysis was to consider the pairs of broad (aging workforce 
and workplace allergies) and narrow (human smuggling and international art crime) 
topics as equivalent. This would have permitted a within-subject analysis, but subject 
comments during the first dozen sessions raised questions about the validity of this 
matched pair assumption. For the latter half of the sessions, a question was added that 
specifically asked participants to rate the breadth of the topics. Analysis of their 
responses using a one-way ANOVA indicated no significant differences. This 
confirmed that participants did not perceive topic breadth consistently (Figure 41). In 
the analysis, the Topic factor was therefore treated as a between-subjects variable 
with four levels: aging workforce (AW), human smuggling (HS), international art 
crime (IAC), and workplace allergies (WA). Where significant differences by topic 
were detected, the Camtasia video and observation notes were reviewed to investigate 
the differences. 
 
 
171 
 
 
Figure 41. Subject assessment of topic breadth (N=12). Participants did not perceive the breadth 
of the topics significantly differently. 
5.10.1.2 Original location of viewed (clicked-on) pages in search result list 
Searchers viewed (clicked on) a total of 924 pages from the search results. A 
histogram of the raw data reveals that the results are highly skewed (Figure 42). The 
log-transformed values do appear to be normally distributed, so they are used here. 
The results of a 2 (system) x 4(topic) factorial analysis indicated a significant 
difference by system F(1,919)=8.96, p<0.01 and by topic F(3,919)=5.73, p<0.01, and 
a marginally significant interaction between system and topic F(3,919)=2.19, p<0.10. 
The Normal Quantile-Quantile (Q-Q) plot shows that the residuals are moderately 
skewed, but not enough to invalidate the ANOVA results (Figure 43). Searchers 
viewed pages at a mean (median) depth of 28.4 (18) when using the categorized 
overview, whereas they viewed pages at a mean depth of 22.3 (12) with the baseline. 
The plot in Figure 45 shows modest but noticeable differences in the distribution of 
viewed pages of views. With the categorized overview, searchers viewed results from 
a broader portion of the result list. 
 
Broad 
 
 
 
 
Narrow 
 
 
172 
 
Tukey post-hoc tests on topic indicated that there were significant differences 
between two of the four topic pairs: IAC-AW and IAC-WA. The mean (median) 
depth of pages viewed for IAC was 20.2 (10), whereas the mean depths for AW and 
WA were 29.6 (19) and 27.0 (20), respectively. In general, searchers searched 
somewhat more deeply with the categorized overview than the baseline across all 
topics (Figure 46 and Figure 47). The exception was with HS, where the mean depth 
was the same between systems. The log data indicates that two of the subjects who 
used the baseline system for the HS topic viewed substantially more pages than the 
other subjects (22 and 18, versus an average of 9.4 for the other 22 subjects). Video 
of those two searches shows that both subjects scrolled far down the result list and 
selected several pages from deep in the list. One of these subjects indicated that she 
found the baseline interface easier to use. 
Original location
F
r
eque
ncy
0 20 40 60 80 100
0
100
200
300
40
0
log(Original location)
F
r
eque
ncy
012345
0
2
0
4
0
6
0
8
0
1
00
120
 
Figure 42. Histograms of a) original location of search result in list, and b) log(original location). 
 
 
173 
 
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
Normal Q-Q Plot
Theoretical Quantiles
S
a
m
p
le
 Q
u
a
n
t
il
e
s
 
Figure 43. Normal Quantile-Quantile plot of the residuals for the log(original location) model. 
Residuals are moderately skewed, but not enough to invalidate the ANOVA results. 
 
Figure 44. Original location of viewed pages in search results, a) by System
*
, and b) by Topic
+
 
(N=924). (Note: For all boxplots, the bold line in the middle of the box indicates the median; the 
upper and lower boundaries of the box indicate the first and third quartiles, and the whiskers 
extend 1.5 times the interquartile range from the box. For all figures, statistically significant 
differences, p<0.05, are marked with an asterisk in the caption, and marginally significant 
differences, p<0.10, are marked with a plus sign.) 
 
 
174 
 
 
 
Figure 45.  Percent of pages viewed by original location of page within search results, for each 
system. The interface displayed approximately 10 results per screen. The dashed line shows the 
initial screen break.  
 
Figure 46. Interaction plot of mean depth of viewed pages for System and Topic factors. Except 
for the human smuggling topic, searchers viewed pages more deeply using the Categorized 
overview system. The largest change between systems was for the workplace allergies topic. 
 
 
175 
 
 
Figure 47. For each topic, percent of pages viewed by original location of page within search 
results.  
 
5.10.1.3 Original location of collected pages in search result list 
Searchers collected 611 pages from the search results during their searches. As with 
the viewed pages, the raw data are skewed, but the log-transformed values appear to 
have a normal distribution (Figure 48). The results of a 2 (system) by 4 (topic) 
factorial analysis were not significant, although there was a marginally significant 
effect for topic F(3, 603) = 2.48, p<0.10. The mean (median) depth of collected pages 
was 26.1 (16) for both the baseline and categorized overview. The histograms in 
Figure 50 and the plot in Figure 50 show that the distribution of original locations is 
similar between the two systems. 
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
Baseline
Categorized overview
 
 
176 
 
 
Figure 48. Histograms of a) original location of collected pages, and b) log(original location). 
 
 
Figure 49. Original location of collected pages, a) by System, and b) by Topic
+
 (N=611). 
 
 
 
177 
 
 
Figure 50. Percent of pages collected by original location of page within search results. The 
interface displayed approximately 10 results per screen. The dashed line shows the initial screen 
break. 
 
The interaction diagram in Figure 46 shows that the largest change in mean depth of 
viewed pages between systems occurred with the workplace allergies topic. To see if 
this change was reflected in the pages they chose to collect, the original location of 
collected pages was computed for each topic. Figure 51 shows that for this topic 
participants collected more pages from the lower 40 locations in the result list and 
fewer from the top 30 locations.  
 
 
178 
 
  
Figure 51. For each topic, percent of pages collected by original location of page within search 
results.  
 
5.10.1.4 Proportion of pages collected from categorized facets 
Searchers collected a total of 679 pages, including pages that were not in any search 
results (e.g. pages found by following links). For each collected page a boolean 
variable (InAnyFacet) was computed indicating whether the page was found in any of 
the facets (topic, geographic, or government). The proportion of categorized pages 
differed significantly by System, ?
2
(1, N = 679) = 5.11, p < .05, and Topic, ?
2
(1, N = 
679) = 18.00, p < .001. The difference for the System factor is 7.5 percentage points, 
suggesting that the categorized overview biased searchers toward categorized pages. 
The percentage for the workplace allergies topic is substantially different from the 
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
Baseline
Categorized overview
Workplace allergies
Aging workforce 
Human smuggling
International art crime 
 
 
179 
 
other topics (12-17 points). This may be related to the relative ease of finding 
information about the workplace allergies topic, which several participants 
commented on. 
 
Table 23. Percent of collected pagesthat had been categorized, by System
*
. 
System Percent  
Categorized 
Baseline 75.4 % 
Categorized overview 82.7 % 
 
Table 24. Percent of collected pages that were categorized, by Topic
*
. 
Topic Percent  
Categorized 
Aging workforce 85.3 % 
Human smuggling 80.2 % 
International art crime 82.7 % 
Workplace allergies 68.5 % 
 
5.10.1.5 Number of queries issued during searches 
Searchers conducted a total of 96 searches. All subjects except one issued at most 10 
queries. One subject issued 15 queries during a search, and that outlier is removed 
from the following analysis. The raw data are skewed, but the log-transform values 
are somewhat more normally distributed (Figure 52). The results of a 2 (system) x 4 
(topic) factorial analysis indicated a significant difference by system F(1,87)=7.15, 
p<0.01 and by topic F(3,87)=3.63, p<0.05. The Normal Q-Q plot shows that the 
residuals are reasonably distributed (Figure 53). The mean (median) number of 
 
 
180 
 
queries per search was 3.0 (2) for the categorized overview system and 3.5 (3) for the 
baseline. Tukey post-hoc tests on topic indicated there were significant differences 
between the WA and IAC topics. The mean (median) number of queries per search 
was 3.3 (3) for the WA topic and 3.1 (3) for the IAC topic. The magnitudes of 
differences, although statistically significant, are modest. 
 
 
Figure 52. Histograms of a) queries per search and log(queries per search). 
 
 
 
181 
 
-2 -1 0 1 2
-
1
.5
-1
.
0
-
0
.5
0
.
0
0
.
5
1
.
0
1
.5
Normal Q-Q Plot
Theoretical Quantiles
Sa
m
p
le
 Q
u
a
n
t
i
le
s
 
Figure 53. Normal Q-Q plot of residuals for the number of queries per search. 
 
 
Figure 54. The number of queries per search, a) by System
*
, and b) by Topic
*
 (N=95).  
 
 
 
182 
 
 
Figure 55. Interaction plot of mean number of queries per search for System and Topic factors. 
 
5.10.1.6 Ease of exploration of search results 
The results of a 2 (system) x 4 (topic) factorial analysis showed a marginally 
significant difference by system F(1,88)=2.99, p<0.10 and no significant difference 
by topic. 
 
 
Figure 56.  Ease/difficulty (1=difficult, 9=easy) of exploring search results, a) by System
+
, and b) 
by Topic (N=96). 
 
 
183 
 
 
5.10.1.7 Got a good overview of the information available on the Web for topic 
The results of a 2 (system) x 4 (topic) factorial analysis showed no significant 
difference by system and a marginally significant difference by topic F(3,88)=2.57, 
p<0.10. 
 
 
Figure 57. Agreement that they got a good overview of the topic, a) by System, and b) by Topic
+
 
(N=96). 
 
5.10.1.8 Organization of search results 
The results of a 2 (system) x 4 (topic) factorial analysis indicated a significant 
difference by system F(1,88)=42.11, p< 0.001 and no significant difference by topic. 
The Normal Q-Q plot shows that the residuals are normally distributed. The mean 
agreement for the categorized overview system was 7.4, and the mean agreement for 
the baseline system was 4.9. The corresponding medians were 7 and 5. 
 
 
184 
 
 
Figure 58. Normal Q-Q plot of residuals for the organization of search results measure. 
 
 
Figure 59. Agreement that system organized results well, a) by System
*
, and b) by Topic (N=96). 
 
5.10.1.9 Agreement that system helped assess results and decide what to do next 
The results of a 2 (system) x 4 (topic) factorial analysis indicated a significant 
difference by system F(1,88)=13.63, p<0.001 and no significant difference by topic. 
The Normal Q-Q plot shows that the residuals are slightly skewed, but this is unlikely 
to affect the overall significance of the system difference. The mean agreement for 
the categorized overview system was 6.5, and the mean agreement for the baseline 
 
 
185 
 
system was 5.3. The corresponding medians were 7 and 5. 
 
-2 -1 0 1 2
-4
-3
-2
-
1
0
1
2
Normal Q-Q Plot
Theoretical Quantiles
S
a
m
p
le
 Q
u
a
n
t
i
le
s
 
Figure 60. The normal Q-Q plot shows a slightly skewed distribution of residuals. 
 
 
Figure 61. Agreement that interface helped assess results, a) by System
*
, and b) by Topic (N=96). 
 
5.10.1.10 Adjectives to describe system 
The 2 (system) x 4 (topic) factorial analysis for each of the eight system adjectives 
(semantic differentials) indicated no significant differences by topic for any measure, 
 
 
186 
 
but three measures showed significant differences by system: Terrible/wonderful 
F(1,88)=7.05, p<0.01; dull/stimulating F(1,88)=13.73, p< 0.001; and 
disorganized/organized F(1,88)=45.7, p<0.001. The analysis indicated marginally 
significant differences by system for the frustrating/satisfying measure F(1,88)=3.03, 
p<0.10.  
 
Figure 62.  Adjectives by System. 
 
5.10.1.11 Familiarity with topic 
The results of a 2 (system) x 4 (topic) factorial analysis indicated no significant 
difference by system and a significant difference by topic F(3,88)=5.71, p<0.01. The 
normal Q-Q plot shows that the residuals are reasonably distributed. The mean 
change in familiarity was 3, that is, searchers rated their familiarity 3 points higher on 
a 9 point scale after the search. Tukey post-hoc tests on topic indicated significant 
 
 
187 
 
differences between the workplace allergies topic and the other three topics. The 
mean changes for AW, HS, IAC, and AW were 2.5, 2.1, 2.3, and 4.2, respectively. 
This is consistent with comments by several participants, who commented on the 
relative ease of finding information about the workplace allergies topic. 
 
-2 -1 0 1 2
-4
-
2
0
2
4
Normal Q-Q Plot
Theoretical Quantiles
S
a
mple
 Q
uant
il
es
 
Figure 63. The normal Q-Q plot shows a normal distribution of residuals, indicating a good fit 
for the model. 
 
 
Figure 64. Change in familiarity after search, a) by System, and b) by Topic
*
 (N=96). 
 
 
 
188 
 
5.10.1.12 Finding useful information 
The results of a 2 (system) x 4 (topic) factorial analysis showed no significant 
difference due to either factor.  
 
Figure 65. Useful information responses, a) by System, and b) by Topic (N=96). 
 
5.10.1.13 Progress toward scenario goal 
The results of a 2 (system) x 4 (topic) factorial analysis showed no significant 
difference due to either factor.  
 
Figure 66. Progress toward scenario goal, a) by System, and b) by Topic (N=96). 
 
 
 
189 
 
5.10.1.14 Idea quality 
Searchers generated a total of 679 ideas. Idea quality was generally low, at least in 
part because of the time limit, which several participants commented on. Although a 
nine-point scale was used (1 = poor, 9 = excellent), the highest rating assigned was 5. 
A Wilcoxon rank sum test did not detect a significant difference in Idea Quality by 
System. A Kruskal-Wallis test detected a marginally significant difference by Topic, 
p<0.10. 
 
Figure 67. Distribution of idea quality ratings, a) by System, and b) by Topic
+
 (N=679; idea 
rating 1 = poor, 9 = excellent). 
 
The mean idea ratings were computed for the four non-journalism students and 
compared to the journalism students. They were almost equivalent. The mean idea 
rating was 2.27 for journalism students and 2.29 for non-journalism students. 
5.10.1.15 Category use 
Searchers used an average of 5.4 categories per search with the categorized overview. 
Most of the selected categories were used only once or twice, which suggests strong 
variation in individual assessments of the utility of each category, except for a few 
0
10
20
30
40
50
60
70
12345
Idea rating
Pe
r
c
e
n
t
Baseline Categorized overview
0
10
20
30
40
50
60
70
12345
Idea rating
Pe
r
c
e
n
t
AW HS IAC WA
 
 
190 
 
highly used categories. Of the 259 instances of category use, 68 were for categories 
only selected once, and 44 were for categories selected twice. The most popular 
categories are shown in Table 25.  
 
Table 25. Top 3 categories used for each topic. 
Topic Category Distinct users Absolute use 
/Health 7 7 
/Home 5 5 
/Business 4 (tie) 5 
AW 
 
/Society 4 (tie)  
   
/Society 6 7 
/North America 5 6 
HS 
/News 3 5 
    
/Arts 9 10 
/News 6 10 
IAC 
/Reference 6 8 
    
/Health 9 12 
/Business 6 7 
WA 
 
/Society 6 6 
 
5.10.1.16 Preferred system for task types 
During the exit interview for the last 12 sessions, participants were asked which 
system they would rather use for four new tasks. They could respond with a system or 
say no preference. The sequence of the four tasks was randomized. The responses 
suggest that searchers would prefer a categorized overview for the comparison and 
 
 
191 
 
exploratory task. They would prefer the baseline system for the known item task, and 
were evenly divided for the simple informational task (Table 26). 
 
Table 26. System preferences for known item, simple informational, comparison, and 
exploratory tasks. 
 Preferred system 
Task (type) Baseline No 
preference 
Categorized 
overview 
a) Find the home page for the daily 
newspaper in Concord, NH, The 
Concord Monitor. (known item) 
7 3 2 
b) Find information on caring for a pet 
gerbil. (simple informational) 
4 4 4 
c) Start looking for information to help 
you select and buy a new digital camera. 
(comparison) 
3 2 7 
d) Learn about U.S. business investment 
in Africa. (exploratory) 
3 0 9 
 
5.10.1.17 Understanding of selected categories 
Early in the sessions, I observed that participants had particular difficulty 
understanding the Open Directory?s top-level Computers category during the training 
task. There seemed to be a discrepancy between what they expected to see under that 
category and the actual results. This prompted the addition of a question to the exit 
interview, starting with twelfth session, asking participants to describe what they 
would expect to find in three selected categories in the context of a search for 
?leonardo da vinci.? They were shown the search results and asked to answer without 
using the mouse. After they answered, the pointer was placed over the Computers 
 
 
192 
 
category, which showed the subcategories in a small pop-up (Figure 68). They were 
then asked to answer the question again for the Computer category. Answers were 
informally evaluated to determine whether it was correct, partially correct, or 
incorrect/did not know and tabulated (Table 27). The Computers category was clearly 
problematic. A review of the pages in this category suggests that many pages are 
placed here because they are in computer-related web-sites. For example, Wikipedia 
pages are categorized in the /Computers/Open Source/Open 
Content/Encyclopedias/Wikipedia category (as well as others). This is an example of 
the ODP combining two kinds of relationships (is-a and about) and of minor 
problems that can be cause by limiting the depth of the hierarchy. These issues are 
discussed in section 0. 
 
Figure 68. For the query ?leonardo da vinci?, placing the pointer over the top-level category 
Computer opened a small pop-up window with the five populated subcategories. 
 
 
193 
 
 
Table 27. Accuracy of participant understanding for selected categories (Kids and Teens, 
Reference, and Computers). 
Category Correct Partial Incorrect/ 
Did not know 
Kids and Teens 13 5 0 
Reference 11 5 2 
Computers ? without pop-ups 3 2 12 
Computers ? with pop-ups 9 6 3 
 
5.10.2 Qualitative results 
The relatively long (2 hours) study time enabled participants to consider their tactics 
and produced novel insights into the search processes of sophisticated searchers 
coping with challenging tasks. The qualitative results are organized into five sub-
sections: Behavioral differences, cognitive and affective impacts, judgments of 
outcome, facet usage, and miscellany. These sub-sections include observations, 
quotes and comparisons between participants to highlight differences that the 
quantitative results do not capture. 
5.10.2.1 Behavioral impacts 
In confirmation of expectations, participants indicated that they used the overviews to 
filter, narrow, refine and explore their results. One participant was particularly 
effusive about the ease of narrowing her results, appreciating the immediacy of the 
interaction and commenting on two aspects of relevance that the overview enhanced 
for her. 
 
 
194 
 
 
I loved it. I was in love with that. I wish Google had that?That really helps if 
you can narrow it down by geography or if you?re really looking for a 
credible source and you wish to go for government. The government sources 
are right there. It?s just one click of the button and you have your government 
source?With 3 clicks you have 5 pieces of information. That's all you need to 
look through. (Participant 220) 
 
Participants were observed reading contents of the subcategory pop-up windows, 
which provided a form of query preview (Tanin, Plaisant, & Shneiderman, 2000), 
before clicking on that category or moving the pointer to a different category. Some 
commented during their searches and in a separate exit interview question on how 
they used the list of subcategories in the pop-up window to help decide whether to 
explore specific categories.  
 
Two participants felt that they used fewer queries and five felt that their queries were 
more general when they used the categorized overview, but they had varying 
reactions to these changes. The average query length did differ slightly between 
systems for the aging workforce and workplace allergies topics, although not for the 
other two topics (Table 28). Most participants apparently considered this reduction in 
work a positive effect. 
 
 
 
195 
 
I knew that if I did a broader word it could be divided by the categories; I 
didn't necessarily have to be so specific. (222) 
 
Rather than narrow down my search by adding additional search words I 
found myself narrowing my search by exploring categories and subcategories. 
(211) 
 
Table 28. Mean (SD) query length by topic and system. 
Topic System Mean (SD)  
Word Count 
Baseline 3.35 (1.15) Aging workforce 
Categorized overview 2.96 (1.18) 
Baseline 2.65 (0.78) Human smuggling 
Categorized overview 2.60 (0.77) 
Baseline 3.57 (1.04) International art crime
Categorized overview 3.60 (1.05) 
Baseline 2.96 (1.40) Workplace allergies 
Categorized overview 2.77 (1.57) 
 
Two participants expressed reservations about the change in their tactics: 
 
I didn't use as many queries, which is part of the reason why I didn't get as 
good information. (216) 
 
Maybe it made me a little bit lazy. But I felt like I had to do less because it 
would do more?.it didn't take as much from me because they were gonna sort 
 
 
196 
 
through them and organize them for me.. I guess I changed by doing less. 
(213) 
 
Two participants indicated that they adopted a tactic wherein they looked at the top of 
the search results first, then looked at the overview. One participant commented that 
it provided, ?sort of a search within a search. That was very cool? (203). Six 
participants indicated that they used the overviews more on their second search and 
two felt they used it less. Of the two who used it less, one attributed this to 
encountering a very useful hub page (a page with many links). The other participant 
felt that the overview did not help and opted to use more queries instead. 
 
Table 29.  The 6 behavioral codes. Plus signs indicate that participants considered this a positive 
aspect. Negative signs indicate they considered it a negative aspect of their interaction. Neutral 
or mixed opinions are indicated by a 0. The count is the number of participants who made this 
type of comment. 
Description +/-/0 Count 
Overview helped to filter or narrow list + 7 
Issued more general queries 0 5 
Issued fewer queries 0 2 
Ping-ponged ? alternated between using the overview and the list 0 2 
Explore ? used the overview to explore the results + 1 
Used the overview to refine search + 1 
 
5.10.2.2 Cognitive and affective impacts 
Thirty-four comments related to cognitive or affective impacts were gathered. The 
placement of pages within categories generated numerous comments. Eight 
 
 
197 
 
participants commented on pages that did not belong within a category at all, judging 
them as incorrectly categorized, whereas eight people indicated that they found 
unexpected pages in a category. This persisted even though the instructions 
emphasized that it was typically the web sites that were categorized, not the specific 
web pages. The prevalence of these concerns suggests that searchers may not 
remember the nature of the relationship entailed by category membership. 
 
I wasn't exactly sure what I thought Shopping would be but I didn?t think it 
was going to be here is where you can buy things like mold 
remover...whatever I thought it wasn't a web site where you can go shopping. 
(202) 
 
One participant particularly noted this problem in the geographic facet. 
 
In the human smuggling one, because that one has a lot to do with geography 
but I noticed that in the geography sections you'd click on Europe but it 
wouldn't be about Europe, it'd be like.. like I said, companies based in Europe 
talking about human smuggling anywhere, you know? It wasn't always exactly 
what you'd think it would be... yeah, it could be a BBC story talking about 
something in Asia but it still categorized as Europe? It would be hard to fix 
that? I don't think it was a big problem, you just have to know that something 
could sort of have a double meaning like a geographic location. (208) 
 
 
 
198 
 
Seven participants commented on the structure or organization of a facet as being 
confusing or non-intuitive. 
 
Personal Finance under home I guess that makes sense but it?s not something 
I would go to intuitively. I might have gone to...Business if I was looking at 
finance, but business is more like the corporate world and home would be 
your personal world, so after viewing it I can see the logic but it wouldn't 
have been there for me initially. (203) 
 
Why did they put News and Media under Computers? Publications under 
Shopping? (215) 
 
Five participants commented on confusing categories; two people felt that the topical 
categories were too general and one person felt that they were ambiguous. 
Interestingly, for all of the above codes, about half of the respondents indicated that 
the problems were minor or not a hindrance; perhaps they were able to quickly 
compensate for this variability in a manner similar to which searchers compensate for 
other breakdowns on the Web. A review of two of the session videos seems to 
indicate that those who reported experiencing such problems would quickly continue 
their search in the face of typical Web errors. One person indicated that he 
specifically did not go to one page because it did not fall in the category he expected. 
 
 
 
199 
 
I was shocked at the category that it was under, and I didn't pursue it but, and 
I can't remember the specific... seemed like it was very strange that it would 
be under that category? I'm not going to that site. [laughs] I just kept 
moving, which is probably not the best thing to do because it might be worth 
investigating but that?s what I did. (203) 
 
One person was concerned that she might have missed useful pages in categories she 
did not explore. 
 
Some categories I didn't even look at, and there might have been something 
useful there. ?Cause? I mean?I guess? I really don't know what the person 
was thinking when they categorized it. So? I mean? They might have been 
thinking about something that, like, never occurred to me but that is perfectly 
relevant so I feel like that might have, uh,  hidden some information from me. 
(223) 
 
Four people commented that the categories helped generate ideas.  
 
I was just looking for general information about the aging workforce but on 
the side it gave like me what the government was doing about it and I was like 
?oh, that?s a good idea to look for,? and like social issues about it? 
 
Two people commented that they used the overview when they were stuck.  
 
 
200 
 
 
When I was stuck on something I could start a new search pretty much 
because I could go in there and click on a new topic and then go see 
everything that they listed. (214) 
 
? like allergies in the workplace. It was tougher to find varying things so I 
used the categories more when I was kind of stuck. (201) 
 
Three people felt that the categories exposed them to different aspects of the topic. 
 
I think it kind of opened up my mind a little bit to investigate a little bit 
deeper. Without the categories I just saw a list and I just had this mentality 
that I didn't want to go ahead and search through all of them but the 
categories made me think of different possibilities so I was more opted [sic] to 
search through a variety of different pages versus just looking for specific 
factors. (204) 
 
It definitely changed the way I searched, probably for the better for something 
like this because it made me look at a wide range of categories. (210) 
 
Another participant said that the overview provoked an illuminating question. 
 
 
 
201 
 
For the art crimes one, when I clicked on, I saw science and it was just, "What 
does that have to do with art crimes?" So that made me click on it and I found 
out that science can help solve art crimes. So that was something that I 
probably wouldn't have picked up on if that subcategory hadn't been there. 
(225) 
 
One person indicated that she used the overview to get an overall sense of how results 
were distributed within or across top-level categories. 
 
It also changed how I originally took in the results rather than reading the 
titles and descriptions.  I looked to see how they were divided up, what main 
categories there where, because I thought it would be faster way to see what I 
had in front of me especially for this particular task where I'm looking for 
different angles within a larger topic I wanted to see,? well, there?s a social 
issue and a health issue and a business issue,? so that lends itself very well to 
that. (211) 
 
 
202 
 
 
Table 30.  The 34 cognitive and affective codes. 
Description +/-/0 Count 
Incorrectly categorized ? Subject considered the page to be in 
the wrong category 
- 8 
Unexpected pages in category ? Subjects did not initially 
expect the pages they found within that category, although 
they did not consider it incorrect 
- 8 
Classification structure undesirable or confusing - 7 
Confusing categories - 5 
Generated ideas + 4 
Takes experience - 3 
Overwhelming - 
More complex - 2 
Indicated frustration - 2 
Pages appeared In multiple categories - 2 
Subject had topic in mind 0 2 
Overview helped organize results better + 2 
Categories suggested idea + 2 
Used overview when stuck + 2 
Experience was less overwhelming + 2 
Felt more comfortable 2
nd
 time 0 2 
Categories too general - 2 
Exposed searcher to different aspects of topic + 2 
Concern that they might miss something - 2 
Ambiguous categories - 1 
Misleading - 
Provoked a question + 1 
Distraction - 
Many uncategorized results - 1 
Difficult to change search style 0 1 
 
 
203 
 
Confusing interface - 1 
Less confusing + 1 
Was more cautious using overview - 1 
Was more careful using overview - 1 
Human editors cataloged pages - 1 
Idea of where pages fit in categories + 1 
Overview made subject look at wide range of categories + 1 
Showed how pages were distributed across categories + 1 
Did less work 0 1 
 
5.10.2.3 Judgments of outcomes 
Participant comments included judgments on the outcomes of their searches when 
using the categorized overview. During their responses to the questions, ten 
participants indicated that the categorized overview was helpful. Three felt it was 
unhelpful and one commented that it was mixed overall. Eight participants 
commented that the problems they encountered were minor or did not hinder their 
search. They typically also described their rationale for this assessment. The first 
comment here illustrates one line of reasoning. 
 
[It wasn?t helpful for] Amazon.com. But, like you said, it didn't really frustrate 
me, it just, I just had to keep in my mind that it?s human-generated. So it?s not 
the web site?s fault that it?s there, its just somebody categorized Amazon.com 
as shopping, or say they considered it computers cause its an internet web 
site. It's not their fault that I clicked on it when that web site is just 
categorized as that, so it?s okay. (220) 
 
 
 
204 
 
Did it hinder searching at all? I would say generally no because I would go to 
the results here [indicates the list] first and then use this [indicates overview] 
as sort of a backup to reorder or filter again sort of thing. So it?s a helpful 
tool. (203) 
 
One participant observed that a new query would generate more results. 
 
With that whole legislation thing, I looked under US Government and I didn't 
find anything so I realized that, ?Oh, maybe it is a little bit more specific,? so 
then I just did a whole new search for it?. I got a lot more when I actually did 
a separate search than when I just clicked on US Government and expected 
more stuff to be there? (206) 
 
One participant attributed his assessment of poorer results to the fact that he issued 
fewer queries with the categorized overview and followed unhelpful links. Another 
participant felt she got sidetracked because of the overview. 
 
I didn't use as many queries which is part of the reason why I didn't get as 
good information.. It led me down paths I didn't need to go down, because of 
the links on the side. (216) 
 
 
205 
 
 
Table 31.  The 9 judgment codes. 
Description +/-/0 Count 
Problems were not a hindrance + 4 
Problems were a minor hindrance + 4 
Saw something that wouldn?t have been seen 
otherwise 
+ 4 
Search went faster + 3 
Search went slower - 1 
Got more results from a new query - 1 
Search was more efficient + 1 
Found poorer quality information - 1 
Got side-tracked - 1 
 
5.10.2.4 Facet and category usage 
All participants commented on aspects of their use of the topic facet. Several 
commented on use of government and geographic facet use. Participants found that 
these facets helped narrow results and focus their search in ways that the topic facet 
did not. 
 
That really helps if you can narrow it down by geography, or if you?re really 
looking for a credible source and you wish to go for government. The 
government sources are right there. Its just one click of the button and you 
have your government source. It's easier to cite it. You don't go looking for ? 
like with Google ? you'd go through what the US government has to say about 
 
 
206 
 
workplace allergies. Here, it?s in front of you, you know, Dept of Health and 
Labor. (220) 
 
I like the government sites at the bottom, because I tended to look at 
government sites first. (224) 
 
When I was doing the smuggling thing I focused more on the geography, 
because ? human smuggling clearly is a social issue, an economic issue, well 
that's obvious, but then it?s like, where is it in the world, so I looked under 
geography (206) 
 
I was getting a lot of stuff about the US, so I clicked on Europe and it gave me 
stuff about the UK. (207) 
 
As with the topic categories, participant comments indicated minor problems with the 
categorization, or their interpretation of the categorization rules. In this quote, the 
participant was evidently confused about what pages would be placed in the US 
government categories. 
 
I think even though certain things are categorized under certain topics...things 
under US government might just mention US government. It might not be an 
actual government page. (207) 
 
 
 
207 
 
Table 32. Mentions of geographic or government category use. 
Description +/-/0 Count 
Used geographic facet 0 7 
Used government facet 0 4 
 
5.10.2.5 Miscellaneous 
Three participants commented that the topic had an effect on how much they used the 
overview. 
 
I definitely used it more the second one because... It was also a tougher, 
tougher thing to find, like allergies in the workplace. It was tougher to find 
varying things so I used the categories more when I was kind of stuck. (201) 
 
The second topic was more conducive to that kind of thing because the 
workplace allergies sorted so well.. .there's health issues, there's business 
issues there's government issues. It worked really well with those categories, a 
little better than the human smuggling one because that [topic] doesn't fit well 
into like health or computers as the other one. It's a little bit more narrow 
probably that's why? [Workplace allergies] is more broad so it fits into the 
categories a little bit more, except that it doesn't fit into all of them. 
 
 
208 
 
5.11 Discussion 
5.11.1 Topic and task efficacy 
The four topics used for the searches were intended to be matched pairs (two broad 
and two narrow) for the Topic Type factor. It became clear during the study that they 
were not well matched. In hindsight, the procedure used to select the topics was not 
sufficient to ensure a match. The evaluation of the candidate topics during the pilot 
test was not rigorous enough, in part because the broad/narrow concept was not 
operationally defined in a way that permitted an objective assessment of topic 
breadth. The lack of a clear definition also hindered the construction of the topics 
because there were no guiding criteria, and the resulting pairs of tasks were 
differentiated more in terms of difficulty than breadth. The two topics drawn from the 
TREC Robust track (international art crime and human smuggling) were generally 
perceived as more difficult than the other two (aging workforce and workplace 
allergies), although not universally. Participants varied in how they interpreted the 
topics, and some participants had knowledge that caused them to consider a topic 
easy. The exploratory nature of the task encouraged participants to apply their own 
experience and knowledge, and this amplified the variations. These factors 
contributed to the unmatched nature of the topics, and necessitated changes in the 
quantitative analysis. What was to have been analyzed as a 2 level, within-subjects 
Topic Type factor had to be analyzed as a 4-level, between groups Topic factor 
instead, and the Topic Type hypotheses (all the ?B? hypotheses) could not be tested. 
The variability may also have reduced measured differences by System in dependant 
variables. 
 
 
209 
 
 
The combination of positive participant response to the interface with no differences 
in outcomes could indicate that the specific task was less dependent on gaining an 
overview than originally anticipated. Most participants appreciated and used the 
overview, but when that wasn?t available, scanning the result lists and reformulating 
queries were reasonably effective tactics for generating article ideas to satisfy the 
assigned task. The quality ratings for all ideas were generally low and there were not 
noticeable differences between systems.  
5.11.2 Differences in search behavior 
The quantitative and qualitative data indicate that the overviews did change searcher 
behavior in several ways. The log data showed that participants explored significantly 
more deeply within the result list. This supports hypothesis H1a and is consistent 
with, but more modest than, previous studies (K?ki, 2005). Overall, they did not 
collect pages significantly more deeply with the categorized overview. The mean 
depths of collected pages were the same for both conditions (as were the medians). 
Thus hypothesis H2a was not supported. Overall, for the given task and topics, the 
categorized overview did not have a significant effect. For the aging workforce topic, 
however, which showed the largest difference in depth of viewed links between 
systems, participants using the categorized overview did collect links from deeper in 
the result list. This could suggest that when they did explore deeper in the results, 
they did find useful pages more deeply.  
 
 
 
210 
 
With the categorized overview, participants did collect slightly more pages that were 
categorized (i.e., they collected fewer uncategorized pages), supporting hypothesis 
H3a. Thus the categorized overview biased participants toward pages that were found 
in at least one category. Whether this bias is positive or negative depends on the 
context of search, the number of uncategorized pages, the value of the uncategorized 
pages, and the negative impact of not viewing the uncategorized pages. A few 
participants were concerned that they might overlook something by using the 
categories. This implies that to minimize undesirable impacts, searchers should 
understand when they are limiting their search to categorized results, whether it is 
important for them to view uncategorized results, and how to do so. This suggests a 
need for better training and/or clearer indications to searchers that their results are 
being filtered. Participants did not always appear to comprehend this distinction. This 
finding has been incorporated into the design principles. 
 
The participants issued fewer queries with the categorized overview, supporting 
hypothesis H4a. The categorized overview appeared to provide cues, similar to the 
notion of ?information scent? (Pirolli & Card, 1995; Pirolli & Card, 1999), that 
induced participants to click on categories instead of refining their query. This is 
supported by the participant comments. Participants commented on submitting more 
general queries and then using the categories to explore or narrow their results. They 
did issue somewhat shorter queries for two topics (aging workforce and workplace 
allergies). It is possible that there was a confounding factor: Issuing a new query took 
substantially longer than simply exploring a category, which could have induced 
 
 
211 
 
participants to avoid query refinement. Participants were alerted in advance that there 
could be delays, and I asked them to ?be patient and search as you normally would.? 
None of the participants appeared to indicate (verbally or non-verbally) impatience at 
the delays incurred by the search or a reluctance to refine their query due to the time. 
Thus, the query time is unlikely to have been a confounding factor.  
 
Participants clearly appreciated the categorized overview. Search engine operators 
might also benefit if they can serve the same number of searchers with fewer queries. 
Processing fewer transactions with larger result sets could be a desirable engineering 
trade-off. This would require that the client be able to receive the entire result set at 
once and then allow the searcher to interact with it. The SERVICE prototype 
currently implements the filtering logic on the web server, but with the use of client-
side technology currently available (JavaScript, Ajax, etc.) it is feasible to implement 
the entire UI on the browser. Because the amount of data being transmitted in a set of 
search results is modest (100-150k for 100 results from Google), this could be 
accomplished in a single HTTP request per query without substantial delay. Even a 
small reduction in queries per search could be beneficial for high-volume search 
services. This could also improve interactive performance from the searcher 
perspective. 
 
During the interview, participants commented that they changed their tactics to utilize 
the overview. One participant commented that he skipped a page because it was in an 
 
 
212 
 
unexpected category. Section 5.11.5, Differences in searcher thinking, discusses these 
changes. 
5.11.3 Cognitive impact of categorized overviews 
The overviews provided an alternative perspective on the search results that 
participants found helpful. In some cases the benefit derived from a reduction in 
work, for example by replacing a query refinement step with a single click. The query 
log data corroborate participant assessments of this effect. The subcategory pop-up 
windows provided contextual information and formed a query preview that helped 
searchers decide whether to explore a category. In other cases the participants 
concluded that the overviews suggested an idea or question or exposed them to a 
concept they would not have otherwise seen. They were speculating, of course, but 
they considered it a positive contribution to their search experience.  
 
One of the most common complaints concerned the assignment of pages to 
categories. When page categorizations did not match participant expectations, they 
experienced frustration, confusion and doubt. The ODP classification generally 
captures what web sites are about, i.e. the topic. Fourteen of its 16 top-level 
categories are primarily topical, which was the rationale for using them to construct 
the Topic facet. However, three factors clearly reduced the categorization accuracy 
from the participants? perspective: encoding different relationships within the same 
facet, ambiguous categories, and the hierarchical structure of categories. 
 
 
 
213 
 
Pages from the British Broadcasting Corporation (BBC), for example, were 
categorized under /Category/Arts/Television, which is closer to encoding an is-a 
relationship than an about relationship. (It could more accurately be construed as the 
hosting organization, producer, host or even author relationship.) Thus, when a BBC 
web page about a human smuggling story was found under Television, it was 
puzzling to many participants. It did not match their expectations. 
 
Participants also commented on the generality or ambiguity of the categories, 
particularly the topical categories. This could be attributed, at least in part, to the 
limited depth of the hierarchy that was used in the categorized overview. The depth of 
each facet was limited to two levels due to performance issues with the specific 
implementation. The ODP-assigned categories were frequently four or more levels 
deep. For example, the BBC web page mentioned earlier was assigned to categories 
in the topical and regional facets (Table 33). Truncating the categories to two levels 
removed useful contextual information. The end result was a more general category. 
This could also have contributed to the perception that pages were incorrectly 
categorized. 
 
An alternative approach could have preserved the contextual information in the 
overview by promoting the lower level categories, thereby flattening the hierarchy. 
This approach could work well with larger displays, but is problematic in the limited 
space available for the overview. In early tests, this multiplied the number of second 
 
 
214 
 
level categories unacceptably; there were too many to fit on one screen in the 
overview.  
 
Table 33. A BBC web page on human smuggling was categorized into eight categories in two 
facets, most of which were at least four levels deep. Truncating the categories to two levels 
removed useful contextual information. 
 Original categories Two-level category 
Topical 
Facet 
 
? /Arts/Television/Networks/Cable/BBC 
? /Arts/Television/Networks/Europe 
? /Arts/Art History/Artists/D/Da Vinci, 
Leonardo 
? /Science/Educational Resources 
? /Arts/Television 
? /Arts/Art History 
? /Science/Educatio
nal Resources 
Regional 
Facet 
 
? /Europe/United Kingdom/Government 
/Culture, Media and 
Sport/Broadcasting 
? /Europe/United Kingdom/News and 
Media 
? /Europe/United Kingdom/Science and 
Environment/News and Media 
? /Europe/United Kingdom/Guides and 
Directories/Search Engines 
? /Europe/United 
Kingdom 
 
 
The category structure was sometimes problematic. Some participants did not initially 
expect to find the Television category under Arts, for example, and found this 
troubling. The example in Table 33 illustrates how the ODP uses the descriptor 
?News and Media? in two separate entries. A more rigorous approach to facet 
analysis (Soergel, 1974) could yield more nearly orthogonal facets and identify 
additional facets. This might yield substantial improvements in the perceived 
accuracy of the category assignments. A lightweight tool could allow experienced 
 
 
215 
 
indexers or ?power searchers? with expertise in specific domains to customize the 
category structure, quickly edit hierarchies, splitting, merging, promoting, or hiding 
categories. 
 
The occasional problems with category structure partly reflect the tension between 
ideals and practice in a classification that is used as a boundary object between 
communities of practice (Bowker & Starr, 1999). The ODP categories are used by 
50,000 editors to catalog web sites. These editors have varying backgrounds, 
motivations and interpretations of category meaning. Moreover, the application of the 
ODP to organizing search results changes the context within which the categories are 
interpreted. The original context, within which the editors operated when cataloging, 
was a browseable directory of web sites. The primary concept was the classification 
structure. When used to organize search results, however, searchers have a different 
conceptualization of the context, in which the search goal and immediate task are 
primary, and the classification structure is secondary. 
 
Fortunately, participants indicated that these problems were minor. Their comments 
on difficulties indicated that their problems were with details of the categorization 
and that they managed these by relying on the stability of the overall categorization 
scheme. They commented on being more familiar and comfortable with the categories 
and having a more accurate understanding of the categorization scheme by the second 
categorized overview task. This could be a benefit when compared to automatically 
 
 
216 
 
clustered or dynamically generated categories, which will differ for each set of search 
results. 
 
The satisfaction data support this interpretation of the experimental results. Although 
the positive results for hypothesis H5a (searchers will find it easier to explore search 
results) were only marginally significant, participants agreed that the categorized 
overview organized the results well and helped them assess their results and decide 
what to do next, partially supporting hypothesis H8a. Hypothesis H6a (searchers will 
agree more strongly that the system provided a good overview of the information 
available on the Web) was not supported. It is likely that the topic breadth and 
difficulty contributed to the variability of this measure. Participants also found the 
categorized overview more generally appealing (?wonderful?) and stimulating, 
supporting hypotheses H10a and H11a. There was no significant difference in ease of 
use ratings, so hypothesis H9a was not supported. The satisfaction ratings, which 
favored the categorized overview, were marginally significant. This suggests some 
support for hypothesis H12a. Participant satisfaction depended on many factors, 
including the information available as well as the search interface, so it is possible 
that participants? assessment of their modest progress and generally poor quality 
results (which many commented on) reduced their overall satisfaction. 
 
There was no significant difference in the overwhelming/manageable or 
complex/simple measures, although some participants commented on the overview 
being more complex or overwhelming. This lack of support for hypothesis H13a 
 
 
217 
 
(searchers will rate the categorized overview more complex) is good news, because it 
suggests that the categorized overviews were not, in general, perceived as 
substantially more complex. This does not mean that complexity effects should be 
ignored. Indeed, one participant specifically asked if he could hide the overview 
because it was distracting. But for this task there were clear benefits to most 
participants. During the exit interview, the 12 participants who were asked which 
interface they would rather use for a range of fact finding to exploratory tasks, they 
indicated a preference for the baseline interface for fact finding and the categorized 
overview for the exploratory tasks. These results reinforce the value of providing 
searchers additional control over their search (Greene, Marchionini, Plaisant, & 
Shneiderman, 2000; Koenemann & Belkin, 1996; Shneiderman, Byrd, & Croft, 
1998), including whether to include display or hide the categorized overview. 
5.11.4 Differences by breadth of topic 
Although the pilot testing suggested the topics were matched in terms of breadth, it 
became apparent during the experimental sessions that participants had highly varied 
interpretations of the breadth of the topic. Their knowledge of the topic, attitudes 
toward it, their response to specific web pages, all contributed to their perception of 
the breadth of the topic. More importantly, their perception of the topic difficulty 
varied widely, for similar reasons. These two factors clearly affected their assessment 
of their progress toward the scenario goal, along with the limited search time. 
Participants commented on these issues during the sessions and afterwards. In 
hindsight, the study would have benefited from a more rigorous and defined 
development of topic breadth and difficulty (Bell & Ruthven, 2004). This could have 
 
 
218 
 
contributed to more reliably measurable effects. Nevertheless, the varied individual 
interpretations of the topic provided useful data for the qualitative analysis 
contributing to the other research questions. 
 
An additional post hoc analysis could be performed to stratify the cases by the 
perceived breadth or the topic difficulty. A two-factor analysis with system and 
perceived topic breadth as factors might show differences by topic breadth that the 
above analysis did not. Because of the small size of this data set, the likelihood of 
needing to use a less powerful non-parametric analysis technique, and the inherent 
variability of the data, it is unlikely that any differences would be significant. 
However, future studies could take this into consideration. 
5.11.5 Differences in searcher thinking about search tactics 
Participants commented on many interesting effects that the categorized overviews 
had on their thinking during searches. They confirmed expectations that they would 
change their tactics to utilize the overview. Some used it before looking at the result 
list, whereas others used it in an ancillary or backup role, for example, when they felt 
?stuck.? Participants used the categorized overview to understand the distribution of 
the pages across categories. They also used the categories to confirm interest in a 
particular page seen in the result list. They used the query preview capability 
provided by the subcategory pop-up window to predict what would be in the category 
and help decide whether to view the results within that category. In these cases, they 
appreciated the categorized overviews, and several commented on feeling more 
efficient. 
 
 
219 
 
  
Several participants spoke of the difficulty of changing established search tactics. In 
the time allotted, some searchers changed their tactics rapidly, whereas others only 
started to change. Two participants did not appear to change their tactics at all during 
the session. Rather than use the overview to help guide their idea generation, they 
thought of specific ideas, and then searched for them. Sometimes they would simply 
issue queries specific to that idea, ignoring the overview. At other times, they would 
use the overview to filter the results to pages that were related to the desired topic. 
  
Participants often thought that they used the overview more during the second 
categorized overview search. They actually clicked on categories slightly more during 
their first categorized overview search (132 vs. 127), but they appeared to be 
exploring the interface and probing categories. Some participants specifically said 
that?s what they were doing, and comments like ?let?s see what this is? were frequent. 
By the second categorized overview search, it appeared that most participants were 
taking advantage of the overview, although many were still exploring the categories 
and revising their search tactics. The robustness with which participants responded to 
the problems discussed in previous sections also suggests that they quickly began to 
adapt their search tactics to take advantage of the categorized overview while 
compensating for its weaknesses. 
 
They also commented on feeling cautious in using the categories, or of being more 
careful than usual, particularly after seeing a web page categorized in an unexpected 
 
 
220 
 
category. It is possible that these feelings would subside with greater use of the 
categorized overviews and increased familiarity with the categories.  
5.11.6 Effect on quality of search outcome 
None of the outcome-oriented hypotheses (H13a-H15a) were supported by the 
quantitative results. As noted earlier, many individual factors can affect search 
outcomes, particularly in exploratory searches. Participants perceived the breadth and 
difficulty of topics very differently. Their comments suggest that the challenging 
nature of the experimental task, the tight time limit and the topic difficulty all 
contributed to the difficulty in making progress toward their goal and the generally 
low quality of ideas.  
 
The qualitative data suggest that ideas were provoked by the categorized overviews, 
and some participants felt that they would not have generated specific ideas without 
the overviews. The data also suggest one possible negative outcome on the quality of 
ideas. One participant indicated concern that idea quality was negatively affected, 
indirectly, by changes in his search tactics due to the overview. He felt that he was 
not getting as many good results because he relied on the categories instead of 
analyzing the results to identify new concepts and terms to refine his query. Although 
other participants did not directly comment on this, observations of their actions and 
comments while searching lends credence to this concern. When presented with a 
feature that reduced cognitive effort, some participants used it even if it produced 
non-optimal results. They found beneficial trade-offs in this satisficing behavior 
(Marchionini, 1995; Simon, 1979) due to the context of the search. In this case, the 
 
 
221 
 
low negative impact of poorer results, the non-trivial effort needed to generate high 
quality story ideas, and the limited search time, probably induced participants to 
accept the poorer outcome. In a bona fide context, they would probably be more 
motivated and have more time, which might produce better results.  
5.12 Limitations 
5.12.1 Subject population 
This study was limited by the fact that the participants (N=24) were all students at the 
University of Maryland. Twenty were journalism students, so the scenario and task 
was appropriate for them (as most of them confirmed), but they might not be 
representative of the needs of other exploratory searchers. The journalism scenario 
was not relevant to the four non-journalism students, although the specific task 
appeared to be similar to tasks they had performed. The participants were all 
experienced searchers, and many appeared to have established sophisticated search 
tactics. They are unlikely to be representative of searchers with less experience.  
5.12.2 Category structure and membership 
The study was limited by several factors related to the categories: the specific facets 
used, the proportion of uncategorized results, and the structure of the categories 
within facets. Only three facets were used: topic, geography, and US government. 
They were selected because they could be practically extracted from existing, 
available data. The first two were selected because they categorized a broad set of 
web sites, providing a wide, but shallow, set of categories, and the third was selected 
because it provided a comprehensive categorization for a narrow domain that was 
 
 
222 
 
conceivably useful for the scenario. Other facets could have been chosen, 
representing different types of relationships. For example, the Last Time Visited 
classifier might be useful for searchers attempting to re-find a page if they had 
knowledge of when they had viewed the desired page. This would have yielded 
different quantitative and qualitative results. 
 
The modest proportion of pages that were categorized was a limitation of the study. 
Typically 40-80% of the search results for a query were categorized, which left many 
uncategorized pages. This had negative cognitive and affective impacts, discussed in 
section 5.11.3, like complicating the search process by the need to consider 
uncategorized pages in decision making. For domains in which search results can be 
more comprehensively categorized, these negative effects might not be observed. The 
modest changes in behavior (e.g. depth of viewed pages) might be more pronounced. 
Overall, the categories used in the study were intended to provide a pragmatic 
assessment based on the amount and kind of information currently available for 
categorizing search results from general web search engines. They did not utilize 
traditional text classification techniques. Incorporating these techniques might 
improve categorization rates. 
 
The structure of the topic facet was a limitation of the study. The ODP is not a well-
structured, formal classification. It represents different types of relationships within 
the hierarchy, the relationships can be ambiguous or loosely defined, and their 
interpretation can differ between the ODP editors and the searcher. This had minor 
 
 
223 
 
negative cognitive and affective impacts, puzzling and frustrating searchers. In 
particular, searchers sometimes perceived pages as being incorrectly categorized or 
were surprised by their placement within categories. This was exacerbated by the 
limited depth of the hierarchy used (three levels). For domains in which the 
classifications are formally defined, these impacts might be less prevalent. 
5.12.3 Scenario and task 
This study is limited because only one scenario and task type was evaluated. Other 
exploratory search tasks may benefit more or less from the categorized overview. In 
fact, the task was an important limitation on the quantitative results of the study. The 
overview may not have been as important to task performance as originally expected. 
The task could be successfully completed with tactics that did not utilize the 
overview. Individual differences also affected the quality of the generated ideas. 
These factors probably reduced the quantitative impact of the two different interfaces. 
The task had the desired effect of encouraging participants to re-evaluate and revise 
their existing search tactics, and it encouraged participants to analyze and integrate 
search results with their own knowledge, which is an important component of 
exploratory search tasks. 
 
This study was limited because the characteristics of the desired outcome were not 
fully described to participants. Although they were asked to generate a diverse set of 
article ideas, they were not told the specific newsworthiness criteria by which the 
ideas would be assessed. Providing this information might have helped participants 
 
 
224 
 
generate higher quality ideas overall, which might have in turn enabled a distinction 
to be seen between the two interface conditions. 
5.12.4 Time constraints 
The study was also limited by several time constraints. Although the training time 
was sufficient for subjects to learn the mechanics of using the categorized overview 
and the practice the task, it took time for subjects to reflect upon and revise their 
search tactics. They were often still in the process of refining their tactics at the end 
of the second categorized overview task. The time allocated to each task (12 minutes) 
was also short, which limited their ability to conduct more thorough searches and 
generate high quality ideas. A longitudinal or multi-day study could overcome this 
shortcoming by giving searchers time to adapt before conducting the assessed tasks. 
5.12.5 Interface design 
The study was limited because the experimental design implemented only one design 
idea for presenting the overview, a textual list supported sequential selection of 
categories within a facet and simultaneous selections between facets. Table 20 lists 
nine dimensions of the design space which could be explored. Alternate approaches 
would have different trade-offs, possibly leading to different results. In particular, 
graphical elements could have been incorporated into the overview, a possibility that 
the Future Work chapter addresses. 
5.12.6 Topic breadth 
The topics were not matched, as discussed in section 5.11.1. This prevented 
quantitative investigation of the effect of topic breadth, and complicated the 
 
 
225 
 
qualitative analysis. It also limited the statistical power of the quantitative analysis of 
the Topic variable, which had to be analyzed as a 4-level between-groups factor 
instead of a 2-level within-subject factor. 
5.12.7 Quantitative analysis 
The statistical power of the quantitative analysis was limited by several factors. The 
non-matched nature of the broad and narrow topics has been noted. Individual 
differences appeared to be a factor in the variability of observed behaviors and the 
quality of the generated ideas. The overall statistical power was limited by the modest 
number of subjects, and may have been affected by the inclusion of the four non-
journalism students.  
5.12.8 Qualitative analysis 
The qualitative analysis was limited in several important ways. The research was 
conducted in a laboratory setting rather than the participants? own workplaces, and 
the task was not of their own choosing. They were removed from their typical 
environment and asked to perform a task with artificial constraints. The detailed 
scenario was designed to provide a rich context for the task and to encourage them to 
draw on their own experience, but the experience was certainly only a facsimile of 
what it would be in practice. Participants did, however, show an awareness of these 
differences. They acknowledged the differences in during the exit interview, and they 
commented on the essential elements of the task that were common between the 
research setting and their workplace. 
 
 
 
226 
 
The qualitative analysis is limited because of the primary reliance on peer review of 
the interpretations and conclusions. A single researcher analyzed and interpreted the 
raw data. Conducting member checks was not considered feasible because of the cost 
and time required to recall participants after the intervening Christmas and New Year 
holidays. Using a second researcher to code the exit interview questions might have 
identified additional behaviors, tactics, and thoughts, or provided alternative 
interpretations. The study does make modest use of triangulation with the quantitative 
data, and it provides direct quotes to support interpretations. The phenomena being 
examined in this study was constrained by the laboratory environment and the task, 
which removed many external factors that could lead to variations in interpretations. 
The interpretations were closely tied to the raw data, often using the same language 
that participants used. 
5.13 Summary 
As a whole, this study and the two early studies provide qualitative support for the 
use of categorized overviews of search results based on meaningful and stable 
categories, and identify some possible limitations. Across two different domains, the 
three studies showed that searchers explored more deeply in their results, and were 
more satisfied with the experience, although they do not show objective differences in 
search outcomes. Searchers agreed that the categorized overviews helped them 
organize, explore and assess their results, and were not appreciably more complex 
than typical Google-like interfaces.  
 
 
 
227 
 
The early studies refined the design principles for exploratory search, and this study 
corroborated the principles by evaluating the SERVICE prototype, which was 
designed according to many of the principles described in section 4.2: 
? Provide overviews of large sets of results 
? Organize overviews around meaningful categories 
? Tightly couple category labels to result list 
? Arrange text for scanning/skimming 
? Support multiple kinds of categories 
? Make category structure visible 
? Use separate facets for each type of category 
 
One important implication of this study for search interface designers is that the 
hierarchy used in a categorized overview should be carefully analyzed and may need 
to be modified in two ways. First, different relationships encoded in the hierarchy 
(e.g. is-a vs. part-of) should be separated into separate top-level facets. Second, and 
more generally, parent-child (or broader-narrower) relationships that are clear when 
encountered while browsing a thesaurus or directory of web pages, will not always be 
clear when used in the context of a categorized overview of search results. The 
structure of the hierarchy will need to be changed in these cases. This suggested a 
new principle (?Use separate facets for each type of category?) and refinement to the 
initial principle, ?Visualize and clarify category structure.? Practitioners should 
analyze at least the top two levels of a hierarchy, considering whether they need to be 
adjusted to provide the clearest overview. 
 
 
228 
 
 
The study suggested an additional design principle: ?Ensure that full category labels 
are available.? It also suggested a refinement to the principle, ?Tightly couple 
category labels to result list?: Provide clear indications to searchers when and how 
their results are being filtered.  
 
The study suggested how categorized overviews affect cognitive processes, and 
illustrated ways that participants began to adapt their exploratory tactics to use the 
categorized overviews. The categorized overviews encouraged searchers to create 
fewer, and possibly broader, queries for the search tasks, which changed the tactics 
searchers used to (re-)formulate queries. Several different tactics for using the 
categorized overview emerged, including using it to organize the exploration of the 
results, alternating between the overview and the list, and using the overview simply 
as a backup or secondary tool. The study highlighted the difficulty that some 
participants had in adapting their existing search tactics to take advantage of the new 
capabilities. The study provided several examples of searchers apparently satisficing 
by using the categorized overview. These results helped to refine the analysis. 
 
Evaluating exploratory search task outcomes is challenging, and these studies did not 
detect quantitative differences in search outcomes. The results do provide qualitative 
indications that categorized overviews suggest ideas and questions to searchers that 
would not surface with the baseline system. They also raise cautionary questions 
about possible negative impacts on the quality of search results. 
 
 
229 
 
 
One important economic implication of the study for search engine developers is that 
they might serve more searchers with fewer transactions by providing larger result 
sets with categorized overviews. This assumes that the category information is 
available at query time. 
 
 
 
230 
 
Chapter 6:  Contributions 
6.1 Benefits of categorized overviews 
The qualitative analysis of study 3 identified changes in how searchers think about 
and interact with search results when a categorized overview is available. It identified 
seven tactics that searchers adopted in response to the categorized overviews. Study 
participants agreed significantly more that the search system helped them assess their 
search results and determine the next steps in their search process with the 
categorized overviews than without. Study participants found the categorized 
overview interface significantly more organized than the baseline system. 
 
Studies 1 and 3 confirmed previous findings that searchers view pages deeper in their 
search results when overviews are available (K?ki, 2005). Study 1 extended these 
findings by providing quantitative (albeit not statistically significant) indications that 
the categorized overviews also helped searchers find relevant and useful pages deeper 
in the results for an exploratory search task (?Find 3 web pages providing different 
aspects of or perspectives on this topic?.  
 
Studies 1 and 3 confirmed previous findings that searchers were more satisfied with 
their experience when using the categorized overview than without it. 
6.2 Limitations of categorized overviews 
Study 3 found no differences in the outcomes of an exploratory search task (generate 
newspaper article ideas). Analysis of the results suggested that several factors 
 
 
231 
 
contributed to this. First, task performance for that task may not be dependent on an 
overview, even though searchers appreciated it. Second, a large number of 
uncategorized results may have limited the effectiveness of the overview. Third, 
flaws in the hierarchical structure of the categories may have limited the effectiveness 
of the overview. This suggested that when categories are incorporated from existing 
knowledge structures, such as the Open Directory, the hierarchical structure should be 
carefully analyzed and may need to be modified for use in the categorized overview. 
This yielded several design principles and suggested refinements for future studies.  
 
Study 2 indicated that automated clustering techniques supported an exploratory 
search task that involved generating ideas for newspaper articles. Participant 
comments indicate that the words and phrases in the cluster labels suggested article 
ideas. 
6.3 Analysis of search tactics with categorized overviews 
This dissertation presents an analysis of search with categorized overviews. It 
proposes a model of the exploratory search process (Figure 25), identifies four 
lightweight actions available to searchers when evaluating search results with 
categorized overviews (Table 11), and describes six beneficial tactics that searchers 
can adopt when categorized overviews are available (Table 12). This provides 
theoretical support for a set of principles for the design of exploratory search 
interfaces. The analysis helped guide the design of the SERVICE categorizing search 
system. 
 
 
 
232 
 
The analysis should stimulate research into the delicate interplay between the 
presentation of categorized overviews and the search results, the forms of interaction 
available to the searcher, the learned tactics that searchers employ, and the 
fundamental human and machine constraints that affect search. This analysis, 
narrowly focused on one step (examining search results) and one form of interface 
(categorized overview) should be seen as one step in understanding how exploratory 
searchers search. 
6.4 Design principles for categorized overviews of search results 
This dissertation proposes a set of design principles for exploratory search interfaces, 
supported and refined by the empirical studies: 
? Provide overviews of large sets of results 
? Organize overviews around meaningful categories 
? Clarify and visualize category structure 
? Tightly couple category labels to result list 
? Ensure that the full category information is available 
? Support multiple types of categories and visual presentations 
? Use separate facets for each type of category 
? Arrange text for scanning/skimming 
? Visually encode quantitative attributes on a stable visual structure 
 
These principles will be useful for digital library and web search designers, 
information architects, and web developers because they provide guidance for the 
appropriate integration of visual overviews with search result lists, and particularly 
 
 
233 
 
for the textual surrogates embedded in result lists. These principles embed a strong 
call for the surfacing of structure ? which is often used internally by search engines, 
but less often exposed at the user interface ? without abandoning the tried and true 
value of text. 
6.5 Fast feature classifiers 
This research contributes a framework in three dimensions (fast-feature/full-feature, 
rich/lean, online/offline) to analyze techniques for categorizing web search results. It 
describes nine Fast-feature, online classifiers that integrate information available in 
web search results with external data sources to categorize search results into 
meaningful and stable categories. The implementation and analysis of the Fast-
Feature classifiers shows their potential for use in categorized overviews for web 
search results. An analysis of search results from queries based on 250 TREC Robust 
topics showed that an average of 66% of the top 100 and 61.6% of the top 350 results 
for each query could be categorized in a rich thematic hierarchy based on the Open 
Directory. 
6.6 Enriching search result interaction with brushing and linking 
The general web search interface enabled novel, lightweight interactions with web 
search results by incorporating a brushing and linking technique. Specifically, 
brushing the pointer over a category label in the overview had the effect of 
highlighting any of the currently visible results in that category. Brushing the pointer 
over a result highlighted the categories that it was in. In study 3 participants did not 
find the system with the categorized overview significantly more complex than a 
ranked list of results. This demonstrated that searchers can use and appreciate 
 
 
234 
 
lightweight interactions that support, but do not get in the way of, their search tactics 
and actions. 
6.7 Design space of categorized overviews 
The description of the SERVICE design decisions and the summary of the design 
space for categorized overviews (Table 20) will help to guide designers as they 
develop categorized overview interfaces. The design space summary helps to identify 
decisions they will need to make during the design process. The design space can 
serve as a framework for additional research. 
6.8 Working system for categorized overviews of web search results 
The final contribution of this dissertation research is the SERVICE architecture and 
implementation technology, which supports two working categorizing search 
interfaces: AOL music search (Figure 30) and general web search (Figure 1). The 
SERVICE architecture defines a common Java interface to support easy plug-in of 
alternate category schemes. The SERVICE technology is comprised of approximately 
40 Java class files, which implement nine classifiers plus the two search interfaces. 
The two search interfaces use JavaServer Pages (JSP), hosted by an Apache Tomcat 
servlet container. The system runs on Windows and Linux, and uses JDBC to 
integrate with MySQL and MS-Access databases. The system also implements a 
client-side logging facility that supports capture of any JavaScript events, including 
scrolling, mouse clicks and mouseovers, passing the timestamped events back to a 
Java-based logging tool. Four external data resources containing over 500 MB of data 
were processed to extract category information, using Java, Perl and PHP. The ideas 
embedded in the user interface will be useful to designers of other search interfaces, 
 
 
235 
 
and the SERVICE system is available to researchers at the categorized overview 
project page (http://www.cs.umd.edu/hcil/categorizedoverview/). This will provide a 
flexible, extensible platform for additional research in categorizing search interfaces. 
 
 
 
236 
 
Chapter 7:  Future work 
7.1 Evaluation of exploratory search interfaces 
Evaluation of exploratory search interfaces is an exciting research challenge (White, 
Muresan, & Marchionini, 2006; White, Kules, Drucker, & schraefel, 2006). Task-
based evaluation of exploratory search interfaces using controlled experiments has 
been effective for showing subjective satisfaction differences between interfaces, but 
less effective at showing objective differences in task performance, particularly in 
task outcomes. (Kabel, Hoog, Wielinga, & Anjewierden, 2004; Yee, Swearingen, Li, 
& Hearst, 2003). Controlled experiments and in-depth case studies are two 
approaches to evaluation of exploratory search interfaces. 
 
Three factors may have contributed to the lack of objective differences in study 3: the 
proportion of uncategorized results, the structure of the hierarchies, and the degree to 
which the task depended on an overview. Controlled experiments may help quantify 
the effect of each factor in an exploratory search context. Future research in this area 
should carefully construct the topics to ensure that they are indeed distinguishable by 
breadth. The broad/narrow concept should be operationally defined in terms of 
specific criteria, such as searcher perception, or in relation to a specific set of 
categories (e.g. distribution of search results), and tested with pilot subjects. Studies 
of exploratory search should also account for individual differences. Differences in 
cognitive abilities, cognitive styles, and problem-solving styles have been shown to 
affect search behavior and outcome (Kim & Allen, 2002; Wang, Hawk, & Tenopir, 
 
 
237 
 
2000). This appeared to be particularly true for the exploratory search tasks used in 
these three studies. 
 
The situated nature of exploratory search tasks can lead to many different, but 
successful, task outcomes for different searchers. In-depth, longitudinal case studies 
have been used to evaluate information visualization interfaces and creativity support 
tools (Shneiderman et al., 2006; Shneiderman & Plaisant, 2006). These techniques 
integrate ethnographic and quantitative methods, using participant observation, 
surveys, interviews, and usage logs to study users performing complex tasks with 
individually defined goals. These techniques may be beneficial for investigating how 
searchers adapt their tactics when rich web search interfaces like categorized 
overviews are available. 
7.2 Structure of category hierarchies for search results 
Research on web directories generally indicates that broad, shallow hierarchies are 
desirable. These studies have typically used known-item or other narrowly defined 
search tasks. Does the exploratory search task benefit from a different seat of breadth, 
depth and size trade-offs? Does the content domain affect these trade-offs? 
 
Zaphiris, Shneiderman & Norman (2002) found that expandable menus outperformed 
sequential menus on hierarchies of depth 2 or 3, but performed poorer than sequential 
menus with hierarchies of depth 4. As with other studies, they used narrowly defined 
search tasks with a single correct answer. They speculate that fully expanded 
hierarchy (of depth 4) became unwieldy for users. Supporting hierarchy 
 
 
238 
 
customization operations ?on-the-fly? as users explore search results may ameliorate 
that by allowing them to promote and move sub-trees of interest. However, that 
benefit could be offset by the additional training and possible cognitive effort. A 
comparison of sequential menus versus expandable outliners in this problem domain 
could yield different results than Zaphiris, et al. observed, and could deepen our 
understanding of the trade-offs inherent hierarchical displays. 
7.3 Graphical overviews of search results 
Graphical displays of web search results, inspired by the success of information 
visualization for abstract data, are a promising way to improve information retrieval. 
They have yielded mixed results to date, though. This dissertation has argued that 
designers of first generation tools (e.g., Grokker and Kartoo) overlooked the ongoing 
importance of text in their zeal to reap the perceptual benefits of graphical displays. 
The analysis and principles begin to address the graphical elements of categorized 
overviews, but have not yet been theoretically or empirically validated. Compact 
graphical overviews, paired with search result lists, are one promising research 
direction. This approach does impose moderate to severe size constraints on the 
graphical elements. Information visualization techniques like GRIDL (Shneiderman, 
Feldman, Rose, & Grau, 2000), SuperTable (Klein, M?ller, Reiterer, & Eibl, 2002), 
and WebTOC (Nation, Plaisant, Marchionini, & Komlodi, 1997), and the treemaps 
used in study 1are starting points, and may provide additional opportunities for 
lightweight interaction with search results, in the spirit of dynamic queries. 
Additional research is needed to better understand the tasks as well as the 
fundamental perceptual and cognitive processes that will benefit. 
 
 
239 
 
7.4 Leveraging the Semantic Web 
The Semantic Web community advocates the development of machine usable 
metadata to support automated resource discovery and reasoning, but there is growing 
recognition that both human and automated agents can benefit from interoperability 
between metadata standards. A plethora of proposals and standards purport to address 
the needs of classification users in multiple fields. Topic Maps, the Simple 
Knowledge Organization System (SKOS), and other proposals for the interchange of 
thesauri, classifications and ontologies promise a way to distribute classifications 
widely and maybe even to interconnect them at strategic points. Documenting and 
distributing the instantiated algorithms and rules for categorizing items into a 
classification has not yet been addressed. Additional research in this area could 
extend the fast-feature classifiers to take advantage of work done by projects such as 
the Dublin Core Metadata Initiative, the Open Archives Initiative, and CITIDEL to 
find, harvest, and integrate external metadata. Collaborative taxonomies, or 
folksonomies, like Flikr (flikr.com) and del.icio.us (del.icio.us) could be incorporated 
into categorized overviews. Collaborative taxonomies are not controlled vocabularies, 
but social forces encourage evolution toward a common set of tags. Both services 
provide application programmer interfaces to their tagging engines.  
7.5 Lightweight customization of categories 
The formative study results motivate a lightweight mechanism for customizing 
hierarchies. The need to restructure and reorganize hierarchies was highlighted by 
during the development of the SERVICE system and the final study. Existing 
taxonomic maintenance tools are designed to manage extensive metadata for 
 
 
240 
 
taxonomies and classifications. They are full-featured, complex and require a 
commitment of time to learn and use. There may be value to end-users and designers 
in a lightweight tool to customize rich category hierarchies. This could allow 
motivated end-users (perhaps ?power users?) to customize hierarchies for niche uses.  
 
 
 
 
241 
 
Appendix A: Study 1 ? Perspectives identified by subjects 
The following three tables list the perspectives identified for each scenario in study 1, 
and the number of times each was identified within each condition. 
 
Table 34. Perspectives identified for the Urban Sprawl scenario. 
Perspective Rank in 
results
Control Expand-
able 
Outliner
Tree-
map 
Total
Health-public health 2 4 1 3 8
NASA-satellite mapping 6 3 2 3 8
other-Interior Dept. 1 2 1 4
Health-obesity 8 2 3
overview-Definition of 
urban sprawl 
93 3
environmental 2 1  3
Health-NIH 1 1 1 3
environmental-agricultural 
impact 
32 2
autos/traffic 1 1 2
economic factors 2  2
environmental-air pollution 1 1 2
overview-big picture 1 1 1
assessing 3 1 1
other-Michican 5 1 1
development-brown fields 1 1
development-coastal 1 1
development-density 1 1
development-Smart growth 1  1
environmental-
photosynthesis 
1 1
environmental-water 
resources 
1 1
Health-CDC 1  1
NASA 1 1
NASA-scientific 1 1
Total 16 18 18 
 
 
 
242 
 
 
Table 35. Perspectives identified for the Breast Cancer scenario. 
Perspective Rank in 
results
Control Expand-
able 
Outliner
Tree-
map 
Total
other-male BC 2 4 3 1 8
research-NASA/space 
based 
1 2 2 5
general info-self-detection, 
diagnosis, screening 
5, 7112 4
general info-what you need 
to know 
4,3 2 1 1 4
risks-assessment 10 2 2 4
legislation-senate 1 1 1 3
reports-medline 1 1 2 3
research-genes 3 3
general info-treatments 2  2
legislation 1 1 2
other-NIH 2 2
other-NIH-NCI 3 2 2
risks-heart 1 1 2
general info-cancer types 1  1
general info-early detection 1  1
general info-facts 1  1
other-NOAA 1 1
other-pre-knowledge/post-
knowledge 
1 1
reports-news/scientific 1  1
research-studies-biggest is 
NIH 
1 1
risks-anti-perspirant 1 1
risks-environmental 1 1
Total 18 17 18 
 
 
 
243 
 
 
Table 36. Perspectives identified for the Alternative Energy scenario. 
Perspective Rank in 
results
Control Expand-
able 
Outliner
Tree-
map 
Total
agriculture 5 4 2 1 7
legislation-presidential 
initiative 
13 4
mailing list 1 3  3
promotion-benefits 2 2 1 3
legislation-house 1 1 2
legislation-tax code 1 1 2
lists of technology 6 1 1  2
medical use 1 1 2
sustainable 2 2
who [agency] is dealing 
with it 
2 2
coast guard 1  1
economic-energy futures 1 1
Economic-hydro power-
cost 
1 1
environmental-climate 
change 
1 1
environmental-conservation 1  1
environmental-green 
communities 
91  1
form 1 1
halogen alternatives 1  1
info 1
info-overview 1 1
land management 1 1
legislation-senate 1 1
microbial 1 1
NOAA-current law 1 1
products of process 1  1
promotion-educational 1 1
prototypes 1 1
renewable 4 1 1
reporting-statistics 1  1
source 1 1
source-biomass energy 1  1
source-fuel cells 1  1
source-fuels/crops 1 1
 
 
244 
 
source-solar power 1  1
studies-DOE labs 1  1
Total 18 18 18 
 
 
 
 
245 
 
Appendix B: Study 1 ? Unusual results identified by subjects 
The following three tables list the unusual results identified for each scenario in study 
1. If a participant identified multiple instances of the same value within the scenario, 
that was counted as one instance, i.e., noticing missing results from two agencies 
within the Urban Sprawl scenario would be coded as one instance. The user?s first 
reaction was counted, even if they subsequently explained the instance and/or 
changed their mind.  
Table 37. Unusual results identified for the Urban Sprawl scenario. 
Unusual-1 (Urban Sprawl) Control ExpOut TM Total
why not more from agency 3 3
why so many/why any at all from agency 2 1 3
NASA-why/satellite images 1 1 1 3
Myths 1 1 2
Obesity 1 1
library of Michigan 1  1
desert blooms-guide to plants 1  1
incorrectly categorized page 1 1
aggressive driving 1 1
measuring heat 1 1
why does lab link urban sprawl with natural disasters  1 1
invalid titles  1 1
coastal growth  1 1
hadn't clicked on that yet  1 1
Total 5 7 6
 
 
 
246 
 
Table 38. Unusual results identified for the Breast Cancer scenario. 
Unusual-2 (Breast Cancer) Control ExpOut TM Total
why not more from agency 2 2 4
why so many/why any at all from agency 1 1 2 4
NASA-space based research 1 1 2 4
Male BC 3 1  4
myths 1 1 1 3
simulations of BC 1 1  2
FAQ on hereditary 1   1
hawaii 1 1
new gene found 1   1
CBCTR 1 1
SPORES project 1   1
URL changed 1  1
Defense bill 1  1
surveillance 1 1
expected general pages to be ranked higher 1  1
LOC/tracer bullets  1 1
economic statistics  1 1
Total 12 11 9 
 
Table 39. Unusual results identified for the Alternative Energy scenario. 
Unusual-3 (Alternative Energy) Control ExpOut TM Total
why so many/why any at all from agency 2 1 5 8
why not more from agency 4 2 6
atrial defibrillation/AW for medical use 1 2  3
mailing list 2   2
student congressional town meeting 1 1 2
health 1 1
titles not helpful 1   1
photosynthesis 1 1
north korea 1   1
how few provide overviews 1   1
USAID & Brazil 1   1
Yurok 1 1
climate change 1   1
incorrectly categorized page 1  1
homeland security 1  1
computer aided manufacturing 1  1
why not more wacky sites  1 1
Total 13 11 9 
 
 
247 
 
Appendix C: Study 3 ? Paper materials 
Study Introduction 
IRB Project: User Interfaces for Public Access Information Systems 
 
Thank you for agreeing to participate in this study. In a moment I?ll ask you to fill out 
a few forms, but first I?ll give you a little background.  
 
Although people often use search engines to find a particular piece of information or 
a specific web page, they also search when they want to explore more generally, for 
example to find out what information is available about an issue of concern, or to start 
learning about an unfamiliar subject.  
 
A journalist could be doing background research for an article or a person might want 
to learn more about a friend?s health problem. We call this exploratory searching. Can 
you think of an instance when you did a search like that? 
 
We are conducting this study to learn more about this kind of searching. To do that 
we have to observe how real people conduct these searches ? how they explore, 
gather information, and make sense of what they find.  
 
That?s why your help is so important. In this study, you will perform several of these 
exploratory searches. I will provide a scenario, asking you to imagine that you have a 
particular need for information from the Web and then you will use a search engine to 
gather information to satisfy that need.  
 
You will use two experimental systems that implement new ways of retrieving and 
presenting search results. You?ll do 4 searches total, with a short break in the middle. 
Before and after each search, you will complete short questionnaires. At the end of 
the session, which will last about 2 hours, you will receive a $30 reward in 
recognition of your help. 
 
Do you have any questions at this point? Okay, if you would please read and sign this 
informed consent form, we can get started. [Informed Consent; offer water; turn 
off cellphones] 
 
Ok, could you fill out this short questionnaire? [Entry questionnaire; 3-button 
mouse; tabbed browser] 
 
Now I?ll show you a short video. [training video (Portsmouth first => 
Portsmouth+Collector.avi, Kittery first => Collector.avi)] 
 
[Both note] The search engine returns 100 results, listed with the most relevant pages 
at the top, like a regular search engine. 
  
 
 
248 
 
[Portsmouth note] It?s important to remember that the web sites for these pages 
[point to results] are cataloged, not necessarily the specific pages. And the 
cataloging is done by human editors, so there will always be some sites that haven?t 
been categorized yet. Those pages might appear near the top of the results, and might 
be good pages, even though they in the Uncategorized group. The Empty item is a list 
of categories that don?t have any results in them. Sometimes it?s useful to see what 
categories your results are not coming from. 
 
Do you have any questions? 
 
[ Training task ] 
 
Now we?ll walk through the scenario that we?re using for this session and give you a 
chance to use the system. 
 
Imagine that you are a reporter for a national newspaper. Due to some recent events, 
your editor has just asked you to generate a list of ideas for a series of articles on 
[urban sprawl]. There?s a meeting in an hour, so she doesn?t need a lot of detail, but 
she wants a diverse list of 8-10 (or more) ideas for discussion.  
 
They should cover many different aspects of the topic, to appeal to a broad range of 
readers. Unusual or provocative ideas are good. You have about 10 minutes to 
conduct a short web search to find out what information is available and generate the 
ideas. 
 
Your results will be judged (by your imaginary editor) on the quality and diversity of 
ideas. For example, ?public health impact? would be an okay idea. and ?obesity as a 
public health impact of urban sprawl? would be even better, because it is a bit more 
specific. 
 
As you use the search engine to explore and generate article ideas, enter them in the 
Collector form and include the web page that inspired your idea. It is important that 
you enter the ideas, not notes like ?a good page?. Think of this list [point to the 
Collector] as a bullet list for the discussion. 
 
Please think out loud as you take each action, for example, when you enter a query, 
click on something, or scroll a page. Briefly say why you did it and then tell me your 
reaction to the system?s response.  
 
I?m also interested in what?s good or bad, problems or insights, and anything 
confusing. You don?t have to describe what you are doing, since we?re recording it. 
We?ll spend a few minutes on this now. 
 
[Start Camtasia. Encourage them to explore the system and generate 2-3 ideas. 
If they haven?t done all items on checklist, prompt them. After training task: ] 
Do you have any questions? [Stop Camtasia] 
 
 
249 
 
 
Please remember that you are not being tested. Instead, you are helping us to evaluate 
these systems by using them to the best of your ability and producing the best results 
you can. 
 
 
250 
 
 
Participant ID: ________ 
 
 
251 
 
  
Training Task Checklists 
IRB Project: User Interfaces for Public Access Information Systems 
 
 
Participant ID: _____  Sequence: ____/____/____/____ Date: 
____________ 
 
 
Check mark ( 9) if they do it on their own, ?P? if prompted. 
 
Portsmouth 
 
__ Enter query 
 
__ Pointer over category; view subcats & highlighted results 
 
__ Pointer over result; view highlighted cats 
 
__ Filter on category 
 
__ Exclude category 
 
__ Pointer over Empty pseudo category; view list 
 
__ Show all results 
 
Collector 
 
__ Collects idea & link (from a web page) 
 
__ Collects idea & link (results list) 
 
__ Collects idea & link (URL location bar) 
 
__ Generates 2+ appropriate ideas  
 
 
 
252 
 
Study Procedural Checklist 
IRB Project: User Interfaces for Public Access Information Systems 
 
Participant ID: _____ Sequence: ____/____/____/____ Date: ____________ 
 
Start 
Time 
Cap-
ture 
Step 
  Introduction 
  Informed consent form 
  Entry questionnaire 
  Video:    __ Collector.avi 
               __ Portsmouth+Collector.avi 
  9 Training task ? Time limit: 8 
  Task 1: __Br1    __Br2    __N1    __N2 
      Pre-search questionnaire 
  9     Search ? Limit: 12  End: __________ 
  9     Post-search questionnaire 
  Task 2: __Br1    __Br2    __N1    __N2 
      Pre-search questionnaire 
  9     Search ? Limit: 12  End: __________ 
  9     Post-search questionnaire 
  Break (restart Eclipse) 
Video: __Portsmouth.avi 
            __None 
  9 Training task: __Yes    __No 
  Task 3: __Br1    __Br2    __N1    __N2 
      Pre-search questionnaire 
  9     Search ? Limit: 12  End: __________ 
  9     Post-search questionnaire 
  Task 4: __Br1    __Br2    __N1    __N2 
      Pre-search questionnaire 
  9     Search ? Limit: 12  End: __________ 
  9     Post-search questionnaire 
  9 Exit interview 
  Reward; receipt; copy of form 
 
 
253 
 
Exit Interview Questions 
IRB Project: User Interfaces for Public Access Information Systems 
 
Participant ID: _____  Date: ____________ 
 
1. Which system would you rather use for these tasks (Kittery, Portsmouth 
or no preference) [order: ____/____/____/____]: 
a. [K, P, no-op] Find the home page for the daily newspaper in 
Concord, NH, The Concord Monitor. 
b. [K, P, no-op] Find information on caring for a pet gerbil. 
c. [K, P, no-op] Start looking for information to help you select and 
buy a new digital camera. 
d. [K, P, no-op] Learn about U.S. business investment in Africa. 
2. How do you feel about the quality of ideas that you generated for each 
task? Rank (worst) ____ - ____ - ____ - ____ (best) 
3. Did the categorized overview change the way you searched? Can you 
describe an example? Why? 
4. Can you describe an example where the categorized overview 
[helped/hindered, frustrated or mislead ? whichever not indicated in 
previous question]? 
5. Did you notice any difference in how you used the categorized overview 
each time? Can you describe an example? 
6. Did you ever read the category pop-ups? subcategories? Why/why not? 
How did that help you decide? 
7. [Show Leonardo da Vinci] What kind of results do you expect to see if 
you did this search and clicked on the Kids and Teens link? Computers? 
Reference? K&T ______     Computer ______ / ______     Reference 
______ 
8. Did you ever use the Uncategorized pseudo-category? Example? Empty? 
9. These systems display 100 results at a time. Do you have any thoughts on 
this? Would you typically use all 100? 
10. How similar or dissimilar is this scenario what a journalist would really 
need to do? How does it differ?  
11. How similar or dissimilar are the searches that you did to a search that 
would do? How/how not? 
12. Could you compare the difficulty finding information on the topics? 
Rank them Rank (easiest) ____ - ____ - ____ - ____ (hardest) 
13. How much time would you have spent on the tasks if there were no 
specific time limit? 
14. Did you notice any changes in your energy level or alertness over the 
course of the session? 
15. Do you have any suggestions for additions or changes to the categories? 
16. Do you have any suggestions for changes to the system? features? 
layout? interactions? 
 
 
 
 
254 
 
Exit Interview Questions 
IRB Project: User Interfaces for Public Access Information Systems 
 
Participant ID: _____  Date: ____________ 
 
 
On a scale from 1 to 9, please rate how narrow or broad each of the topics was: 
 
 
 Narrow                                             
Broad 
 
Aging workforce 
 
 
     1     2     3     4     5     6     7     8     9    
 
Human smuggling 
 
 
     1     2     3     4     5     6     7     8     9    
 
International art crime 
 
 
     1     2     3     4     5     6     7     8     9    
 
Workplace allergies 
 
 
     1     2     3     4     5     6     7     8     9    
 
 
 
 
255 
 
Appendix D: Study 3 ? Online questionnaires 
 
Entry Questionnaire 
User Interfaces for Public Access Information Systems 
Web Search Study 3 
Investigator: Bill Kules 
 
    This questionnaire provides us with background information that helps us analyse 
the answers you give in later stages of this experiment. 
 
Questions marked with a * are required. 
  
*1.  Participant ID (provided by experiment monitor) 
 
  
*2.  Your age 
 
  
*3.  Your gender 
 
     Female 
     Male 
 
  
*4.  Your occupation (if student, department/major) 
 
  
*5.  Highest level of education achieved 
 
     High school 
     Part way through undergraduate program 
     Undergraduate degree 
     Part way through graduate program 
     Graduate degree (e.g. Masters, PhD) 
 
  
*6.  Web searching experience ? How long have you used search engines to look 
for information on the Web? 
 
     Less than 6 months 
     6-12 months 
     1-3 years 
     More than 3 years 
 
 
256 
 
 
  
*7.  Search frequency ? How often do you use a search engine to search for 
information on the Web? 
 
     Less than once a week 
     1-2 times per week, but less than once a day 
     At least once a day 
 
  
*8.  How do you rate your searching skills? 
 
     1  2  3  4  5 
    Novice     Expert 
 
  
*9.  When you search on the Web, how often do you find the information you are 
looking for? 
 
     Never or almost never 
     Rarely 
     Some of the time 
     Most of the time 
     Always or almost always 
 
  
*10.  What web search engines do you use frequently (select all that apply)? 
 
     Google 
     Yahoo! 
     MSN 
     AOL 
     Other:  
 
  
*11.  What type of information do you normally search for on the web (select all 
that apply)? 
 
     Research for classes 
     Research for work 
     Job searching 
     Entertainment/recreation 
     Places or products 
     News or information on events 
     Locate people (email, addresses, phone numbers, etc.) 
     Commerce, travel, employment or economy 
 
 
257 
 
     Society, culture, ethnicity or religion 
     Government information 
     Other:  
 
 
Pre-Search Questionnaire 
 
Questions marked with a * are required. 
  
*1.  Participant ID (provided by experiment monitor) 
 
  
*2.  Sequence number (provided by experiment monitor) 
 
  
*3.  System (provided by experiment monitor) 
 
     Kittery 
     Portsmouth 
 
  
*4.  Topic (provided by experiment monitor) 
 
  
*5.  How familiar are you with this topic now? 
 
     1  2  3  4  5  6  7  8  9 
 Not al        Very 
 
  
*6.  How interested are you in this topic now? 
 
     1  2  3  4  5  6  7  8  9 
 Not al        Very 
 
  
*7.  How confident are you that you can find useful information about your topic 
on the Web? 
 
     1  2  3  4  5  6  7  8  9 
 Not al        Very 
 
  
*8.  How do you feel about your ability to complete the task at this point? 
 
 
 
258 
 
     1  2  3  4  5  6  7  8  9 
 Very uncertain        Very certain 
 Pessimistic         Optimistic 
 Confused         Clear 
 Doubtful         Confident 
 
 
 
 
Post-Search Questionnaire 
 
Questions marked with a * are required. 
  
*1.  Participant ID (provided by experiment monitor) 
 
  
*2.  Sequence number (provided by experiment monitor) 
 
  
*3.  System (provided by experiment monitor) 
 
     Kittery 
     Portsmouth 
 
  
*4.  Topic (provided by experiment monitor) 
 
  
5.  What are your thoughts at this point (you may write or comment out loud)? 
 
  
*6.  How familiar are you with this topic now? 
 
     1  2  3  4  5  6  7  8  9 
 Not al        Very 
 
  
*7.  How interested are you in this topic now? 
 
     1  2  3  4  5  6  7  8  9 
 Not al        Very 
 
  
*8.  How confident are you that you can find useful information about your topic 
on the Web? 
 
 
259 
 
 
     1  2  3  4  5  6  7  8  9 
 Not al        Very 
 
  
*9.  How do you feel about your ability to complete the task at this point? 
 
     1  2  3  4  5  6  7  8  9 
 Very uncertain        Very certain 
 Pessimistic         Optimistic 
 Confused         Clear 
 Doubtful         Confident 
 
  
*10.  The search I performed was: 
 
     1  2  3  4  5  6  7  8  9 
 Stressful         Relaxing 
 Boring         Interesting 
 Tiring         Restful 
 Difcult        Easy 
 
  
*11.  How much progress did you make on generating good ideas? 
 
     1  2  3  4  5  6  7  8  9 
    None         I?m ready to give 
my editor the list 
 
  
*12.  How useful was the information you found? 
 
     1  2  3  4  5  6  7  8  9 
    Not at all useful        Very useful 
 
  
*13.  How difficult was it to explore / navigate the results of your search? 
 
     1  2  3  4  5  6  7  8  9 
 Very hard         Very easy 
 
  
*14.  I was able to get a good overview of the information available on the Web for 
this topic: 
 
     1  2  3  4  5  6  7  8  9 
 
 
260 
 
 Strongly disagree        Strongly agree 
 
  
*15.  The system helped me organize my search results: 
 
     1  2  3  4  5  6  7  8  9 
 Strongly disagree        Strongly agree 
 
  
*16.  The system helped me find useful pages: 
 
     1  2  3  4  5  6  7  8  9 
 Strongly disagree        Strongly agree 
 
  
*17.  The system helped me assess the results of my queries to decide what to do 
next: 
 
     1  2  3  4  5  6  7  8  9 
 Strongly disagree        Strongly agree 
 
  
*18.  Please indicate how well these descriptions apply to this system: 
 
     1  2  3  4  5  6  7  8  9 
 Terrible         Wonderful 
    Difficult to use        Easy to use 
 Dull         Stimulating 
 Frustrating         Satisfying 
 Complex        Simple 
 Too Slow         Fast Enough 
 Overwhelming        Manageable 
 Disorganized        Organized 
 
 
 
261 
 
Bibliography 
1. Aguilar, F. J. (1988). General Managers in Action. New York, NY: Oxford 
University Press. 
2. Ahlberg, C., Shneiderman, B. (1993). Visual information seeking: Tight coupling 
of dynamic query filters with starfield displays. In Proceedings of the SIGCHI 
Conference on Human Factors in Computing Systems (pp. 313-317). New York: 
ACM Press.  
3. Allen, R. (1995). Two digital library Interfaces that exploit hierarchical structure, 
DAGS95: Electronic Publishing and the Information Superhighway. 
4. Anderson, J. R. (1990). The Adaptive Character of Thought. Hillsdale, NJ: 
Lawrence Erlbaum Associates. 
5. Ask.com. (2005). About Ask.com: IQ. Retrieved April 18, 2006, from 
http://sp.ask.com/en/docs/iq/iq.shtml. 
6. Aula, A. (2004). Enhancing the readability of search result summaries. In 
Proceedings Volume 2 of the Conference HCI 2004: Design for Life, Leeds, UK. 
Retrieved April 27, 2006, from http://www.cs.uta.fi/~aula/aula_summary.pdf. 
7. Aula, A., Jhaveri, N., & K?ki, M. (2005). Information search and re-access 
strategies of experienced web users. In Proceedings of the 14th International 
Conference on the World Wide Web, Chiba, Japan (pp. 583-592). New York: 
ACM Press.  
8. Bates, M. (1990). Where should the person stop and the information search 
interface start. Information Processing and Management, 26(5), 575-591.  
9. Bates, M. J. (1979). Information search tactics. Journal of the American Society 
for Information Science, 30, 205-214.  
10. Bates, M. J. (1989). The design of browsing and berrypicking techniques for the 
on-line search interface. Online Review, 13(5), 407-431.  
11. Becks, A., Seeling, C., & Minkenberg, R. (2002). Benefits of document maps for 
text access in knowledge management: A comparative study. In Proceedings of 
the 2002 ACM Symposium on Applied Computing (pp. 621-626). New York: 
ACM Press.  
12. Belkin, N. J. (1980). Anomalous states of knowledge as a basis for information 
retrieval. Canadian Journal of Information Science, 5, 133-143.  
 
 
262 
 
13. Bell, D. J., & Ruthven, I. (2004). Searchers? assessments of task complexity for 
Web searching. In S. Macdonald & J. Tait (Eds.), Proceedings of the 26th BCS-
IRSG Euopean Conference on Information Retrieval (pp. 57-71). Berlin: 
Springer-Verlag.  
14. Bhavnani, S. K., & Bates, M. J. (2002). Separating the knowledge layers: 
Cognitive analysis of search knowledge through hierarchical goal 
decompositions. In Proceedings of the American Society for Information Science 
and Technology Annual Meeting (Vol. 39, pp. 204-213). Medford, NJ: 
Information Today.  
15. Borlund, P. (2000). Experimental components for the evaluation of interactive 
information retrieval systems. Journal of Documentation, 56(1), 71-90.  
16. Borlund, P. (2003). The IIR evaluation model: A framework for evaluation of 
interactive information retrieval systems. Information Research, 8(3), paper no. 
152. Retrieved April 17, 2006, from http://informationr.net/ir/8-3/paper152.html. 
17. Bowker, G., & Starr, S. (1999). Sorting Things Out: Classification and Its 
Consequences. Cambridge MA: MIT Press. 
18. Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3-10.  
19. Bystr?m, K., & Hansen, P. (2002). Work tasks as units for analysis in information 
seeking and retrieval studies. In H. Bruce, R. Fidel, P. Ingwersen & P. Vakkari 
(Eds.), Emerging Frameworks and Methods (pp. 239-251). Greenwood Village, 
CO: Libraries Unlimited.  
20. Card, S., Mackinlay, J., & Shneiderman, B. (1999). Readings in Information 
Visualization: Using Vision to Think. San Fransisco: Morgan Kaufmann. 
21. Ceaparu, I., & Shneiderman, B. (2004). Finding governmental statistical data on 
the Web: A study of categorically organized links for the FedStats topics page. 
Journal of the American Society for Information Science and Technology, 55(11), 
1008 - 1015.  
22. Chen, H., Houston, A. L., Sewell, R. R., & Schatz, B. R. (1998). Internet 
browsing and searching: User evaluations of category map and concept space 
techniques. Journal of the American Society for Information Science, 49(7), 582-
608.  
23. Chen, M., Hearst, M., Hong, J., & Lin, J. (1999, October 11-14, 1999). Cha-Cha: 
A system for organizing intranet search results. Paper presented at the 2nd 
USENIX Symposium on Internet Technologies and Systems, Boulder, CO. 
Retrieved April 17, 2006, from 
http://www.sims.berkeley.edu/~hearst/papers/usits99/. 
 
 
263 
 
24. Chirita, P. A., Nejdl, W., Paiu, R., & Kohlsch?tter, C. (2005). Using ODP 
metadata to personalize search. In Proceedings of the 28th Annual International 
ACM SIGIR Conference on Research and Development in Information Retrieval, 
Salvador, Brazil (pp. 178-185). New York: ACM Press.  
25. Choo, C. W., Detlor, B., & Turnbull, D. (2000). Web Work: Information Seeking 
and Knowledge Work on the World Wide Web. Dordrecht, The Netherlands: 
Kluwer Academic Publishers. 
26. Cockburn, A., & Jones, S. (1996). Which way now? Analysing and easing 
inadequacies in WWW navigation. International Journal of Human-Computer 
Studies, 45(1), 105-129.  
27. Cousins, S. B., Paepcke, A., Winograd, T., Bier, E. A., & Pier, K. (1997). The 
digital library integrated task environment (DLITE). In Proceedings of the Second 
ACM International Conference on Digital Libraries, Philadelphia, Pennsylvania 
(pp. 142-151). New York: ACM Press.  
28. Cunha, C., Bestavros, A., & Crovella, M. (1995). Characteristics of WWW client-
based traces (No. TR-95-010): Boston University. Retrieved January 24, 2005, 
from http://cs-www.bu.edu/faculty/crovella/paper-archive/TR-95-010/paper.html. 
29. Dervin, B., & Nilan, M. (1986). Information needs and uses. In M. Williams 
(Ed.), Annual Review of Information Science and Technology (Vol. 21, pp. 3-33). 
White Plains, New York: Knowledge Industries.  
30. Drori, O. (2003). Display of search results in Google-based Yahoo! vs. LCC&K 
interfaces: A comparison study. Proceedings of Informing Science 2003 
Conference, Pori, Finland. Retrieved April 27, 2006, from 
http://shum.huji.ac.il/~offerd/papers/drori062003-b.pdf. 
31. Drori, O., & Alon, N. (2003). Using documents classification for displaying 
search results list. Journal of Information Science, 29(2), 97-106.  
32. Dumais, S., Cutrell, E., & Chen, H. (2001). Optimizing search by showing results 
in context. In Proceedings of the SIGCHI Conference on Human Factors in 
Computing Systems, Seattle, WA (pp. 277-284). New York: ACM Press.  
33. Dumais, S., Cutrell, E., & Chen, H. (2001). Optimizing search by showing results 
in context. Proceedings of the SIGCHI conference on Human factors in 
computing systems, 277 - 284.  
34. Durand, D., & Kahn, P. (1998). MAPA: A system for inducing and visualizing 
hierarchy in websites. In Proceedings of the Ninth ACM Conference on Hypertext 
and Hypermedia (pp. 66-76). New York: ACM Press.  
 
 
264 
 
35. Egan, D. E., Remde, J. R., Gomez, L. M., Landauer, T. K., Eberhardt, J., & 
Lochbaum, C. C. (1989). Formative design evaluation of SuperBook. ACM 
Transactions on Information Systems, 7(1), 30-57.  
36. Ellis, D. (1989). A behavioral model for information retrieval system design. 
Journal of Information Science, 15(4-5), 237-247.  
37. Ericsson, K. A., & Simon, H. A. (1984). Protocol Analysis: Verbal Reports as 
Data. Cambridge, MA: MIT Press. 
38. Fidel, R. (1985). Moves in online searching. Online Review, 9(1), 61-74.  
39. Furnas, G. W. (1997). Effective view navigation. In Proceedings of the SIGCHI 
Conference on Human Factors in Computing Systems (pp. 367-374). New York: 
ACM Press.  
40. Furnas, G. W., & Rauch, S. J. (1998). Considerations for information 
environments and the NaviQue workspace. In Proceedings of the Third ACM 
Conference on Digital libraries, Pittsburgh, PA (pp. 79-88). New York: ACM 
Press.  
41. Garcia, E., & Sicilia, M.-?. (2003). User interface tactics in ontology-based 
information seeking. Psychology Journal, 1(3), 242-255.  
42. Garfield, E. (2005). The agony and the ecstasy-The history and meaning of the 
journal impact factor. Paper presented at the International Congress on Peer 
Review and Biomedical Publication, Chicago, IL. Retrieved April 16, 2006, from 
http://garfield.library.upenn.edu/papers/jifchicago2005.pdf. 
43. Ginsburg, M. (2004). Visualizing digital libraries with open standards. 
Communications of the Association for Information Systems, 13, 336-356.  
44. Golovchinsky, G. (1997). Queries? Links? Is there a difference? In Proceedings 
of the SIGCHI Conference on Human Factors in Computing Systems, Atlanta, GA 
(pp. 407-414). New York: ACM Press.  
45. Greene, S., Marchionini, G., Plaisant, C., & Shneiderman, B. (2000). Previews 
and overviews in digital libraries: Designing surrogates to support visual 
information-seeking. Journal of the American Society for Information Science, 
51(3), 380-393.  
46. Guba, E. G., & Lincoln, Y. S. (1982). Epistemological and methodological bases 
of naturalistic inquiry. Educational Communication and Technology, 30(4), 233-
252.  
47. Hearst, M. (1995). TileBars: Visualization of term distribution information in full 
text information access. In Proceedings of the SIGCHI Conference on Human 
Factors in Computing Systems (pp. 59-66). New York: ACM Press.  
 
 
265 
 
48. Hearst, M., Elliot, A., English, J., Sinha, R., Swearingen, K., & Yee, P. (2002). 
Finding the flow in web site search. Communications of the ACM, 45(9), 42-49.  
49. Hearst, M. A. (1999). The use of categories and clusters for organizing retrieval 
results. In T. Strzalkowski (Ed.), Natural Language Information Retrieval (pp. 
333-373). Boston: Kluwer Academic Publishers.  
50. Hearst, M. A. (2006). Clustering versus faceted categories for information 
exploration. Communications of the ACM, 49(4), 59-61.  
51. Hearst, M. A., & Karadi, C. (1997). Cat-a-Cone: An interactive interface for 
specifying searches and viewing retrieval results using a large category hierarchy. 
In Proceedings of the 20th Annual International ACM SIGIR Conference on 
Research and Development in Information Retrieval (pp. 246-255). New York: 
ACM Press.  
52. Hearst, M. A., & Pedersen, J. O. (1996). Reexamining the cluster hypothesis: 
Scatter/Gather on retrieval results. In Proceedings of the 19th Annual 
International ACM SIGIR Conference on Research and Development in 
Information Retrieval, Zurich, Switzerland (pp. 76-84). New York: ACM Press.  
53. Hendry, D. (to appear). Workspaces for Search. Journal of the American Society 
for Information Science and Technology. from 
http://faculty.washington.edu/dhendry/docs/jasis2004.pdf. 
54. Hendry, D., & Harper, D. (1997). An informal information-seeking environment. 
Journal of the American Society for Information Science, 48(11), 1036-1048.  
55. Hert, C. (2002). Developing and evaluating scenarios for use in designing the 
National Statistical Knowledge Network. Retrieved February 15, 2006, from 
http://ils.unc.edu/govstat/papers/scenario_paper_nov_14_2002.doc. 
56. Hochheiser, H., & Shneiderman, B. (1999). Performance benefits of simultaneous 
over sequential menus as task complexity increases. International Journal of 
Human Computer Interaction, 12(2), 173-192.  
57. Jaccard, J. (1983). Statistics for the Behavioral Sciences. Belmont, CA: 
Wadsworth Publishing Company. 
58. Jacob, E. (2004). Classification and categorization: A difference that makes a 
difference. Library Trends, 52(3), 515-540.  
59. Janecek, P., & Pu, P. (2005). An evaluation of semantic fisheye views for 
opportunistic search in an annotated image collection. Journal of Digital 
Libraries, 5(1), 42-56.  
 
 
266 
 
60. Jansen, B. J., Spink, A., & Pedersen, J. (2005). A temporal comparison of 
AltaVista Web searching. Journal of the American Society for Information 
Science and Technology, 56(6), 559-570.  
61. Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real 
needs: A study and analysis of user queries on the Web. Information Processing 
and Management, 36, 207-227.  
62. J?rvelin, K., & Ingwersen, P. (2004). Information seeking research needs 
extension towards tasks and technology. Information Research, 10(1), paper 212. 
Retrieved April 27, 2006, from http://informationr.net/ir/10-1/paper212.html. 
63. Kaasten, S., & Greenberg, S. (2001). Integrating Back, History and Bookmarks in 
Web Browsers. In CHI '01 Extended Abstracts on Human Factors in Computer 
Cystems (pp. 379-380). New York: ACM Press.  
64. Kabel, S., Hoog, R. d., Wielinga, B. J., & Anjewierden, A. (2004). The added 
value of task and ontology-based markup for information retrieval. Journal of the 
American Society for Information Science and Technology, 55(4), 348-362.  
65. K?ki, M. (2005). Findex: search result categories help users when document 
ranking fails, Proceeding of the SIGCHI conference on Human factors in 
computing systems. Portland, Oregon, USA: ACM Press. 
66. K?ki, M. (2005). Findex: search result categories help users when document 
ranking fails. In Proceeding of the SIGCHI Conference on Human Factors in 
Computing Systems, Portland, OR (pp. 131-140). New York: ACM Press.  
67. Kim, K.-S., & Allen, B. (2002). Cognitive and task influences on Web searching 
behavior. Journal of the American Society for Information Science and 
Technology, 53(2), 109-119.  
68. Kleiboemer, A., Lazear, M., & Pedersen, J. (1996). Tailoring a retrieval system 
for naive users. In Proceedings of the 5th Annual Symposium on Document 
Analysis and Information Retrieval.  
69. Klein, P., M?ller, F., Reiterer, H., & Eibl, M. (2002). Visual information retrieval 
with the SuperTable + Scatterplot. In Proceedings of the Sixth International 
Conference on Information Visualisation (IV '02) (pp. 70-75). New York: IEEE 
Computer Society.  
70. Klein, P., Reiterer, H., M?ller, F., & Limbach, T. (2003). Metadata visualisation 
with VisMeB. In Proceedings of the Seventh International Conference on 
Information Visualization (IV?03) (pp. 600-605). New York: IEEE Computer 
Society.  
71. Koenemann, J., & Belkin, N. J. (1996). A case for interaction: A study of 
interactive information retrieval behavior and effectiveness. In Proceedings of the 
 
 
267 
 
SIGCHI Conference on Human Factors in Computing Systems: Common Ground, 
Vancouver, British Columbia, Canada (pp. 205-212). New York: ACM Press.  
72. Kuhlthau, C. C. (1991). Inside the search process: Information seeking from the 
user's perspective. Journal of the American Society for Information Science, 
42(5), 361-371.  
73. Kules, B., Kustanowitz, J., & Shneiderman, B. (to appear). Categorizing web 
search results into meaningful and stable categories using Fast-Feature 
techniques. Proceedings of the 6th ACM/IEEE-CS joint conference on Digital 
libraries.  
74. Kules, B., & Shneiderman, B. (2003). Designing a metadata-driven visual 
information browser for federal statistics. In Proceedings of the 2003 National 
Conference on Digital Government Research (pp. 117-122). Retrieved April 27, 
2006, from http://hcil.cs.umd.edu/trs/2003-08/2003-08.pdf. 
75. Kules, B., & Shneiderman, B. (2004). Categorized graphical overviews for web 
search results: An exploratory study using U.S. government agencies as a 
meaningful and stable structure. Paper presented at the Third Annual Workshop 
on HCI Research in MIS, Washington, DC. Retrieved April 27, 2006, from 
http://hcil.cs.umd.edu/trs/2004-38/2004-38.html. 
76. Kunz, C. (2003). SERGIO - An Interface for context driven knowledge retrieval. 
In Proceedings of eChallenges, Bologna, Italy, 2003. Retrieved April 27, 2006, 
from 
http://www.hci.iao.fraunhofer.de/uploads/tx_publications/Kunz2003_SERGIO_Pr
oceedings_of_eChallenges.pdf. 
77. Kunz, C., & Botsch, V. (2002). Visual representation and contextualization of 
search results ? List and Matrix Browser. In Proceedings of the International 
Conference on Dublin Core and Metadata for e-Communities (pp. 229-234): 
Firenze University Press. Retrieved April 27, 2006, from 
http://www.bncf.net/dc2002/program/ft/poster10.pdf. 
78. Kwasnik, B. H. (1999). The role of classification in knowledge representation and 
discovery. Library Trends, 48(1), 22-47.  
79. Lamping, J., & Rao, R. (1996). The Hyperbolic Browser: A focus + context 
technique for visualizing large hierarchies. Journal of Visual Languages and 
Computing, 7(1), 33-55.  
80. Larson, K., & Czerwinski, M. (1998). Web page design: Implications of memory, 
structure and scent for information retrieval. In Proceedings of the SIGCHI 
Conference on Human Factors in Computing Systems (pp. 25-32). New York: 
ACM Press.  
 
 
268 
 
81. Louie, A. J., Maddox, E. L., & Washington, W. (2003). Using faceted 
classification to provide structure for information architecture. Paper presented at 
the The 62nd ASIS Annual Meeting, Washington, D.C. Retrieved April 17, 2006, 
from http://depts.washington.edu/pettt/presentations/conf_2003/IASummit.pdf. 
82. Marchionini, G. (1995). Information Seeking in Electronic Environments: 
Cambridge University Press. 
83. Marchionini, G., Plaisant, C., & Komlodi, A. (1998). Interfaces and tools for the 
Library of Congress National Digital Library Program. Information Processing & 
Management, 34(5), 535-555.  
84. Markman, A. B., & Ross, B. H. (2003). Category use and category learning. 
Psychological Bulletin, 129, 592-613. Retrieved April 19, 2006, from 
http://www.psy.utexas.edu/psy/faculty/Markman/PB03.pdf. 
85. Marshall, B., McDonald, D., Chen, H., & Chung, W. (2004). EBizPort: Collecting 
and analyzing business intelligence information. Journal of the American Society 
for Information Science and Technology, 55(10), 873-891.  
86. Matsuda, K., & Fukushima, T. (1999). Task-oriented World Wide Web retrieval 
by document type classification. In Proceedings of the Eighth International 
Conference on Information and Knowledge Management, Kansas City, MO (pp. 
109-113). New York: ACM Press.  
87. Milic-Frayling, N., Jones, R., Rodden, K., Smyth, G., Blackwell, A., & 
Sommerer, R. (2004). Smartback: supporting users in back navigation. In 
Proceedings of the 13th International Conference on World Wide Web (pp. 63-
71). New York: ACM Press.  
88. Miller, D. (1981). The depth/breadth tradeoff in hierarchical computer menus. 
Proceedings of the Human Factors Society, 296-300.  
89. Nation, D. A., Plaisant, C., Marchionini, G., & Komlodi, A. (1997). Visualizing 
websites using a hierarchical table of contents browser: WebTOC. Proceedings of 
the Third Conference on Human Factors and the Web. Retrieved April 27, 2006, 
from http://hcil.cs.umd.edu/trs/97-10/97-10.html. 
90. Nielsen, J., Clemmensen, T., & Yssing, C. (2002). Getting access to what goes on 
in people's heads? - Reflections on the think-aloud technique. In Proceedings of 
the Second Nordic Conference on Human-Computer Interaction, Aarhus, 
Denmark (pp. 101-110). New York: ACM Press.  
91. Niemela, M., & Saariluoma, P. (2003). Layout attributes and recall. Behaviour & 
Information Technology, 22(5), 353-363.  
 
 
269 
 
92. Norman, K. (1991). The Psychology of Menu Selection: Designing Cognitive 
Control at the Human/Computer Interface. Norwood, NJ: Ablex Publishing 
Corporation. 
93. Nowell, L. T., France, R. K., Hix, D., Heath, L. S., & Fox, E. A. (1996). 
Visualizing search results: Some alternatives to query-document similarity. In 
Proceedings of the Nineteenth Annual International ACM/SIGIR Conference on 
Research and Development in Information Retrieval (pp. 67-75). New York: 
ACM Press.  
94. Periakaruppan, R., & Nemeth, E. (1999). GTrace - A graphical traceroute tool. In 
Proceedings of the 13th USENIX Conference on System Administration, Seattle, 
WA (pp. 69-78): USENIX Association. Retrieved April 18, 2006, from 
http://www.caida.org/publications/papers/1999/GTrace/GTrace.pdf. 
95. Perugini, S., McDevitt, K., Richardson, R., Manuel Perez-Quiones, Shen, R., 
Ramakrishnan, N., et al. (2004). Enhancing usability in CITIDEL: multimodal, 
multilingual, and interactive visualization interfaces, Proceedings of the 4th 
ACM/IEEE-CS joint conference on Digital libraries. Tuscon, AZ, USA: ACM 
Press. 
96. Pirolli, P., & Card, S. (1995). Information foraging in information access 
environments. In Proceedings of the SIGCHI Conference on Human Factors in 
Computing Systems (pp. 51-58). New York: ACM Press.  
97. Pirolli, P., Schank, P., Hearst, M., & Diehl, C. (1996). Scatter/gather browsing 
communicates the topic structure of a very large text collection. In Proceedings of 
the SIGCHI Conference on Human Factors in Computing Systems: Common 
Ground, Vancouver, British Columbia, Canada (pp. 213-220). New York: ACM 
Press.  
98. Pirolli, P. L., & Card, S. K. (1999). Information foraging. Psychological Review, 
106(4), 643-675.  
99. Pollitt, S. (1997). Interactive information retrieval based on faceted classification 
using views in knowledge organization for information retrieval, Sixth 
International Study Conference on Classification Research. University College 
London, 16-19 June 1997. 
100. Pratt, W., Hearst, M. A., & Fagan, L. M. (1999). A knowledge-based 
approach to organizing retrieved documents. In Proceedings of the 16th National 
Conference on Artificial Intelligence, Orlando, FL (pp. 80-85): American 
Association for Artificial Intelligence. Retrieved April 27, 2006, from 
http://www.sims.berkeley.edu/~hearst/papers/AAAI-99.pdf. 
101. R Development Core Team. (2005). R Language Definition. Vienna, Austria: 
R Foundation for Statistical Computing. Retrieved April 27, 2006, from 
http://cran.r-project.org/doc/manuals/R-lang.pdf. 
 
 
270 
 
102. Risden, K., Czerwinski, M., Munzner, T., & Cook, D. (2000). An initial 
examination of ease of use for 2D and 3D information visualizations of Web 
content. International Journal of Human-Computer Studies, 695 - 714.  
103. Rivadeneira, W., & Bederson, B. B. (2003). A Study of Search Result 
Clustering Interfaces: Comparing Textual and Zoomable User Interfaces: 
University of Maryland HCIL Technical Report HCIL-2003-36. Retrieved April 
27, 2006, from http://hcil.cs.umd.edu/trs/2003-36/2003-36.pdf. 
104. Rose, D. E., & Levinson, D. (2004). Understanding user goals in web search. 
In Proceedings of the 13th International Conference on World Wide Web (pp. 13-
19). New York: ACM Press.  
105. Sebrechts, M., Vasilakis, J., Miller, M., Cugini, J., & Laskowski, S. (1999). 
Visualization of search results: A comparative evaluation of text, 2D, and 3D 
interfaces. In Proceedings of the 22nd Annual International ACM SIGIR 
Conference on Research and Development in Information Retrieval (pp. 3-10). 
New York: ACM Press.  
106. Shiri, A. A., & Revie, C. (2000). Thesauri on the web: current developments 
and trends. Online Information Review, 24(4), 273-279.  
107. Shneiderman, B., Byrd, D., & Croft, W. B. (1997). Clarifying search: A user-
interface framework for text searches. D-Lib Magazine. Retrieved April 16, 2006, 
from http://www.dlib.org/dlib/january97/retrieval/01shneiderman.html. 
108. Shneiderman, B., Byrd, D., & Croft, W. B. (1998). Sorting out searching: A 
user-interface framework for text searches. Communications of the ACM, 41(4), 
95-98.  
109. Shneiderman, B., Feldman, D., Rose, A., & Grau, X. F. (2000). Visualizing 
digital library search results with categorical and hierarchial axes. In Proceedings 
of the Fifth ACM International Conference on Digital Libraries (San Antonio, TX, 
June 2-7, 2000) (pp. 57-66). New York: ACM Press.  
110. Shneiderman, B., Fischer, G., Czerwinski, M., Resnick, M., Myers, B., Candy, 
L., et al. (2006). Creativity support tools: Report from a U.S. National Science 
Foundation sponsored workshop. International Journal of Human-Computer 
Interaction, 20(2), 61-77.  
111. Shneiderman, B., & Plaisant, C. (2004). Designing the User Interface: 
Strategies for Effective Human-Computer Interaction (4th ed.). Boston: 
Pearson/Addison-Wesley. 
112. Shneiderman, B., & Plaisant, C. (2006). Strategies for evaluating information 
visualization tools: Multi-dimensional in-depth long-term case studies, Beyond 
Time and Errors: Novel Evaluation Methods for Information Visualization 
 
 
271 
 
(BELIV '06): A Workshop of the AVI 2006 International Working Conference. 
Venezia, Italy. 
113. Simon, H. A. (1979). Models of Thought. New Haven, CT: Yale University 
Press. 
114. Soergel, D. (1974). Construction and Maintenance of Indexing Languages 
and Thesauri. New York: Wiley. 
115. Soergel, D. (1999). The rise of ontologies or the reinvention of classification. 
Journal of the American Society for Information Science and Technology, 50(12), 
1119-1120.  
116. Spink, A., Bateman, J., & Jansen, B. J. (1999). Searching the Web: A survey 
of EXCITE users. Internet Research: Electronic Networking Applications and 
Policy, 9(2), 117-128.  
117. Spink, A., & Jansen, B. J. (2004). Web Search: Public Searching of the Web. 
New York: Kluwer. 
118. Spink, A., Wilson, T. D., Ford, N., Foster, A., & Ellis, D. (2002). Information 
seeking and mediated searching study. Part 3. Successive searching. Journal of 
the American Society for Information Science and Technology, 53(9), 716-727.  
119. Spink, A., Wolfram, D., Jansen, B. J., & Saracevic, T. (2001). Searching the 
web: The public and their queries. Journal of the American Society for 
Information Science, 52(3), 226-234.  
120. Swan, R., & Allen, J. (1998). Aspect Windows, 3-D visualizations, and 
indirect comparisons of information retrieval systems. In Proceedings of the 21st 
Annual International ACM SIGIR Conference on Research and Development in 
Information Retrieval (pp. 173-181). New York: ACM Press.  
121. Tanin, E., Plaisant, C., & Shneiderman, B. (2000). Browsing large online data 
with Query Previews. In Proceedings of the Symposium on New Paradigms in 
Information Visualization and Manipulation (NPIVM) 2000, Washington, DC: 
ACM Press. Retrieved April 27, 2006, from 
http://citeseer.ist.psu.edu/tanin00browsing.html. 
122. Taylor, A. (1999). The Organization of Information. Englewood, CO: 
Libraries Unlimited, Inc. 
123. Teitelbaum, R. C., & Granda, R. E. (1983). The effects of positional 
constancy on searching menus for information. In Proceedings of the SIGCHI 
Conference on Human Factors in Computing Systems, Boston, MA (pp. 150-153). 
New York: ACM Press.  
 
 
272 
 
124. Tullis, T. (1988). Screen design. In M. Helander (Ed.), Handbook of Human-
Computer Interaction (pp. 377-411). Amsterdam, The Netherlands: Elsevier 
Science Publishers.  
125. Turetken, O., & Sharda, R. (2005). Clustering-based visual interfaces for 
presentation of web search results: An empirical investigation. Information 
Systems Frontiers, 7(3), 273-297.  
126. Vakkari, P. (2000). eCognition and changes of search terms and tactics during 
task performance: A longitudinal case study. In Proceedings of the RIAO 2000 
Conference. Retrieved April 27, 2006, from 
http://www.info.uta.fi/vakkari/Vakkari_Tactics_RIAO2000.pdf. 
127. Vakkari, P. (2001). A theory of the task-based information retrieval process: 
A summary and generalisation of a longitudinal study. Journal of Documentation, 
57(1), 44-60.  
128. Vickery, B. C. (1960). Faceted Classification: A Guide to Construction and 
Use of Special Schemes. London: Aslib. 
129. Wang, P., Hawk, W. B., & Tenopir, C. (2000). Users? interaction with World 
Wide Web resources: An exploratory study using a holistic approach. Information 
Processing & Management, 36(2), 229-251.  
130. Watters, C., & Amoudi, G. (2003). GeoSearcher: Location-based ranking of 
search engine results. Journal of the American Society for Information Science 
and Technology, 54(2), 140-151.  
131. Wen, J. (2003). Post-valued recall web pages: User disorientation hits the big 
time. IT & Society, 1(3), 184-194. Retrieved April 27, 2006, from 
http://www.stanford.edu/group/siqss/itandsociety/v01i03/v01i03a10.pdf. 
132. White, R., Muresan, G., & Marchionini, G. (2006). Evaluating Exploratory 
Search Systems - SIGIR 2006 Workshop Call for Papers. Retrieved April 24, 
2006, from http://www.umiacs.umd.edu/~ryen/eess. 
133. White, R. W., Kules, B., Drucker, S. M., & schraefel, m. c. (2006). Supporting 
exploratory search. Communications of the ACM, 49(4), 36-39.  
134. Wildemuth, B. M. (2004). The effects of domain knowledge on search tactic 
formulation. Journal of the American Society for Information Science and 
Technology, 55(3), 246-258.  
135. Yee, K.-P., Swearingen, K., Li, K., & Hearst, M. (2003). Faceted metadata for 
image search and browsing. In Proceedings of the SIGCHI Conference on Human 
factors in Computing Systems, Ft. Lauderdale, FL (pp. 401-408). New York: 
ACM Press.  
 
 
273 
 
136. Zamir, O., & Etzioni, O. (1998). Web document clustering: a feasibility 
demonstration. In Proceedings of the 21st Annual International ACM SIGIR 
Conference on Research and Development in Information Retrieval, Melbourne, 
Australia (pp. 46-54). New York: ACM Press.  
137. Zamir, O., & Etzioni, O. (1999). Grouper: a dynamic clustering interface to 
Web search results. Computer Networks, 31, 1361-1374.  
138. Zaphiris, P., & Mtei, L. (1997). Depth v. Breadth in the Arrangement of Web 
Links. Retrieved April 27, 2006, from http://otal.umd.edu/SHORE/bs04. 
139. Zaphiris, P., Shneiderman, B., & Norman, K. (2002). Expandable indexes 
versus sequential menus for searching hierarchies on the World Wide Web. 
Behaviour & Information Technology, 21(3), 201-207.  
140. Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., & Ma, J. (2004). Learning to 
cluster web search results. In Proceedings of the 27th Annual International 
Conference on Research and Dvelopment in Information Retrieval, Sheffield, 
United Kingdom (pp. 210-217). New York: ACM Press.