Outsourcing Cataloging at the University of Maryland, College Park: Problems and Opportunities by Benjamin Bradley (Discovery Librarian, University of Maryland Libraries) and beth Guay (Continuing Resources Librarian, University of Maryland Libraries) background issue of duplicate working links displaying on WorldCat catalog records For the university of Maryland Libraries (the Libraries), a major from local catalog records and from the WCKB, concluding that the outsourcing initiative began in late 2011 following an earlier implemen- inconvenience to the patron would be negligible. tation of WorldCat Local as a discovery tool. The Libraries transitioned In situations in which domain and/or path names for URLs within from MARC record set loads of e-resources collections into the local e-resource records that had been loaded into the local catalog changed, catalog to the creation and activation of e-resources collections in the local catalog URLs would present an inconvenience to WCD users. One WorldCat knowledge base (WCKB) using WorldShare Collection Man- such problem did surface, through staff investigation, on two separate ager. WCKB collections automate holdings maintenance on WorldCat occasions, for titles in the series, Transportation research record, or catalog records and provide direct linking from those catalog records ?TRR.?6 In the first instance in 2018, a STEM librarian was considering to the Libraries? e-resources in the local discovery layer. withdrawing print versions of resources for which the Libraries? had Cataloging of individual ebooks purchased on approval plans or firm purchased perpetual access to corresponding e-versions. The librarian orders transitioned from review and enhancement of vendor MARC consulted with CRDM?s Library Services Supervisor, who determined records loaded to the local catalog to title activations in appropriate the extent of the Libraries? perpetual access rights; the Library Services knowledge base collections. Catalogers of monographic e-resources Supervisor then consulted with the Continuing Resources Librarian, not sourced from vendor records but received on standing orders or because e-version records for titles in the series were in the local cata- subscriptions, licensed in perpetuity and not available in WCKB col- log. The staff then discovered the URL changes. An additional factor lections, discontinued exporting catalog records to the local catalog, for consideration was that the cataloging of these e-version resources and instead began creating and updating local WCKB collections for had been suspended due to the previously discussed staffing issues. access provision. With the cessation of use of SFX by staff in the then The Continuing Resources Librarian and Library Services Supervisor Technical Services Division of the Libraries in December 2015 and the consulted with the Discovery Librarian, who created a local WCKB completion of the data migration to WorldCat Discovery (WCD) and collection derived from the oCLC numbers of the local catalog records the WorldCat link resolver in July 2015, electronic resources cataloging for the resources, and globally updated the collection URLs. In addition, transitioned from a traditional to a highly automated and outsourced the Discovery Librarian identified local catalog records for URL field environment.1 From yet another perspective, with 77% of the Libraries deletion by the Consortial Library Applications Support (CLAS) unit FY2013 expenditures among print and electronic books and journals (staff responsible for the ILS database) to prevent the incorrect links going to electronic resources, WorldCat Discovery had effectively from appearing in WCD. replaced the local catalog.2, 3 To create the local collection, the Discovery Librarian worked with In June 2013, seven Metadata Services Department staff members? CLAS to receive the MARC records for the series and then used MarcEd- duties included e-resources cataloging, among other assignments. it to transform the MARC records into a KBART file. The MARC 2 Following staff retirements, departures, reassignments, and a March Kbart Converter MarcEdit plugin pulls the necessary data from MARC 2017 reorganization, the number of catalogers working on electronic records (such as the URL, title, and standard number) and creates a resources, among other assignments, had fallen from seven to three.4 tab-separated values file using that data. The Continuing Resources Thus, cataloging of some monographic e-resources received on stand- Librarian alerted the Discovery Librarian of the previous two cataloging ing orders or subscriptions, licensed in perpetuity, and not available in policies for e-resources: the earlier ?single record approach? under WCKB collections was suspended. which e-resources holdings were added to print version records, followed The reorganization resulted in the formation of four new units by a ?separate record approach? to print and e-version resources. WCKB within Collection Services (formerly the Technical Services Division): records need an oCLC number to place holdings on the corresponding Acquisitions and Data Services, Continuing Resources and Database record and for linking. Because of the two different cataloging policies, Management (CRDM), Discovery and Metadata Services, and Original the Discovery Librarian needed to find which oCLC numbers in the and Special Collections Cataloging. This change brought relief in the local MARC records referred to a print master record and which referred form of a new Discovery Librarian position. The Continuing Resources to an electronic master record.7 Using the oCLC numbers from the Librarian, one of the three remaining e-resources catalogers, and the local records, the Discovery Librarian ran a Z39.50 batch search of the Discovery Librarian began collaborating to distill the benefits of the WorldCat database using MarcEdit. Once MarcEdit returned the set of outsourcing experience. master records, the Discovery Librarian exported the 776 (additional physical form entry) fields to check what other formats were listed. If outsourcing Problems and opportunities the 776 subfield $i referenced a print version, the librarian could then Problem: Transportation research record deduce that the record was an electronic version and the oCLC number SFX had been customized to push local catalog record URLs to the was correct, and if an electronic version was referenced, the Discovery link resolver services menu for catalog records lacking an ISBN or an Librarian could replace the oCLC number from the local record with ISBN in the first ISBN field of catalog records that matched those in that oCLC number. Finally, the URLs needed to be updated. This the SFX knowledge base. This functionality was of course lost in the was a minor change, as only the domains needed to be changed, which migration to the WorldCat link resolver. In these cases, from the local could be accomplished using find and replace, correcting all the URLs catalog, the WorldCat link resolver returns a canned response, ?We were at once. Then the 881 records in the KBART file could be uploaded as a unable to find direct full text links for this item.? A good number of local collection to WorldShare Collection Manager, adding the Libraries? e-resources local catalog records produce this response, for example, holdings to the correct records and linking directly to the resources. many among the 124,885 ?legacy? e-resources records that had been The second incidence occurred about a year later, when the Acquisi- loaded in MARC record sets. tions and Data Services Graduate Assistant reviewing the TRR license As intended when the Libraries first implemented WorldCat Local, found that links to the resources via WorldCat Discovery were again URLs from local catalog records are accessible in WCD. oCLC created failing. She reported the problem via the Libraries? help desk ticketing a Groovy Script for making this service possible.5 With this fact in mind, system, and within minutes, the Library Services Supervisor contacted the Discovery and Continuing Resources librarians had discussed the continued on page 18 Against the Grain / December 2019 - January 2020 17 outsourcing Cataloging at the university of Maryland ... from page 17 the Continuing Resources Librarian.8 Once again, they consulted with the Discovery Librarian, who quickly resolved the URL access problem. In the previous incident, the base URLs needed a small update, but in this case, the series changed platforms so the changes to the URLs were more significant. However, the Discovery Librarian found that the URLs for each issue followed a similar pattern; the domain and path for each title?s URL contained the same domain and path followed by a pattern using the title?s issue number. Using find and replace in a text editor, the Discovery Librarian used regular expressions to automate the URL corrections. This time, the Continuing Resources Librarian suggested contributing the Libraries? local WCKB collection to oCLC?s global collections, since the staffing issue remained, and the cataloging of this particular collection had been suspended. If contributed as a global collection, there would be potential for other catalogers, outside of the Libraries, to add to it.9 This time, the Continuing Resources Librar- ian took steps to remove local catalog records for the e-resources, in consideration of several factors, including differing past e-resources cataloging practices, new practices, and that e-resources in this series were no longer being cataloged. One effect from this action was that call number searching for these e-resources, lacking local catalog records, is no longer an option in WCD. Problem: Discoverability of PMLA in WorldCat Discovery While WorldCat Discovery and the associated systems offer effi- ciencies, they can come at the cost of local control over the metadata presented to users and the user experience in general. One such example was brought to the Discovery Librarian?s attention by a Humanities Librarian. The librarian was having trouble finding the Libraries? print holdings for a journal. The Discovery Librarian looked into the problem and found that when the user tried finding the title, PMLA, the Publi- cations of the Modern Language Association, an important title for our English literature students, the appropriate records with the Libraries? holdings attached did not display on the first page of results; they were often buried and ended up around the fifth or sixth page of results.10 This search surfaces article and issue records from Crossref, all titled ?PMLA.? The results do not provide enough metadata to the patrons to understand what the articles are about (see Figure 1). These records do link to the journal?s homepage and display our coverage. Nevertheless, the Libraries? print holdings would require expert searching to find. This is an example of the problem-solving libraries with outsourced catalogs face: library personnel need to work with multiple third parties to trouble-shoot problems. Figure 2 Crossref? He began by exploring the records in WorldCat Discovery. They were created from the Crossref data, had a minimal amount of metadata, yet included DOIs. The DOIs in the article records redirected to a defunct DOI page on Crossref, but the DOIs in the issue records successfully directed to the correct issues of the journal. He then used Crossref?s Metadata Search tool to search for more about these records. The tool provides access to the Crossref metadata in a JSON format, wherein he found that many of the records had ?Test accounts? listed as the publisher (see Figure 2).11 Because the records appeared to be test records, the Discovery Librarian contacted Crossref to ask if there was anything that could be done about them. The contact at Crossref explained that these records were a holdover from an older method for managing defunct DOIs, and that there were no current plans to fix those DOIs. The Discovery Librarian posted on the WorldCat Discovery Community Center, a forum hosted by oCLC for librarians using WorldCat Discovery, asking if other librarians had encountered this problem. A few librarians responded about similar situations and Figure 1 shared an enhancement request. Since then, oCLC has shared that they hope to make changes to their algorithm that would help fix these When this issue first arose, the Discovery Librarian worked to find problems. On the other hand, Crossref could clean-up the metadata the source of the problem: was this a problem with the system pro- for these defunct DOIs. vided by oCLC (WCD), or a problem with the metadata provided by continued on page 19 1 8 Against the Grain / December 2019 - January 2020 outsourcing Cataloging at the university of Maryland ... from page 18 Opportunity: Automated KBART Feeds12 One opportunity libraries can leverage with the automation afforded by the connection of WCD and WorldShare Collection Manager is au- IMF PUBLICATIONS tomated KBART feed services. This service enables publishers to send a library?s entitlements to oCLC to automatically activate its holdings in the knowledge base. While the process is automated, initiating the A T N O C H A R G E . service is not a matter of merely flipping on a switch; it requires manual intervention. FREE ACCESS TO OUR In our case, the KBART feed activates titles in two collections, one for serials titles and one for monograph titles, which cannot have titles E - C O L L E C T I O N already activated in them when the automated feed service begins. The Discovery Librarian worked with the Head, CRDM, to deselect titles in these collections. To prevent loss of access, they worked together O F M O R E T H A N to activate these titles elsewhere: The Discovery Librarian created a collection for the eBooks, while the Head, CRDM worked with her staff 2 1 , 0 0 0 T I T L E S to ensure that the serials were activated in other collections. After they were sure that the eBooks and serials were activated in STARTING JANUARY 1, 2020. other collections, the Discovery Librarian deselected the eBook and journals collections, and the Head, CRDM retrieved the Libraries? cre- VISIT ELIBRARY.IMF.ORG dentials from the provider, springer Nature, and sent the command, via email, to oCLC to start the automated feeds. Subsequently, the ebook collection received 44,489 records from springer Nature, 29 of which were listed as invalid, meaning 29 eBook entitlements were B E C A U S E K N O W L E D G E not activated. Upon review of the report for the load, it was found that these titles did not yet exist in the knowledge base collection. The I S A P U B L I C G O O D Discovery Librarian reached out to springer Nature, and over time, the missing titles were added. Overall, automated feeds demonstrate the potential of the Collection Manager and the KBART format. Vendors sending KBART data for your entitlements to manage library e-resources holdings updates is the goal of the KBART automation recommended practice and is something that has been gradually adopted by vendors and system providers. While there is much promise in the automation, the process is not perfect, requiring manual interventions and checks to ensure quality. Opportunity: ProQuest Dissertations and Theses (PQDT) Global KB project By leveraging WorldCat data using WCD, libraries are able to provide enhanced discovery and access to materials not previously available in their traditional catalogs. However, the scope of WorldCat is larger than the WCKB. Materials the library has access to may be discoverable in WorldCat because of library-contributed records, but without a corresponding WCKB collection, a library cannot provide access to those materials. At the university of Maryland, this sit- uation has resulted in ILL requests for such materials. In particular, the ILL department found that ProQuest Dissertations & Theses titles made up 16% of ILL requests received and cancelled because the title was available on the PQDT platform, the largest of any single platform in their study.13 So while the discovery layer enables the Libraries? users to find records for resources to which they are entitled, it does not offer access to those resources because of the lack of a knowledge base collection. Furthermore, ILL staff are inconvenienced by having to review and redirect patrons to the PQDT platform. The Discovery Librarian undertook the work of creating a collection for PQDT titles to provide both discovery and access in WCD. PQDT is a large col- lection; as of this writing, ProQuest states that the database contains 5 million items, and from some searching in WorldCat, there are an estimated 1 million titles cataloged there. In order to work on such a large collection, the Discovery Librar- ian developed a Python script to use the WorldCat Search API to find records for ETDs and to write the data to a file. The script is run from I N T E R N A T I O N A L M O N E T A R Y F U N D the command line and searches only a single year at a time because the API limits access to the first 100,000 search results, meaning one cannot pull all the records at the same time. Initially the script wrote the results to a file and required manual data cleanup and transformation of the MARC21XML into KBART using MarcEdit. The Discovery Librarian has since refined the script to automate the data cleanup and conversion from MARC21XML to KBART. While searching by year continued on page 20 as sharing them on GitHub in addition to sharing them via the World- outsourcing Cataloging at the university of Maryland ... Share Collection Manager. from page 19 While the automated KBART feed is an incomplete story, this case demonstrated how the use of a discovery product aggregating data from has generally operated within the 100,000 search result limit, some multiple vendors can be complicated for librarians to untangle. As years return well over 100,000 results, so logic to initiate additional with the case of PMLA, librarians troubleshooting vendor data need to searches, running through the alphabet to search for items understand where the data comes from and work with stakeholders, based on the first letter for the title and author, was added. including other librarians and vendors, to understand and attempt The script starts with the title search, but if too many results to resolve problems in library systems. It is also important to are returned it runs the author search as well. In practice understand that when a solution is out of librarians? hands this has not been found to be a wholly effective method, and dependent on factors such as providers? development but it enables the searches to run. timelines, it is important for providers to communicate Once the script has the MARC data, it saves elements about those factors, such as their development schedules, to needed for the KBART file including title, author, and librarians who must responsibly keep their customers apprised publication data. When the script reads the 856 (URL) of these situations beyond their control. We feel for our public fields, it uses regular expressions to find particular URLs services librarians, such as our Humanities Librarian, who will and extract certain elements. Many ProQuest URLs are undoubtedly encounter the same recurring and new problems structured the following way: search.proquest.com/(a when providing services to library patrons seeking discovery unique identifier). Because the URLs are coming from and access to the Libraries? resources. oCLC records, they often have proxies prepended, so the script uses the regular expression, ?docview/(\d+),? to find the unique identifier and rewrite the URL altogether remov- ing proxies or other bad data. The script performs similar cleanup for Endnotes URLs containing ?gateway.proquest.com? or ?wwwlib.umi.com.? If 1. The uM Libraries are members of a seventeen-member library con- a matching URL is not found during this process, the script records sortium. The Libraries share the ExLibris? ALEPH ILS with the other the value ?not found? for that field. After the URLs are revised, the consortium members, but have opted out of ExLibris? SFX services. See script writes the data to the text file and then moves on to the next set ?USMAI,? viewed Aug. 29, 2019, http://www.usmai.org/ and ?USMAI of search results. The output still requires manual checking, with the (University System of Maryland & Affiliated Institutions); and: Summa- assistance of a recently hired Coordinator in Discovery and Metadata ry of Available and Shared E-Resources, Platforms, and Services, Version Services. ILL personnel now send the Discovery Librarian a monthly 1.0, updated 4/25/2019, viewed August 29, 2019, http://www.usmai.org/sites/public/files/USMAI_Summary_Available_and_Shared_E-Resourc- report of theses and dissertations that patrons have requested that ILL es_Platforms_Services.pdf. personnel have canceled and referred; these titles can then be manually 2. university of Maryland Libraries. Annual report, 2013. added to the collection. After about a year of work, the knowledge 3. WorldCat is oCLC?s catalog record database. base contains 275,000 titles. 4. This number excludes copy catalogers activating individual title access Conclusion in WCKB collections and making minor adjustments, i.e., adding or revising the order of the ISBN fields in the local catalog?s vendor records. The authors have illustrated that within libraries, interdependence 5. See ?Apache Groovy,? Wikipedia, https://en.wikipedia.org/wiki/ across unit and division lines is indisputable in highly automated Apache_Groovy viewed July 29, 2019. environments. In the case of Transportation research record, within 6. See TRR at the ISSN Portal, Print ISSN, 0361-1981 (https://portal. Collection Services, staff in three of the four units played roles in issn.org/resource/ISSN/0361-1981); e-ISSN, 2169-4052 (https://portal. identifying problems and contributing to their resolutions. In this issn.org/resource/ISSN/2169-4052). effort, good communication skills have shown to be essential. We 7. The phrase ?master record? here and throughout the paper refers to have also demonstrated that hand in hand with highly technical skills, records from oCLC?s WorldCat database. institutional memory plays an important role in the process of elec- 8. For information on the Libraries? trouble ticketing system, see Re- tronic resources management in libraries. becca Kemp Goldfinger and Mark Hemhauser, ?Looking for Trouble Working across division lines, e.g., the Collection Services Discov- (Tickets): A Content Analysis of University of Maryland, College Park ery Librarian?s work with the ILL department, has also been shown E-Resource Access Problem Reports,? Serials Review 42, no. 2 (2016): 84-87. to be highly valuable. Beyond collegiality, creativity as shown by the Discovery Librarian?s approach to assisting the ILL department?s 9. See ?Knowledge base collections,? oCLC, https://help.oclc.org/Metadata_Services/WorldShare_Collection_Manager/Choose_your_ PQDT problem is a useful and effective complement to technological Collection_Manager_workflow/Knowledge_base_collections, viewed ?knowhow.? July 30, 2019. The methods and tools for creating the PQDT knowledge base 10. See PMLA at the ISSN Portal, Print ISSN, 0030-8129 (https://portal. collection have supported the creation of many others. In addition to issn.org/resource/ISSN/0030-8129); e-ISSN, 1938-1530 (https://portal. subscription databases, the Discovery Librarian has developed and issn.org/resource/ISSN/1938-1530). contributed a number of open access collections (which also includes 11. JSON example: https://api.crossref.org/v1/works/10.1632/pm- collections whose titles are in the public domain) to the WorldCat la.2003.118.6.1434d. knowledge base. These open access collections include the university 12. See KBART Automation Working Group. KBART Automation: of Nebraska-Lincoln Zea books, Indiana Authors and Their Books Automated Retrieval of Customer Electronic Holdings: NISO RP-26-2019, (Baltimore, MD: National Information Standards Organization, from Indiana university Libraries, the Illinois Open Publishing 2019), https://groups.niso.org/apps/group_public/download.php/21896/ Network, university of Nebraska-Lincoln Open Access Journals, NISO_RP-26-2019_KBART_Automation.pdf. ACRL Open Access titles, and more. The Discovery Librarian is 13. Hilary Thompson, ?Find It Fail: What ILL can tell us about looking into additional avenues for sharing these KBART files, such Challenges related to Known Item Discovery,? presentation at the UM Libraries? Library Research & Innovative Practice Forum, June 4, 2015, viewed Aug. 29, 2019, https://drum.lib.umd.edu/handle/1903/16385. 20 Against the Grain / December 2019 - January 2020