Friday, April 19, 2019

Cement trucks, Undifferentiated Services and Collections, Google, and Library Discovery Systems


Every morning when my daughter Rachel and I are heading to South Coast Plaza for our morning hour of walking, we see at least one cement truck. It does not matter if we take the freeway or city streets, we consistently see at least one. Considering that greater Los Angeles seems to be the concrete capital of the U.S. this isn’t particularly surprising. None the less, it is a bit of a running joke for us.

If you’re familiar with cement trucks, they are pretty big. They’re built for carrying a massive load, and designed to dump it quickly; not in a particularly precise fashion. That said, they are pretty efficient if you just need an undifferentiated 8 cubic yards or 40,000 pounds of concrete.

I think that in the past libraries have approached their information provision much in the same manner as the cement truck – we gave them everything in a huge undifferentiated pile, without even understanding what they may want or need. As we’re not the only information game in town, it is important to customize our approach to help users appreciate and navigate our services. Here are a couple of places where I think we could start.

·         Print collections: In most libraries, our print collections occupy large swaths of floor space, without any signage other than a call number, leaving users with no idea of the wealth of materials at their fingertips. For them, it is as undifferentiated as the pile of cement. Most of today’s users take their cues from retail, be it the local bookstore, grocery store, or large clothing store. All of these are replete with signage and other visual cues that help users navigate the physical space.  Perhaps we should take a cue from the work that is happening at the Arizona State University Libraries where they are experimenting with how they can use their collections as a means of intellectual and strategic engagement. Their white paper “The Future of the Academic Library Print Collection: A Space for Engagement” is quite insightful.

·         Reference services: Most academic libraries are still offering some sort of reference service be it in person or online. While the service may be branded as “research help” or “reference”, or “research consultations” the average user still is likely to see this as an undifferentiated service that they do not know what to do with. Perhaps some examples of what “reference” or “research help” means would be helpful. This might include things like “help find peer-reviewed articles for your assignment” or “how to know if sources you find with Google are credible”, or “help finding primary sources”. A poster that I once saw at Dickinson College in PA gives a good example”

·        Database A-Z list:  Almost every academic library has an A-Z list of database that usually runs more than 200 entries. While vendors have made some progress in naming databases, for many users the names are still pretty opaque. There are misleading database names, like “Web of Science” that implies it is only science content even though it contains vast amounts of social science, arts, and humanities content. There are names like “Scopus” or “arXiv” that give the uninitiated not a clue as to their content or use.  Libraries should be providing searchable database descriptions and subject groupings for their databases. Fortunately, this is beginning to happen in at least some libraries.

·         Microfilm, Government Documents, Maps:  In large research libraries, microfilm, government documents, and maps fall into the undifferentiated category that most users ignore. Microfilm is the most problematic, not only because it is a difficult to use format, but library collections of microform typically have no coherent collection structure. The cabinets will have a mix of journals, monographs, reports, and dissertations. It isn’t easy to help users see this mass in any way that would prompt them to use its resources. Likewise, maps and government documents are not arranged in ways that invite use. Government documents are often as opaque as the government itself can be and libraries have increasingly eliminated government documents reference services, leaving large collections of documents as an unnavigable block. Perhaps better signage or ways to visually display the content might help users find these intellectually rich treasures.


Google, as well as our library discovery systems, also resemble the cement truck and the imprecise dump of data. A search on Google for “global warming” yields 112 million results and in a discovery system like the Cal State University Libraries Ex Libris Primo, about 1.1 million hits. It is little wonder that users are frustrated and do not go beyond the first page or two or results. What are users expected to do with that much data, especially when some of it does not seem to relate at all to the search query?

Google attempts adds some structured resources that might help the users in parsing the search results. These are things like showcasing articles from Wikipedia, spotlighting some images, and creating a “Top Stories” section. While these can be helpful, the users are still faced with 112 million hits and no way to really sort them, or to filter them in a way that is useful.

Library discovery systems are somewhat better as they offer fielded searching and a set of filters that allow users to narrow down the results. Even so, the results are still confusing in that the user often cannot understand, based on their search query, why these particular results are on the screen.

As a librarian, I consider myself a fairly well informed and skilled searcher, and I still find both of these tools incredibly frustrating and limiting at times. Here are some areas that I think need attention by Google and by the vendors of library discovery systems.
  • Better algorithms: Google and most library discovery systems have moved away from strictly relying on metadata to using the full-text corpus as the basis of search. While the addition of the full-text does provide the ability to discover needed information buried in the middle of an article or book, the sheer mass of text often works against “search” and results in a slew of false positives. Google and library discovery system vendors need to improve their algorithms as well as incorporate some of the machine learning research that is being done, in order to improve search results.
  • Better metadata for precision searching: While traditional metadata (author, title, journal, publisher, etc.) is no longer the total basis of online searching, it is critical when a user is interested in precision searching, especially in cases when they are searching for a known item. Library discovery systems make reasonably good use of this metadata in their fielded searching capabilities, but Google appears not to think this is important. Vendors should demand better metadata from information providers.
  • Ability to specify when punctuation should be included or excluded. Google and most library discovery systems throw out all the punctuation in their indexes. This works OK in many cases, but in some cases, the presence or absence of the punctuation changes the meaning of what is being searched. For example, recently I was searching in Google Scholar for articles on the concept of “identity space” and about half of the results on every page showed results where the two words were together but there was a comma or a period in between them, and therefore not the concept of “identity space” I was looking for. There was no way to filter the 8,300 results to get rid of the results with the punctuation. Because the punctuation was thrown out in the search algorithm, Google presented the results as if “…identity. Space” was equal to “identity space”. This leaves the student or researcher to labor through pages of results to find results that a search engine or discovery layer should have been able to present.
  • Ability to search only specific fields. While most library discovery systems provide for some level of fielded searching (e.g. title, author, journal, publishers, etc.). Google or Google scholar do not. Often students or researchers are searching for a known item and it is very frustrating when they input a search query, in quotations, and it either fails or retrieves so much ”garbage” that the user tries over and over hoping that they will get what they want/need.
  • Better image searching. This is often, in my opinion, one of Google’s biggest fails. Searching for an image of a known person may or may not get you a picture of the person, but you certainly will get pictures with images that may have mentioned you in a document or from someone else’s picture on Instagram where you commented. Google needs to do a better job so that you don’t get millions of pictures most of which are totally irrelevant to your search. For example, if you search my name “Clem Guthro” in quotes, you get thousands of hits but I am only in 21 of the pictures, with many of the others showing no relation to me or to my work. It has been most frustrating when I have been looking for pictures of a particular university library and end up with a slew of pictures of a library from a different university, sometimes not even on the same continent. Certainly in today’s world of AI and machine learning, and advanced algorithms, Google can do better.
  •  Linked Open Data:  Linked Open Data is the de rigueur for much of the web and we can see this in action in the Google searches that pop up a Wikipedia blurb or a marquee of movie covers on the same topic. There has been some talk of including Linked Open data in library catalogs and there are some elements seen in the RDA standards. To date, library discovery systems are not making much use of Linked Open Data, and publishers are not making enough of its content in this form. Wikipedia has gained such prominence, despite the groaning and protestation from some academics, precisely because the entire Wikipedia content is out there as DBpedia in Linked Open Data format for Google and others to use. Might it not be transformative if every encyclopedia and dictionary that libraries license were available as Linked Open Data?  Every time a user did a search in our discovery layer, in-depth academic content from these resources would immediately pop up into the results.

 We need to move beyond the cement truck approach to be more agile and customizable in our approach. It is what users need and want, and what they experience from most of the web and from the apps they use.

No comments:

Post a Comment

The role of daydreaming and Imagination

Often when I am sitting in a meeting, a lecture, or presentation, my mind wanders. Early on in my career I found this annoying but over time...