Every
morning when my daughter Rachel and I are heading to South Coast Plaza for our
morning hour of walking, we see at least one cement truck. It does not matter
if we take the freeway or city streets, we consistently see at least one. Considering
that greater Los Angeles seems to be the concrete capital of the U.S. this
isn’t particularly surprising. None the less, it is a bit of a running joke for
us.
If you’re
familiar with cement trucks, they are pretty big. They’re built for carrying a
massive load, and designed to dump it quickly; not in a particularly precise
fashion. That said, they are pretty efficient if you just need an undifferentiated
8 cubic yards or 40,000 pounds of concrete.
I think that
in the past libraries have approached their information provision much in the
same manner as the cement truck – we gave them everything in a huge
undifferentiated pile, without even understanding what they may want or need.
As we’re not the only information game in town, it is important to customize our
approach to help users appreciate and navigate our services. Here are a couple
of places where I think we could start.
·
Print
collections: In most libraries, our print collections occupy large swaths of
floor space, without any signage other than a call number, leaving users with
no idea of the wealth of materials at their fingertips. For them, it is as
undifferentiated as the pile of cement. Most of today’s users take their cues
from retail, be it the local bookstore, grocery store, or large clothing store.
All of these are replete with signage and other visual cues that help users
navigate the physical space. Perhaps we
should take a cue from the work that is happening at the Arizona State
University Libraries where they are experimenting with how they can use their
collections as a means of intellectual and strategic engagement. Their white
paper “The
Future of the Academic Library Print Collection: A Space for Engagement”
is quite insightful.
·
Reference
services: Most academic libraries are still offering some sort of reference
service be it in person or online. While the service may be branded as
“research help” or “reference”, or “research consultations” the average user still
is likely to see this as an undifferentiated service that they do not know what
to do with. Perhaps some examples of what “reference” or “research help” means
would be helpful. This might include things like “help find peer-reviewed
articles for your assignment” or “how to know if sources you find with Google
are credible”, or “help finding primary sources”. A poster that I once saw at
Dickinson College in PA gives a good example”
· Database A-Z list: Almost
every academic library has an A-Z list of database that usually runs more than
200 entries. While vendors have made some progress in naming databases, for
many users the names are still pretty opaque. There are misleading database
names, like “Web of Science” that implies it is only science content even
though it contains vast amounts of social science, arts, and humanities
content. There are names like “Scopus” or “arXiv” that give the uninitiated not
a clue as to their content or use.
Libraries should be providing searchable database descriptions and
subject groupings for their databases. Fortunately, this is beginning to happen
in at least some libraries.
·
Microfilm,
Government Documents, Maps: In
large research libraries, microfilm, government documents, and maps fall into
the undifferentiated category that most users ignore. Microfilm is the most
problematic, not only because it is a difficult to use format, but library
collections of microform typically have no coherent collection structure. The
cabinets will have a mix of journals, monographs, reports, and dissertations. It
isn’t easy to help users see this mass in any way that would prompt them to use
its resources. Likewise, maps and government documents are not arranged in ways
that invite use. Government documents are often as opaque as the government
itself can be and libraries have increasingly eliminated government documents
reference services, leaving large collections of documents as an unnavigable
block. Perhaps better signage or ways to visually display the content might
help users find these intellectually rich treasures.
Google, as
well as our library discovery systems, also resemble the cement truck and the
imprecise dump of data. A search on Google for “global warming” yields 112
million results and in a discovery system like the Cal State University
Libraries Ex Libris Primo, about 1.1 million hits. It is little wonder that
users are frustrated and do not go beyond the first page or two or results.
What are users expected to do with that much data, especially when some of it
does not seem to relate at all to the search query?
Google attempts
adds some structured resources that might help the users in parsing the search
results. These are things like showcasing articles from Wikipedia, spotlighting
some images, and creating a “Top Stories” section. While these can be helpful, the
users are still faced with 112 million hits and no way to really sort them, or
to filter them in a way that is useful.
Library
discovery systems are somewhat better as they offer fielded searching and a set
of filters that allow users to narrow down the results. Even so, the results
are still confusing in that the user often cannot understand, based on their
search query, why these particular results are on the screen.
As a
librarian, I consider myself a fairly well informed and skilled searcher, and I
still find both of these tools incredibly frustrating and limiting at times. Here
are some areas that I think need attention by Google and by the vendors of
library discovery systems.
- Better algorithms: Google and most library discovery systems have moved away from strictly relying on metadata to using the full-text corpus as the basis of search. While the addition of the full-text does provide the ability to discover needed information buried in the middle of an article or book, the sheer mass of text often works against “search” and results in a slew of false positives. Google and library discovery system vendors need to improve their algorithms as well as incorporate some of the machine learning research that is being done, in order to improve search results.
- Better metadata for precision searching: While traditional metadata (author, title, journal, publisher, etc.) is no longer the total basis of online searching, it is critical when a user is interested in precision searching, especially in cases when they are searching for a known item. Library discovery systems make reasonably good use of this metadata in their fielded searching capabilities, but Google appears not to think this is important. Vendors should demand better metadata from information providers.
- Ability to specify when punctuation should be included or excluded. Google and most library discovery systems throw out all the punctuation in their indexes. This works OK in many cases, but in some cases, the presence or absence of the punctuation changes the meaning of what is being searched. For example, recently I was searching in Google Scholar for articles on the concept of “identity space” and about half of the results on every page showed results where the two words were together but there was a comma or a period in between them, and therefore not the concept of “identity space” I was looking for. There was no way to filter the 8,300 results to get rid of the results with the punctuation. Because the punctuation was thrown out in the search algorithm, Google presented the results as if “…identity. Space” was equal to “identity space”. This leaves the student or researcher to labor through pages of results to find results that a search engine or discovery layer should have been able to present.
- Ability to search only specific fields. While most library discovery systems provide for some level of fielded searching (e.g. title, author, journal, publishers, etc.). Google or Google scholar do not. Often students or researchers are searching for a known item and it is very frustrating when they input a search query, in quotations, and it either fails or retrieves so much ”garbage” that the user tries over and over hoping that they will get what they want/need.
- Better image searching. This is often, in my opinion, one of Google’s biggest fails. Searching for an image of a known person may or may not get you a picture of the person, but you certainly will get pictures with images that may have mentioned you in a document or from someone else’s picture on Instagram where you commented. Google needs to do a better job so that you don’t get millions of pictures most of which are totally irrelevant to your search. For example, if you search my name “Clem Guthro” in quotes, you get thousands of hits but I am only in 21 of the pictures, with many of the others showing no relation to me or to my work. It has been most frustrating when I have been looking for pictures of a particular university library and end up with a slew of pictures of a library from a different university, sometimes not even on the same continent. Certainly in today’s world of AI and machine learning, and advanced algorithms, Google can do better.
- Linked Open Data: Linked Open Data is the de rigueur for much of the web and we can see this in action in the Google searches that pop up a Wikipedia blurb or a marquee of movie covers on the same topic. There has been some talk of including Linked Open data in library catalogs and there are some elements seen in the RDA standards. To date, library discovery systems are not making much use of Linked Open Data, and publishers are not making enough of its content in this form. Wikipedia has gained such prominence, despite the groaning and protestation from some academics, precisely because the entire Wikipedia content is out there as DBpedia in Linked Open Data format for Google and others to use. Might it not be transformative if every encyclopedia and dictionary that libraries license were available as Linked Open Data? Every time a user did a search in our discovery layer, in-depth academic content from these resources would immediately pop up into the results.