Session recordingOpen and Not Very Linked Data -- Metadata Quality in Art Museum Data Sets: A Review and Case Study of Example Data Sets from a Data Analyst Perspective, Yingying Han (University of Illinois)
The American Art Collaborative (AAC) is a consortium of 14 art museums. They strive to provide linked open data for their collections. Open data is gaining attention and adoption recently among museums. Even though these datasets are open, they typically are published in isolation from one another. Integrated datasets are more valuable than standalone subsets. This review examines geographic metadata from three representative AAC members: Princeton University Art Museum (PUAM), Gilcrease Museum (GM) and Smithsonian American Art Museum (SAAM) and explores metadata quality problems across three dimensions: interoperability, completeness and accuracy. Interoperability: Each museum used different data structure. Both GM and PUAM published geographic data about each object's original location. While SAAM shared geographic location data about the artists' place of birth and death. Furthermore, curators did not adopt a standard when assigning the data value, especially for country name and state name and hence created interoperability problems. Completeness: This review examined geographic data completeness to identify: (1) The records where every attribute value is "null"; (2) The records without a longitude value and latitude value. The longitude and latitude numbers are important geographic markers because other metadata can be inferred computationally to identify country, state, and city. This presentation will share python scripts to accomplish these tasks. Accuracy: Frequent errors in geographic metadata include: (1) Vague or incorrect values for continent, country, and state. For example, "Central America" was value for continent in PUAM dataset. The review presents python scripts to identify this kind of error. (2) Data uncertainty introduced by unsure curators assigning data value, such as "probably Chiapas". This points out the importance of longitude and latitude values which constrain options for curators and improve data accuracy. Audience: Library catalogers, museum curators, technologist and researchers seeking to understand the data analysts' needs regarding cultural Linked Open Data (LOD) and those who aim to improve LOD quality in data sets.