Data.wa.gov is a digital catalog of open data and consists of a collection of data sets and visualizations uploaded by independent authors. These assets have nearly 1,200 tags applied to them as user input is uncontrolled, leading to numerous duplicates and variants that have reduced the effectiveness of the tags as a whole.
We created a tag protocol that consists of defined rules and edge cases along with a dashboard and future recommendations. This solution provides structure to their tag list, increases data quality, searchability of assets, and sustainability, by reducing tag maintenance and preserving tag effectiveness.
Learn More
Data administrators benefit from the reduced upkeep costs of the site
Data publishers benefit through increased asset accessibility and more efficient publishing processes
General public members benefit from improved asset findability through tag-filtering and search
We began our project by familiarizing ourselves with the problem and the best practices currently used in industry surrounding tagging. This involved a literature review as well as meetings with the sponsor, trusted Data Publishers, and subject matter experts.
Next, we developed a standardized protocol by which to keep, remove, or update tags already used on the site. After several iterations, we found a protocol that would match the needs of our sponsor and maximize the benefits felt by the stakeholders.
After we developed our protocol, we began to apply it to the list of over 1,200 tags used on data.wa.gov. The protocol continued to evolve as unique and corner-case tags were discovered, culminating in a cleaner and more functional final list of tags.
To ensure that the tag list remains standardized, we developed a dashboard that allows site administrators to monitor the list. Through the dashboard, admins can view summary statistics of the list and any new incoming tags. If any new tags violate the protocol, the admins can take action accordingly.
The cleaned list of tags increases findability of assets on the site twofold. First, filtering by tags with a condensed list is much more manageable. Second, more relevant assets appear when a user searches for something, as the tags on each asset are more precise.
Typos and inconsistencies amongst the tags cause a loss of credibility for the site. By cleaning the tag list to remove any of these errors, our solution allows the site to regain that credibility in the eyes of the Data Explorers and the general public.
In addition to the findability benefits provided to the end users, the cleaned list of tags also helps improve the usability of the site for Data Publishers, as the straightforward protocol puts less stress on them to decide how they should tag their assets.
Simply creating a protocol and cleaning the current tags would likely lead to many of the same problems appearing again down the line. We addressed this by setting up an administrative dashboard that helps to prevent the tag list from getting out of hand in the future.