Automated genome annotation is an essential tool for extracting biological information from sequence data. The identification and annotation of tRNA genes is frequently performed by the software package tRNAscan-SE, the output of which is listed – for selected genomes – in the Genomic tRNA database (GtRNAdb). Given the central role of tRNA in molecular biology, the accuracy and proper application of tRNAscan-SE is important for both interpretation of the output, and continued improvement of the software. Here, we report a manual annotation of the predicted tRNA gene sets for 20 complete genomes from the archaeal taxon Thermococcaceae. According to GtRNAdb, these 20 genomes contain a number of putative deviations from the standard set of canonical tRNA genes in Archaea. However, manual annotation reveals that only one represents a true divergence; the other instances are either (i) non-canonical tRNA genes resulting from the integration of horizontally transferred genetic elements, or CRISPR-Cas activity, or (ii) attributable to errors in the input DNA sequence. To distinguish between canonical and non-canonical archaeal tRNA genes, we recommend using a combination of automated pseudogene detection by tRNAscan-SE and the tRNAscan-SE isotype score, greatly reducing manual annotation efforts and leading to improved predictions of tRNA gene sets in Archaea.

doi.org/10.5281/zenodo.6782366
Algorithms and Complexity

van der Gulik, P., Egas, M., Kraaijeveld, K., Dombrowski, N., Groot, A., Spang, A., & Gallie, J. (2023). Distinguishing between canonical and non-canonical tRNA genes reveals that Thermococcaceae adhere to the standard archaeal tRNA gene set. doi:10.5281/zenodo.6782366