Automated genome annotation is an essential tool for extracting biological information from sequence data. The identification and annotation of tRNA genes is frequently performed by the software package tRNAscan-SE, the output of which is listed – for selected genomes – in the Genomic tRNA database (GtRNAdb). Given the central role of tRNA in molecular biology, the accuracy and proper application of tRNAscan-SE is important for both interpretation of the output, and continued improvement of the software. Here, we report a manual annotation of the predicted tRNA gene sets for 20 complete genomes from the archaeal taxon Thermococcaceae. According to GtRNAdb, these 20 genomes contain a number of putative deviations from the standard set of canonical tRNA genes in Archaea. However, manual annotation reveals that only one represents a true divergence; the other instances are either (i) non-canonical tRNA genes resulting from the integration of horizontally transferred genetic elements, or CRISPR-Cas activity, or (ii) attributable to errors in the input DNA sequence. To distinguish between canonical and non-canonical archaeal tRNA genes, we recommend using a combination of automated pseudogene detection by tRNAscan-SE and the tRNAscan-SE isotype score, greatly reducing manual annotation efforts and leading to improved predictions of tRNA gene sets in Archaea.
Algorithms and Complexity

van der Gulik, P.T.S, Egas, M, Kraaijeveld, K, Dombrowski, N, Groot, A.T, Spang, A, & Gallie, J. (2023). Distinguishing between canonical and non-canonical tRNA genes reveals that Thermococcaceae adhere to the standard archaeal tRNA gene set. doi:10.5281/zenodo.6782366