Comparative analysis of the complete mitochondrial genomes of three rockfishes (Scorpaeniformes, Sebastiscus) and insights into the phylogenetic relationships of Sebastidae

Abstract Mitochondrial genome is a powerful molecule marker to provide information for phylogenetic relationships and revealing molecular evolution in ichthyological studies. Sebastiscus species, a marine rockfish, are of essential economic value. However, the taxonomic status and phylogenetic relationships of Sebastidae have been controversial so far. Here, the mitochondrial genomes (mitogenomes) of three species, S. tertius, S. albofasciatus, and S. marmoratus, were systemically investigated. The lengths of the mitogenomes’ sequences of S. tertius, S. albofasciatus, and S. marmoratus were 16910, 17056, and 17580 bp, respectively. It contained 13 protein-coding genes (PCGs), two ribosomal RNAs (rRNAs), 22 transfer RNA (tRNA) genes, and one identical control region (D-loop) among the three species. The genetic distance and Ka/Ks ratio analyses indicated 13 PCGs were suffering purifying selection and the selection pressures were different from certain deep-sea fishes, which were most likely due to the difference in their living environment. The phylogenetic tree was constructed by Bayesian Inference (BI) and Maximum Likelihood (ML). Most interestingly, the results indicated that Sebastidae and Scorpaenidae were grouped into a separate branch, so the taxonomic status of Sebastidae should be classified into subfamily Sebastinae. Our results may lead to a taxonomic revision of Scorpaenoidei.


Introduction
The family Sebastidae belongs to Scorpaeniformes, commonly known as rockfishes or false kelpfishes, comprises 7 genera and approximately 133 species, and it is a diverse and economically important group distributed in Atlantic, Indian, and Pacific Oceans [1]. Sebastidae has become one of the most important marine fishes due to its huge economic value and high biodiversity. However, there is still no consensus about the taxonomic status of Sebastidae up to now. Some authors considered Sebastidae as a family of the order Scorpaeniformes [2][3][4][5][6]. By contrast, some authorities recognized it as a subfamily of Scorpaenidae [7][8][9][10][11][12]. Besides, the databases that are widely referenced also showed different results, like Fish-Base (https://www.fishbase.se) acknowledged the validity of Sebastidae, but Integrated Taxonomic Information System (ITIS) (https://www.itis.gov) is opposed to this opinion. Before the conclusion, it will be unified use of the appellation 'Sebastidae' in the present study.

albofasciatus (B) and S. marmoratus (C)
The innermost circle of the images represents GC% per every 5 bp of the mitogenome; the darker lines are, the higher their GC% are.
The genus Sebastiscus is mainly distributed in the western Pacific [13,14]. Despite their diversity, abundance, and economic importance, our understanding of the relationships within the genus remains limited. It was classified as part of the genus Sebastes until 1904 when Jordan and Starks identified it as a separate genus by recognizing the development of the air bladders [13]. Besides, Matsubara used the structure of the suborbital bone and the number of vertebrae as the diagnosis traits of Sebastiscus [15,16]. Although Barsukov and Chen classified Sebastiscus as a subgenus of Sebastes, it is commonly regarded as an independent genus at present [17][18][19][20][21]. Three valid species and a newly reported species are included in genus Sebastiscus: S. albofasciatus [22], S. marmoratus [23], S. tertius [17], and S. vibrantus [6]. It is worth noting that S. marmoratus is widely distributed in the northwestern Pacific Ocean and the remaining species are likely to be confined to the warm waters of East Asia and Indonesia [6,24].
Mitochondrial genomes (mitogenomes) have become a powerful molecule marker of species classification, population genetics, molecular systematic geography, molecular ecology, and other fields [25][26][27][28][29]. Small size, maternal inheritance, compact gene arrangement, high conservatism, and simple structure are the main features of mitochondrial genomes [30][31][32]. The structure, characteristics, and properties of the mitochondrial genomes of fish have been studied more and more widely since Johansen et al. completed the complete sequence determination of the mitochondrial genome of Gadus morhua [33].
From the above, there have been so many meaningful achievements about phylogenetic analyses and fish taxa through research of mtDNA. However, there exists some limitations in short mitochondrial gene fragments in discussing and resolving more complicated phylogenetic relationships in many fish lineages [34]. For these limitations, the longer DNA sequences liked complete mitochondrial genomes which have additional informative sites will have better ways to solve these higher level relationships and deeper branches thoroughly [35]. Therefore, the mitochondrial genomes applied in this study may help recognize the evolutionary and relationships of the family Sebastidae and verify the accuracy of traditional taxonomy.
In the present study, we sequenced and annotated three complete mitochondrial genomes of Sebastiscus species (S. tertius, S. albofasciatus, and S. marmoratus) and compared them with each other. The characteristics were described to evaluate the variation and conservation in the mitochondrial genome among Sebastiscus species. The relative synonymous codon usage (RSCU) and AT skew value of protein-coding genes (PCGs) were analyzed to better understand the functional inference of related genes. Moreover, the phylogenetic analyses among Sebastiscus species were performed and related species in Scorpaenoidei to preferably discuss the taxonomic status of Sebastidae. All information reported in this article will help supplement and enhance the limited molecular data available for Sebastiscus species, and provide the essential evidence for the taxonomic status of Sebastidae.

Sampling and DNA extraction
The sample of rockfishes were collected using hook-and-line fishing from the coastal waters of Zhoushan in China  [6]) and stored at −20 • C for further study. Muscle tissue was taken from the tails of each sample and digested using proteinase K (Merck, Germany). Genomic DNA was isolated following a standard phenol-chloroform method and detected by 2.0% agarose electrophoresis. DNA precipitation was dissolved in double-distilled water and stored at 4 • C after concentration quantification. The study protocol was approved by the Experimental Animal Ethics Committee of the Ocean University of China.

PCR amplification and sequencing
Three Sebastiscus mitogenomes were amplified with 34 pairs of Sebastiscus-specific universal primer sets (Supplementary Table S1) and 4 pairs of specific primer sets (Supplementary

Mitogenome annotation and sequence analyses
SeqMan (DNAStar, U.S.A.) software was used to manually correct and compare the contiguous sequence fragments, then splice the complete mitogenome sequence and calculate the full length of mitogenome. The starting and ending positions of each gene were determined by comparing the full mitogenome sequence of S. marmoratus (Accession no. KF667491) published in GenBank (https://www.ncbi.nlm.nih.gov/genbank/). Online software tRNAscan-SE2.0 (https://lowelab.ucsc.edu/tRNAs can-SE) was used to identify the transfer RNA (tRNA) gene and predict the secondary structure diagram of tRNA [37]. The online software Tandem Repeats Finder [38] was used to search and analyze the tandem repeats in the control region. The O L and tandem repeat structures were simulated and drawn by RNAstructure6.2 software. All complete mitogenomes were preliminarily annotated and drawn the mitochondrial genome map by Mitofish (https://mitofish.aori.u-tokyo.ac.jp) [39,40]. The composition of complete mitochondrial genome sequence and each segment (including the non-coding region, ribosomal RNA (rRNA), tRNA, and the PCGs) were calculated by MEGA5.0 software [41], and the PCG codon and base content were determined. Composition skewness of each segment was calculated by the following formulas:

Phylogenetic analyses
In order to discuss the phylogenetic relationships of Sebastidae and explore the taxonomic status of Sebastiscus in Scorpaenoidei, mitogenomes of 33 previously sequenced Scorpaenoidei and 2 previously sequenced Gobiidae species (with the latter as the outgroup taxon; Table 1) were used in the phylogenetic analyses. We used the nucleotide sequences of the 13 PCGs as the dataset to construct the phylogenetic tree. Sequences were aligned using SeqMan from DNAStar software (U.S.A.). The optimal model for nucleotide sequences was estimated by MEGA 5.0 (Tamura et al. 2011). TN93 + G + I captured the minimum values of Bayesian Information Criterion (BIC) and it was considered to be the best model for phylogenetic tree construction. The Maximum Likelihood (ML) phylogenetic tree was constructed by MEGA 5.0 software with 1000 replicates of bootstrap and the Bayesian Inference (BI) analysis was inferred by the software of MrBayes 3.2.6 based on 10000000 generations [42]. The divergence time was predicted by MEGA 10 [43] with the ML method of RelTime. The calibration of divergence times was obtained from online Time Tree database (http://www.timetree.org/) [44].

Mitogenome organization and composition
The complete mitochondrial genomes of S. albofasciatus (Accession no. MT117230), S. tertius (Accession no. MT117231), and S. marmoratus (Accession no. MT789709) in GenBank were 17056, 16910, and 17580 bp in length, respectively ( Figure 1). The size variation of the three mitogenomes was mainly caused by the differences in the lengths of the non-coding regions. Of all 36 sequenced Scorpaenoidei mitogenomes, the length of the mitogenome of Sebastes koreanus (17606 bp) was the longest, whereas that of Sebastes oblongus (16396 bp) was the shortest. The mitogenome lengths of five Scorpaenoidei species were longer (>17000 bp) because of a longer control region (>1300 bp). The mitogenome of S. albofasciatus contained the typical 37 genes (13 PCGs, 22 tRNAs, and 2 rRNAs), 1 control region, and 1 OL. The mitogenome of S. tertius and S. marmoratus had the same composition (Table 2). Most mitochondrial genes were encoded on the H-strand, except for ND6 and eight tRNA (Glu, Ala, Asn, Cys, Tyr, Ser-UCN, Gln, and Pro) genes that were encoded on the L-strand. The nucleotide composition of S. albofasciatus, S. tertius, and S. marmoratus mitogenomes had a higher A+T bias of 55.15, 55.04, and 55.30%, respectively, and both showed positive AT-skew and negative GC-skew ( Figure 2 and Supplementary Table S3).

PCGs and codon usages
All the 13 PCGs in the mitogenomes of the three rockfishes were similar to those of other vertebrates. Twelve PCGs (ND1, ND2, COI, COII, ATP8, ATP6, COIII, ND3, ND4L, ND4, ND5, and CYTB) were coded on the heavy strand    (H-strand) and the remaining one (ND6) was coded on the light strand (L-strand). The length, codon usage, and A+T content of PCGs in the S. albofasciatus, S. tertius, and S. marmoratus mitogenomes were nearly identical. Among the three mitogenomes, all the 13 PCGs of Sebastiscus species encoded 3800 amino acids in total. All the PCGs used the initiation codon ATG except COI used GTG. ATG is an accepted conventional initiation codon for many Osteichthyes mitogenomes including among Scorpaeniformes fishes [45,46]. At the same time, GTG was commonly used as the initiation codon of COI in many other Osteichthyes mitogenomes [47][48][49][50][51]. All three mitogenomes have the same termination codon for 11 PCGs (ND2, COII, ATP8, ATP6, COIII, ND3, ND4L, ND4, ND5, ND6, and CYTB). TAA were commonly used as the termination codons, although the incomplete termination codons T or TA were found in ND2, COII, ATP6, COIII, ND3, ND4, and CYTB in all three mitogenomes. The incomplete termination codon has also been found in all other sequenced Osteichthyes species [48][49][50][51][52]. It has been confirmed that the incomplete termination codons could act as the complete functional termination codons in polyadenylation processes and polycistronic transcription cleavage [29,51,53]. TAG was used as the termination codon of ND6 in the three Sebastiscus mitogenomes. Additionally, in the S. tertius mitogenome, ND1 used TAG as the termination codon, too. Although TAG is the typical termination codon in many mitogenomes, it is not used frequently, due to the high percentage of AT nucleotide use by the PCGs [32]. The average AT contents of the 13 PCGs in S. albofasciatus and S. tertius were 54.50 and 54.40%, respectively, and both were similar values to that of S. marmoratus (54.54%). The PCGs encoded by the H-strand displayed T-skews (T > A) and C-skews (C > G) whereas the L-strand displayed T-skews and G-skews (G >C). We calculated the RSCU of the three Sebastiscus species mitogenomes (Figure 3 and Supplementary Table S4) and the result showed that the frequency of using NNA and NNC (N represents A, T, C, G) were higher than NNT and NNG in S. albofasciatus, S. tertius, and S. marmoratus. The most frequent amino acids in the coding sequences of S. albofasciatus, S. tertius, and S. marmoratus mitochondrial proteins were Leu (CUN), Ala, and Thr (>290) (Supplementary Figure S1). These three amino acids were also frequently used in other Osteichthyes mitogenomes [48][49][50][51]. Moreover, the minimally used amino acid in the three mitogenomes was Cys (<30).

Genetic distance and evolution rates of PCGs
The genetic distance could be used to evaluate different mutation pressures among genes [54]. The pairwise genetic distances (p-distance) were calculated to reveal the sequence conservation and divergence of the PCGs among the Sebastiscus species (Figure 4). The genetic distance at the third nucleotide position was obviously higher than the first and second nucleotide position, indicating that the evolution of the third position was faster than the first and the second. The highest p-distance was found in ND1 (0.289) and ND4 (0.243, 0.161) at the third nucleotide of codons, while explored in ND2 (0.039, 0.034) and ND3 (0.013) base on the first and second nucleotide position. The COI-III and ATP8 genes had the low genetic distance in both first + second and third analysis. ND1, ND2, ND3, and ND4 genes might have high evolutionary rates among the three species, while COI-III and ATP8 were low. The value of nonsynonymous substitution (K a )/synonymous substitution (K s ) is a common indicator to assess selective pressure and evolutionary relationships of species in molecular studies [55]. K a /K s < 1, K a /K s = 1, and K a /K s > 1 were represented purifying selection, neutral mutation, and positive selection, respectively [56]. All 13 PCG genes were under strong purifying selection with K a /K s values below 1 ( Figure 5). The result was different from deep-sea fishes, where most genes exhibited positive selection or convergent/parallel signals with the exception of ND4L and ND5 [57]. One of the reasons might be the different living environment between them. Environmental difference leads to the different expression levels of protein-coding, which involved in energy regulation, reproduction, and immune behaviors [26]. The basic characteristics of genome evolution depended on random genetic drift and mutation pressure that closely connected with the environment [58]. The deep-sea fishes inhabited in the condition of oxygen deficiency, lacked food, no sunlight, and extreme cold, while the Sebastiscus species survived in the warm coastal waters [57,59,60]. Positive selection usually related to the adaptation of new environments and the development of the new function, and most nonsynonymous mutations were disadvantage [61,62]. The K a /K s values in Sebastiscus species showed they were under purifying selection, indicating that the environment variation was not great enough to change their genetic function. ND2 and ATP8 genes had high K a /K s (mean: 0.135, 0.188) values across three Sebastiscus mitogenomes compared with other genes, while ND3, ATP6, and Cytb genes were low (mean: 0.013, 0.014, 0.014). Low mutation rates tended to occur on highly expressed genes due to DNA repair mechanisms [63]. Compared with other genes, the ND3, ATP6, and Cytb showed low K a /K s representing a low mutation rate, indicating that they may have higher expression level.

rRNAs and tRNAs
Similar to S. marmoratus, the mitogenomes of S. albofasciatus and S. tertius each had one 12S rRNA and one 16SrRNA gene. The 12S rRNA gene was located between tRNA Phe and tRNA Val , and the 16S rRNA gene was located between tRNA Val and tRNA Leu(UUR) as also occurs in some other Scorpaeniformes fishes [64,65]. The size of the 12S rRNA in S. albofasciatus and S. marmoratus were 946 bp, both a little shorter than in S. tertius (947 bp). The size of the 16S rRNA was 1692 bp in S. albofasciatus and S. tertius, which was consistent with S. marmoratus. In the S. tertius mitogenome, the A+T content of the rRNA genes was the minimum (52.25%) whereas the A+T content of rRNA in the S. albofasciatus and S. marmoratus mitogenomes were approximately 52.31 and 52.35%, lower than the control region. In three Sebastiscus species, the AT-skew of rRNA was strongly positive whereas the GC-skew was slightly negative indicating that the contents of A and C were higher than those of T and G in the rRNA, respectively.
Like the typical set of tRNA genes in Osteichthyes mitogenomes, there were 22 tRNA genes predicted in the three species. All of them had two kinds of tRNA Leu and tRNA Ser ( Table 2). The secondary clover-leaf structures of tRNA genes identified in the mitogenome of S. tertius, S. albofasciatus, and S. marmoratus are shown in Figure 6. These tRNA genes varied in length from 65 to 74 bp. All the predicted tRNAs displayed the typical clover-leaf secondary structure, except for tRNA Ser(AGY) , which can not form a stable secondary structure because of lack of the DHU arm [66,67], this structure was common among fish mitogenomes [68].

Control region
The control region is the largest non-coding region of the fish mitogenome which is equivalent to the A+T-rich region of the insect mitogenome [29,69]. The control regions of Sebastiscus mitogenomes were located between tRNA Pro and tRNA Phe with lengths of 1246 and 1391 bp, respectively (Table 2), much shorter than that of S. marmoratus (1918 bp). The length of the control region is variable, which is the main reason for the differences in mitochondrial DNA lengths in fishes [70][71][72]. In the mitogenomes of S. tertius and S. albofasciatus, the contents of A+T were 69.34 and 68.01%, respectively, similar to that of S. marmoratus (68.81%). The control regions of all three species genomes showed positive AT-skew values while all control regions of the three Sebastiscus species displayed negative GC-skew values. Additionally, we found tandem repetitive sequences in all three Sebastiscus species by Tandem Repeat Finder V 4.07 (http://tandem.bu.edu/trf/trf.html) [38]. The tandem repetitive sequence units of S. tertius, S. albofasciatus and S. marmoratus were 22, 275, and 269 bp, respectively. The secondary structures of tandem repetitive sequence for three species showed that the two kinds of structures can form multiple stem ring structures, respectively ( Figure 7). As a non-coding region, the control region of mitochondrial DNA has a low evolutionary pressure. Therefore, it is frequency of base insertion and deletion is high, and the tandem repetitive sequence has often occurred, too. With the accumulation of data, the phenomenon of tandem repetition is becoming more and more frequent in the mitochondrial genome of fish [73]. The formation mechanism has been made some progress, of which slipped-strand mispairing is the most likely mechanism for tandem repeats [74,75]. In this study, the number of the core sequence repetitions in S. tertius and S. albofasciatus were 6 and 2, respectively, and that of S. marmoratus was 4. The difference between tandem repetitive sequence is the main reason the mitogenome lengths of the three species are different. Besides, the control region is responsible for the regulatory functions of DNA replication and transcription, which length variation will inevitably affect its functions and the metabolic frequency of the whole organism, thus resulting in interspecific differences.

L-strand origin of replication
The L-strand origins of replication (O L ) of S. tertius and S. albofasciatus mitogenomes were located between tRNA Asn and tRNA Cys with the same lengths of 37 bp, 1 bp shorter than S. marmoratus (38 bp). Secondary structures of O L were displayed in Figure 8. In these O L structures, the use of stem codons showed significant asymmetry, with more pyrimidines at the 5 end of the sequence. Consistent of other studies in fish, all O L regions had an identical conserved sequence region at the end of the stem (5 -GCCGG-3 ) [45,[76][77][78]. This may be related to the mechanism of RNA transformation to DNA [79]. Like found in many fishes, the use frequency of T and C bases in the ring region was higher [76,80].

Phylogenetic analyses
Phylogenetic relationships were reconstructed based on the sequences of 13 PCGs of 38 mitogenomes using BI and ML methods. The phylogenetic trees constructed by two methods were consistent with high intermediate bootstrap values, post probabilities, and the topological structure of the two phylogenetic trees were basically the same ( Figure  9). We found three Sebastiscus species (S. marmoratus, S. tertius, and S. albofasciatus) formed a monophyletic group in both the BI and ML analyses. S. tertius and S. albofasciatus formed a sister group, which together had a sister relationship with S. marmoratus. Barsukov and Chen (1978) regarded Sebastiscus as a subgenus of Sebastes [17]. However, the result of our phylogenetic inference showed that Helicolenus are more closely related to Sebastes than to Sebastiscus. Our results supported the view that Sebastiscus should be treated as a genus independent from Sebastes which also was supported by some previous studies [11,[81][82][83]. It was the strong evidence that supports the Sebastiscus as an independent species. Sebastidae and Scorpaenidae have been considered as two families by several authors [2][3][4][5][6]. Based on this study, however, if Sebastidae is valid, the family Scorpaenidae was regarded as paraphyletic with its subfamily Pteroinae in a sister relationship with Sebastidae ( Figure 10). A number of authors thought that Sebastidae was considered as a subfamily of Scorpaenidae, and had an equal level with Pteroinae and Scorpaeninae [10,81,84]. Such taxonomic division was consistent with our results of phylogenetic relationships.
We estimated the species divergence time estimated using the ML method of RelTime by MEGA 10 ( Figure 11). The divergence time calculations indicated that the Scorpaenidae had differentiated from other species ∼45.45 million years ago and all the families of Scorpaenoidei began to diverge more than 40 million years ago, except for Sebastidae. The differentiation time of Sebastidae from other species of Scorpaenidae was the lastest, which began ∼17.96 million years ago and the evolutionary rate of Sebastidae fishes were similar to the average rate of other subfamilies of Scorpaenidae. In summary, the taxonomic status of Sebastidae maybe coincides with the subfamilies of Scorpaenidae according to the result.

Conclusions
In the present study, the complete mitogenomes of S. tertius, S. albofasciatus, and S. marmoratus were successfully determined. Their mitogenomes were with a total length of 16910 bp in S. tertius, 17056 bp in S. albofasciatus, and 17580 bp in S. marmoratus, respectively. A total of 22, 275, and 269 bp tandem repetitive sequences were respectively detected in three mitogenomes, which may be a typical characteristic of Sebastiscus fish. The ratio of K a and K s indicated that three species were suffering a purifying selection, while the ND2 and ATP8 showed the highest K a /K s values. Both ML and BI analyses indicated that Sebastiscus was monophyletic and S. marmoratus was a sister clade to S. tertius and S. albofasciatus. That is to say, the taxonomic status of Sebastidae, well supported by results of a phylogenetic tree, should be subfamily Sebastinae. We believe the present study will greatly improve our understanding of the evolution profile and phylogenetic position in Scorpaeniformes, which could benefit resource management and species protection in fishery and aquaculture.

Data Availability
The data used in the manuscript can be found on NCBI website (https://www.ncbi.nlm.nih.gov/) by accession number.