Acorus tatarinowii Schott is a well-known Chinese traditional herb. Lignin is the major biologically active ingredient and exerts a broad range of pharmacological effects: it is an antitumor, antioxidant and bacteriostatic agent, and protects the cardiovascular system. In the present study, the transcriptomes of the leaf and rhizome tissues of A. tatarinowii Schott were obtained using the BGISEQ-500 platform. A total of 141777 unigenes were successfully assembled, of which 76714 were annotated in public databases. Further analysis of the lignin biosynthesis pathway revealed a total of 107 unigenes encoding 8 key enzymes, which were involved in this pathway. Furthermore, the expression of the key genes involved in lignin synthesis in different tissues was identified by quantitative real-time PCR. Analysis of the differentially expressed genes (DEGs) showed that most of the up-regulated unigenes were enriched in rhizome tissues. In addition, 2426 unigenes were annotated to the transcriptome factor (TF) family. Moreover, 16 TFs regulating the same key enzyme (peroxidase) were involved in the lignin synthesis pathway. The alignment of peroxidase amino acid sequences and the analysis of the structural characteristics revealed that the key peroxidase enzyme had well-conserved sequences, spatial structures, and active sites. The present study is the first to provide comprehensive genetic information on A. tatarinowii Schott at the transcriptional level, and will facilitate our understanding of the lignin biosynthesis pathway.
Acorus tatarinowii Schott (A. tatarinowii) is a well-known traditional Chinese herb that is abundant in eastern and southern Asia. A. tatarinowii is believed to have the potential to alleviate cognitive decline, epilepsy, and dementia . The main biologically active ingredients are phenylpropanoids, lignins, sesquiterpenoids, and alkaloids [2,3]. The active ingredients of A. tatarinowii confer a broad range of pharmacological effects, including hypoglycemic, anti-Alzheimer’s disease (AD), antiepileptic, neuroprotective, and memory-enhancing activities [4,5].
Lignin is the major bioactive ingredient in A. tatarinowii. Lignin is a composite phenolic polymer, composed mainly of hydroxyphenyl (H), guaiacyl (G), and syringyl (S) units, which are formed through the oxidative polymerization of p-coumaryl, coniferyl, and sinapyl alcohols, respectively . The main functions of lignin are to provide mechanical support for plants, water conductivity, and protection against plant pathogens [7,8]. Reduced lignin contents often affect lodging resistance . In addition, lignin has a waterproofing effect on the cell wall, forms a vascular system for the transport of water and solutes, and participates in the protection of plants from pathogens . The initial three steps of lignin biosynthetic pathway, catalyzed by phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), and 4-coumaroyl CoA-ligase (4CL), are mandatory and provide the basis for all subsequent branches and resulting metabolites. PAL participates in the first step, catalyzing the nonoxidative deamination of phenylalanine to cinnamic acid and directing the carbon flow from the phenylalanine pathway . Cinnamic acid forms p-cinnamic acid, which is catalyzed by C4H; p-cinnamic acid is then converted into p-coumaroyl-CoA and feruloyl-CoA by 4CL and caffeic acid 3-O-methyltransferase (COMT) . Cinnamyl-CoA reductase (CCR) is responsible for catalyzing the most important metabolic reaction in the biosynthesis of lignin monomers . Finally, monomeric lignin is catalytically synthesized by cinnamyl-alcohol dehydrogenase (CAD), peroxidase (POX), and ferulate-5-hydroxylase (F5H) [14,15]. Although researchers have studied A. tatarinowii for more than a century, the genomic information of A. tatarinowii remains incomplete.
RNA sequencing is a particularly effective technology for gene discovery, especially for species without a reference genome . This technology has been used widely in the study of traditional medicinal plants, such as Rehmannia glutinosa , Andrographis paniculata , Solanum elaeagnifolium , and Arisaema heterophyllum Blume . Comprehensive transcriptome information can be analyzed further to identify gene expression levels, gene sequences, biosynthetic pathway, and key enzymes, and to reveal candidate genes potentially involved in specific metabolic pathways. However, given that the biosynthesis of lignin and other secondary metabolites in A. tatarinowii is still largely unknown, further research on this species has been hindered. In the present study, we characterized the transcriptomes of the leaf tissue and rhizome tissue of A. tatarinowii and identified the unigenes encoding key enzymes in the lignin biosynthesis pathway. The transcriptome data for A. tatarinowii reported in the present study will provide an important resource for the study of lignin biosynthesis and other metabolic pathways. Furthermore, this database will provide important clues to explore the biological characteristics of other plants closely related to A. tatarinowii.
RNA-Seq and assembly
In total, 124084 raw reads were obtained from the rhizome tissue and 76197 were obtained from the leaf tissue. The raw reads were generated using the BGISEQ-500 sequencing platform. The Q30 of each plant tissue was greater than 86.82%, and the GC content of each plant tissue was approximately 42.11%. In total, 141777 unigenes were assembled and clustered from high-quality reads, thorough quality control, and filtering. Among these unigenes, 51352 (36.22%) were longer than 1000 nt and 85212 (53.69%) were longer than 500 nt. The average length of the unigenes was 1108 nt and the N50 was 2062 nt. The distribution of the lengths of the unigenes is presented in Supplementary Figure S1. The quality of transcriptome sequencing was tested using a single copy orthologous database (BUSCO), and 96% of unigenes had exact matches in the BUSCO database (Supplementary Figure S2). The complete transcriptome data obtained in this sequencing have been deposited in the NCBI SRA database (accession number: SRA226668).
Functional annotation of unigenes
The 141777 identified unigenes were annotated using seven NCBI publicly available databases; the annotations were distributed as follows: Nr, 77368 unigenes (54.57%); Nt, 57501 unigenes (40.56%); Swissprot, 58557 unigenes (41.30%); KEGG, 62765 unigenes (44.27%); GO, 42681 unigenes (30.10%); KOG, 62928 unigenes (44.39%), and Pfam, 56187 unigenes (39.63%) (Table 1). A totaol of 76714 (68.22%) unigenes were annotated in at least one public functional database, and 23137 (16.32%) co-annotated unigenes were identified in five databases (Figure 1A). We obtained the species homology distribution of A. tatarinowii transcripts from the Nt annotations. The matched unigenes revealed that A. tatarinowii had the closest homology with Nelumbo nucifera (12.16%) in the Nr database, followed by Elaeis guineensis (12.14%), Phoenix dactylifera (8.35%), Macleaya cordata (8.28%), and Ananas comosus (3.72%) (Figure 1B).
Unigene functional annotation
|Database .||Number annotated .||Percentage (%) .|
|Database .||Number annotated .||Percentage (%) .|
GO functional annotations for A. tatarinowii unigenes were obtained from the Nr database: 42681 unigenes (30.10%) were categorized into main GO groups of molecular function, cellular components, and biological processes. In the biological processes group, most of the unigenes were related to ‘cellular process’ (13121 unigenes), ‘biological regulation’ (5009 unigenes), ‘metabolic process’ (3173 unigenes), and ‘localization’ (3038 unigenes). In the molecular function group, 21714 unigenes were related to ‘binding,’ and 20872 unigenes were most abundant in the ‘catalytic activity’ class. In the cellular components group, most of the unigenes were involved in ‘cell’ (13439 unigenes), followed by ‘membrane part’ (11735 unigenes), and ‘organelle part’ (5289 unigenes) (Supplementary Figure S3).
In total, 62928 unigenes (44.39%) were classified into 25 unigene functional groups, with 11690 unigenes classified in the ‘general function prediction only’ group and 7981 unigenes classified in the ‘signal transduction mechanisms’ group.
Overview of expression and MAPMAN analysis
The expression value of all transcripts (fragments per kilobase of transcript per million mapped reads (FPKMs) > 1) in each tissue was counted. In total, 41327 and 57460 unigenes were expressed in the leaf tissue and rhizome tissue, respectively (Figure 2A). The overall gene expression level in rhizome tissue was higher than that in leaf tissue (Figure 2B).
Overall expression profiles of Acorus tatarinowii leaf and rhizome tissues
The overall level of gene expression was analyzed in the A. tatarinowii by using MAPMAN software. The most common categories for the unigenes were protein (7.92%), followed by RNA (5.32%), signaling (3.03%), transport (2.46%), miscellaneous function (misc) (2.05%), and cell (1.76%) (Supplementary Figure S4). An overview of the metabolic and secondary metabolic pathways was prepared from the MAPMAN analysis based on the unigenes with FPKM > 1 (Supplementary Figure S5). In the ‘metabolic pathway’ subcategory, the highest number of unigenes was assigned to ‘secondary metabolism.’ The ‘secondary metabolic pathways’ subcategory contained 20 pathways, and most of the unigenes were in the ‘phenylpropanoids’ pathway.
Characterization of functional genes involved in lignin biosynthesis via KEGG analysis
In total, 62765 unigenes (44.27%) were mapped to canonical pathways and assigned to 20 pathways based on the KEGG database. The most prevalent functional group in the KEGG pathways was ‘global and overview maps’ (13855 unigenes), followed by ‘carbohydrate metabolism’ (5653 unigenes), ‘translation’ (5232 unigenes), and ‘folding, sorting and degradation’ (4185 unigenes) (Supplementary Figure S6 and S7). The ‘biosynthesis of other secondary metabolites’ subcategory contained 16 biosynthetic pathways, including phenylpropanoid biosynthesis (ko00940), flavonoid biosynthesis (ko00941), isoquinoline alkaloid biosynthesis (ko00950), tropane, piperidine and pyridine alkaloid biosynthesis (ko00960), stilbenoid, diarylheptanoid and gingerol biosynthesis (ko00945), monobactam biosynthesis (ko00261), flavone and flavonol biosynthesis (ko00944), caffeine metabolism (ko00232), isoflavonoid biosynthesis (ko00943), betalain biosynthesis (ko00965), glucosinolate biosynthesis (ko00966), benzoxazinoid biosynthesis (ko00402), indole alkaloid biosynthesis (ko00901), anthocyanin biosynthesis (ko00942), biosynthesis of secondary metabolites-unclassified (ko00999), and carbapenem biosynthesis (ko00332) (Figure 3). Among these pathways, we annotated 1309 unigenes involved in phenylpropanoid biosynthesis and 219 unigenes involved in flavonoid biosynthesis. From further analysis of the lignin biosynthesis pathway, 107 unigenes encoding the key enzymes involved in lignin biosynthesis were identified, including PAL, C4H, COMT, 4CL, CCR, CAD, POX, F5H (Table 2). Most of the genes encoding CCR were more highly expressed in the leaf tissue, whereas the expression of most genes encoding PAL, C4H, POX, COMT, 4CL, and CAD was higher in rhizomes (Figure 4).
Pathway classifications for biosynthesis metabolism
Proposed lignin biosynthetic pathways in Acorus tatarinowii
|Abbreviation .||EC number .||Unigene number .||Enzyme name .|
|COMT||184.108.40.206||9||Caffeic acid 3-O-methyltransferase|
|Abbreviation .||EC number .||Unigene number .||Enzyme name .|
|COMT||220.127.116.11||9||Caffeic acid 3-O-methyltransferase|
Protein–protein interaction network construction
The protein–protein interaction network between the key enzymes was predicted using STRING database (https://string-db.org/) to identify functional protein association networks. From the analysis of 107 key enzyme genes using the STRING website, seven key enzymes involved in lignin biosynthesis were annotated as node proteins: PAL, C4H, 4CL, CCR, CAD, COMT, and F5H (Supplementary Table S1). The average node degree was 9.85 and the PPI enrichment P-value was <1.0 × 10−16. This result indicated a strong association among node proteins in the PPI networks. These key enzymes may interact with each other during lignin synthesis and play crucial roles in the regulation of lignin biosynthesis (Supplementary Figure S8).
Analysis of differentially expressed genes and specifically expressed genes in Acorus tatarinowii
In total, 68575 shared unigenes were identified in the two different tissue types of A. tatarinowii based on a Venn diagram analysis; 9997 and 51930 unigenes were only expressed in the leaf tissue and rhizome tissue, respectively (Figure 5A).
Unigenes expressed in Acorus tatarinowii tissues
In the comparison of rhizome tissue with leaf tissue, 44879 unigenes were identified as significant differentially expressed genes (DEGs), which included 25602 up-regulated unigenes and 19277 down-regulated unigenes (Figure 5B). To determine the significantly enriched KEGG pathways, the 25602 up-regulated unigenes were mapped to the KEGG database. The four most common categories were ‘phenylpropanoid biosynthesis’ (288 genes), ‘flavonoid biosynthesis’ (36 genes), ‘isoquinoline alkaloid biosynthesis’ (23 genes), and ‘stilbenoid, diarylheptanoid and gingerol biosynthesis’ (21 genes). When these unigenes were mapped to the GO database, the most genes were enriched in ‘integral component of membrane’ (1952 genes), ‘ATP binding’ (973 genes), ‘nucleus’ (525 genes), and ‘DNA binding’ (344 genes) (Figure 5C, D).
The rhizome of A. tatarinowii is the main medicinal part of the plant. In the rhizome tissue, 51930 specifically expressed unigenes were identified; the significantly enriched KEGG pathways were ‘phenylpropanoid biosynthesis’ (468 unigenes), ‘flavonoid biosynthesis’ (80 unigenes), ‘isoquinoline alkaloid biosynthesis’ (61 unigenes), and ‘tropane, piperidine and pyridine alkaloid biosynthesis’ (55 unigenes). The significantly enriched GO terms were ‘integral component of membrane’ (3329 unigenes), ‘ATP binding’ (1958 unigenes), ‘nucleus’ (1226 unigenes), and ‘cytoplasm’ (904 unigenes) (Supplementary Figure S9 and S10).
Identification of transcriptome factors involved in lignin biosynthesis in Acorus tatarinowii
In the A. tatarinowii transcriptome database, 2426 unigenes were mapped to 57 transcriptome factor (TF) families. Mostly, unigenes in the MYB family were enriched (249 unigenes), followed by bHLH (179 unigenes), AP2-EREBP (170 unigenes), C2H2 (146 unigenes), C3H (135 unigenes), WRKY (130 unigenes), NAC (120 unigenes), and FAR1 (118 unigenes) (Figure 6A). From the KEGG pathway analysis of all TFs, 40 TFs related to the biosynthesis of other secondary metabolites were identified. Among these, 10 unigenes were enriched in the Trihelix family, and in the MYB (6 unigenes), zf-HD (6 unigenes), FAR1 (3 unigenes), C2H2 (3 unigenes), and GRAS (3 unigenes) families (Figure 6B). Sixteen unigenes involved in lignin biosynthesis were identified, including Trihelix (8 unigenes), MYB (4 unigenes), C2C2-Dof (1 unigene), OFP (1 unigene), zf-HD (1 unigene), and bHLH (1 unigene) (Figure 6C), and these TFs related to lignin biosynthesis regulated the same key enzyme (POX) (Supplementary Table S2).
Analysis of transcription factors
Structural characteristics of peroxidase involved in lignin biosynthesis
The alignment of the four peroxidase amino acid sequences revealed that they shared a very high sequence identity (92.67%, Figure 7C). A 3D structural model of peroxidase (CL4977.Contig4, Range P29-N330, Coverage 0.85) was constructed based on the crystal structure of peroxidase from Sorghum bicolor (PDB ID: 5aog, 64.57% sequence identity) by using SWISS-MODEL (https://swissmodel.expasy.org/) and PyMOL software. The structure of peroxidase (CL4977.Contig4) is presented in Figure 7. The predicted tertiary structure indicated that the peroxidase contained two different domains, comprising 11 α-helixes. ‘AAGLLRLHFHDC’ (A66–C77) is the peroxidase active site signature (colored in pink, Figure 7B), and ‘DLVALSGGHTI’ (D196–I206) is the peroxidase proximal heme-ligand signature (colored in blue, Figure 7B). Moreover, the peroxidases contained four conserved disulfide bonds (C211–C238, C77–C82, C131–C326, and C125–C44, Figure 7A).
Spatial structure model of peroxidase
qRT-PCR validation of gene expression of key enzymes
The gene expression levels of POX (CL4977-4), F5H (CL7903-2), CAD (CL6617-1), COMT (CL9267-1), and PAL (CL11361-1 and CL11361-3) were evaluated in the rhizome and leaf tissues of A. tatarinowii by using qRT-PCR. The relative expression of these genes was significantly higher in rhizome tissue than in leaf tissue, which was consistent with the transcriptome data (Figure 8).
qRT-PCR and FPKM-value analysis of six genes encoding enzymes involved in lignin biosynthesis in Acorus tatarinowii
A. tatarinowii is a well-known traditional medicinal plant : its sedative, digestive, analgesic, diuretic, and antifungal effects are recorded in the Chinese Pharmacopoeia . Even though lignins are considered to be the major bioactive components of A. tatarinowii, there are comparatively few studies relating to lignins . To identify the unigenes involved in the lignin biosynthesis pathway, we constructed a large number of gene sequences to help identify more key enzymes to assist research of lignin biosynthesis in A. tatarinowii. Based on the A. tatarinowii transcriptome database, 141777 unigenes were uncovered, with an average length of 1108 nt and an N50 length of 2062 nt; 51352 unigenes (36.22%) were longer than 1000 nt. Furthermore, the relative gene expression of POX, F5H, CAD, COMT, and PAL was examined using qRT-PCR, and found to be consistent with our transcriptional data. These results confirmed the high quality of the transcriptome assembly and the accuracy of our A. tatarinowii transcriptome database. The availability of A. tatarinowii transcriptome data will support the study of biological metabolism and functional genes of other homologous species.
When the DEGs were mapped to KEGG pathways, the expression level of 25602 unigenes in rhizome tissues was significantly up-regulated, these were mainly associated with the pathways of ‘phenylpropanoid biosynthesis,’ ‘flavonoid biosynthesis,’ and ‘isoquinoline alkaloid biosynthesis.’ A total of 51930 unigenes were identified as specifically expressed in rhizome tissue, and were mainly associated with the following KEGG pathways: ‘phenylpropanoid biosynthesis,’ ‘flavonoid biosynthesis,’ and ‘isoquinoline alkaloid biosynthesis. These data may confirm the medicinal value of A. tatarinowii from rhizome tissues at the molecular level. KEGG annotations revealed eight key enzymes related to lignin biosynthesis, namely PAL, C4H, COMT, 4CL, CCR, CAD, POX, and F5H. These key enzymes play important roles in the molecular mechanisms of lignin synthesis in A. tatarinowii. Compared with the leaf tissue, unigenes encoding the PAL, C4H, COMT, CAD, and POX enzymes were more highly expressed in rhizome tissue. COMT, CAD, and POX have been reported to play important roles in the process of lignin biosynthesis, and revealed that the expression levels of key enzymes genes may regulate the synthesis of lignin [24–26].
In the transcriptome data of A. tatarinowii, 2426 unigenes were successfully annotated to 57 transcription factor families. Mostly, unigenes in the MYB family were enriched. Studies have reported that MYB transcription factors are involved in lignin biosynthesis in many plant species, including Arabidopsis, Loquat undergoes, and poplar, and play a role in the formation of secondary cell walls [27–29]. Through KEGG pathway analysis of TFs, 16 TFs involved in lignin biosynthesis in A. tatarinowii were identified, and these TFs regulated the same key enzyme (peroxidase). These results indicate that peroxidase may play a key role in the regulation of lignin biosynthesis in A. tatarinowii. Hence, the TFs regulating peroxidase are important candidates for future studies of the regulation of lignin biosynthesis in A. tatarinowii. However, RNA-Seq is not appropriate for the detection of regulatory genes with low expression levels [30,31]. For this reason, many TFs are hard to detect, which has obstructed research into their functions; however, they are still very important, because a small increase in the expression of TFs may have a dramatic effect.
The four sequences of peroxidase in A. tatarinowii had well-conserved amino acid sequences, spatial structures, active sites, and proximal heme-ligand signatures. Moreover, the peroxidase had eight conserved cysteine residues, which serve as the site of oxidation by peroxides. Other research has shown that the peroxidases active site (AAGLLRLHFHDC) and proximal heme-ligand (DLVALSGGHTI) played a catalytic role by forming covalent bonds with Ca2+, Na+, and Fe2+; of these, Ca2+ and Na+ are the proximal and distal conserved binding sites. The incorporation of metal ions plays a regulatory role in the catalytic process of peroxidase [32,33].
Lignin is a promising material for pharmaceutical and biomedical applications owing to its biocompatibility, low cytotoxicity, antioxidant and antimicrobial properties. In medicine, it can be used as an auxiliary material to improve the oral bioavailability of the drug . In addition, lignin can increase the stability of the antimicrobial properties of wound dressings [35,36]. Lignin-based hydrogels, which combine lignin with other polymers, have been used for gene delivery , drug delivery , and tissue engineering [39,40].
This is the first transcriptome sequencing study of two tissues of A. tatarinowii using RNA-Seq technology. The obtained data have provided comprehensive information on the gene sequences and expression level of genes in A. tatarinowii, and will support more detailed studies of the molecular mechanism of lignin biosynthesis.
Materials and methods
Plant material and RNA extraction
Three plants of A. tatarinowii, which were cultivated in the natural environment of the pharmaceutical garden of Anhui University of Chinese Medicine, were harvested in April 2019. Fresh plants were quickly cleaned with sterile water and wiped clean with filter paper. The tissues (leaf and rhizome) were separately collected into 50-mL centrifuge tubes, frozen quickly in liquid nitrogen for 6 h, and then stored at −80 °C for RNA extraction. An independent experiment was conducted for each A. tatarinowii tissue, with three biological replicates used for each treatment.
cDNA library construction and sequencing (RNA-Seq)
Each plant tissue sample of RNA was extracted by using the RNA Plant Kit (Aidlab Biotech, Beijing, China) in accordance with the manufacturer’s protocol. The mRNA from each sample was isolated from the total RNA by using beads with oligo(dT), and lysed into short fragments using fragmentation buffer. Random N6 primers were used to reverse-transcribe the short fragments of mRNA obtained; the first-strand cDNA was synthesized as a template, and two-strand cDNA was synthesized to form double-stranded cDNA by using dNTPs, RNase H, and DNA polymerase I. The obtained short cDNA fragment was ligated to the adapter, and the entire fragment was selected for PCR amplification. Finally, the quantification of cDNA library per tissue was detected, and cDNA library of plant tissues was constructed by BGISEQ-500 .
Transcriptome de novo assembly and unigene functional annotation
To obtain high-quality clean data, raw reads were filtered to remove low-quality reads with greater than 20% Q-value < 20 bases, ambiguous reads containing >5% unknown base (N), and for non-coding RNA by SOAPnuke and Trimmomatic software. Full-length transcripts were de novo assembled through a broad range of expression levels and sequencing depths by using Trinity software (v2.0.6), default parameters (K-mer = 25, group pairs distance = 400) . These contigs were then further processed with Trinity to remove the redundant transcripts by using TGICL (v2.0.6) software, and finally, to generate unigenes.
Unigenes were mapped to several public databases using BLAST (version 2.2.23, e-value ≤ 1 × 10−5), including Nt (ftp://ftp.ncbi.nlm.nih.gov/blast/db), Nr (ftp://ftp.ncbi.nlm.nih.gov/blast/db), COG (https://www.ncbi.nlm.nih.gov/COG/), KEGG (http://www.genome.jp/kegg), and Pfam (http://pfam.xfam.org). According to the Nr annotation, the GO (http://geneontology.org) annotation of unigenes was obtained using the Blast2GO program (https://www.blast2go.com) .
MAPMAN annotation and protein–protein interaction network construction
MAPMAN (http://mapman.gabipd.org/) is a tool for plant metabolism distribution research that performs large gene data enriched to diagrams of plant metabolic pathways . All A. tatarinowii gene sequences were annotated by the MAPMAN database (http://www.plabipd.de/portal/mercator-sequence-annotation) to form independent species mapping. The expression level of unigenes in A. tatarinowii rhizome tissue was analyzed using MAPMAN software.
Based on Arabidopsis thaliana genomic data, the STRING online tool was used to analyze the genes encoding the key enzyme involved in the biosynthesis of lignin metabolism, and to predict the interactions between them. The protein–protein interactions of key enzymes were selected with score (median confidence) >0.4, and the node proteins were selected based on their association with other proteins. In a PPI network, an interactive network with a higher degree of association with other interactive networks plays a more important role .
Analysis of DEGs
RNA-seq has emerged as a powerful technology to measure gene expression and tissue specificity at the whole-transcriptome level. In the present study, the transcriptome data were evaluated to determine the differences in gene expression in the leaf and rhizome of A. tatarinowii. We calculated FPKM values to quantify the expression levels of each unigene. From the analysis of the DEGs, high-quality reads were assembled on the final transcriptome using Bowtie2 (version 2.2.5). Differential gene expression in two tissues was determined using the PossionDis method, which was performed as described: FC (fold change) ≥ 2.00 and FDR ≤ 0.001 . Moreover, functional enrichment of the DEGs was annotated by the GO functional database by using GO-Term Finder software. KEGG pathway annotation was applied to significant DEGs.
Identification of TFs
In order to identify the transcription factor family represented in the A. tatarinowii transcriptome, Getorf (http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html) was applied to detect the open reading frames (ORFs) of the unigenes, and then to align them with the protein domains of transcription factors. In our transcriptome, unigene-encoded protein sequences were probed against and matched with all the transcription factor protein sequences in the Plant transcription factor database (PlantTFDB). Finally, the transcription factors were successfully predicted in the A. tatarinowii transcriptome .
Analysis of the structural characteristics of peroxidase
Four amino acid sequences of Pox were submitted to the SmartBlast in NCBI (http://www.ncbi.nlm.nih.gov/) to search for enzyme sequences of homologous species. The alignment of four ORFs of POX (peroxidase) was compared using DNAMAN software. SWISS-MODEL (https://swissmodel.expasy.org/) and PyMOL were applied to analyze the 3D structure model of the peroxidase protein .
The key enzyme genes involved in lignin synthesis were determined by quantitative real-time PCR in accordance with the instructions of Power SYBR® Green PCR Master Mix (Roche) and Quantstudio multiplex real-time PCR instrument (Life Technologies, U.S.A.). Based on transcriptome sequences, special PCR primers were designed for different genes using Primer Premier 6.0 and Beacon designer 7.8 software (Supplementary Table S3). We used 20 µl of reaction mixture, including 1 µL of diluted cDNA, 10 µl of Power SYBR® Green Master Forward Primer, 8 µL of SDW, 0.5 µL of forward primer (10 µM), and 0.5 µL of reverse primer (10 µM). The real-time PCR program used was 95 °C for 1 min, 40 cycles of 95 °C for 15 s, and 63 °C for 25 s. For each sample, three technical replicates of the qRT-PCR analyses were repeated, and the mean of the three replicates was used as the PCR result. The actin gene (Unigene12762) served as the internal control. The relative expression levels of the selected genes were calculated using the 2(−ΔΔCt) method .
The RNA-seq datasets from two A. tatarinowii tissues were deposited in the NCBI Sequence Read Archive (SRA) database (accession: SRP213064).
The authors declare that there are no competing interests associated with this paper.
This work was supported by the Natural Science Research Grant of Higher Education of Anhui Province [grant numbers KJ2018ZD028, KJ2019A0476]; the Outstanding Youth Project of Anhui Institution of Higher Education (gxyq2019030); the Natural Science Foundation of Anhui Province of China [grant number 2008085MH268]; the National Key Research and Development Program [grant number 2017YFC1701600]; and the National Project Cultivation Fund of Anhui University of Chinese Medicine [grant number 2020PY02] provided experimental support.
CRediT Author Contribution
Shengxiang Zhang: Data curation, Writing—original draft. Liqiang Zhao: Data curation, Writing—original draft. Chunmiao Shan: Data curation. Yuanyuan Shi: Data curation. Kelong Ma: Funding acquisition, Data curation. Jiawen Wu: Funding acquisition, Validation.
We thank the Beijing Genomics Institute for assistance with experiments.
caffeic acid 3-O-methyltransferase
differentially expressed gene
False Discovery Rate
fragments per kilobase of transcript per million mapped read
open reading frame
real-time quantitative Polymerase Chain Reaction
sterile distilled deionized water
These authors contributed equally to this work.