Light-dependent protochlorophyllide oxidoreductase (LPOR) catalyzes the reduction of protochlorophyllide to chlorophyllide, which is a key reaction for angiosperm development. Dark operative light-independent protochlorophyllide oxidoreductase (DPOR) is the other enzyme able to catalyze this reaction, however, it is not present in angiosperms. LPOR, which evolved later than DPOR, requires light to trigger the reaction. The ancestors of angiosperms lost DPOR genes and duplicated the LPORs, however, the LPOR evolution in angiosperms has not been yet investigated. In the present study, we built a phylogenetic tree using 557 nucleotide sequences of LPORs from both bacteria and plants to uncover the evolution of LPOR. The tree revealed that all modern sequences of LPOR diverged from a single sequence ∼1.36 billion years ago. The LPOR gene was then duplicated at least 10 times in angiosperms, leading to the formation of two or even more LPOR isoforms in multiple species. In the case of Arabidopsis thaliana, AtPORA and AtPORB originated in one duplication event, in contrary to the isoform AtPORC, which diverged first. We performed biochemical characterization of these isoforms in vitro, revealing differences in the lipid-driven properties. The results prone us to hypothesize that duplication events of LPOR gave rise to the isoforms having different lipid-driven activity, which may predispose them for functioning in different locations in plastids. Moreover, we showed that LPOR from Synechocystis operated in the lipid-independent manner, revealing differences between bacterial and plant LPORs. Based on the presented results, we propose a novel classification of LPOR enzymes based on their biochemical properties and phylogenetic relationships.
Chlorophylls, the main photosynthetic pigments, are counted among tetrapyrroles, which is one of the most ancient prosthetic groups in all organisms . The origin of chlorophylls and oxygenic photosynthesis traces back to at least 2.7 billion years ago, and it contributed to the increase of the level of atmospheric oxygen [2–4]. The penultimate reaction of the chlorophyll biosynthesis pathway is the reduction of protochlorophyllide (Pchlide) to chlorophyllide (Chlide) , which today can be catalyzed by two distinctive enzymes: DPOR and LPOR. DPOR (dark operative light-independent protochlorophyllide oxidoreductase, E.C. 126.96.36.199) is considered to be phylogenetically older . It is closely related to nitrogenase  and it is oxygen sensitive . When the level of atmospheric oxygen begun to rise, DPOR activity decreased, what triggered the ancient cyanobacteria  to develop new, oxygen-insensitive mechanism of Pchlide reduction catalyzed by LPOR [6,8,10]. LPOR (light-dependent protochlorophyllide oxidoreductase, E.C.188.8.131.52) is a member of the short-chain dehydrogenase reductase (SDR) family and is the only enzyme among SDR proteins that requires light to trigger the reaction . Nowadays, most of the oxygenic photosynthetic organisms have both DPOR and LPOR [6,7].
Angiosperms, on the other hand, are the only known plants that completely lost DPOR genes and cannot synthesize chlorophyll without light. The light-triggered conversion of Pchlide to Chlide regulates the rate of chlorophyll biosynthesis and has profound consequences for angiosperm morphogenesis, including chloroplast development. In the dark, angiosperm seedlings and leaves become etiolated , developing photosynthetically inactive etioplasts instead of chloroplasts. Etioplasts contain large paracrystalline structures called prolamellar bodies (PLB), which accumulate lipids and proteins, mainly LPOR  in a form of ternary photoactive complexes with its substrates: NADPH and Pchlide [14,15]. Upon illumination, light triggers the reduction of Pchlide, followed by disintegration of PLB, which is the first step of the chloroplast development [12,16,17].
The light-induced enzymatic activity of LPOR has long attracted attention (see for review ). The LPOR protein was originally identified independently in Oryza sativa  and Hordeum vulgare  as a single polypeptide of ∼36 kDa. Later, two distinctive isoforms of the enzyme were discovered in different locations of etiochloroplast inner membranes . The genes coding the isoforms in Arabidopsis thaliana and H. vulgare were shown to be differently regulated by light, what was interpreted as two independent pathways of the light-dependent chlorophyll synthesis [22,23]. The isoform expressed during etiolation was called PORA, whereas the isoform PORB was considered to be responsible for maintaining chlorophyll synthesis throughout the whole plant life. This distinction, however, became inaccurate for plants having only one isoform, like pea . Moreover, the discovery of the third LPOR isoform in A. thaliana, named PORC, contributed to the inconsistency of the LPORs nomenclature . In contrast with AtPORA and AtPORB, the AtPORC undergoes expression only in light and is not controlled by the circadian clock. Knowledge about LPOR isoforms among plant species is still fragmentary.
The photoreduction of Pchlide to Chlide has been investigated using fluorescence spectroscopy at 77 K. Freezing the samples in liquid nitrogen does not damage Pchlide–LPOR complexes, but makes them temporarily light-insensitive, what allows spectroscopic measurements [14,15]. The spectra recorded for plant etiolated tissues showed high complexity, what was interpreted as a result of different sizes of Pchlide:LPOR aggregates with NADPH or NADP+. Numerous studies on leaves and isolated plastids, carried out over the last 50 years, indicated that Pchlide:LPOR:NADPH aggregates interact with the lipids of PLB and are responsible for light-triggered PLB dispersion. The exact mechanisms for these processes, however, are still elusive. The roles of distinctive LPOR isoforms, present in etioplasts, is also unknown.
Only a fragmentary research on enzymatic properties of recombinant LPOR isoforms was conducted so far, using both bacterial and plants LPORs from various species [26–30]. The results depended on the studied organism and the presence or absence of the lipids in the reaction mixture [31–33]. Recently, some interactions with plant lipids: monogalactosyldiacylglycerol (MGDG), phosphatidylglycerol (PG) and sulfoquinovosyldiacylglycerol (SQDG) were shown for the reconstituted photoactive ternary Pchlide–enzyme complexes of AtPORA .
The difficulty in drawing any strong universal conclusions out of the studies on LPOR activity described above indicates that probably we do not fully understand how the enzyme works and what regulates its activity. It cannot be excluded that inadequate isoforms are compared, since there is only a very limited data on LPOR evolution . In particular, the origin of LPOR isoforms in angiosperms requires investigation, as well as their role in plastids and potential benefits for plants having more than one LPOR isoform. The availability of protein and nucleotide sequences from an increasing number of genome projects opens up a new opportunity to address these issues. In the present study, we combined a large scale phylogenetic analysis of LPOR evolution with the study of enzymatic activity of LPORs selected from different branches of the phylogenetic tree. This analysis revealed that the enzymatic activity of the LPOR isoforms is regulated differently by lipids. These are the crucial results, indicating a new way to classify all the LPOR isoforms.
Materials and methods
The protein sequences of LPOR enzymes were downloaded from UniProtKB data base . We included into the analysis the sequences from both Swiss-Prot and TrEMBL. Duplicates, gene variants and partial sequences were excluded and the remaining sequences were manually validated for the presence of the motifs characteristic to SDR enzymes (TGxxxGXG and YxxxK motifs ) and the extra loop characteristic to LPORs . Then, corresponding DNA sequences were downloaded from the European Nucleotide Archive (ENA,  and KEGG database ). Finally, the sequences with ambiguous nucleotides were excluded. The final database consisted of 557 sequences.
Phylogenetic analysis and the sequences similarity analysis
The sequences were analyzed using MEGA7 software . Firstly, the sequences were aligned with the Muscle algorithm , using default parameters and then trimmed (Supplementary Data S1). During the analysis, multiple combinations of parameters and methods were tested: neighbor-joining, maximum likelihood, minimum-evolution and maximum parsimony with different substitution models. Only the neighbor-joining method with pairwise deletion  and Maximum Composite Likelihood model  with a gamma distribution of the rate variation among sites (shape parameter = 5) (Supplementary Data S2) resulted in the tree having topology resembling known phylogeny of seed plants (Supplementary Figure S1). Additionally, support for the nodes in the tree was assessed by non-parametric bootstrapping with 100 replicates. The differences in the composition bias among sequences were considered in evolutionary comparisons . The analysis involved 557 nucleotide sequences and included 1st codon positions. There were 884 codons in the final dataset. All trees presented in this study were visualized with the use of iTOL . The sequences similarity analysis was performed with the use of aligned sequences using scripts written in Python programming language.
Firstly, the tree was rooted with the use of the minimal ancestor deviation method , since no confirmed outgroup sequence was available. The minimal ancestor deviation method accommodates heterotachy by using all pairwise topological and metric information in unrooted trees and was shown to correctly recover the known root for both eukaryotic and cyanobacterial sequences . After the rooting, we performed molecular clock test based on the maximum likelihood method using the test of the molecular clock implemented in MEGA .
The tree was calibrated in three calibration points: two fossil-based  (Figure 2A, calibration times 2 and 3; Supplementary Table S2) and one based on other phylogenetic analysis  (Figure 2A, calibration time 1; Supplementary Table S2). Bacterial evolution of LPOR sequences was out of the scope of this study, therefore, the bacterial clades Z1 and Z2 were not calibrated and not analyzed. To verify the proper calibration of the tree, the divergence times of different plant species were compared with the literature data originating from four different studies [46,48–50] (Figure 2B, Supplementary Table S3).
Expression vector construction
The LPOR gene of Synechocystis sp. PCC 6803 (SynPOR) was amplified with Q5 polymerase (New England Biolabs) using disrupted cells as a template with the following primers: ATGGAACAACCGATGAAACCCAC as forward and CAGCCGGATCCTCGAGCATCTAAACCAGACCCACTAACTTTTCGC as reverse (annealing at 70°C). Simultaneously, pET15b vector was amplified using the following primers: ATGCTCGAGGATCCGGCTGCTAACAAAG as forward and GTTTCATCGGTTGTTCCATATGGCTGCCGCGCGGCAC as reverse (annealing at 71°C).
The products of the reactions were purified from agarose gel using Gel Extraction Minipreps Kit (Bio Basic Canada) and ligated together with NEBuilder HiFi DNA Assembly Kit (New England Biolabs). Escherichia coli DH5α competent cells were transformed with the ligation mixtures and the cells containing assembled plasmid were selected on an agar medium supplemented with 100 mg/l ampicillin. The clones containing inserts were identified using colony PCR. The recombinant plasmids were isolated using plasmid purification kit (Bio Basic Canada) and the inserts were verified by sequencing (Genomed, Poland).
The expression vectors for AtPORA, AtPORB and AtPORC from A. thaliana were constructed previously .
Protein expression and purification
The proteins were expressed in E. coli BL21(DE3)pRIL and purified with 6×His tag at N-terminal end according to previously described protocol .
The Pchlide was isolated from etiolated wheat seedlings as described previously  with some modifications including additional purification steps to minimize lipid impurities in the final preparation. The collected chloroform fractions after phase separation were evaporated on a rotary evaporator, and the extract was vortexed in methanol. The insoluble impurities were removed by centrifugation and the methanol solution was separated on a Sephadex LH-20 column to remove carotenoids. The Pchlide fraction was evaporated again and dissolved in acetonitrile/methanol/water (72/8/1, v/v). The insoluble impurities were removed by centrifugation, and the Pchlide solution was further purified twice on an Acclaim C30 RP column (ThermoScientific) in acetonitrile/methanol/water (72/8/5, v/v) at a flow rate of 1 ml/min, using absorption detection at 210 nm (to monitor lipid impurities) and fluorescence detection at 432/640 nm.
Low-temperature fluorescence spectra measurements
Low-temperature (at 77 K) fluorescence measurements were performed with the use of PerkinElmer LS-50B spectrofluorometer equipped with sample holder cooled with liquid nitrogen. Fluorescence emission spectra were recorded in the range from 600 to 700 nm with the scanning speed of 100 nm min−1; the data collection frequency of 0.5 nm and the excitation wavelength of 440 nm. Both excitation and emission slits were set to 10 nm.
Reaction mixtures contained 15 μM LPOR and 6 μM Pchlide, in the phosphate buffer (37 mM phosphate, pH 7.0; 225 mM NaCl; 150 mM imidazole; 25% v/v glycerol, 7 mM 2-mercaptoethanol) with the addition of NADPH (0–200 μM). In some samples, lipid mixture MGDG:DGDG:PG:SQDG (50:25:12.5:12.5 mol%). Pchlide and lipids were added from methanol stock solutions; the methanol concentration did not exceed 6.5%. Each reaction mixture was prepared in at least two quartz capillaries and incubated in darkness for 30 min before being frozen in liquid nitrogen for fluorescence measurements. Samples were prepared under dim and scattered green light, which was previously shown not to induce Pchlide photoreduction. After the spectra measurements, the capillaries were thawed in darkness and illuminated for 20 s with the continuous white light of 8 μmol photons m−2 s−1. These light condition provided complete photoreduction of Pchlide bound in the reconstituted Pchlid:LPOR:NADPH complexes. Then the samples were frozen again and used for measuring of the fluorescence spectra.
The analysis of the low-temperature fluorescence spectra
All the spectra were normalized at 635 nm. The intensities at 647 and 655 nm were read from the spectra recorded before illumination, while the maximal intensity of the band ∼680 nm from those measured after the illumination. Then, the relative increase of the fluorescence intensities at 647 nm (ΔF647), 655 nm (ΔF655) and at the maximum of 680 nm band (ΔChlide) were calculated. Mixtures prepared without NADPH were included as respective reference samples (Figure 3G–I). This method of analysis was previously described by Gabruk et al. . The ΔF647 parameter, which is defined as the increase of the fluorescence intensity at 647 nm as compared with the control sample without NADPH (Figure 3G), reflects the formation of ternary NADPH:LPOR:Pchlide complexes. The ΔF655 parameter, which is the increase of the fluorescence intensity at 655 nm as compared with the control sample (Figure 3H), reflects the formation of oligomeric NADPH:LPOR:Pchlide complexes with MGDG and negatively charged plant lipids: PG and SQDG. ΔChlide origins from the increase of the Chlide fluorescence due to Pchlide photoreduction and it reflects LPOR activity.
LPOR sequences were downloaded from UniProtKB (April 2018). To ensure the correctness of the data, every sequence was manually validated after downloading for the presence of: the motifs of SDR enzymes, namely the TGxxxGXG and the YxxxK motifs , and the extra loop, characteristic for LPORs . Then, corresponding DNA sequences were downloaded and used for the analysis. The final database consisted of 557 sequences: 336 sequences of bacterial LPORs, 2 sequences originating from Rhodophyta and 219 sequences from photosynthetic green Eukaryotes (including 190 sequences from angiosperms and 21 sequences from algae) (Supplementary Data S1 and Figure S1).
The phylogenetic tree of LPOR nucleotide sequences has 557 leaves representing all of the sequences used in this analysis (Figure 1A). The tree splits into three main clades: two consisted of mostly bacterial sequences (hereafter bacterial clades Z1 and Z2), and one consisted of both bacterial (clade Z3) and eukaryotic sequences (Figure 1A). Clade Z1 consists of 141 exclusively bacterial sequences, while clade Z2 has 180 bacterial sequences and two sequences originating from Rhodophyta. Clade Z3 consists of 15 bacterial sequences and is followed by the branches reflecting the evolutionary history of land plants: algae, mosses and liverworts, a lycophyte, gymnosperms and finally angiosperms.
The phylogeny of LPOR.
The part of the tree containing the seed plants is enlarged in Figure 1B. Most of the sequences, 190 out of 194, belong to the angiosperm clade, which splits into two main branches. One of them contains sequences originating from monocots, the other from eudicots. The two branches are separated by the LPOR sequence from Amborella trichopoda, the most basal lineage in the clade of angiosperms. Surprisingly, both branches seem to be duplicated: each branch splits into two subbranches of nearly the same organisms (Figure 1B, points 1 and 2). This is the evidence of two independent duplications of LPOR gene that occurred separately in monocots and eudicots, leading to the independent formation of two distinctive isoforms of the enzyme in these organisms. Most of the analyzed modern angiosperms have two LPOR isoforms that originate in one of these duplication events.
Interestingly, some eudicot species underwent secondary duplication events (that is the duplication of the gene that was already duplicated in the past), leading to the formation of the third, the fourth and other isoforms. These include order Asterales (Heliathus, Lactuca and Cyrana; all having three isoforms, Figure 1B point 5), order Brassicales (including Arabidopsis; Figure 1B point 7), order Solanales (including genera Solanum, Capsicum, Nicotiana — some having more than three isoforms, Figure 1B point 8), order Fabales (having at most two isoforms; Figure 1B point 6), genus Brassica (some members have up to 10 LPOR genes, Figure 1B point 10) and genus Gossypium (Figure 1B point 9).
The tree reveals that some of the species of angiosperms have only one LPOR gene. To ensure that these organisms indeed have only one copy of LPOR gene, we analyzed the genomes of every selected species and looked for the genes resembling LPOR using BLAST. We found nine species with fully sequenced genomes that have only one LPOR isoform (Supplementary Table S1). These include: Amborella trichopoda, Pisum sativum, Medicago truncatula, Trifolium pratense, Cicer arietinum, Prunus persica, Rosa chinensis, Citrus clementina and Citrus sinensis.
The phylogenetic tree of LPOR sequences originating from the seed plants resembles the known phylogeny of these species with high accuracy  (Supplementary Figure S2). Except for the duplication of selected branches caused by the gene duplication, the two trees differ in the placement of A. trichopoda and a few other species. Amborella, as the most basal lineage in the clade of angiosperms, supposes to be diverging from the rest of angiosperms first. The incorrect placement of this and some of the others branches (Supplementary Figure S2) are probably caused by high identity between the sequences of seed plants (overall mean distance = 0.149 for seed plants; 0.31 for the whole dataset). To assess the support of the nodes we performed the bootstrap analysis, however, there is no obvious correlation between the incorrect placement of the node and the bootstrap values (Supplementary Figure S2).
To tackle the time scale of LPOR evolution, we rooted the tree with minimal ancestor deviation method  and we performed a molecular clock test based on the maximum likelihood method using the test of molecular clock implemented in MEGA . The logarithmic likelihood values under the clock and non-clock models were calculated to be — 2 875 379.936 and — 53 950.512, respectively. The likelihood ratio test statistic showed (p < 0.00001, degrees of freedom = 555) that the evolutionary rate is not constant over time among the analyzed species, therefore, local clocks were applied.
The analysis revealed that all analyzed sequences started diverging from a single ancestral LPOR ∼ 1358 ± 214 MYA (Figure 2). The divergence times for selected angiosperms species estimated in this study are in good agreement with the literature data (Figure 2AB). Therefore, this confirms the considerable accuracy of the described estimation of LPOR duplication events (Figure 2AC).
Time-calibrated evolution of LPOR.
Enzymatic activity of LPORs
A. thaliana has been so far the only described plant with three independently regulated LPOR genes encoding three LPOR isoforms (Figure 1AB). The AtPORC, which was discovered the latest , originated from the earliest duplication event (Figure 1, point 2) and occupies different branch of the tree than AtPORA and AtPORB, which arose much later and therefore are more closely related (Figure 2). It has been shown recently that lipids interact with the recombinant AtPORA and increase its enzymatic activity . Moreover, in the presence of the lipids in the reaction mixture, the maximum of the fluorescence emission spectra measured at 77 K for reconstituted photoactive Pchlide:AtPORA:NADPH complexes is red-shifted to 655 nm. In mixtures without the lipids, this maximum was observed at ∼647 nm. These observations prone us to examine all three A. thaliana LPORs and to look for some differences between the isoforms. We compared the effect of the lipids on the fluorescence properties of photoactive complexes, enzymatic activity as well as NADPH binding. At the same time, SynPOR, the enzyme of Synechocystis, was also examined, to compare plant and bacterial LPORs.
The exemplary low-temperature fluorescence emission spectra of the reaction mixtures containing the investigated LPOR isoforms with the substrates are shown in Figure 3A, and those with the addition of the lipids (MGDG:DGDG:PG:SQDG 50:25:12.5:12.5 mol%) in Figure 3D. The addition of the lipids resulted in the shift of the fluorescence emission maximum to 655 nm for all the A. thaliana enzymes, whereas had no effect for SynPOR. Two parameters, ΔF647 and ΔF655, were calculated from the spectra. The ΔF647 parameter (Figure 3G) reflects the formation of ternary NADPH:LPOR:Pchlide complexes, whereas the ΔF655 parameter (Figure 3H) represents the oligomers of the ternary NADPH:LPOR:Pchlide complexes triggered by the interaction with MGDG and negatively charged plant lipids: PG and SQDG . The third parameter, ΔChlide (Figure 3I), calculated from the spectra measured after the photoreduction of Pchlide, reflects the increase of the fluorescence intensity of the product of the reaction (Chlide). The band having a maximum ∼635 nm that was observed in Figure 3A,D,G–I originates from Pchlide unbound to the enzyme.
Fluorescence properties of LPORs at 77 K.
When the lipids were not present in the reaction mixture, AtPORA and AtPORB formed much less ternary NADPH:LPOR:Pchlide complexes in a wide range of NADPH concentrations as compared with AtPORC and SynPOR (Figure 3A,B). The ability to form complexes having a maximum at 647 nm correlated with the enzymatic activity of the isoforms (Figure 3C): AtPORA and AtPORB were weakly active even at high NADPH concentrations, comparing to the AtPORC and SynPOR, which was the most active (Figure 3C).
The addition of the lipids to the reaction mixtures affected the emission spectra for all plant enzymes (Figure 3D), shifting the maximum to 655 nm. The biggest effect was observed for AtPORB. The presence of the lipids did not affect the ability of SynPOR to form 647 nm complexes, regardless of NADPH concentration (Figure 3B,E right axis). On the other hand, the lipids increased the affinity of the plant LPORs towards NADPH, since the enzyme formed complexes at much lower NADPH concentrations in the presence of the lipids (Figure 3E vs B). Additionally, the presence of the lipids increased the enzymatic activity of all plant isoforms (Figure 3F vs C).
The influence of the lipids on the enzymatic activity of AtPORs was investigated in more detail (Figure 4). Both the formation of 655 nm complexes and Chlide production revealed unexpected differences between the isoforms. The optimal lipid concentration for AtPORC and AtPORB was 40–100 μM, while for AtPORA it was 20–40 μM. The formation of the complexes by AtPORB was much less efficient than for the other isoforms. AtPORB was also less enzymatically active (Figure 4B). AtPORC, on the other hand, was highly active in broad range of the lipids concentrations, while AtPORA had only a sharp pick of activity at 20 μM lipids (Figure 4B).
The dependence of the lipids concentration on the formation of PLB-like complexes (Δ655).
The sequence dataset and the phylogenetic tree (Figure 1) provided us with the unique opportunity to create the sequences comparison between different clades to highlight the conserved differences between them. Such differences are clearly visible between the lipid-independent (i.e. bacterial) and lipid-dependent (i.e. eudicots) sequences (Figure 5A). Except the presence of the extended N-terminal end of eudicot's sequences, which plays a role of a transit peptide and is unimportant for the enzyme activity, there are several residues uniquely conserved in plant and in bacterial sequences. Based on the recently published LPOR structure , most of the differences described above are present of the surface of the enzyme (Figure 5B).
The sequence similarity analysis.
LPOR evolution — LPOR ancestor is still missing
It is believed that DPOR is evolutionary older and emerged first, probably at least 2.7 billion years ago, judging by the rise of the level of atmospheric oxygen produced by photosynthetic organisms [3,10]. Over the next hundreds of millions of years, the level of oxygen in the atmosphere was slowly increasing and reached ∼3% 2.2–2.0 billion years ago . Modern DPOR is no longer operative under such a concentration of oxygen, what led to the hypothesis that LPOR had have to emerge roughly at the same time to compensate for the decreased activity of DPOR . The data presented in this study show that the last common ancestor of the all analyzed modern LPOR sequences diverged from a single sequence ∼1.36 ± 0.21 billion years ago (Figure 2A). The discrepancy between the phylogenetic and geological data suggests that the last common ancestor of all modern LPORs was itself a product of ∼0.7 billion years of evolution. Probably, it is during that time the light-dependent properties were developed and mastered, however, another phylogenetic analysis focused on the evolution of SDR enzymes is required to address this hypothesis.
LPOR–lipids interaction is characteristic for plant LPORs
According to our data, the lipids we used do not change fluorescence parameters of Pchlide:SynPOR:NADPH reconstituted complexes, nor influence NADPH binding by SynPOR (Figure 3). This suggests that the lipid-dependent regulation of LPOR is an exclusive feature of plant enzymes. Moreover, our results show that LPORs originating form bacteria cannot form complexes having an emission maximum at 655 nm, which are believed to be the oligomeric state of the enzyme . Interestingly, AtPORA in Arabidopsis can be complemented by heterologous expression of SynPOR. Such plants have a regular size PLB, however, the amount of 655 nm complexes in these plants is comparable to AtPORA knock-out . Therefore, the lipid-driven activity and oligomerization properties of LPOR we see in plant isoforms must had evolved somewhere in the course of evolution and should be reflected in some residues uniquely conserved in plant and in bacterial sequences. Sequence analysis (Figure 5) revealed several such residues, most of which are located at the protein surface . Some of them must be responsible for the observed differences in the interaction with the lipids and the ability to form oligomers. However, extensive biochemical research is required to identify which ones exactly and to elucidate the molecular mechanisms of lipid–protein interaction. The sequences comparison (Figure 5), may help in the selection of the residues for this kind of research.
LPOR duplication and loss in angiosperms
Most oxygenic photosynthetic organisms have both DPOR and LPOR, with a few exceptions (summarized in: [7,11]). There are only a handful of studies on the DPOR activity in organisms having both enzymes, including cyanobacteria , liverworts  and gymnosperms . In all of these organisms, DPOR is expressed in the dark and under low intensities of light, whereas LPOR expression is triggered by moderate and strong light intensities. The loss of DPOR genes by the ancestors of angiosperms resulted in the loss of the ability to synthesize chlorophyll in the dark [7,11]. Angiosperms seedlings are, therefore, particularly vulnerable during dark-to-light transitions after etiolation, because they cannot immediately perform photosynthesis upon illumination. This seems to be an important factor driving the LPOR evolution in angiosperms, namely the duplication of LPOR gene and specialization of newly formed isoform dedicated to etiolation. Such an isoform has been found in Arabidopsis , barley (Hordeum vulgare) , rice (Oryza sativa) , maze (Zea mays), pea (Pisum sativum), bean (Phaseolus vulgaris), mustard (Sinapis alba), tomato (Lycopersicon esculentum) and sunflower (Helianthus annuns) .
Surprisingly, A. trichopoda — the most basal lineage in the clade of angiosperms  — has only one LPOR gene in its fully sequenced genome (Supplementary Table S1). This gene is located on a separate branch of the phylogenetic tree (Figure 1B), what is a strong indicator that it diverged before any detected LPOR duplication event took place. Therefore, it suggests that the ancestors of angiosperms had only one copy of LPOR gene and that Amborella is their direct descendent. It took ∼40 and 55 MY since the divergence of Amborella to the first duplication event of LPOR gene in the ancestors of monocots and eudicots, respectively (Figure 2, points 1 and 2). Both duplications independently gave rise to isoforms dedicated to etiolation in monocots and eudicots. Similar duplications of the original LPOR gene occurred independently in genus Zostera (Figure 2 green point 4) and in orders Ranunculales and Proteales (Figure 2 green point 3), however, these findings are ambiguous due to the limited numbers of sequences from these orders and the lack of experimental data on the expression of the isoforms in these plants.
As the phylogenetic tree revealed, the LPOR gene kept duplicating with the diversification of angiosperms species, at least in eudicots. We detected six secondary duplications of LPOR gene, i.e. the duplication of the gene that was duplicated previously. The secondary duplications led to the presence of more than two isoforms in numerous species of angiosperms (Supplementary Table S1), namely: in orders Asterales, Brassicales and Solanales (Figure 1B, points 5, 7 and 8), and in genera Gossypium and Brassica (Figure 1B, points 9, 10). The apparent duplication of LPOR gene in the latter are probably caused by the polyploidization events in these organisms [61,62], respectively). In others, i.e. belonging to the orders Asterales, Solanales and Brassicales (Figure 1B, points 5, 7 and 8), the independent formation of the third LPOR isoform may have a physiological significance, since one of these duplications gave rise to isoforms AtPORA and AtPORB in Arabidopsis. It is highly plausible that LPOR isoforms in these organisms have developed some unique and distinctive properties, just like AtPORA and AtPORB. The detailed biochemical study of these isoforms, however, is required to investigate this hypothesis.
Interestingly, the ancestor of subfamily Faboideae (including Vigna, Phaseolus, Cajanus, Glycine and Lupinus, Figure 1B point 7) underwent LPOR gene duplication that followed a loss of one copy of the gene. As a results, these organisms still have two LPOR isoforms, but of different origin than other eudicots.
The phylogenetic tree revealed few species of eudicots having one LPOR gene in their fully sequenced genomes (Supplementary Table S1), even though their ancestors underwent the gene duplication in the past (Figure 1B point 2). This suggests that one LPOR gene has been lost by the ancestors of these plants. We identified three such events: in genus Citrus, in clade Vicioid (including genera Pisum, Medicago, Trifolium and Cicer) and in family Rosaceae (including genera Prunus and Rosa).
Arabidopsis LPORs by lipids is important for their physiological role
Lipids influence the enzymatic activity of every AtPOR (Figures 3 and 4) and promote aggregation of the ternary Pchlide:LPOR:NADPH complexes in the reaction mixture, judging by the shift of fluorescence emission maximum . As a result, the recorded fluorescence emission spectra (Figure 3D) resembles those of PLBs . Interestingly, the activity of AtPORC in the absence of lipids is much higher than those of AtPORA and AtPORB. The results of the current work provide an interesting explanation for what is already known about the occurrence and functioning of particular AtPOR isoforms.
The AtPORA isoform is expressed during etiolation and it is strongly down-regulated by light [22,64,65]. Our results indicate, that the interaction with PG increases the affinity of the AtPORA towards NADPH (Figure 4 and ), so active AtPORA complexes preferentially bind to the lipid membranes. Moreover, the formation of 655 nm complexes can effectively happen even when the lipid concentration is low, and such complexes have a high enzymatic activity (Figure 4). The isoform is, therefore, perfectly tuned to the early stage of etioplast development, when there are no or little available inner membranes in the plastids for the enzyme to form PLB. At the same time, the high activity of this isoform efficiently provides Chlide for chlorophyll biosynthesis at the beginning of the deetiolation, what makes this isoform crucial for the process .
AtPORB, which is expressed in etiolated seedlings and in mature leaves , is involved in PLB formation and in chlorophyll synthesis both during deetiolation and in light-grown plants [64,65]. Similarly to AtPORA, active complexes of AtPORB preferentially binds to the lipid membranes due to the interaction with PG (Figure 3). AtPORB, however, is less active than AtPORA (Figure 4), therefore, during deetiolation it reduces Pchlide slower and over prolonged time. In mature leaves, AtPORB must form complexes exclusively on the membrane, i.e. on grana, lamellae or inner chloroplast membrane, however, additional research is required to demonstrate it.
The expression of AtPORC is positively regulated by light [25,67]. Therefore, this isoform is expressed after deetiolation and is present in mature leaves only. It can bind Pchlide without the lipids under low NADPH concentrations (Figure 4), thus it is the only isoform in Arabidopsis that might be active in the stroma of the chloroplasts. On the other hand, when the lipid concentration is high, AtPORC can oligomerize on the lipid membrane, what increases its activity (Figure 4). This form, together with the AtPORB, is crucial for maintaining chlorophyll biosynthesis in green plants, and for the formation of the grana thylakoids . The lipid-regulated activity of AtPORC may be important to maintain chlorophyll synthesis at an optimal level in changing light intensities, especially during the rearrangement of thylakoid grana.
The results of the biochemical characterization of the isoforms are in agreement with the conclusions drawn out of the phylogenetic three: AtPORA and AtPORB are more similar than AtPORC, since they originate in the duplication of the same gene, whereas AtPORC diverged much earlier (Figure 1B). Surprisingly, the interaction with the plant lipids seems to be the main factor driving the specialization of the isoforms. It remains unclear, however, whether it is the lipids regulating the activity of the enzyme, or rather the enzyme influencing the lipids and the membranes. Certainly, LPOR isoforms from other branches of the phylogenetic tree must be biochemically characterize to tackle this issue.
A need for a new LPORs classification
Letters A and B, which are currently used to distinguish individual LPOR isoforms in most plants, has been proposed several years ago for LPOR classification [22,23,25], solely based on the expression pattern of respective genes: letter A for the isoforms expressed during etiolation, and letter B for those which expression is triggered by light. In the other papers, on the other hand, the isoforms were just numbered (for review see ). Both approaches are flawed and inconsistent. The numbering system provides no information about the properties of the isoforms it refers to. The letter system, on the other hand, is impractical for unorthodox isoforms: AtPORB, which has constitutive expression, or for plants having more than two or only one LPOR gene which is expressed both in etiolated state and in mature leaves. Moreover, such a criterion may be misleading, suggesting the mutual origin of all LPORA isoforms in the same duplication event. This is clearly not true in the light of the phylogenetic analysis presented in this study (Figure 2). The analysis indicates that LPOR isoforms dedicated to etiolation evolved independently in at least monocots and eudicots (Figure 2B points 1 and 2), probably to compensate for the lack of DPOR genes.
Therefore, we propose a novel classification of LPOR enzymes based on their biochemical properties and phylogenetic relationship. We distinguish three types of isoforms: S, L and Z type.
Z-type LPOR do not form 655 nm complexes with the lipids and PG does not affect their NADPH-binding properties. This is an LPOR version present in bacteria.
S-type and L-type isoforms originated in the main duplication event in eudicots (Figure 1, point 2). They both can form 655 nm complexes and the interaction with PG increases their affinity towards NAPDH. Isoforms type S (active in Soluble form), including the AtPORC, can be enzymatically active without the lipids, although, they do form more active oligomeric complexes when interacting with the lipids. On the other hand, isoforms type L (active mostly with the Lipids) are weakly active unless bound to the lipid membrane, like AtPORA and AtPORB.
Surprisingly, four independent secondary duplication events that occurred in the ancestors of eudicots happened to the L-type LPOR (Figure 1, points 5–8). Moreover, out of three groups of eudicots that independently lost one copy of the gene, they all lost S-type isoform and kept the L-type. These findings indicate that the interaction with the lipids and the L-type isoform play a crucial physiological role that is yet to be discovered.
It had to be mentioned here that the sequence analysis of S- and L-type isoforms from eudicots revealed only a few conserved subtle differences between them (Supplementary Figure S3). This suggests that the enzymes within the clades may be more diverse in the terms of their biochemical properties, and therefore in their sequences. Such an explanation sounds valid, bearing in mind that the L-type isoforms has been duplicated at least four times, leading to the formation of the third isoforms in numerous species. The biochemical characterization of these isoforms is absolutely necessary to understand the evolution and real significance of the light-dependent chlorophyll synthesis.
The results presented in this paper shed a new light on the evolution of the chlorophyll biosynthetic pathway. The phylogenetic analysis revealed multiple duplication events of LPOR in angiosperms evolution, probably triggered by the loss of DPOR genes. Based on the biochemical study, we are proposing new classification of LPOR family into three different types: one bacterial, that is lipid-independent, and two lipid-driven, present in angiosperms: L-type and S-type. L-type isoforms preferentially form complexes on the lipid membranes, while the S-type ones are active both with and without lipids. The new classification takes into account the biochemical characteristics of the interaction between the isoforms and the lipids, as well as their phylogenetic relationships. The phylogenetic data and the biochemical analysis suggest a high diversity of the properties of the LPOR isoforms originating in different duplication events, what reflects the complicated evolution of LPOR isoforms in Angiosperms (Figure 6). The biochemical characterization of other isoforms, from different branches of the phylogenetic tree, is absolutely necessary to understand the evolution and real significance of the light-dependent chlorophyll synthesis.
A schematic representation of the origin of S-type and L-type of LPOR isoforms in angiosperms.
The authors declare that there are no competing interests associated with the manuscript.
M.G. performed all the analyses. M.G. and B.M.-K. interpreted the results and wrote the manuscript.
This study was supported by Bekker scholarship granted by the Polish National Agency for Academic Exchange (NAWA) (PPN/BEK/2018/1/00105) and START scholarship granted by the Foundation for Polish Science (FNP) (024.2018), both subsidies awarded to M.G. The work was in part the result of the implementation of the research project No. 2013/10/E/NZ3/00748 financed by the National Science Centre, gained by B.M.-K.