What’s in a mass?

Juan J. Calvete, Libia Sanz, Diana Mora-Obando, Bruno Lomonte, Anita M. Tanaka-Azevedo, Karen de Morais-Zani, Sávio S. Sant’Anna and Cleópatra A.S. Caldeira Evolutionary and Translational Venomics Laboratory, Instituto de Biomedicina de Valencia, CSIC, Valencia, Spain; Instituto Clodomiro Picado, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica; Laboratório de Herpetologia, Instituto Butantan, São Paulo, Brasil; Centro de Estudos de Biomoléculas Aplicadas a Saúde (CEBio), Fundação Oswaldo Cruz, Fiocruz Rondônia, Porto Velho, RO, Brasil

This short essay pretends to make the reader reflect on the concept of biological mass and on the added value that the determination of this molecular property of a protein brings to the interpretation of evolutionary and translational snake venomics research. Starting from the premise that the amino acid sequence is the most distinctive primary molecular characteristics of any protein, the thesis underlying the first part of this essay is that the isotopic distribution of a protein's molecular mass serves to unambiguously differentiate it from any other of an organism's proteome. In the second part of the essay, we discuss examples of collaborative projects among our laboratories, where mass profiling of snake venom PLA 2 across conspecific populations played a key role revealing dispersal routes that determined the current phylogeographic pattern of the species.

Introduction: fundamentals of biological mass spectrometry
The title of this short essay is inspired by Gilbert O'Sullivan's 1968 song 'What's in a Kiss' 1 . A kiss is something so everyday that we rarely stop to think about its meaning and the feelings it conveys. A kiss can mean very different things to people of different cultures. One example may suffice to start a reflection in this regard.
In his 1864 book 'Savage Africa', Scottish philosopher, historian, anthropologist and explorer William Winwood Reade (1838-1875) recounts falling in love with Ananga, the daughter of an African king. After several months courting her, he dared to kiss her. Unfortunately, things developed in an unpredictable way. The girl, who had never conceived the possibility of such a thing, 'gave a shriek and bounded from the house like a frightened fawn' [1]. Only later did Reade realise that the young princess had interpreted his kiss as an intention to dine off her! Something similar happens with the perception of the mass. We are so familiar with the intuitive Newtonian notion of 'mass' as a proxy for quantity of matter that we do not stop to think about all the conceptual complexity of this fundamental quality of everything around us that moves at subluminal speed. The large number of ways and units in which the mass can be measured or operationally defined (i.e. inertial mass, gravitational mass, atomic mass, relativistic mass…) highlights this point [2].
In Natural Sciences, the mass of polyatomic molecules is generally measured in daltons (Da) or unified atomic mass (u) units [3,4]. Prior to the 2019 redefinition of the International System of Units (SI) base unit quantities expressed in Da or u were, by definition, equivalent to g/mol and thus strictly numerically interchangeable. After the 20 May 2019 redefinition of units, this relationship is only nearly equivalent. The definition of the Dalton (1/12 of the mass of an unbound neutral atom of carbon-12 in its nuclear and electronic ground state and at rest, which by definition is exactly 12.000 Da) was not affected by the 2019 redefinition of SI base units. However, the definition of mole Version of Record published: 30 April 2021 was changed to be the amount of substance consisting of exactly 6.02214076 × 10 23 entities, and the concept of kilogram (kg) was changed as well by taking for its definition the fixed numerical value of the Planck constant h (6.62607015 × 10 −34 J s) in SI units (kg m 2 s −1 ), where the meter (m) and the second (s) are defined in terms of c (the speed of light in vacuum, 299 792 458 m s −1 ) and the caesium hyperfine frequency, Δν Cs (9 192 631 770 Hz) [3].
Proteins are large size biomolecules that perform a vast array of functions within organisms. Proteins are polymers formed by the linear arrangement of up to 20 building blocks (the amino acids, AAs), which fold into specific 3D structures that determine their activity. Proteins differ from one another primarily in their amino acid sequence, which is dictated by the nucleotide sequence of their genes. The amino acid sequence can thus be considered an important, if not the most distinctive primary molecular characteristic of a protein.
The vast majority (99.9%) of the 10 739 protein sequences analysed conformed to a similar basic molecular formula C n H (1.58 ± 0.015)n N (0.28 ± 0.005)n O (0.30 ± 0.007)n S (0.01 ± 0.002)n [5]. Only molecules with the same structural formula have identical mass, which in the case of proteins means having the same polypeptide length and amino acid composition. Isobaric polypeptides may differ in the order of their building blocks but any change in number or type of residues (other than isobaric isoleucine/leucine replacement) will be reflected in the coefficients of the constituent atoms of the structural formula. The occurrence of absolutely isobaric proteins in an organism's proteome is highly unlikely. The number of isobaric sequences that can theoretically be generated by permutations with repetition of n elements (n = number of residues of the amino acid sequence) of 20 classes (n1, n2, n3 … n20) of building blocks is given by the formula n!/(n1! × n2! × n3! × … n20!), which for a 100-residue protein equals 1.5 × 10 148 , a number much greater than the estimated number of atoms in the observable Universe (10 77 -10 80 )! It is therefore mandatory to conclude that only a small set of all possible theoretical combinations of amino acid sequences of any length occur in the real biological space [6]. Furthermore, biological evolution is intrinsically a creative process of molecular diversity governed by the so-called 'Fourth Law of Thermodynamics', by which when a system can follow various paths that allow it to maximise entropy, it will follow the path(s) that do so most rapidly, given the restrictions to conserve a specific architecture and to retain interactions that mediate functions [7]. Thus, while the structural divergence of the copies of a multiplicated gene represents the substrate for molecular evolution, structural convergence of two genes to produce two identical copies has never been documented. In line with this view, except for highly structural and functional constrained short (<50 AAs) proteoforms whose sequences are subject to compensatory mutations of co-evolving residues [8], and proteins encoded by a recently (in the evolutionary scale) duplicated gene, there is no documentation in any organism of two proteins with identical length and amino acid composition (i.e. identical molecular mass) [9]. Additionally, chemical modifications of proteins not only play a central role in encoding and decoding cellular information, but may also modify isobaric polypeptide sequences creating distinct protein species [10,11]. So, can the mass of a given protein serve to unambiguously differentiate it from any other protein molecule? In this mini-review we advocate for a positive response. Consubstantial with this thesis, in what follows we will discuss the critical importance of high-resolution (HR) mass spectrometry (MS) for deriving the exact (to several decimal places) monoisotopic mass of a protein ion from the resolved isotopic distribution of its mass-to-charge ratio (m/z). Access to HRMS is readily available in the field of biomolecular MS. However, the accompanied information on the isotopic distribution in HR spectra is rarely used at its full potential. Here we discuss recent examples from our own research plot that highlight the added value of accurate molecular mass profiling of snake venom toxins in evolutionary and ecological venomics.
The pillars of protein mass profiling: mass resolving power, accuracy, and the isotope distribution A mass spectrometer is a device that measures the mass-to-charge ratio of ions. Protein molecules, as any other terrestrial matter, incorporate elemental isotopes in proportion to their natural abundances [12]. Consequently, the molecular mass of a protein is not a single value but a distribution of values, the isotopic distribution, represented by the spectral peaks of the monoisotopic ion (M) and a series of peaks (M + 1n, M + 2n, M + 3n…) differing from each other in the number (n) of isotopes in their atomic composition ( Figure 1).
Accurate mass measurement requires the highest possible mass resolution to ensure that only a single molecular structure populates the protein's monoisotopic mass spectral peak. The accuracy (e.g. the difference between the measured and the experimentally measured mass-to-charge (m/z) of an ion) and resolving power (defined as M/ΔM, where M is the m/z of an analyte's ion peak and ΔM is the full width at half maximum (FWHM) of that peak) of protein mass measurements by biomolecular mass spectrometry has constantly increased over the last decade. The highest available broadband mass resolving power and mass accuracy is from Fourier transform (FT) orbital trapping (i.e. Orbitrap) and ion cyclotron resonance (ICR)-MS mass analysers. With resolving power of 1 000 000-10 000 000 and mass accuracy ± 0.5-1 ppm (0.0005-0.001u) at 1000u, these MS configurations are capable of determining the monoisotopic mass of multicharged medium-size (15-25 kDa) protein ions to >3-4 decimal places.
The monoisotopic mass, represented by the 'lightest' peak which is also the least abundant species of the isotopic distribution, is the most reliable mass characteristics of biomolecules. The main reason is that this mass spectral signal has a unique isotopic composition made up by the most abundant, which are also the lightest, isotopes of biomolecular elements ( 12 C, 1 H, 14 N, 16 0, and 32 S) and therefore its mass is unaffected by isotopic variabilities. Mass assignments based on any of the other (usually the most abundant) isotopic peak may introduce an error due to the incorrect assignment of the number of heavy isotopes in that peak. Furthermore, due to the different mass defect contributed by the different heavy isotopes of CHNOS, isotopic peaks other than the monoisotope are ensembles of all the isotopic variants containing equal number of additional neutrons with respect to the monoisotopic species ( Figure 1). Mass defect differences between the light and heavy isotopes of CHNOS are due to the different relativistic binding energy to accomodate a neutron in the nucleus. Every isotope of every chemical element has a subtly different mass defect, meaning that each of the isotopic distribution mass peaks exhibit isotopic fine structure, which can however be resolved at sufficiently resolving power. For example, at ∼2 Da above the monoisotopic mass of a protein, the isotopologues having two 15 N, one 34 S, one 18 O, two 13 C, and two 2 H are 1.99407 Da, 1.995796 Da, 2.004244 Da, 2.00671 Da, and 2.014102 Da, respectively, above the monoisotopic mass. Resolution of the fine structure of isotopologue peaks of the 16+ charge state of a 15.8 kDa protein is achievable with the resolving power in excess of 8 × 10 6 of high magnetic field (9.4T) FT-ICR instruments [13]. The monoisotopic (M) and isotope-averaged (M ave ) masses are highlighted. The fine structure subjacent to the M + 2n isotopic peak and the calculated relative contributions and masses for the 2 H, 13 C, 15 N, 18 O and 34 S isotopologues is displayed. The inserted right panel shows theoretically number of distinct ensembles of all the quasi-isobaric isotopic topological variants containing an equal number of additional neutrons with respect to the monoisotopic species, that could form part of the envelope of the isotopic peaks M + n and M + 2n. Isotopic combinations that, due to the relative abundance of the isotopes they contain, may contribute at least one structure, are rendered in red.
The fine structure underlying the M + 1 and M + 2 isotopologue peaks can give information about the chemical structure of the monoisotopic ion. Hence, identification of the 34 S isotopomer can unequivocally confirm the presence and number of 32 S atoms in the monoisotopic species [13]. The increased probability for multiple heavy isotopes as the mass of a molecule increases causes a decrease in the relative abundance of the monoisotopic peak. Fine structure resolved isotopic distribution is particularly useful for proteins with masses larger than 13 kDa, such as the snake venom phospholipase A 2 whose mass spectrum is displayed in Figure 1, where the monoisotopic mass (13803.58149 Da) is only 0.0197% of the most abundant ion, and mass imprecision (which ultimately limits mass accuracy) depends directly on the signal-to-noise ratio [14].
The impressive analytical capability of frequency-based mass analysers does not necessarily represent the theoretical limit of the performance characteristics of FTMS. Leucine and isoleucine are positional isomeric amino acids, and thus have identical mass. Positional isomers have different heat of formation, and thus different relativistic mass, ΔE = Δmc 2 = ∼1 eV, which corresponds to a mass difference of ∼10 −9 Da. Mass resolving power increases linearly with increasing magnetic field strength, and thus achieving ultrahigh resolving power to resolve unique elemental assignment for positional isomeric compositions will likely go hand in hand with the development and implementation of higher superconducting magnets.
Since its development for routine analysis at the turn of the current century, molecular mass spectrometry along with omics technologies have revolutionised our way of conceiving biological research. The continuous increase in resolving power of mass analysers represents a quantum leap to made inroads into integrative biological research in the gas phase [15][16][17]. In the previous section we have reflected on the information that can be derived from the simple act of measuring the mass of a polyatomic compound in general, and of a protein in particular. Based on the instrument resolution, different levels of detail can be achieved, i.e. aggregated, fine, and hyper-fine structure [18]. An inherent drawback with cutting-edge technologies capable of revealing the hyperfine structure of the isotopic distribution of molecular mass, is the high cost of both the instrumentation and its maintenance. Fortunately, on the one hand, most today's benchtop mass spectrometry configurations are endowed with analytical capabilities for fine mass measurements, and on the other hand, not every biological experiment requires determining molecular masses with hyper-fine resolving power. Consequently, the trend in biological mass spectrometry is to have in-house instrumentation to carry out routine work and submit samples, whose analysis requires specific technology, to specialised services or technology leading laboratories. The aim of this section is to show examples of how the concepts exposed in the previous section, particularly mass profiling of snake venom PLA 2 molecules, have been useful to elaborate biological stories in the context of ecological-evolutionary venomics projects. We focused on PLA 2 s because due to its size (13-14 kDa) and absence of post-translational modifications (other than seven absolutely conserved intramolecular disulfide bonds), the monoisotopic molecular mass can be accurately derived both in silico (calculated from the amino acid sequence translated from a species-specific transcriptomic database) and experimentally determined (i.e. by electrospray-ionisation mass spectrometry, ESI-MS) using most of the available benchtop mass spectrometry systems [19]. However, it should be emphasized that the concepts and applications of mass profiling in venomics are not limited to a particular type of toxin. Theoretically, any toxin class whose identity can be unambiguously derived from its mass can serve the same purpose. This is the case of PI snake venom metalloproteinases (PI-SVMP), cysteine-rich secretory proteins (CRISP), and disintegrins (DISI), commonly found in Viperidae venoms, and toxins from multigene families typically expressed in elapid venoms, such as Kunitz-type inhibitors (KUN) and three-finger toxins (3FTxs) [20,21]. This cast of molecules expands the possibilities of applying mass profiling to study important snake species whose venoms lack PLA 2 s, e.g. the African non-spitting cobras Naja annulifera [22], N. senegalensis [23], N. haje [24] and N. nivea [24].
Venom toxins evolve by accelerated evolution under strong adaptive selective pressures [25]. Acting through a variety of mechanisms neutral, positive and balancing selection promote structural and functional innovation, whereas purifying (or negative) selection operates conserving structurally and functionally important residues in venom toxins. Conserved cysteines and buried residues appear to be under strong purifying selection due to their role in maintaining the active conformation of the toxin's fold, whereas residues under diversifying selection are generally surface-exposed, often localised in loops extending from the compact central core. Consequently, absolute conservation of toxin structure may occur only within and between recently (in evolutionary terms) genetically isolated populations [26]. The smallest mass difference generated by a single, quasi-isobaric Gln/Lys, amino acid change is ±0.03638 Da; any other single residue replacement contributes a minimum Δmass of ±0.98401 Da. At unit mass resolution provided by quadrupole and ion trap analysers, a map of connectivities between conspecific snake populations that express venom toxins of identical or quasi-isobaric monoisotopic masses reveals a global dispersal pattern of the species. However, the directionality of the resulting biogeographic branching pattern from the ancestral and subsequent derived populations isdifficult, if not impossible, to establish. Eco-evolutionary feedback is required to unveil fine spatio-phenotypic venomics correlations that inform about the genetic flow underlying individual dispersal steps that ultimately determine the biogeographic structuring of venom variation within wide-ranging species. All these aspects were taken into account for the elaboration of this essay. Research selected here to illustrate the biological value of determining molecular masses of venom toxins in snake venomics projects were recorded initially on a Sciex QTRap2000 MS instrument (resolving power of 6.000 at full-width at halfmaximum, FWHM) and, more recently, using a Synapt G2 HR mass spectrometer (R in excess of 40.000 FWHM).
On the added value of measuring the mass of venom toxins in evolutionary and translational snake venomics Venoms are integrated phenotypes used for predatory and defensive purposes by a wide phylogenetic range of organisms [27][28][29]. Understanding these complex evolutionary and intrinsically ecological traits demands a detailed picture of the functional significance and evolutionary history of the proteins that make up venoms. Adaptations to ecosystems require evolutionary changes of both morphological and molecular phenotypic traits that maximise the organism's fitness in local environments. Despite being traits of moderate genetic complexity in terms of the number of genes that encode toxin families, within-and between-species venom variability seems to be a common feature at all taxonomic levels. Genomic reorganisations and post-transcriptional regulation of the expression patterns of toxin messengers have been reported to underlie in-space (geographic) and in-time (ontogenetic) venom variability [30]. Understanding how natural selection shapes the genetic variation in populations underlying the adaptations that drive intra-specific venom evolution is of paramount importance in evolutionary biology, and requires that research questions be formulated from an ecologically informed perspective. The geographic pattern of intraspecific venom toxin conservation among disjunct conspecific populations provides clues for tracing dispersal routes that accounted for the current biogeographic distribution of the species [31,32]. The occurrence of intraspecific individual variability resulting through divergence in allopatry highlights the concept that a species should be considered as a group of metapopulations [33]. This, in turn, may help rationalising the often documented compositional and toxinological variability of snake venoms, particularly in wide-ranging species. The examples that follow illustrate this point.

Mass profiling of PLA 2 molecules provides a venomics ground to trace Russell's viper landscape of structural and toxinological variability across the Indian subcontinent
The well-documented variable clinical manifestations of envenomings caused by Russell's viper (Daboia russelii) in different areas of the Indian subcontinent indicate geographical variations in venom composition. However, only venom obtained from a single source, the lrula tribal snake venom extraction cooperative situated in the south Indian state of Tamil Nadu, is being used for manufacturing antivenoms to treat Russell's viper bites in India and surrounding countries. Consequently, Indian anti-Russell's viper antivenoms are suboptimal, and in some cases even inadequate, to fully neutralise all the geographic-specific clinical complications across its wide range (consult [34] and therein cited references). PLA 2 s represent the dominant toxin family in the venom proteomes of D. russelii from Bangladesh, Sri Lanka, Pakistan, and South India. Figure 2 displays the distribution of D. russelii venom PLA 2 s across the Indian subcontinent. Connecting populations that share identical PLA 2 molecules depicts a rough picture of the dispersal pattern of Russell's viper ancestors to its current biogeographic distribution. The data led us to propose two alternative routes of radiation for Russell's viper, towards the northeast of India and Bangladesh and towards south India and Sri Lanka, with a common origin in Pakistan. The mass profiling informed phylovenomics landscape provided a ground for rationalising the geographic variability in venom composition and the documented North-South divide in Russell's viper envenoming in India. The dispersal route towards the south explains the paraspecific neutralisation activity exhibited by antivenoms raised against D. russelii venom from Tamil Nadu or Sri Lanka towards the toxic effects, including lethality, of D. russelii venom from Pakistan, and their poor outcome against venoms from northern India and Bangladesh [34]. On the other hand, this knowledge offers the possibility of improving current antivenoms in a scientifically informed way.
Tracing dispersal patterns of the wide-ranging species Bothrops atrox and B. asper The genus Bothrops includes at least 50 species of venomous pitvipers (Viperidae: Crotalinae) which are widely distributed throughout the Americas, from México to southern Argentina. Bothrops species inhabit diverse habitats, from tropical and subtropical forests to arid and semiarid regions, and from sea level to altitudes of more than 3000 m [35]. The common ancestor of the extant species within genus Bothrops was the first viperid to colonise South America, sometime during the late Miocene. Rapid dispersal and diversification of Bothrops across South America may have been favoured through adaptive radiations into habitats devoid of viperid competitors. A single species, the ancestor of B. asper, reinvaded Central America, where it remains the only widespread species of Bothrops. Biogeographic studies support B. asper as the first species complex to split from the B. atrox clade in the Pliocene, ∼3.02-2.32 Mya, as the result of geological events, including the uplift of the Guanacaste, Central, and Talamanca mountain ranges in Mesoamerica, the late Miocene through the Pliocene; the closure of the Isthmus of Panamá; and the uplift of the Andes mountain range in South America. Habitat fragmentation promoted new ecological niche opportunities and led to allopatric speciation in these groups [36,37]. We have applied PLA 2 mass profiling to elucidate biogeographical patterns of the highly adaptable and widely distributed B. atrox and B. asper, which are considered the most dangerous snakes throughout much of their distribution range in the Amazon region (B. atrox) [31] and throughout Central America and west of the Andes in northern South America (B. asper) [32]. The pale green and brown arrows highlight, respectively, the proposed Northwest-South and Northwest-East dispersal routes that emerge by connecting geographic variations of Russell's viper venoms sharing PLA 2 molecules. Shared PLA 2 s distributed along the northwest-northeast and northwest-south dispersal routes are color-coded in brown and green, respectively (for primary data and bibliography, please consult [34]).  Panel A of Figure 3 shows a map of connectivities revealing a phylogeographic pattern consistent with a gradual expansion of B. atrox range. Analysed in the frame of current molecular evidence, which consistently places the northern Venezuelan haplotype as the sister group to all other South American mainland Bothrops haplotypes [38], the proteomic data suggest the existence of an Andean corridor linking northern and southern Amazonian B. atrox populations. The data also showed a 'PLA 2 connection' between the B. atrox population of Puerto Ayacucho (Venezuelan Amazonia, 3) and the Colombian Magdalena Medio Valley B. asper population (1) ( Figure 3B). This finding is consistent with a model of dispersal of the ancestors of B. asper along a northeast to northwest path, presumably through corridors between the Venezuelan Cordillera de Mérida and the Colombian Cordillera Oriental and Sierra de Perijá and Sierra Nevada de Santa Marta (asterisk in Figure 3A [40] revealed that the two Brazilian taxa share an identical PLA 2 molecule between themselves but not with Peruvian B. b. smaragdinus ( Figure 4B). This seemingly unusual result only found a logical explanation when the origin of the poisons donor lineages was investigated. Venom from B. b. bilineatus was collected from an   [41]. The Santo Antônio hydroelectric dam on the Madeira River is 6 km southwest of Porto Velho, suggesting that both B. bilineatus subspecies may live in sympatry in Rondônia, and may eventually hybridise in areas of secondary contact resulting in introgression of venom genes.

Perspectives
• Tracing the natural history of snakes demands a detailed picture of structure-function correlations of the toxins that make up their venoms. Toxin mass profiling across evolutionary clades complements molecular approaches to integrate and rationalise omics data in the context of an evolutionary and ecological hypothesis. • Venoms act as integrated adaptive phenotypes aiding snakes to manage their hunting success while minimising risks posed by predators and dangerous prey, and inherent physical threats in its habitat. Rationalisation in the context of the natural history of the snake lineage of the landscape of venom variability within and between extant conspecific populations, is of paramount importance to understand the molecular basis of local adaptations.
• The depth to which a proteome can be analysed is directly related to the completeness of the database against which the proteomic data are mapped. The emerging field of comparative snake genomics promises to uncover a treasure trove of biological information and give venomics a big boost in the reconstruction of the evolutionary scenario of venom evolvability-aided local adaptations and speciacion of snakes.