Jen E.
Werner
and
Jennifer A.
Swift
*
Georgetown University, Department of Chemistry, Washington, DC 20057-1227, USA. E-mail: jas2@georgetown.edu
First published on 12th January 2021
Data informatics approaches were applied to the Cambridge Structural Database (CSD) in an effort to discern fundamental trends related to the preparation, occurrence, and general properties of organic solvates. Foremost, the 50 most abundant solvate classes in the CSD were identified through SMILES string matching implemented through CSD Python API, and their relative occurrence rates were compared against data reported 20 years prior. These two sets of data suggest that solvate preparation methods have become less diverse over that time period with an increasing fraction derived from a smaller subset of solvents, though the relative abundance of hetero-solvates containing more than one type of solvent molecule simultaneously increased. A subsequent SMILES string matching facilitated the identification of ∼2700 pairs of solvate and solvent-free structures from the top 10 solvate classes. Data from the two related groups showed statistical differences in both the lattice symmetries and packing fractions. Solvates exhibited an inherent bias favoring triclinic lattice symmetry, which is likely related to the larger number of unique molecular components in the asymmetric unit. More surprising was the fact that solvates that do not exhibit disorder statistically had lower packing fractions than their solvent-free analogues. While solvate formation may in fact be a means to achieve phases with higher packing efficiency for some organic molecules, the data indicate this is not a general trend.
The role(s) solvent plays in the formation and stability11 of a solvate can be difficult to pinpoint. In general, strong solute–solvent interactions are thought to play a significant role in predicting solvate formation12 and these same interactions likely provide some degree of lattice stability that may not be possible in a solvent-free form. Relatedly, it has been shown that as the total polar surface of a molecule increases so does the frequency of hydrate formation.13,14 Even in the absence of strong solute–solvent interactions, solvate formation has been considered a means to facilitate more efficient space filling, allowing for the generation of more dense phases relative to solvent-free alternatives. This is often used to rationalize why some compounds are prolific solvate formers (e.g. gossypol,15 sulfathiazole,16 olanzapine17 axitinib18 and galunisertib19). Even though the number of pharmaceuticals marketed as solvates (other than hydrates) is relatively small20,21 owing to strict safety22 and stability requirements, solvates can play an important role in the development process when regarded as precursor phases that can be intentionally desolvated to yield novel solvent-free polymorphs.23–29 Such process induced transformations become especially relevant when crystal structure prediction methods30,31 indicate that the lowest energy polymorph has not yet been experimentally realized.
An increasingly popular approach to gain insights into solvate formation and properties is through the statistical analysis of large data sets.32–38 Though the occurrence rate of solvates (and/or other multicomponent crystals) in the CSD and other industry compilations may differ slightly,39,40 the CSD remains the largest and most widely accessible source of crystallographic data. The last comprehensive CSD survey of organic solvates was performed by Görbitz and Hersleth in 2000 (October 1998 release)5 though the number of database entries has grown considerably in the past two decades. Herein we provide an updated analysis of organic solvates in the CSD using a structure search method based on simplified molecular input line entry string (SMILES)41,42 matching which was implemented through the CSD Python application programming interface (API).43 A similar approach was previously used to analyze hydrates in the CSD.44 Here, this SMILES string matching method facilitates an updated analysis of the occurrence frequencies of the 50 most common solvents (beyond water), and enables statistical comparisons to be made between solvated and solvent-free forms for the ten most common solvate types.
A second verification step was applied to confirm that each refcode corresponded to a unique polymorph. Duplicate structures were identified with a two-step approach using (1) the “Crystal Packing Similarity” tool in Mercury,45 and (2) a comparison of unit cell parameters. The Similarity tool compares the ratio of overlapping molecules in two structures for a given packing shell size. A ratio of one means the two structures are identical, with any ratio less than one indicating the two structures are different. However, in the case of solvates, a significant fraction exhibit disorder despite being topologically identical. Analysis with the Similarity tool for these structures is ineffective. Since the disordered entities are no longer a fixed representation of the molecule in the crystal lattice, the Similarity tool will either return a ratio of less than one or fail to converge. Therefore, any refcode pairs with the aforementioned outcomes were subjected to a secondary comparison of the unit cell lengths and angles. Any pair of structures that differed by <1.5% of the largest reduced cell length were treated as identical. When duplicate entries of the same polymorph were identified, the first was retained in the working data set and others removed.
The final curation step was a preemptive validation of the SMILES string associated with each unique structure. In our previous analysis of hydrates,44 we found that while the vast majority of CSD entries have an entry SMILES string which correctly indicates the component string for each molecule in the crystal, a small fraction of structures had either an incomplete SMILES string or a SMILES string of “none” (∼1.6%). All refcodes with SMILES strings were checked for completeness using a text search that verified each solvent molecule in the chemical formula was represented in the entry SMILES string. When an incomplete entry SMILES string was identified, the missing water or organic solvent molecule's component SMILES string was added from a dictionary that linked each solvent to its corresponding formula and compound name(s). The corrected SMILES strings were then used in all subsequent steps. Structures with a SMILES string of “none” were not included. Application of these steps resulted in a final data set consisting of 325104 unique organic structures.
Fig. 1 Flow-chart illustrating steps to identify and sort organic structures in the CSD into five different categories: Solvates, hetero-solvates, solvate–hydrates, hydrates, and solvent-free forms. |
2020 rank | All solvates (% of total) | Solvate–hydrates (% of that solvent) | Hetero-solvates (% of that solvent) | 2000 rank5 | ||||
---|---|---|---|---|---|---|---|---|
a Hetero-solvates are counted in each solvent category, making the total listed for each category less than the sum of the numbers in each respective column. | ||||||||
Methanol | 1 | 5007 | (15.79%) | 924 | (18.5%) | 372 | (7.4%) | 1 |
Dichloromethane | 2 | 4349 | (13.71%) | 298 | (6.9%) | 406 | (9.3%) | 2 |
Chloroform | 3 | 4142 | (13.06%) | 301 | (7.3%) | 392 | (9.5%) | 5 |
Acetonitrile | 4 | 2834 | (8.94%) | 415 | (14.6%) | 226 | (8.0%) | 6 |
Ethanol | 5 | 1984 | (6.26%) | 392 | (19.8%) | 134 | (6.8%) | 4 |
Dimethyl sulfoxide | 6 | 1738 | (5.48%) | 205 | (11.8%) | 79 | (4.5%) | 13 |
Acetone | 7 | 1616 | (5.10%) | 221 | (13.7%) | 99 | (6.1%) | 7 |
N,N-Dimethylformamide | 8 | 1384 | (4.36%) | 178 | (12.9%) | 57 | (4.1%) | 14 |
Benzene | 9 | 1346 | (4.24%) | 63 | (4.7%) | 106 | (7.9%) | 3 |
Toluene | 10 | 1171 | (3.69%) | 58 | (5.0%) | 97 | (8.3%) | 8 |
Ethyl acetate | 11 | 1034 | (3.26%) | 101 | (9.8%) | 65 | (6.3%) | 10 |
Tetrahydrofuran | 12 | 927 | (2.92%) | 57 | (6.1%) | 77 | (8.3%) | 9 |
n-Hexane | 13 | 799 | (2.52%) | 41 | (5.1%) | 207 | (25.9%) | 15 |
Diethyl ether | 14 | 668 | (2.11%) | 54 | (8.1%) | 112 | (16.8%) | 11 |
Dioxane | 15 | 627 | (1.98%) | 71 | (11.3%) | 32 | (5.1%) | 12 |
2-Propanol | 16 | 381 | (1.20%) | 61 | (16.0%) | 27 | (7.1%) | 18 |
Acetic acid | 17 | 341 | (1.08%) | 51 | (15.0%) | 7 | (2.1%) | 16 |
Pyridine | 18 | 328 | (1.03%) | 27 | (8.2%) | 15 | (4.6%) | 17 |
1,2-Dichloroethane | 19 | 262 | (0.83%) | 23 | (8.8%) | 26 | (9.9%) | 25 |
Cyclohexane | 20 | 241 | (0.76%) | 9 | (3.7%) | 36 | (14.9%) | 19 |
n-Pentane | 21 | 237 | (0.75%) | 18 | (7.6%) | 64 | (27.0%) | 26 |
p-Xylene | 22 | 233 | (0.73%) | 10 | (4.3%) | 18 | (7.7%) | 20 |
Carbon disulfide | 23 | 208 | (0.66%) | 7 | (3.4%) | 34 | (16.3%) | 22 |
Chlorobenzene | 24 | 160 | (0.50%) | 9 | (5.6%) | 16 | (10.0%) | 29 |
Nitromethane | 25 | 145 | (0.46%) | 18 | (12.4%) | 8 | (5.5%) | 23/24 |
N,N-Dimethylacetamide | 26 | 131 | (0.41%) | 12 | (9.2%) | 8 | (6.1%) | 35/36 |
1-Propanol | 27 | 108 | (0.34%) | 21 | (19.4%) | 11 | (10.2%) | 27 |
1,2-Dichlorobenzene | 28 | 104 | (0.33%) | 2 | (1.9%) | 16 | (15.4%) | 35/36 |
Tetrachloromethane | 29 | 95 | (0.30%) | 5 | (5.3%) | 7 | (7.4%) | 21 |
n-Heptane | 30 | 87 | (0.27%) | 12 | (13.8%) | 30 | (34.5%) | 37–40 |
Nitrobenzene | 31 | 79 | (0.25%) | 10 | (12.7%) | 8 | (10.1%) | 23/24 |
n-Butanol | 32 | 75 | (0.24%) | 13 | (17.3%) | 1 | (1.3%) | 31–33 |
Formic acid | 33 | 72 | (0.23%) | 12 | (16.7%) | 4 | (5.6%) | 37–40 |
o-Xylene | 34 | 66 | (0.21%) | 1 | (1.5%) | 4 | (6.1%) | 31–33 |
m-Xylene | 35 | 64 | (0.20%) | 1 | (1.6%) | 12 | (18.8%) | 30 |
t-Butanol | 36/37 | 41 | (0.13%) | 4 | (9.8%) | 2 | (4.9%) | 44–46 |
Ethylene glycol | 36/37 | 41 | (0.13%) | 9 | (22.0%) | 1 | (2.4%) | 42–43 |
2-Butanol | 38 | 36 | (0.11%) | 9 | (25.0%) | 1 | (2.8%) | 44–46 |
1,2-Dimethoxyethane | 39 | 32 | (0.10%) | 6 | (18.8%) | 1 | (3.1%) | 28 |
2-Butanone | 40 | 31 | (0.10%) | 1 | (3.2%) | 5 | (16.1%) | 31–33 |
Benzonitrile | 41 | 29 | (0.09%) | 2 | (6.9%) | 1 | (3.4%) | 42–43 |
Propionic acid | 42 | 27 | (0.09%) | 1 | (3.7%) | 0 | (0%) | 47–49 |
Cyclohexanone | 43 | 26 | (0.08%) | 5 | (19.2%) | 1 | (3.8%) | 37–40 |
Bromobenzene | 44 | 24 | (0.08%) | 2 | (8.3%) | 0 | (0%) | 37–40 |
Dibromomethane | 45 | 16 | (0.05%) | 0 | (0%) | 0 | (0%) | 50 |
Acetophenone | 46–48 | 13 | (0.04%) | 0 | (0%) | 0 | (0%) | 41 |
Diethyl ketone | 46–48 | 13 | (0.04%) | 0 | (0%) | 1 | (7.7%) | 47–49 |
Ethylenediamine | 46–48 | 13 | (0.04%) | 1 | (7.7%) | 2 | (15.4%) | 34 |
1,1,2-Trichloroethane | 49 | 10 | (0.03%) | 0 | (0%) | 0 | (0%) | 44–46 |
Acetylacetone | 50 | 4 | (0.01%) | 1 | (25.0%) | 0 | (0%) | 47–49 |
Total | 31710 | 3433 | 1380 |
Overall, the total number of solvate structures in the CSD increased by a factor of ∼6, from 5366 in 2000 to 31710 in 2020, though the increases were unevenly distributed across the different solvent types. The top ten solvate types in 2020 were: (1) methanol, (2) dichloromethane, (3) chloroform, (4) acetonitrile, (5) ethanol, (6) dimethyl sulfoxide (DMSO), (7) acetone, (8) N,N-dimethylformamide (DMF), (9) benzene, and (10) toluene. These top 10 accounted for just over 80% of all solvates in 2020, up from ∼69% in 2000. DMSO and DMF are new to the top 10 list, while ethyl acetate and tetrahydrofuran fell from their former top ten rank in 2000. Expanding to the top 15 solvates, the 2020 and 2000 lists are identical. The top 15 accounted for ∼89% of all solvates in 2020, also up from ∼85% in 2000. This suggests the typical range of solvents used in the preparation of organic crystals is less diverse than two decades ago.
While the number of CSD entries for solvates in general increased by ∼6 times in two decades, the rise of DMSO and DMF to the top 10 list reflects an increase of more than twice that, with relative increases of ∼13.5 times and ∼12.1 times, respectively. Their entry into the top 10 may in part reflect an increase in the solvents use in other expanding scientific fields.46,47 Other solvate types which grew at a much faster rate than average include N,N-dimethylacetamide, chloroform and chlorobenzene with the number of entries increasing by a factor of ∼10. In contrast, benzene solvates showed a notable drop in relative rank from #3 to #9, an increase of only ∼3 times in the number of reported structures over this same 20 year time period. This may in part be related to a greater awareness of the solvent's toxicological properties.48,49
Interestingly, while the data suggests a decrease in the diversity of solvents used over the past two decades, the proportion of solvates with more than one type of solvent molecule increased. Compared to all solvates which increased by ∼6 times, solvate–hydrate and hetero-solvate entries increased by ∼7.4 times and ∼10.5 times, respectively. Highly polar solvents (e.g. ethanol, methanol, acetonitrile) were not surprisingly the most frequently encountered in the solvate–hydrates. Trends in top 15 solvate–hydrates generally parallel the total solvates statistics, even though they account for only 10.8% of the total.
On the other hand, hetero-solvates draw from a more diverse combination of solvents. The most common pairs of solvents encountered were methanol–chloroform, methanol–dichloromethane, and dichloromethane–hexane, with 110, 90, and 58 reported structures of each type, respectively. The first two solvent pairs were also among the most common heterosolvates in 2000. In particular, n-alkanes (e.g. hexane, pentane, heptane) appear to be more likely to crystallize as hetero-solvates than all others in the top 50 list, as evidenced by the significantly higher percentage of n-alkane solvates (26.5–47.1%) that contain multiple solvents. In these n-alkane heterosolvates, dichloromethane and chloroform are the most common second component.
# Solvent molecules | Hydrates44 | Methanol | Dichloromethane | Chloroform | Acetonitrile | Ethanol | DMSO | Acetone | DMF | Benzene | Toluene |
---|---|---|---|---|---|---|---|---|---|---|---|
Integral number of solvent molecules in composition | |||||||||||
1 | 10977 (46.3%) | 2895 (57.8%) | 2183 (50.2%) | 2013 (48.6%) | 1370 (48.3%) | 1203 (60.6%) | 937 (53.9%) | 850 (52.6%) | 782 (56.5%) | 536 (39.8%) | 515 (44.0%) |
2 | 3893 (16.4%) | 713 (14.2%) | 600 (13.8%) | 796 (19.2%) | 491 (17.3%) | 222 (11.2%) | 427 (24.6%) | 255 (15.8%) | 341 (24.6%) | 148 (11.0%) | 159 (13.6%) |
3 | 1092 (4.6%) | 132 (2.6%) | 119 (2.7%) | 181 (4.4%) | 155 (5.5%) | 44 (2.2%) | 72 (4.1%) | 56 (3.5%) | 43 (3.1%) | 54 (4.0%) | 44 (3.8%) |
4 | 759 (3.1%) | 108 (2.2%) | 55 (1.3%) | 143 (3.5%) | 128 (4.5%) | 30 (1.5%) | 64 (3.7%) | 32 (2.0%) | 51 (3.7%) | 24 (1.8%) | 21 (1.8%) |
5 | 270 (1.1%) | 20 (0.4%) | 12 (0.3%) | 37 (0.9%) | 36 (1.3%) | 6 (0.3%) | 20 (1.2%) | 9 (0.6%) | 7 (0.5%) | 15 (1.1%) | 3 (0.3%) |
6 | 273 (1.2%) | 29 (0.6%) | 11 (0.3%) | 34 (0.8%) | 44 (1.6%) | 5 (0.3%) | 13 (0.7%) | 8 (0.5%) | 23 (1.7%) | 7 (0.5%) | 7 (0.6%) |
7 | 113 (0.5%) | 10 (0.2%) | 5 (0.1%) | 10 (0.2%) | 19 (0.7%) | 0 (0%) | 9 (0.5%) | 4 (0.2%) | 5 (0.4%) | 3 (0.2%) | 3 (0.3%) |
8 | 145 (0.6%) | 9 (0.2%) | 5 (0.1%) | 9 (0.2%) | 10 (0.4%) | 1 (0.05%) | 15 (0.9%) | 3 (0.2%) | 5 (0.4%) | 1 (0.07%) | 1 (0.09%) |
9 | 68 (0.3%) | 3 (0.06%) | 0 (0%) | 2 (0.05%) | 2 (0.07%) | 0 (0%) | 4 (0.2%) | 1 (0.06%) | 1 (0.07%) | 2 (0.1%) | 0 (0%) |
10 | 81 (0.3%) | 0 (0%) | 3 (0.07%) | 8 (0.2%) | 8 (0.3%) | 0 (0%) | 8 (0.5%) | 1 (0.06%) | 1 (0.07%) | 1 (0.07%) | 0 (0%) |
>10 | 467 (1.9%) | 7 (0.1%) | 10 (0.2%) | 7 (0.2%) | 10 (0.4%) | 3 (0.2%) | 21 (1.2%) | 1 (0.06%) | 1 (0.07%) | 4 (0.3%) | 1 (0.09%) |
Non-integral number of solvent molecules in composition | |||||||||||
0.5 | 2414 (10.2%) | 514 (10.3%) | 674 (15.5%) | 330 (8.0%) | 222 (7.8%) | 270 (13.6%) | 56 (3.2%) | 220 (13.6%) | 70 (5.1%) | 375 (27.9%) | 225 (19.2%) |
<1 | 1242 (5.2%) | 263 (5.3%) | 391 (9.0%) | 244 (5.9%) | 121 (4.3%) | 127 (6.4%) | 26 (1.5%) | 84 (5.2%) | 21 (1.5%) | 84 (6.2%) | 73 (6.2%) |
>1 | 1836 (7.7%) | 260 (5.2%) | 267 (6.1%) | 302 (7.3%) | 213 (7.5%) | 63 (3.2%) | 60 (3.5%) | 81 (5.0%) | 32 (2.3%) | 89 (6.6%) | 104 (8.9%) |
Not specified | 68 (0.3%) | 44 (0.9%) | 14 (0.3%) | 26 (0.6%) | 5 (0.2%) | 10 (0.5%) | 6 (0.3%) | 11 (0.7%) | 1 (0.07%) | 3 (0.2%) | 15 (1.3%) |
TOTAL | 23698 | 5007 | 4349 | 4142 | 2834 | 1984 | 1738 | 1616 | 1384 | 1346 | 1171 |
In our previous analysis of hydrates44 we found a strong bias in favor of structures with an integral number of water molecules, with mono- and di-hydrates collectively accounting for 62.7% of all hydrates. A similar bias favoring integral solvent stoichiometries was observed for solvates, however the magnitude of that bias and the general diversity in compositions varied across the solvate types. DMSO and DMF solvates were the most likely to crystallize in ratios of 1 or 2 solvents per host (78.5 and 81.1%). DMSO and DMF solvates were also far less likely than others to have sub-stoichiometric solvent content. To the extent that non-integral solvent compositions could result from partial desolvation of the lattice prior to structure determination, it may be worth noting that these two solvents have significantly higher boiling points than all others in the top 10. Desolvation seems an unlikely explanation for the particularly low occurrence rates for hemi-solvates compared to other solvate types. In contrast, the fraction of benzene and toluene solvates with 1:1 and 1:2 host:solvent compositions (50.8 and 57.6%) was significantly lower than all other solvates or hydrates. At least in part this appears to be due to the much higher occurrence rate of 2:1 hemi-solvate compositions for these aromatic solvents. p-Xylene (rank #22) showed a similarly high occurrence rate for hemi-solvate formation.
Across the different solvate types, the fraction of solvate–hydrates varies although trends largely reflect what might be expected based on a given solvent's miscibility with water. Solvates of alcohols ethanol (19.8%) and methanol (18.5%), followed by acetonitrile (14.6%) and acetone (13.7%) had the largest fraction of solvate–hydrates in the top 10. On the other end of the spectrum, the aromatic solvents benzene (4.6%) and toluene (5.0%) and halogenated solvents dichloromethane (6.8%) and chloroform (7.3%) had the lowest occurrence of solvate–hydrates. Restricting the stoichiometric analysis to only water-free solvates yielded minor changes to the overall statistics, but in general resulted in a modest increase in the fraction of structures with 1:1 and 1:2 host:solvent compositions.
One of the most noticeable differences between hydrates and other solvates appears to be their ability to access higher stoichiometric ratios, which we define as having 4 or more integral solvent molecules in the composition. In our previous hydrate analysis, we found a natural decrease in the number of structures as the number of water molecules increased, though hydrates with 4 or more water molecules still constituted ∼9.2% of the total. When the same analysis was applied to water-free solvates, only acetonitrile (8.8%) and DMSO (8.1%) showed a similar proclivity for adopting compositions with 4 or more solvent molecules. For all other solvates in the top 10, there was a marked decrease in the number of higher solvates, with ethanol (2.2%), dichloromethane (2.2%) and methanol (3.0%) and toluene (3.0%) among the least likely to crystallize with high solvent stoichiometric ratios.
Due to the inability of SMILES strings to distinguish between stereoisomers, we know this initial list overestimates the number of actual pairs. For example, SMILES string matching will incorrectly match the solvate of a chiral molecule with a solvent-free structure of its enantiomer or the racemate. To identify and remove the false pairs, a sorting method was applied using Python API to identify chiral molecules. An automated search for chiral centers in each non-solvent molecule within the asymmetric unit was carried out. For any structure where chiral centers were identified, the space groups were compared. Chiral molecules crystallizing in centrosymmetric space groups were assumed to be racemic mixtures, whereas chiral molecules in Sohncke space groups where there is one host molecule in the asymmetric unit (ASU) must be homochiral due to a lack of inversion and mirror symmetry.50 As long as both entries crystallized in one of these space group categories, and if it was a Sohncke group there was one chiral molecule in the ASU of each whose chiral centers matched, they were considered a pair. For any pair of structures where one or both entries had either a non-Sohncke or non-centrosymmetric space group, the molecular contents of the unit cell were manually compared. The same was true if the entries had Sohncke space groups and the chiral molecule appeared more than once in the ASU.
All structures found to have no reported chiral centers had to be investigated manually. For achiral compounds, a quick comparison of the ASU was sufficient. Molecules with asymmetric carbons that eluded the chiral center search or possessing axial chirality went through the same space group assessment to determine whether the unit cells needed to be compared and to what degree. These additional screening steps compensate for all of the stereochemical limitations of the SMILEs string-matching search and led to 2719 “Solvate and Solvent-free Pairs”. The pairs reflect ∼8.6% of the total number of water-free solvates in the top 10 solvate classes (for comparison purposes, ∼6.2% of hydrates had known anhydrate forms). Table 3 shows the breakdown of the number of pairs for each solvate class in the top 10.
Solvent | Total | No disorder | Same temp |
---|---|---|---|
Methanol | 448 | 289 (64.5%) | 146 (32.6%) |
Dichloromethane | 272 | 145 (53.3%) | 56 (20.6%) |
Chloroform | 280 | 143 (51.1%) | 58 (20.7%) |
Acetonitrile | 288 | 166 (57.6%) | 83 (28.8%) |
Ethanol | 194 | 115 (59.3%) | 56 (28.9%) |
DMSO | 344 | 191 (55.5%) | 81 (23.5%) |
Acetone | 236 | 160 (67.8%) | 83 (35.2%) |
DMF | 267 | 187 (70.0%) | 96 (36.0%) |
Benzene | 246 | 160 (65.0%) | 78 (31.7%) |
Toluene | 144 | 42 (29.2%) | 24 (16.7%) |
Total | 2719 | 1598 | 761 |
Triclinic | Monoclinic | Orthorhombic | Tetragonal | Trigonal | Hexagonal | Cubic | |
---|---|---|---|---|---|---|---|
ORGANIC DATA SET | 22.5% | 53.2% | 21.7% | 1.2% | 1.0% | 0.3% | 0.1% |
SOLVATES (all) | 35.6% | 45.5% | 14.5% | 1.7% | 2.1% | 0.6% | 0.1% |
Hetero-solvates (all) | 42.4% | 41.1% | 10.5% | 2.2% | 3.0% | 0.7% | 0.1% |
TOP 10 PAIRS | 34.6% | 45.4% | 13.4% | 1.8% | 3.7% | 0.9% | 0.1% |
1 non-solvent molecule | 34.1% | 46.2% | 12.9% | 1.8% | 4.0% | 1.0% | 0% |
2+ non-solvent molecules | 37.9% | 40.7% | 16.5% | 2.0% | 2.4% | 0% | 0.4% |
1. Methanol (all) | 29.2% | 46.3% | 20.6% | 1.5% | 1.8% | 0.6% | 0.1% |
(Pair): solvate | 32.4% | 42.0% | 15.4% | 1.3% | 8.5% | 0.4% | 0% |
: Solvent-free | 21.1% | 51.6% | 17.2% | 0.7% | 8.3% | 0.4% | 0.7% |
2. Dichloromethane (all) | 34.8% | 46.5% | 14.4% | 1.6% | 2.0% | 0.5% | 0.2% |
(Pair): solvate | 36.0% | 43.0% | 15.1% | 0.7% | 3.7% | 1.5% | 0% |
: Solvent-free | 24.3% | 49.6% | 18.8% | 2.6% | 2.9% | 1.5% | 0.4% |
3. Chloroform (all) | 37.9% | 44.0% | 13.1% | 1.7% | 2.5% | 0.5% | 0.3% |
(Pair): solvate | 32.9% | 43.2% | 16.8% | 2.1% | 3.2% | 1.8% | 0% |
: Solvent-free | 23.6% | 57.5% | 14.3% | 1.8% | 2.5% | 0.4% | 0% |
4. Acetonitrile (all) | 38.4% | 44.7% | 12.7% | 1.9% | 1.7% | 0.5% | 0.1% |
(Pair): solvate | 34.0% | 44.4% | 11.8% | 1.0% | 7.6% | 0.7% | 0.3% |
: Solvent-free | 23.6% | 45.5% | 16.0% | 1.0% | 12.5% | 0.7% | 0.7% |
5. Ethanol (all) | 33.8% | 46.1% | 16.4% | 1.5% | 1.4% | 0.7% | 0.1% |
(Pair): solvate | 41.2% | 40.7% | 14.4% | 0.5% | 3.1% | 0% | 0% |
: Solvent-free | 30.9% | 47.9% | 19.1% | 0% | 0% | 0% | 0% |
6. DMSO (all) | 40.7% | 46.8% | 10.1% | 0.8% | 1.3% | 0.3% | 0% |
(Pair): solvate | 57.0% | 26.6% | 10.1% | 1.3% | 5.1% | 0% | 0% |
: Solvent-free | 37.5% | 49.7% | 9.6% | 2.0% | 0% | 1.2% | 0% |
7. Acetone (all) | 31.8% | 47.5% | 16.5% | 1.7% | 1.7% | 0.8% | 0% |
(Pair): solvate | 32.6% | 46.2% | 15.7% | 2.1% | 3.0% | 0.4% | 0% |
: Solvent-free | 26.7% | 46.2% | 16.9% | 2.1% | 5.1% | 2.1% | 0.8% |
8. DMF (all) | 46.2% | 42.3% | 9.7% | 0.7% | 0.8% | 0.3% | 0% |
(Pair): solvate | 39.2% | 44.7% | 6.5% | 3.7% | 10.6% | 1.6% | 0% |
: Solvent-free | 17.9% | 56.1% | 19.5% | 3.3% | 2.0% | 0.4% | 0.8% |
9. Benzene (all) | 40.0% | 43.9% | 10.3% | 1.8% | 3.3% | 0.6% | 0.1% |
(Pair): solvate | 44.9% | 46.4% | 6.7% | 0.7% | 1.1% | 0% | 0% |
: Solvent-free | 33.0% | 47.2% | 17.2% | 0.7% | 0.7% | 1.1% | 0% |
10. Toluene (all) | 44.3% | 40.3% | 10.6% | 2.0% | 2.2% | 0.5% | 0% |
(Pair): solvate | 36.1% | 40.3% | 9.0% | 9.0% | 4.9% | 0.7% | 0% |
: Solvent-free | 25.0% | 56.2% | 16.0% | 0.7% | 2.1% | 0% | 0% |
In our previous analysis of hydrate–anhydrate pairs,44 the distribution of hydrates across the different lattice types was essentially the same as that of all structures in the working data set. Yet as the number of unique molecules in the lattice increased, the fraction of structures with lower triclinic symmetry also appeared to increase. This was evident from comparisons of hydrate–anhydrate pairs, in which the former by necessity have a higher number of molecules. A trend toward reduced symmetry was also evident when hydrates (with or without anhydrate pairs) were sorted into two categories – those with 1 and 2+ organic components, and the latter were shown to have an even stronger bias toward triclinic lattices.
When a similar analysis was performed on the 2719 solvates and solvent-free structure pairs, the trends were less clear. Whether pairs were considered in the aggregate or treated as separate solvate classes, the fraction of solvates with triclinic structures was consistently higher than the fraction in the solvent-free group. This is consistent with the notion that as the number of unique molecules in the lattice increases, there is a trend toward lower symmetry. However, when the 2719 solvate pairs were sorted into groups with either 1 or 2+ non-solvent molecules, a much more modest change in the distribution across lattice types was observed. In 5 solvate classes (methanol, ethanol, acetone, benzene, and toluene) comparison of structures with 1 to 2+ non-solvent molecules revealed the latter had a higher proportion of triclinic lattices, a trend that paralleled what was seen in the hydrates. In the other 5 solvate classes (dichloromethane, chloroform, acetonitrile, DMSO and DMF), structures with 2+ non-solvent molecules were actually less likely to be triclinic than those with 1 non-solvent molecule. Based on this data, it seems there may be more subtle factors which affect lattice symmetry comparisons across the different solvate classes.
While analysis of compounds which form both solvated and solvent-free forms is an effective means to eliminate some biases, it assumes that the subset of solvates considered is representative of the class of solvates as a whole. In analyzing lattice symmetry specifically, we note that the distribution of lattices across all structures in a given solvate class and the subset with a solvent-free form sometimes differ. This is perhaps most notable in the DMSO and toluene solvates, where the fraction of triclinic structures differs substantially depending on whether all structures in class or only the subset with known solvent-free forms are considered. Similarly, a disproportionately large fraction of methanol, acetonitrile, and DMF solvates with known solvent-free forms have trigonal lattices relative to all solvates in that class.
In order to avoid differences due to thermal expansion effects, we further limited our analysis to structure pairs that were determined from data collected at the same temperatures. After this step, 761 pairs remained. The hydrogen atom positions in each structure were normalized, and the packing fraction (PF) was calculated using the packing coefficient algorithm in Mercury. Comparison of the PF of each pair of solvate and solvent-free structures showed that in the vast majority of cases (84%) the difference was 5.0% or less. Each pair was sorted according to the magnitude of the difference between the solvate and solvent-free forms with binning in 0.5% increments. Fig. 3 plots the number of times each solvate (red) or solvent-free (black) structure within a pair has a higher packing fraction. The data suggest that solvates are statistically more likely to have the lower PF, and that the bias appears to be independent of the magnitude of the difference between structure pairs. However, we do not know if this trend would still be apparent if the disordered structures excluded from the analysis had also been considered.
That solvates without disorder statistically have lower packing fractions than their solvent-free forms was unexpected, since solvate formation is often rationalized as a means to achieve greater packing efficiency. While this is undoubtedly true in some cases, the data here indicate that statistically speaking that argument does not hold for all solvate classes. Notably, if the top 10 solvate classes are treated individually, the bias is apparent only in about half of the cases. Though the number of pairs in each individual solvate class is low, it is only in ethanol (72.1%), DMSO (69.6%), DMF (68.4%), chloroform (64.9%), acetone (62.9%), and benzene (60.0%) where statistically more solvates have lower packing fractions than their solvent-free forms. The other 5 solvate classes do not show meaningful biases in either direction.
Comparison of current CSD data against similar data from Görbitz and Hersleth point to some likely shifts over the past two decades in how solvates are generated. It appears that while the range of solvents commonly used in solvate formation has become less diverse, at the same time mixed solvent use has led to a disproportionate increase in reports of hetero-solvates. Relative changes in the growth of individual solvate classes clearly point toward the expanded use of DMSO and DMF, and significantly decreased use of benzene. Organic solvates were also found to adopt a much narrower range of solvent:host stoichiometries compared to organic hydrates.
Direct comparison of the ∼2700 pairs of solvates and the solvent-free forms from the top 10 solvate classes indicated differences in both the lattice symmetries and packing densities in the two groups. All solvates were found to have an inherent bias favoring triclinic lattice symmetry, a trend which is especially magnified in hetero-solvates. Relative increases in the fraction of structures with trigonal lattices were also observed in solvates with known solvent-free forms, though assigning significance to this relative increase may be premature given the number of structure pairs is low. More surprising to us was the fact that solvates without disorder in general, and some solvate classes in particular, showed a bias toward lower packing fractions than their solvent-free analogues. While solvate formation may be a means to achieve phases with higher packing efficiency for some organic molecules, the data indicated this was not an across the board trend.
The two overarching goals of this paper were (1) to assess whether the data indicates practitioner methods for solvate generation have changed over the past 20 years, and (2) to compare some simple metrics based on solvate and solvent-free pairs that might point to hidden structure trends. We hope that the utility of the general approach adopted here can inspire more advanced data mining efforts which address other fundamental questions pertinent to developing a more complete understanding of solvate formation. Toward that end, we stress to practitioners the importance of reporting detailed information on the specific growth conditions employed when new crystal structures are deposited in the CCDC.
This journal is © The Royal Society of Chemistry 2021 |