Nandini
Sarkar
a,
Joydeep
Mitra
b,
Molly
Vittengl
c,
Lexi
Berndt
a and
Christer B.
Aakeröy
*a
aDepartment of Chemistry, Kansas State University, 213 CBC Building, 1212 Mid Campus Dr North, Manhattan, KS 66506-0401, USA. E-mail: aakeroy@ksu.edu; Fax: +785 532 6666; Tel: +785 532 6096
bDepartment of Computer Science, Kansas State University, Manhattan, Kansas 66506, USA
cDepartment of Chemistry, Truman State University, Kirksville, Missouri 63501, USA
First published on 30th September 2020
An automated application, CoForm, was used for predicting the outcomes of attempted co-crystallizations between two active pharmaceutical ingredients, loratadine and desloratadine, and 41 potential co-formers from the general interest (OGI) list. The predictive abilities of the app were compared to structure-informatics tools based on hydrogen-bond propensity (HBP) and molecular complementarity (MC). The results indicate that CoForm delivered a success rate of 78% for both loratadine and desloratadine compared to 76% and 54%, respectively (HBP), and 39% and 22%, respectively (MC).
In the last two decades, co-crystallization technologies have emerged as an area of research involving high value organic crystalline solids. A pharmaceutical co-crystal is the result of a successful combination of an active pharmaceutical ingredient (API) and an appropriate molecular partner, the co-former. Unfortunately, finding molecules that can act as co-formers for a specific drug is generally based on combinatorial and extensive experimental co-crystal screens, which are time-consuming and expensive.9 One of the reasons why co-crystal synthesis has not yet transitioned into a widely utilized technology is partly due to challenges associated with finding molecules that are likely to form a new solid crystalline form with the API. Consequently, there is a need for cheaper, faster, and more reliable methods for predicting when a pair of molecules will form a co-crystal, and when they will not.
There are a handful of predictive methods for co-crystal formation in the literature. However, some of these methods are very complex and require in-depth knowledge of theoretical chemistry and quantum mechanical methods.10–13 Such methods also tend to be computationally expensive and less suitable for systematic screens. Other methods have employed combinations of data mining and structure-informatics, taking advantage of over a million crystal structures of small molecules in the Cambridge Structural Database (CSD).14–16 Thanks to the presence of reliable and properly curated data in the CSD, various structure-informatics methods such as hydrogen-bond propensity,9,17 hydrogen-bond coordination,18 and molecular complementarity19 developed by the Cambridge Crystallographic Data Centre (CCDC) have been applied to co-crystal prediction. One inherent problem with building a predictive tool on existing crystal structures is that only positive co-crystallization results are included, failed co-crystallizations can by definition not be included in any training data set. With this in mind, access to a new approach for accurately predicting the outcome of co-crystallization reactions based on both positive and negative experimental results could be of interest to a broad spectrum of the organic solid-state community.
CoForm is based on a mathematical model that compares the number of hydrogen-bond donors and acceptors of the target of interest with the number of hydrogen-bond donors and acceptors of a set of known compounds. Each target is associated with a list of co-formers with which it forms co-crystals (positive partners), and a list of co-formers with which it does not form co-crystals (negative partners). See ESI† Fig. S1 for a detailed description of the algorithm. The database for the known compounds is based on the outcome (as determined using infrared spectroscopy) of approximately 2000 attempted co-crystallizations.20–23 The quality of the predictions using CoForm is dependent on the compounds present in the database, however, the app can be customized to work with databases that are directly tailored to the type of target compounds and co-formers that a prospective user is specifically interested in. The automated algorithm is very fast and accessible through an easy-to-use desktop application. Moreover, users with relatively limited technical knowledge will be able to use the app and interpret the results. The current version of CoForm is based on a database that comprises 41 co-formers that are of general interest (OGI) for pharmaceutical co-crystals and an additional 50 co-formers, which are conventionally used in co-crystallization experiments (see ESI† Table S1).
CoForm is built using the Groovy programming language.24 Groovy was chosen because it is platform-independent and, therefore, the app can be used on all three major operating systems, i.e., Windows, Linux, and Mac OSX. Moreover, Groovy is a scripting language that allows quick prototyping of software.
CoForm requires two inputs from the user:
1. Number of hydrogen-bond donors (donor: molecule or molecular fragment X–H in which X is an electronegative atom such as N, O, and F).
2. Number of hydrogen-bond acceptors (acceptor: an electronegative element such as N, and O).
The name of the target for which co-crystals need to be predicted can also be incorporated to facilitate usability. The target name is simply providing a label/tag for the search but does not have any scientific meaning.
CoForm ranks the co-formers as ‘highly likely’, ‘likely’, and ‘least likely’ to produce a co-crystal with a specific target. The output is in the form of tables that can be exported as .csv files. The most likely and least likely lists, as the names suggest, correspond to the co-formers with the highest and lowest probability of forming a co-crystal, respectively. The likely lists consist of co-formers which were found to form co-crystals with compounds in the database in some cases and did not form in other cases. Since the co-crystallization outcomes are binary, we assigned the likely list of co-formers a ‘YES’ to co-crystallization. Although this will generate some false positives, this is more preferable than predictions of false negatives in the context of co-crystallization screens.
CoForm is a data-driven predictive application based on experimental data from attempted co-crystallization experiments which include both successful and unsuccessful reactions. In contrast, other structure-informatics analytical tools such as hydrogen-bond propensity (HBP)25 and molecular complementarity (MC)19 rely exclusively on existing crystallographic data, but both methods can be used for predicting co-crystallization outcomes (see ESI† Table S2). A comparison of the prediction outcomes of CoForm, HBP and MC methods was carried out on the same two molecules, loratadine and desloratadine. The accuracy of each method was determined by calculating the success rate, which is the number of predictions that match the experimental results over the total number of predictions (see ESI† S3–S6 for experimental and predicted co-crystallization screening outcomes).
The three methods gave the following success rate for predicting the outcome of 41 attempted co-crystallizations of loratadine: CoForm, 78%, HBP, 76%, and MC 39%. A summary of the results is displayed in a confusion matrix, Fig. 2.
30 of the 41 attempted co-crystallizations with loratadine produced a positive result and CoForm, HBP, and MC predicted these correctly at a success rate of 78%, 86%, and 26%, respectively. For the 11 reactions that did not produce a co-crystal, the three methods correctly predicted this with a success rate of 72% (CoForm), 45% (HBP), and 63% (MC).
A similar analysis of the predictions for co-crystallizations on desloratadine (again, 41 reactions were attempted) is given in Fig. 3.
CoForm displays 89%, HBP 50%, and MC 16% prediction accuracy for successful co-crystallization outcomes, and for the failed attempts CoForm could not predict any of the five instances correctly, while HBP and MC both predicted 3/5 instances correctly.
Overall for both loratadine and desloratadine, CoForm produced higher success rates for the positive co-crystallization experiments. When comparing the ratio of successful to failed co-crystallization cases in our database, we found that there is a total of 1136 successful co-crystals and 649 failed co-crystals results. The positive outcomes account for 68% of the total number of attempted reactions which can help to explain why CoForm shows an imbalance for predicting positive versus negative outcomes. In Table 1, the overall success rates for the co-crystal predictions of loratadine and desloratadine are listed.
Compounds | Method | Success rate |
---|---|---|
Loratadine | CoForm | 32/41 = 78% |
HBP | 31/41 = 76% | |
MC | 16/41 = 39% | |
Desloratadine | CoForm | 32/41 = 78% |
HBP | 22/41 = 54% | |
MC | 9/41 = 22% |
We hope this tool will be further tested, refined, and utilized by users interested in the crystalline solid-state, especially in the context of improving physical properties.26 In addition, the app is a customizable tool and will produce the most reliable outcomes, if the unknown target is a close match (similar molecular weight, rotatable bonds, functional groups) to the known targets in the database. Therefore, having a user-specific database will undoubtedly increase the predictive abilities of the app. We believe that the customizability of CoForm can extend its usability to hydrogen-bonded solids across areas such as pharmaceutics, agrochemicals, and energetic materials.
Footnote |
† Electronic supplementary information (ESI) available: HBP, MC, and CoForm prediction table and FT-IR grinding experiment data table. See DOI: 10.1039/d0ce01074j |
This journal is © The Royal Society of Chemistry 2020 |