Design and virtual screening of donor and non-fullerene acceptor for organic solar cells using long short-term memory model

Long-Fei Lva, Cai-Rong Zhang*a, Rui Caoa, Xiao-Meng Liua, Mei-Ling Zhanga, Ji-Jun Gonga, Zi-Jiang Liub, You-Zhi Wuc and Hong-Shan Chend
aDepartment of Applied Physics, Lanzhou University of Technology, Lanzhou, Gansu 730050, China. E-mail: zhcrxy@lut.edu.cn
bSchool of Mathematics and Physics, Lanzhou Jiaotong University, Lanzhou 730070, China
cSchool of Materials Science and Engineering, Lanzhou University of Technology, Lanzhou, Gansu 730050, China
dCollege of Physics and Electronic Engineering, Northwest Normal University, Lanzhou, Gansu 730070, China

Received 5th July 2024 , Accepted 31st July 2024

First published on 2nd August 2024


Abstract

In organic solar cells (OSCs), electron donor–acceptor materials are key factors influencing device performance. However, traditional experimental methods for developing new, high-performance materials are often time-consuming, costly and inefficient. To accelerate the development of novel OSC donor–acceptor materials, we constructed a database of 547 donor–acceptor pairs and derived 30 easily obtainable molecular structure descriptors through transformation screening. Using the long short-term memory (LSTM) network model, belonging to deep learning, we tuned the LSTM model with grid search for optimal hyperparameters, and predicted the power conversion efficiency (PCE), open-circuit voltage, short-circuit current density and fill factor. The SHapley Additive exPlanations analysis revealed that the number of rotatable bonds and the presence of two or more rings in acceptor molecules positively impact PCE. We then systematically fragmented and recombined molecules in the constructed database, creating 142[thin space (1/6-em)]560 donor molecules and 61[thin space (1/6-em)]732 acceptor molecules. The tuned LSTM model predicted photovoltaic parameters for these new donor–acceptor pairs. After excluding the donor–acceptor pairs in the database, we identified 7632 novel pairs with a predicted PCE greater than 18.00%, including five pairs exceeding 18.50%, with the maximum PCE of 18.52%. This method facilitates the cost-effective design and rapid, accurate prediction of OSC material performance, enabling efficient screening of high-performance candidates.


1 Introduction

With the continuous increase in the global population and rapid economic development, the demand for energy is growing quickly. Traditional energy sources are not only limited in supply but also pose significant environmental problems, such as climate change and air pollution. Therefore, there is an urgent need to accelerate the research of clean and renewable energy to meet energy demands and reduce negative environmental impacts.1,2 Among the current research on clean and renewable energy, organic solar cells (OSCs) are considered an important avenue for future energy development due to their advantages of being clean, low-cost, and sustainable.3–6 In OSCs, the heterojunction formed by electron donor and acceptor materials is the key for achieving photoelectric conversion, making their selection and properties crucial to the performance of OSCs. Thus, the research focus in OSCs is the design and optimization of electronic donor and acceptor materials.7,8

Traditional fullerene materials, such as PC70BM, PC71BM, and C60, had achieved relatively high power conversion efficiencies (PCEs), making them the mainstream acceptor materials in the OSC field.9–13 However, the synthesis of fullerene materials is costly, and their electronic structure leads to poor light absorption in the UV-vis region. This limits their light harvesting efficiency and photovoltaic performance, thereby restricting the further development of fullerenes in OSCs. In contrast, non-fullerene acceptors (NFAs) exhibit broader absorption spectra and more easily tunable energy levels, as well as narrower optical band gaps and greater carrier mobility, which are beneficial for improving OSC performance.14–17 Therefore, OSCs using NFAs are considered to have a very promising application prospect.18–20

In recent years, OSCs developed rapidly, with significant improvements in PCE. The PCE of binary or ternary OSCs using NFAs reached 19%,21–24 and the PCE of tandem OSCs exceeded 20%.25 Layer-by-layer OSCs have experienced significant advancements in recent years.26–29 However, since the PCE of OSCs is still relatively low compared to that of currently commercialized silicon-based and perovskite solar cells, improving the PCE of OSCs remains the primary research goal.

Designing new donor and acceptor materials, particularly those with high PCE, using traditional experimental methods is very challenging. Due to the complexity of chemical composition, conventional methods are time-consuming and labor-intensive. Consequently, scientists have proposed using machine learning to accelerate molecular design.30–34 Researchers have utilized machine learning algorithms to analyze a series of performance data from OSCs and discovered that optimizing certain key descriptors can improve the accuracy of prediction models.35–37 Sahu et al. introduced methods for predicting the PCE of OSCs using machine learning and the improved descriptors.38–40 Han and Yi proposed the singlet–triplet energy gap (ΔEST) as a key molecular descriptor for predicting PCE, achieving a Pearson correlation coefficient (r) of 0.81 in their predictions.41 Saeki and Nagasawa screened conjugated molecules for polymer-fullerene OSC applications through supervised learning methods.42 Sun et al. used a database containing actual donor materials collected from the literature, and employed images, ASCII strings, two types of descriptors and seven molecular fingerprints as inputs for machine learning models to predict PCE.43 David et al. proposed a machine learning method for extracting data information from OSCs, utilizing a database composed of 1850 device characteristics, performance, and stability data, and employed the Sequential Minimal Optimization Regression (SMOreg) model to identify the factors that have the greatest impact on OSC stability and PCE.44 Min et al. applied machine learning analysis to find the optimal donor–acceptor pairs for OSCs. They predicted PCE using five machine learning models—Linear Regression (LR), Multiple Logistic Regression (MLR), Random Forest (RF), Artificial Neural Network (ANN), and Boosted Regression Tree (BRT)—on a dataset of 565 polymer donor non-fullerene acceptor OSC pairs, achieving an r of 0.71 and 0.70 for the BRT and RF models, respectively.45

In recent years, deep learning, as a branch of machine learning, has developed rapidly, achieving remarkable results in natural language processing, image recognition, handling complex data and fitting intricate functions.46–52 It has been applied in the field of OSCs as well. Peng and Zhao used convolutional neural networks (CNNs), widely applied in deep learning, to build a model that used molecular simplified molecular input line entry system (SMILES) strings as inputs to predict PCE53 and the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energy levels, and to generate new NFA molecules.54 They later used CNNs to build another model that used molecular graphs as inputs to predict the HOMO and LUMO energy levels of new molecules.55 Moore developed the quantitative structure–property relationship based on a deep learning model in the form of a CNN to predict the HOMO and LUMO energy levels of organic molecules usable in OSCs. The model used the SMILES strings of molecules as inputs, converted them into 2D RGB images, extracted features from the images using the network's convolutional layers, and then used a deep dense neural network to convert the features into energy levels.56

Long short-term memory (LSTM) networks, as one of the important methods in the field of deep learning, are a special type of recurrent neural network (RNN) designed to address the gradient vanishing and exploding problems that a standard RNN may encounter during learning.57 By introducing a “gate” mechanism (including an input gate, forget gate and output gate) and memory cells, LSTM can effectively control the input, retention and output of information. In previous studies, some descriptors used for predicting photovoltaic performance parameters through machine learning methods were too costly to compute for high-throughput screening, and the amount of data required for deep learning methods was excessively large. To address this issue, this research uses an LSTM-based deep learning prediction model with easily accessible molecular structure descriptors and a relatively small database. With the aid of the LSTM model, it is possible to simulate and evaluate material performance in a virtual environment, significantly reducing experimental costs and cycles, and accelerating the discovery and application of novel OSC materials.

In the process of constructing an LSTM-based deep learning prediction model to accelerate the discovery of novel OSC materials, relying solely on the predictive capability of the model is often insufficient. To make the model's decision-making process more transparent and to enhance its application value, the SHapley Additive exPlanations (SHAP) analysis method is employed to identify and interpret the importance of various structural descriptors within the model.58–63 SHAP is a model interpretation method developed based on the Shapley value from game theory. The Shapley value is a mathematical concept used to quantify each player's marginal contribution to the overall success in a cooperative game. In machine learning models, each feature (such as the structural descriptors of materials) can be viewed as a “player,” and the model's predictive outcome is analogous to the “overall success” of the game. By calculating the Shapley value for each feature, it is possible to quantify the contribution of that feature to the model's predictive outcome, thereby understanding its importance. The use of SHAP analysis not only provides interpretability for the LSTM model but, more importantly, reveals the impact of different structural descriptors on the performance of OSC materials. This is highly valuable for scientists and researchers. For instance, if the analysis shows that a particular molecular structural feature has a significantly positive impact on the PCE of OSCs, this feature can be prioritized in future material design and optimization, thereby more efficiently screening and discovering high-performance OSC materials.

To find high-performance novel OSC donor–acceptor materials, in this work, 547 completely different donor–acceptor pair molecular structures and the corresponding OSC performance parameters were collected. The collected molecular structures were converted into structural descriptors, which were screened and used as inputs for the LSTM model to predict OSC performance parameters. After tuning the hyperparameters, a model with good predictive performance was obtained, and the importance of the input descriptors was interpreted using the SHAP analysis method. Next, molecular design and virtual screening were conducted. The molecules in the database were systematically fragmented to create a fragment library. These fragments were then recombined to generate new OSC donor–acceptor materials. The tuned model was used to predict PCE, open-circuit voltage (VOC), short-circuit current density (JSC), and fill factor (FF) of these new OSC materials. Finally, high-performance novel OSC donor–acceptor materials were screened out.

2 Methods

2.1 Database

Compared to traditional fullerene acceptors, non-fullerene acceptors have more easily tunable molecular structures and energy levels. Through chemical synthesis, researchers can more precisely adjust the energy levels and other optoelectronic properties in molecular design, allowing non-fullerene acceptors to better match donor materials. In this study, molecular structures and the corresponding device performance parameters of 547 completely different donor–acceptor pairs in binary OSCs were collected from the searches of published papers on the Web of Science using keywords “TS = (organic solar cells)”. Each donor and acceptor molecular structure was converted into the corresponding SMILES strings using RDKit,64 an open-source cheminformatics and machine learning library primarily used for processing and analyzing chemical data. These SMILES strings were then converted into 102 structural descriptors using RDKit. The microstructure of the active layer can be further optimized by controlling processing and experimental conditions, such as morphology, film thickness, and thermal annealing, to improve the performance of OSCs.65–68 However, fewer relevant data were reported in the literature, making it more challenging to predict and screen designed molecules using machine learning methods while considering the experimental conditions and post-processing of the active layer. Though the dynamic process of active layer formation is important for optimizing OSC device performance, the donor and acceptor materials play the most fundamental and decisive role. Therefore, the choice of donor–acceptor materials dominates the performance of OSC devices, and process optimization is an important means to realize this potential. Structural descriptors were chosen for their ease of acquisition, enabling high-throughput virtual screening.

Not all molecule descriptors and the corresponding device performance parameters could be used to build the database, so data preprocessing was required. Since some generated structural descriptor values were zero, including them would significantly affect the predictive performance of the LSTM model. If zero values of a structural descriptor are more than 80% in the database, the corresponding structural descriptor was removed, resulting in 51 structural descriptors with 27 related to acceptors and 24 related to donors. Excessive redundant descriptors could introduce noise and affect the LSTM model's predictive performance. To minimize this impact, descriptors for donor and acceptor molecules were individually screened, and their correlations with PCE were analyzed using Pearson correlation coefficients (r). If several descriptors had r values greater than 0.9, only the descriptor with the highest r value related to PCE was retained. Totally, 30 structural descriptors were obtained, with 19 related to acceptors and 11 related to donors. The meanings of the final 30 descriptors are shown in Tables S1 and S2 in the ESI. After obtaining the structural descriptors, some molecular pairs had different structures but identical descriptor values due to molecular similarity. In such cases, only the data with the highest PCE were retained, resulting in a final database of 465 donor–acceptor pairs.

2.2 LSTM model

The LSTM model is a variant of the RNN. In traditional RNN models, issues like gradient vanishing and gradient exploding often occur. The LSTM model effectively addresses these problems by introducing a structure called the “memory cell”. The memory cell in the LSTM model consists of an input gate, a forget gate, an output gate, and a cell state. The “gate” is a structure that selectively allows information to pass through, composed of a sigmoid function and a pointwise multiplication operation. LSTM controls the addition or removal of information through “gates”, thereby achieving the functions of forgetting or remembering information. The output value of the sigmoid function lies in the range [0,1], where 0 represents complete discard and 1 represents complete retention. Besides the “gates”, the LSTM layer also has a crucial component called the cell state, which acts like a conveyor belt, transferring information from one cell to the next with minimal linear interactions with other parts. At each time step, the LSTM model determines the information to retain and forget, and the output for the current time step, based on the input and the memory state from the previous time step. Fig. 1 shows a schematic diagram of the LSTM layer operation process.
image file: d4ta04665j-f1.tif
Fig. 1 Schematic diagram of the LSTM layer operation process.

The forget gate ft is a sigmoid function with the previous cell's hidden output ht−1 and the current cell's input xt as inputs, generating a value between [0,1] (which can be considered a probability) for each item in the previous cell's memory state Ct−1 to control the extent of forgetting the previous cell's state, as shown in eqn (1):

 
ft = σ(Wf[ht−1, xt] + bf) (1)
where Wf represents the forget gate's weights, bf represents the forget gate's bias, and σ represents the sigmoid function. The input gate it works with a tanh function to control the new information being added. The tanh function generates a new candidate vector, and the input gate generates a value between [0,1] for each item in the temporary cell state image file: d4ta04665j-t1.tif controlling how much new information is added. With the forget gate's output ft, controlling the degree of forgetting the previous cell, and the input gate's output it, controlling how much new information is added, the cell state is updated as shown in eqn (2) and (3):
 
it = σ(Wi[ht−1, xt] + bi) (2)
 
image file: d4ta04665j-t2.tif(3)
where Wi represents the input gate's weights, bi represents the input gate's bias, WC represents the cell state's weights, bC represents the cell state's bias, and σ represents the sigmoid function. Before the output gate, the current cell state Ct is calculated as shown in eqn (4):
 
image file: d4ta04665j-t3.tif(4)

The output gate ot controls how much of the current cell state is filtered. The cell state is activated, and the output gate generates a value between [0,1] for each item, controlling the degree of filtering the cell state, as shown in eqn (5) and (6):

 
ot = σ(Wo[ht−1, xt] + bo) (5)
 
ht = ot × tanh(Ct) (6)
where Wi represents the output gate's weights, bi represents the output gate's bias, and σ represents the sigmoid function.

In the model used in this study, a gated linear unit (glu) layer was defined,69 and an LSTM model was created, which included LSTM layers, glu layers, dropout layers, and linear layers. The parameters and hyperparameters of the model were set, the loss function and optimizer were defined, with mean squared error (MSE) used as the loss function and the Adam optimizer. Early stopping was employed to monitor the training process and evaluate the model's performance on the validation set to check for improvements. If improvement was not observed within a specified number of training iterations, the training was terminated early. During the training loop, the model underwent forward propagation, loss calculation, backpropagation, and optimization.

A detailed network structure diagram of the model used to predict PCE is provided in Fig. 2, which illustrates the backpropagation process of the LSTM. The main elements include network parameters (weights and biases), computation nodes, and the gradient accumulation process. The detailed explanations of each element are as follows. The lstm.weight_hh_l0, lstm.bias_hh_10, and lstm.weight_ih_l0 are the parameters of the LSTM layer, and their shapes are (600, 150), (600), and (600, 30), respectively, indicating the dimensions of the parameter matrices. AccumulateGrad: this is a gradient accumulation node, indicating that the gradients corresponding to the parameters are incrementally accumulated during backpropagation. glu.linear1.weight and glu.linear1.bias: these are the weights and biases of the first fully connected linear layer, with shapes (150, 150) and (150), respectively. CudnnRnnBackward: this represents the gradient calculation of the RNN layer implemented using the cuDNN library in CUDA. SelectBackward and TBackward: these are backpropagation nodes for the select and transpose operations. AddmmBackward and SigmoidBackward: these represent the gradient calculation nodes for matrix multiplication and the sigmoid activation function. fc.weight and fc.bias: these are the weights and biases of the final fully connected layer (fc), with shapes (1, 150) and (1), respectively. MulBackward: this represents the backpropagation of the multiplication operation. NativeDropoutBackward: this represents the backpropagation of the dropout regularization layer. Ultimately, all these computations and gradient accumulations converge to a final output.


image file: d4ta04665j-f2.tif
Fig. 2 LSTM network structure of the model used for PCE prediction.

For hyperparameter tuning, the grid search method was used to adjust the optimal hyperparameters, tuning the hyperparameters separately for each of the four device performance parameters. After identifying the optimal model, the model was evaluated using the MSE, the root mean squared error (RMSE), the mean absolute error (MAE), the Pearson correlation coefficient (r), and the coefficient of determination (R2), and defined as follows:

 
image file: d4ta04665j-t4.tif(7)
 
image file: d4ta04665j-t5.tif(8)
 
image file: d4ta04665j-t6.tif(9)
 
image file: d4ta04665j-t7.tif(10)
 
image file: d4ta04665j-t8.tif(11)

Eqn (7)–(11) define the evaluation metrics, where N is the number of data points in the dataset; Ri and Pi represent the actual values and predicted values, respectively; [R with combining macron] and [P with combining macron] represent the mean of the actual values and predicted values, respectively; and var(Ri) is the variance of the sample data. These metrics are used to discuss the accuracy of the trained models in predicting the performance of OSC devices.

All model building and training were performed using PyTorch,70 which is a Python-based scientific computing package primarily designed to meet the needs of deep learning. It is one of the most popular tools in the field of deep learning. PyTorch offers a rich library of deep learning algorithms and a flexible design mechanism, supporting features such as automatic differentiation, dynamic computation graphs, and model visualization, enabling users to build and train models more easily and efficiently. All programming and execution were completed using PyTorch 1.12 and RDKit 2023.03.2 within the Python 3.9 environment on the Anaconda platform.

2.3 Design of new molecules

In the process of designing new OSC materials, the precise cutting and recombination of donor and acceptor molecules from the database enabled the creation of new molecular structures. Initially, donor and acceptor molecules from the database were segmented into three fundamental units: donor units (D), acceptor units (A), and π-spacer units (π). This segmentation is based on the chemical characteristics and functions of molecular structures, which aid in understanding and designing new organic photovoltaic materials. Donor units typically have low ionization potential, promoting electron donation; acceptor units usually have high electron affinity, facilitating electron acceptance; and π-spacer units provide the necessary conjugated structure, aiding in electron transport and charge separation. During this process, side chains and halogen atoms were deliberately excluded as independent units to maintain the integrity and representativeness of the skeleton. This segmentation strategy aims to extract and retain structural features crucial for photoelectric performance.

For donor molecules, 36 D, 22 π, and 33 A were obtained. For acceptor molecules, 44 D, 23 π, and 61 A were obtained. In recent years, researchers have been dedicated to exploring new molecular design and synthesis methods to improve the PCE and stability of OSCs. Among them, donor molecules with a D–π–A–π structure have been widely used in OSCs due to their broad absorption range, which helps in exciton dissociation and reduces electron and hole recombination, thereby improving PCE and charge carrier mobility.71 On the other hand, acceptor molecules with an A–π–D–π–A structure exhibit high design flexibility, allowing the tuning of optical absorption properties and energy levels by modifying the chemical structure, thereby optimizing device performance.72 After segmentation, donor molecules were combined according to the D–π–A–π format, resulting in 142[thin space (1/6-em)]560 donor molecules, while acceptor molecules were symmetrically combined according to the A–π–D–π–A format, resulting in 61[thin space (1/6-em)]732 symmetrical acceptor molecules. This design approach allows systematic exploration and generation of a large number of potential novel OSC materials.

3 Results and discussion

3.1 Database analysis

The distribution of PCE, VOC, JSC, and FF in the database is shown in Fig. 3. The PCE values are primarily concentrated in the range of 3.00% to 8.00%, with a peak at 5.00–6.00%, where the count reaches 39. The maximum PCE value in the database is 18.50%. The JSC values mostly fall within the range of 8.00 mA cm−2 to 18.00 mA cm−2, with the highest count of 61 occurring in the 12.00–14.00 mA cm−2 range. The maximum JSC value in the database is 27.70 mA cm−2. For VOC, there is a very pronounced peak at 0.8–0.9 V, with a count of 185, indicating that the VOC values of most OSCs are very close to this level. The maximum VOC value in the database is 1.34 V. The FF values of most OSC devices are concentrated in the range of 55.00% to 75.00%, with the highest count of 79 samples falling within the 60.00–70.00% range. The maximum FF value in the database is 81.10%.
image file: d4ta04665j-f3.tif
Fig. 3 Distribution of PCE, VOC, JSC, and FF in the database.

Heat maps analyzing the descriptors of input models for acceptor and donor molecules are shown in Fig. 4 and 5, respectively. In the heat maps, the color intensity represents the strength of the correlation, with red indicating a positive correlation, blue indicating a negative correlation, and deeper colors representing stronger correlations. The correlation coefficient values range from −1 (completely negative correlation) to +1 (completely positive correlation). In Heat Map_A, the correlation between acceptor descriptors and organic photovoltaic performance parameters is illustrated. The number of halogen groups in the acceptor (fr_halogen_A), the number of two or more rings in the acceptor (fr_bicyclic_A), and the number of ketone groups in the acceptor (fr_ketone_A) show strong positive correlations with PCE, and also notable positive correlations with JSC and FF, suggesting that these molecular features may significantly impact photovoltaic performance. In Heat Map_D, the correlation between donor descriptors and photovoltaic performance parameters is depicted. PCE and JSC have the strongest positive correlations with the number of rings contained in the donor molecule (RingCount_D). VOC does not show significant correlation with any donor or acceptor descriptors, with the highest correlation descriptor being the number of rings in the donor molecule (RingCount_D).


image file: d4ta04665j-f4.tif
Fig. 4 Heatmap of acceptor descriptors and their correlations with PCE, VOC, JSC, and FF in the database.

image file: d4ta04665j-f5.tif
Fig. 5 Heatmap of donor descriptors and their correlations with PCE, VOC, JSC, and FF in the database.

3.2 LSTM model results

In the research work focused on predicting and analyzing the performance of OSC materials, special attention was given to the distribution of PCE values to ensure that the models developed have good generalizability and accuracy. Initially, the molecular data were sorted according to their PCE values. For the database division, a method was employed to make the data more representative, specifically, to ensure that the training and test sets had a uniform distribution across different PCE value ranges, thereby making the test set more representative of the overall data distribution; the sorted data were divided into 10 groups based on PCE values. This approach aimed to evenly select samples from each PCE value range, reducing bias due to data unevenness. Then, within each group, 20% of the data were randomly sampled as the test set, with the remaining 80% used as the training set for subsequent model training. This method ensured that both the test and training sets maintained uniform and consistent distribution across the entire PCE value range, which not only improved the effectiveness of model training but also ensured the model's generalization performance and reliability on unseen data. This data partitioning method is widely adopted in deep learning and machine learning fields to ensure data diversity and representativeness, thereby promoting a healthy model training process and enhancing the final model performance.73–75

After dividing the database into the training set (374 data points) and test set (91 data points), the models were input to predict and test the four device performance parameters: PCE, VOC, JSC, and FF. Using early stopping and grid search, the optimal model was calculated to avoid overfitting, resulting in a model with the best predictive performance and generalization ability. The hyperparameters of the LSTM model tuned for PCE, VOC, JSC, and FF are shown in Table S3. The MSE, RMSE, MAE, r, and R2 of the tuned LSTM model for these parameters are shown in Table 1. A smaller MSE value indicates higher prediction accuracy of the model. Since RMSE shares the same units as the prediction target, its results are easier to interpret. MAE is used to calculate the average absolute difference between predicted and actual values. Compared to MSE or RMSE, MAE is less sensitive to outliers because it does not square the errors, thus reducing the impact of outliers on the overall error. MAE provides an intuitive understanding of the magnitude of prediction errors, with smaller values indicating more accurate predictions. The correlation coefficient r is used to measure the strength and direction of the linear relationship between two variables. In regression tasks, it can be used to assess the degree of correlation between predicted and actual values, ranging from −1 to 1, with values close to 1 or −1 indicating strong correlation and values close to 0 indicating no correlation. R2 reflects the goodness of fit of the model predictions to actual values, crucial in regression models. It is calculated based on the ratio of prediction error to the variance of the original data and can be interpreted as the proportion of the variance explained by the model. R2 ranges from 0 to 1, with values closer to 1 indicating higher explanatory power and better predictive performance. For PCE, the high r values of 0.9446 in the training set and 0.9179 in the test set indicate a strong correlation between observed and predicted PCE values, with R2 values of 0.8916 in the training set and 0.8414 in the test set indicating good model accuracy. Low values of RMSE, MAE, and MSE further demonstrate the model's excellent precision. For JSC, the accuracy is similar to that of PCE, although there is a slight drop in precision, yet it still shows excellent predictive capability. The predictive ability for FF and VOC is lower than that for PCE and JSC but still performs well. Compared to previous work by other researchers, who used the RF model to predict PCE with an R2 level close to 0.7 and an r level around 0.8,36,37 this study demonstrates superior results.

Table 1 MSE, RMSE, MAE, r, and R2 of the LSTM model for PCE, VOC, JSC, and FF predictions
Device parameters Evaluation metrics Training set value Test set value
PCE r 0.9446 0.9179
R2 0.8916 0.8414
RMSE 1.4815 1.8105
MAE 1.0434 1.4189
MSE 2.1949 3.2778
JSC r 0.9438 0.9040
R2 0.8885 0.8138
RMSE 2.2724 3.0389
MAE 1.6955 2.2719
MSE 5.1639 9.2346
VOC r 0.7190 0.7239
R2 0.5108 0.5159
RMSE 0.0903 0.0981
MAE 0.0625 0.0745
MSE 0.0082 0.0096
FF r 0.7949 0.7801
R2 0.6235 0.5937
RMSE (in%) 7.9367 8.4015
MAE (in%) 6.0239 6.5947
MSE (in%) 58.3198 70.5858


As shown in the scatter plots in Fig. 6, the relationship between experimental values and predicted values in the training and test sets can be visually compared. Blue triangles represent data points from the training set, while red circles represent data points from the test set. The x-axis denotes experimental values, and the y-axis denotes predicted values. Additionally, fitted lines shown in blue and red illustrate the fit between predicted and experimental values for the training and test sets, respectively. If the points tend to fall along a straight line with a slope close to 1, it indicates accurate predictions, visually representing the model's performance on these datasets. The scatter plots also label the R2 and r values for both the training and test sets, which are crucial for evaluating the model's overall performance. This approach not only quantitatively evaluates the model's performance on the datasets but also provides a visual understanding of the relationship between predicted and experimental values. The close alignment of training and test set performance across the four plots indicates that the model does not overfit and generalizes well to new data. The high R2 and r values for all four parameters suggest that the model effectively captures the relationship between predicted and actual values, with closely clustered points around the best-fit line, particularly in the PCE and JSC plots, demonstrating the model's strong predictive capability.


image file: d4ta04665j-f6.tif
Fig. 6 Prediction of photovoltaic performance parameters PCE, VOC, JSC and FF of OSCs using the LSTM model on the training set (364 data points) and test set (91 data points). Blue color indicates the prediction results on the training set and red color indicates the prediction results on the test set. The corresponding R2 and r are given in the upper left corner. The blue and red regions indicate the error ranges of the corresponding fitted lines.

3.3 Validation of prediction accuracy and generalization performance

To confirm the model's accuracy, five donor–acceptor pairs were randomly selected from the database. Experimental values and model predictions were used to evaluate the model's performance. Table 2 shows the accuracy verification results of the PCE prediction model: For the donor–acceptor pair PM6:L8-BO,76 the experimental value is 18.50% and the model prediction is 16.88%. For the donor–acceptor pair PB[N][F]:Y6,77 the experimental value is 14.10% and the model prediction is 13.29%. For the donor–acceptor pair PM6:Y18,78 the experimental value is 16.02% and the model prediction is 16.28%. For the donor–acceptor pair PTB7-Th:DTC-F-F,79 the experimental value is 7.53% and the model prediction is 6.67%. For the donor–acceptor pair PBDB-T:sp-mOEh-ITIC,80 the experimental value is 6.44% and the model prediction is 6.40%. To measure the error between the experimental values and the predicted values, absolute error was calculated. The absolute error is the absolute difference between the predicted value and the experimental value. For the aforementioned donor–acceptor pairs, the absolute errors are 1.62%, 0.81%, 0.26%, 0.86%, and 0.04%, respectively. The small values of these absolute errors indicate that the trained LSTM model has high predictive accuracy.
Table 2 PCE predictions for five donor–acceptor pairs within the database using the trained LSTM model
D:A Experimental PCE (%) Predictive PCE (%) Absolute error (%)
PM6:L8-BO 18.50 16.88 1.62
PB[N][F]:Y6 14.10 13.29 0.81
PM6:Y18 16.02 16.28 0.26
PTB7-Th:DTC-F-F 7.53 6.67 0.86
PBDB-T:sp-mOEh-ITIC 6.44 6.40 0.04


To validate the model's generalization ability, five reported donor–acceptor pairs outside the database were selected as shown in Table 3, including D18:L8-BO,81 PTQ10:ITIC-4F,82 PTB7-Th:Y6,83 PM6:ID-C6Ph-4F84 and PffBT4T-2OD:P(4CF8CH-PDI-TT).85 The PCE prediction model was used to predict these donor–acceptor pairs, and the absolute errors were 0.94%, 0.73%, 1.73%, 0.48%, and 0.67%, respectively, indicating that the trained model has good generalization ability. The prediction results of the VOC, JSC, and FF for the five donor–acceptor pairs both inside and outside the database are provided in Tables S4 and S5. The validation results indicate that the trained model has high accuracy in predicting the performance parameters of OSC devices and exhibits good generalization capability.

Table 3 PCE predictions for five donor–acceptor pairs outside the database using the trained LSTM model
D:A Experimental PCE (%) Predictive PCE (%) Absolute error (%)
D18:L8-BO 16.30 15.36 0.94
PTQ10:ITIC-4F 11.25 11.98 0.73
PTB7-Th:Y6 11.00 12.73 1.73
PM6:ID-C6Ph-4F 10.75 10.27 0.48
PffBT4T-2OD:P(4CF8CH-PDI-TT) 3.43 4.10 0.67


3.4 SHAP analysis

To explore the importance of input descriptors for PCE, VOC, JSC, and FF, SHAP analysis was performed on the optimal model corresponding to each device performance parameter. Fig. 7 shows the SHAP importance analysis of molecular structure descriptors on PCE in the optimal model. Fig. 7(a) shows a bar chart that displays the mean absolute SHAP values for different features, indicating their average impact on the model output. The larger the mean value, the more significant the feature in influencing the model output. Fig. 7(b) shows a scatter plot, where each dot represents an input data point, and the color of the dot indicates the feature value. The color gradient in the plot represents the range of feature values, with red usually indicating high values and blue indicating low values. The position of the dots represents the impact of the feature value on the model prediction, with the X-axis representing SHAP values. Positive values mean the feature has a positive impact on the model prediction output, while negative values indicate a negative impact. SHAP plots for the VOC, JSC, and FF models are shown in Fig. S1–S3 in the ESI.
image file: d4ta04665j-f7.tif
Fig. 7 SHAP importance analysis of the 30 molecular structure descriptors used in the LSTM model for PCE prediction. (a) Shows a bar chart, (b) shows a scatter plot.

From Fig. 7(a), it can be seen that the eight descriptors with the most significant impact on PCE prediction are fr_bicyclic_A (number of two or more rings in the acceptor), NumRotatableBonds_A (number of rotatable bonds in the acceptor molecule), fr_unbrch_alkane_A (number of unbranched aliphatic groups in the acceptor), NumAromaticCarbocycles_A (number of aromatic carbocyclic rings in the acceptor molecule), fr_halogen_A (number of halogen groups in the acceptor molecule), NumAliphaticCarbocycles_A (number of alicyclic alkyl rings in the acceptor molecule), fr_unbrch_alkane_D (number of unbranched aliphatic groups in the donor molecule), and fr_halogen_D (number of halogen groups in the donor molecule). Fig. 7(b) shows that fr_bicycle_A, NumRotatableBonds_A, fr_halogen_A, NumAliphaticCarbocycles_A, and fr_halogen_D are positively correlated with PCE, while fr_unbrch_alkane_A, NumAromaticCarbocycles_A, and fr_unbrch_alkane_D are negatively correlated with PCE. The descriptor with the most significant impact on JSC is fr_bicycle_A, showing a clear positive correlation. Suthar's study pointed out that the number of bicyclic structures in a molecule has a significant positive impact on both PCE and JSC,86 which is consistent with the results of this study. Zhang and He et al. reported that by changing the linear configuration of the alkyl substituents on the thiophene ring and using the polymer donor PBDB-TF, the power conversion efficiency (PCE) of BTIC-TCl-b with branched side chains reached 16.17%, significantly higher than that of BTIC-TCl-l with unbranched aliphatic chains.87 The results of this study indirectly confirm that fr_unbrch_alkane_A has a negative effect on PCE. For VOC, the descriptor with the most significant impact is fr_allylic_oxid_A (number of allylic oxide groups in the acceptor molecule), showing a negative correlation. For the FF, the descriptor with the most significant impact is NumRotatableBonds_A, showing a positive correlation.

3.5 New molecule design and device performance prediction

In this study, 547 donor–acceptor pairs were collected, and molecules were segmented based on donor units, π-spacer units, and acceptor units. The effects of side chains and halogen atoms on segmentation units were ignored. By adding virtual bonds at each cutting point as connection points for fragment combinations, different molecular fragments were screened out. For donor molecules, 36 donor units, 22 π-spacer units, and 33 acceptor units were obtained. For acceptor molecules, 44 donor units, 23 π-spacer units, and 61 acceptor units were obtained. All fragments of donor and acceptor molecules are shown in Fig. S4 and S5. By connecting the virtual bonds at the cutting points, 142[thin space (1/6-em)]560 (33 × 36 × 12 × 10) donor molecules with D–π–A–π structures and 61[thin space (1/6-em)]732 (44 × 23 × 61) symmetrical acceptor molecules with A–π–D–π–A structures were designed. By pairing each newly designed donor–acceptor molecule, 8[thin space (1/6-em)]800[thin space (1/6-em)]513[thin space (1/6-em)]920 (142[thin space (1/6-em)]560 × 61[thin space (1/6-em)]732) donor–acceptor pairs were obtained. Using the RDKit toolkit, molecular structure descriptors required by the machine learning model were calculated, and the trained model was used to predict the PCE, VOC, JSC, and FF of the newly designed donor–acceptor pairs.

After prediction, high-performance donor–acceptor pairs were selected, and donor–acceptor pairs from the database were then deleted. For PCE, 7632 donor–acceptor pairs with PCE greater than 18.00% were obtained, with the highest PCE being 18.52%. There were five donor–acceptor pairs with PCE greater than 18.50%, and their structures are shown in Fig. 8. Each molecule of the donor contains halogen atoms, and each molecule of the acceptor has a fused ring structure and also includes halogen atoms, consistent with the obtained SHAP analysis. For VOC, 888 donor–acceptor pairs with VOC greater than 1.40 V were obtained, with the highest VOC being 1.43 V. For JSC, 17[thin space (1/6-em)]767 donor–acceptor pairs with JSC greater than 25.50 mA cm−2 were obtained, with the highest JSC being 25.95 mA cm−2. For the FF, 150 donor–acceptor pairs with FF greater than 81.00% were obtained, with the highest FF being 82.22%. The structures of these donor–acceptor pairs with the highest predicted values are shown in Fig. 9.


image file: d4ta04665j-f8.tif
Fig. 8 Five designed donor–acceptor pairs with PCE > 18.50%.

image file: d4ta04665j-f9.tif
Fig. 9 Donor–acceptor pairs with the highest predicted values of VOC, JSC, and FF.

The SMILES strings of the donor–acceptor pairs and the corresponding prediction results for PCE, VOC, JSC, and FF are stored in the attached files. pre_PCE.csv contains the SMILES strings and PCE prediction results, pre_Voc.csv contains the SMILES strings and VOC prediction results, pre_Jsc.csv contains the SMILES strings and JSC prediction results, and pre_FF.csv contains the SMILES strings and FF prediction results.

4 Conclusions

In this study, a database comprising 547 different donor–acceptor pairs using NFAs was constructed from published articles, including device performance and molecular structures of OSCs. The molecular structures were converted to SMILES strings and then to molecular structure descriptors. After screening, a database containing 465 sets of data with 30 structural descriptors was obtained. This database was divided into test and training sets and input into the constructed LSTM model. For each device performance parameter, the optimal hyperparameters were determined using the grid search method, achieving reliable predictions for PCE, VOC, JSC, and FF. The prediction of PCE on the test set achieved an R2 of 0.84 and a Pearson correlation coefficient r of 0.92, resulting in a reliable predictive model. SHAP analysis was then used to assess the importance of input descriptors, revealing that the number of rotatable chemical bonds and the number of two or more rings in the acceptor molecules have a significant positive correlation with PCE. Subsequently, the donor and acceptor molecules in the database were fragmented according to donor units, π-bridge units, and acceptor units. This resulted in the construction of 142[thin space (1/6-em)]560 D–π–A–π structured donor molecules and 61[thin space (1/6-em)]732 symmetric A–π–D–π–A structured NFA molecules. By combining each donor–acceptor pair, 8[thin space (1/6-em)]800[thin space (1/6-em)]513[thin space (1/6-em)]920 donor–acceptor pairs were generated and converted into molecular structure descriptors for input into the LSTM model to predict PCE, VOC, JSC, and FF. As a result, 7632 donor–acceptor pairs with predicted PCE greater than 18.00% were identified, with the highest PCE reaching 18.52%, and five pairs corresponding to the predicted PCE greater than 18.50%. This work significantly reduces the experimental costs and development period for new materials through extensive molecular design and virtual screening facilitated by the LSTM model, accelerating the discovery and application of novel OSC materials.

Data availability

The data that support the findings of this study are available within the ESI.

Author contributions

Long-Fei Lv: data curation (equal); investigation (equal); visualization (equal); writing – original draft (equal). Cai-Rong Zhang: conceptualization (equal); formal analysis (equal); funding acquisition (equal); investigation (equal); methodology (equal); project administration (equal); resources (equal); supervision (equal); validation (equal); writing – review & editing (equal). Rui Cao: data curation (equal); investigation (equal); visualization (equal); writing – original draft (equal). Xiao-Meng Liu: formal analysis (equal). Mei-Ling Zhang: formal analysis (equal). Ji-Jun Gong: formal analysis (equal). Zi-Jiang Liu: formal analysis (equal). You-Zhi Wu: formal analysis (equal). Hong-Shan Chen: formal analysis (equal); resources (equal).

Conflicts of interest

The authors have no conflicts to disclose.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 12264025). The authors are grateful for the help of Mr Miao Zhao and Ming Li and Miss Hai-Yuan Yu, Jin-Hong Li, Li Ma and Yu-Tong Ren.

Notes and references

  1. N. Armaroli and V. Balzani, Angew. Chem., Int. Ed., 2006, 46, 52–66 CrossRef .
  2. K. A. Mazzio and C. K. Luscombe, Chem. Soc. Rev., 2015, 44, 78–90 RSC .
  3. P. Cheng, G. Li, X. Zhan and Y. Yang, Nat. Photonics, 2018, 12, 131–142 CrossRef CAS .
  4. Y. Cui, P. Zhu, X. Liao and Y. Chen, J. Mater. Chem. C, 2020, 8, 15920–15939 RSC .
  5. O. Inganäs, Adv. Mater., 2018, 30, 1800388 CrossRef .
  6. L. Lu, T. Zheng, Q. Wu, A. M. Schneider, D. Zhao and L. Yu, Chem. Rev., 2015, 115, 12666–12731 CrossRef CAS PubMed .
  7. H. Chen, Y. Zou, H. Liang, T. He, X. Xu, Y. Zhang, Z. Ma, J. Wang, M. Zhang, Q. Li, C. Li, G. Long, X. Wan, Z. Yao and Y. Chen, Sci. China: Chem., 2022, 65, 1362–1373 CrossRef CAS .
  8. H. Liu, Y. Geng, Z. Xiao, L. Ding, J. Du, A. Tang and E. Zhou, Adv. Mater., 2024 DOI:10.1002/adma.202404660 .
  9. J. Hachmann, R. Olivares-Amaya, A. Jinich, A. L. Appleton, M. A. Blood-Forsythe, L. R. Seress, C. Román-Salgado, K. Trepte, S. Atahan-Evrenk, S. Er, S. Shrestha, R. Mondal, A. Sokolov, Z. Bao and A. Aspuru-Guzik, Energy Environ. Sci., 2014, 7, 698–704 RSC .
  10. I. Y. Kanal, S. G. Owens, J. S. Bechtel and G. R. Hutchison, J. Phys. Chem. Lett., 2013, 4, 1613–1623 CrossRef CAS .
  11. A. Mishra and P. Bäuerle, Angew. Chem., Int. Ed., 2012, 51, 2020–2067 CrossRef CAS PubMed .
  12. M. C. Scharber, D. Mühlbacher, M. Koppe, P. Denk, C. Waldauf, A. J. Heeger and C. J. Brabec, Adv. Mater., 2006, 18, 789–794 CrossRef CAS .
  13. T. Yagi, R. Satoh, Y. Yamada, H. Kang, H. Miyao and K. Sawa, J. Soc. Inf. Disp., 2012, 20, 526–532 CrossRef CAS .
  14. X. Jiaxuan, Cluster Comput., 2018, 22, 4829–4835 CrossRef .
  15. Y. Q. Pan and G. Y. Sun, ChemSusChem, 2019, 12, 4570–4600 CrossRef CAS .
  16. C. Yan, S. Barlow, Z. Wang, H. Yan, A. K. Y. Jen, S. R. Marder and X. Zhan, Nat. Rev. Mater., 2018, 3, 18003 CrossRef CAS .
  17. J. Zhang, H. S. Tan, X. Guo, A. Facchetti and H. Yan, Nat. Energy, 2018, 3, 720–731 CrossRef CAS .
  18. L. Ma, C. R. Zhang, M. L. Zhang, X. M. Liu, J. J. Gong, Y. H. Chen, Z. J. Liu, Y. Z. Wu and H. S. Chen, Adv. Theory Simul., 2023, 7, 2300624 CrossRef .
  19. H.-Y. Yu, C.-R. Zhang, M.-L. Zhang, X.-M. Liu, J.-J. Gong, Z.-J. Liu, Y.-Z. Wu and H.-S. Chen, New J. Chem., 2022, 46, 20204–20216 RSC .
  20. M. Zhao, C. R. Zhang, M. L. Zhang, X. M. Liu, J. J. Gong, Z. J. Liu, Y. H. Chen and H. S. Chen, Int. J. Quantum Chem., 2022, 123, e27047 CrossRef .
  21. Z. Gan, L. Wang, J. Cai, C. Guo, C. Chen, D. Li, Y. Fu, B. Zhou, Y. Sun, C. Liu, J. Zhou, D. Liu, W. Li and T. Wang, Nat. Commun., 2023, 14, 6297 CrossRef CAS PubMed .
  22. J. Song, C. Zhang, C. Li, J. Qiao, J. Yu, J. Gao, X. Wang, X. Hao, Z. Tang, G. Lu, R. Yang, H. Yan and Y. Sun, Angew. Chem., Int. Ed., 2024, 63, e202404297 CrossRef CAS PubMed .
  23. P. Wang, J. Zhang, D. Luo, J. Xue, L. Zhang, H. Mao, Y. Wang, C. Yu, W. Ma and Y. Chen, Adv. Funct. Mater., 2024 DOI:10.1002/adfm.202402680 .
  24. Q. Xie, X. Deng, C. Zhao, J. Fang, D. Xia, Y. Zhang, F. Ding, J. Wang, M. Li, Z. Zhang, C. Xiao, X. Liao, L. Jiang, B. Huang, R. Dai and W. Li, Angew. Chem., Int. Ed., 2024, 63, e202403015 CrossRef CAS PubMed .
  25. J. Wang, Z. Zheng, P. Bi, Z. Chen, Y. Wang, X. Liu, S. Zhang, X. Hao, M. Zhang, Y. Li and J. Hou, Natl. Sci. Rev., 2023, 10, nwad085 CrossRef CAS PubMed .
  26. H. Tian, Y. Ni, W. Zhang, Y. Xu, B. Zheng, S. Y. Jeong, S. Wu, Z. Ma, X. Du, X. Hao, H. Y. Woo, L. Huo, X. Ma and F. Zhang, Energy Environ. Sci., 2024, 17, 5173–5182 RSC .
  27. W. Xu, H. Tian, Y. Ni, Y. Xu, L. Zhang, F. Zhang, S. Wu, S. Y. Jeong, T. Huang, X. Du, X. Li, Z. Ma, H. Young Woo, J. Zhang, X. Ma, J. Wang and F. Zhang, Chem. Eng. J., 2024, 493, 152558 CrossRef CAS .
  28. L. Zhang, M. Zhang, Y. Ni, W. Xu, H. Zhou, S. Ke, H. Tian, S. Y. Jeong, H. Y. Woo, W.-Y. Wong, X. Ma and F. Zhang, ACS Mater. Lett., 2024, 6, 2964–2973 CrossRef CAS .
  29. H. Zhou, Y. Sun, M. Zhang, Y. Ni, F. Zhang, S. Y. Jeong, T. Huang, X. Li, H. Y. Woo, J. Zhang, W. Y. Wong, X. Ma and F. Zhang, Sci. Bull., 2024 DOI:10.1016/j.scib.2024.07.027 .
  30. X. Cai, Y. Chen, B. Sun, J. Chen, H. Wang, Y. Ni, L. Tao, H. Wang, S. Zhu, X. Li, Y. Wang, J. Lv, X. Feng, S. A. T. Redfern and Z. Chen, Nanoscale, 2019, 11, 8260–8269 RSC .
  31. C. Chen, Y. Zuo, W. Ye, X. Li, Z. Deng and S. P. Ong, Adv. Energy Mater., 2020, 10, 1903242 CrossRef CAS .
  32. Y. Chen, Z. Lao, B. Sun, X. Feng, S. A. T. Redfern, H. Liu, J. Lv, H. Wang and Z. Chen, ACS Mater. Lett., 2019, 1, 375–382 CrossRef CAS .
  33. S.-S. Wan, X. Xu, Z. Jiang, J. Yuan, A. Mahmood, G.-Z. Yuan, K.-K. Liu, W. Ma, Q. Peng and J.-L. Wang, J. Mater. Chem. A, 2020, 8, 4856–4867 RSC .
  34. A. Mahmood and J.-L. Wang, Energy Environ. Sci., 2021, 14, 90–105 RSC .
  35. J.-H. Li, C.-R. Zhang, M.-L. Zhang, X.-M. Liu, J.-J. Gong, Y.-H. Chen, Z.-J. Liu, Y.-Z. Wu and H.-S. Chen, Org. Electron., 2024, 125, 106988 CrossRef CAS .
  36. M. Li, C. R. Zhang, M. L. Zhang, J. J. Gong, X. M. Liu, Y. H. Chen, Z. J. Liu, Y. Z. Wu and H. S. Chen, Phys. Status Solidi A, 2024, 221, 2400008 CrossRef CAS .
  37. C.-R. Zhang, M. Li, M. Zhao, J.-J. Gong, X.-M. Liu, Y.-H. Chen, Z.-J. Liu, Y.-Z. Wu and H.-S. Chen, J. Appl. Phys., 2023, 134, 153104 CrossRef CAS .
  38. H. Sahu and H. Ma, J. Phys. Chem. Lett., 2019, 10, 7277–7284 CrossRef CAS PubMed .
  39. H. Sahu, W. Rao, A. Troisi and H. Ma, Adv. Energy Mater., 2018, 8, 1801032 CrossRef .
  40. H. Sahu, F. Yang, X. Ye, J. Ma, W. Fang and H. Ma, J. Mater. Chem. A, 2019, 7, 17480–17488 RSC .
  41. G. Han and Y. Yi, Angew. Chem., Int. Ed., 2022, 61, e202213953 CrossRef CAS PubMed .
  42. S. Nagasawa, E. Al-Naamani and A. Saeki, J. Phys. Chem. Lett., 2018, 9, 2639–2646 CrossRef CAS .
  43. W. Sun, Y. Zheng, K. Yang, Q. Zhang, A. A. Shah, Z. Wu, Y. Sun, L. Feng, D. Chen, Z. Xiao, S. Lu, Y. Li and K. Sun, Sci. Adv., 2019, 5, eaay4275 CrossRef CAS .
  44. T. W. David, H. Anizelli, T. J. Jacobsson, C. Gray, W. Teahan and J. Kettle, Nano Energy, 2020, 78, 105342 CrossRef CAS .
  45. Y. Wu, J. Guo, R. Sun and J. Min, npj Comput. Mater., 2020, 6, 120 CrossRef CAS .
  46. J. Huang, B. Li, J. Zhu and J. Chen, Multimed. Tool. Appl., 2017, 76, 20231–20247 CrossRef .
  47. H. Li, P. He, S. Wang, A. Rocha, X. Jiang and A. C. Kot, IEEE Trans. Inf. Forensics Secur., 2018, 13, 2639–2652 Search PubMed .
  48. Y. Liu, K. Wang, C. Zong and K.-Y. Su, Comput. Speech Lang., 2019, 55, 216 CrossRef .
  49. T. Lu, Y. Wang, R. Xu, W. Liu, W. Fang and Y. Zhang, Multimed. Tool. Appl., 2022, 81, 6305–6330 CrossRef .
  50. A. Majumdar, R. Singh and M. Vatsa, IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39, 1273–1280 Search PubMed .
  51. R. Wadawadagi and V. Pagi, Artif. Intell. Rev., 2020, 53, 6155–6195 CrossRef .
  52. Z. Zhang, P. Luo, C. C. Loy and X. Tang, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, 918–930 Search PubMed .
  53. D. Weininger, J. Chem. Inf. Comput. Sci., 1988, 28, 31–36 CrossRef CAS .
  54. S.-P. Peng and Y. Zhao, J. Chem. Inf. Model., 2019, 59, 4993–5001 CrossRef CAS PubMed .
  55. S.-P. Peng, X.-Y. Yang and Y. Zhao, Int. J. Mol. Sci., 2021, 22, 9099 CrossRef CAS PubMed .
  56. G. J. Moore, O. Bardagot and N. Banerji, Adv. Theory Simul., 2022, 5, 2100511 CrossRef CAS .
  57. S. Hochreiter and J. Schmidhuber, Neural Comput., 1997, 9, 1735–1780 CrossRef CAS PubMed .
  58. A. Datta, S. Sen and Y. Zick, Presented in Part at the 2016 IEEE Symposium on Security and Privacy (SP), 2016 Search PubMed .
  59. S. Lipovetsky and M. Conklin, Appl. Stoch Model Bus. Ind., 2001, 17, 319–330 CrossRef .
  60. M. T. Ribeiro, S. Singh and C. Guestrin, Presented in Part at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016 Search PubMed .
  61. E. Štrumbelj and I. Kononenko, Knowl. Inf. Syst., 2013, 41, 647–665 CrossRef .
  62. O. D. Suarez, S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller and W. Samek, PLoS One, 2015, 10, e0130140 CrossRef .
  63. A. Shrikumar, P. Greenside and A. Kundaje, arXiv, 2017, preprint, arXiv:1704.02685,  DOI:10.48550/arXiv.1704.02685.
  64. RDKit: Open-source Cheminformatics, https://www.rdkit.org/, accessed March 25, 2024 Search PubMed.
  65. G. Long, A. Li, R. Shi, Y. C. Zhou, X. Yang, Y. Zuo, W. R. Wu, U. S. Jeng, Y. Wang, X. Wan, P. Shen, H. L. Zhang, T. Yan and Y. Chen, Adv. Electron. Mater., 2015, 1, 1500217 CrossRef .
  66. G. Long, R. Shi, Y. Zhou, A. Li, B. Kan, W.-R. Wu, U. S. Jeng, T. Xu, T. Yan, M. Zhang, X. Yang, X. Ke, L. Sun, A. Gray-Weale, X. Wan, H. Zhang, C. Li, Y. Wang and Y. Chen, J. Phys. Chem. C, 2017, 121, 5864–5870 CrossRef CAS .
  67. G. Long, B. Wu, A. Solanki, X. Yang, B. Kan, X. Liu, D. Wu, Z. Xu, W. R. Wu, U. S. Jeng, J. Lin, M. Li, Y. Wang, X. Wan, T. C. Sum and Y. Chen, Adv. Energy Mater., 2016, 6, 1600961 CrossRef .
  68. Y. Zhou, G. Long, A. Li, A. Gray-Weale, Y. Chen and T. Yan, J. Mater. Chem. C, 2018, 6, 3276–3287 RSC .
  69. Y. N. Dauphin, A. Fan, M. Auli and D. Grangier, arXiv, 2016, preprint, arXiv:1612.08083,  DOI:10.48550/arXiv.1612.08083.
  70. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai and S. Chintala, arXiv, 2019, preprint, arXiv:1912.01703,  DOI:10.48550/arXiv.1912.01703.
  71. S. E. Ozturk, R. Isci, S. Faraji, B. Sütay, L. A. Majewski and T. Ozturk, Eur. Polym. J., 2023, 191, 112028 CrossRef CAS .
  72. H. Gao, C. Han, X. Wan and Y. Chen, Ind. Chem. Mater., 2023, 1, 60–78 RSC .
  73. H. A. Afan, A. Yafouz, A. H. Birima, A. N. Ahmed, O. Kisi, B. Chaplot and A. El-Shafie, Nat. Hazards, 2022, 112, 1527–1545 CrossRef .
  74. C. Lu, W. Ma, R. Wang, S. Deng and Y. Wu, Complex Intell. Systems, 2022, 9, 2081–2099 CrossRef .
  75. J. Sadaiyandi, P. Arumugam, A. K. Sangaiah and C. Zhang, Electronics, 2023, 12, 4423 CrossRef .
  76. Z. Chen, Q. Li, Y. Jiang, H. Lee, T. P. Russell and Y. Liu, J. Mater. Chem. A, 2022, 10, 16163–16170 RSC .
  77. Z. Cao, J. Chen, S. Liu, X. Jiao, S. Ma, J. Zhao, Q. Li, Y.-P. Cai and F. Huang, ACS Appl. Mater. Interfaces, 2020, 12, 9545–9554 CrossRef CAS PubMed .
  78. C. Zhang, J. Yuan, K. L. Chiu, H. Yin, W. Liu, G. Zheng, J. K. W. Ho, S. Huang, G. Yu, F. Gao, Y. Zou and S. K. So, J. Mater. Chem. A, 2020, 8, 8566–8574 RSC .
  79. J. Liao, P. Zheng, Z. Cai, S. Shen, G. Xu, H. Zhao and Y. Xu, Org. Electron., 2021, 89, 106026 CrossRef CAS .
  80. M. j. Sung, B. Park, J. Y. Choi, J. Kim, C. Sun, H. Kang, S. Kwon, S.-Y. Jang, Y.-H. Kim, K. Lee and S.-K. Kwon, Dyes Pigm., 2020, 180, 108369 CrossRef CAS .
  81. D. Li, N. Deng, Y. Fu, C. Guo, B. Zhou, L. Wang, J. Zhou, D. Liu, W. Li, K. Wang, Y. Sun and T. Wang, Adv. Mater., 2022, 35, 2208211 CrossRef PubMed .
  82. F. Feaugas, T. Nicolini, G. H. Roche, L. Hirsch, O. J. Dautel and G. Wantz, Sol. RRL, 2022, 7, 2200815 CrossRef .
  83. Y. Wang, M. B. Price, R. S. Bobba, H. Lu, J. Xue, Y. Wang, M. Li, A. Ilina, P. A. Hume, B. Jia, T. Li, Y. Zhang, N. J. L. K. Davis, Z. Tang, W. Ma, Q. Qiao, J. M. Hodgkiss and X. Zhan, Adv. Mater., 2022, 34, 2206717 CrossRef CAS PubMed .
  84. P. Wang, F. Bi, Y. Li, C. Han, N. Zheng, S. Zhang, J. Wang, Y. Wu and X. Bao, Adv. Funct. Mater., 2022, 32, 2200166 CrossRef CAS .
  85. L. Wang, M. Hu, Y. Zhang, Z. Yuan, Y. Hu, X. Zhao and Y. Chen, Polymer, 2022, 255, 125114 CrossRef CAS .
  86. R. Suthar, A. T and S. Karak, J. Mater. Chem. A, 2023, 11, 22248–22258 RSC .
  87. P. Tan, C. Cao, Y. Cheng, H. Chen, H. Lai, Y. Zhu, L. Han, J. Qu, N. Zheng, Y. Zhang and F. He, J. Mater. Chem. A, 2023, 11, 9538–9545 RSC .

Footnote

Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4ta04665j

This journal is © The Royal Society of Chemistry 2024