### Evaluation index

The AF detection algorithm in this paper is a binary task. 0 represents non-AF sample and 1 represents AF samples. Therefore, in this paper, the binary evaluation method will be used to evaluate the results, which will be used as the standard for selecting the hyperparameters of the network model and comparing with other research results. At the same time, the final loss of network test sets is compared when the results are compared.

In this paper, the cross entropy function is used as the loss function to calculate the loss of the training process and the final test set. The loss function calculation formula of a single sample is shown in Eq. (4).

$$L = – [y\log \hat{y} + (1 – y)\log (1 – \hat{y})]$$

(4)

Cross entropy loss function is a common loss function in binary classification model. From the perspective of maximum likelihood, the conditional probability of sample label 0 and sample label 1 is integrated, as shown in formula (5). In order to maximize the probability value of \(P(y|x)\) and ensure the monotonicity of the function, *log* function is introduced to make \(Loss = – \log P(y|x)\). The Loss function of a single sample in formula (4) can be obtained. In this paper, for a large number of sample problems based on deep learning, the complete loss function can be obtained by stacking the losses of N samples, and the final loss of each training or test set can be calculated.

$$P(y|x) = \hat{y}^{y} \cdot (1 – \hat{y})^{1 – y}$$

(5)

In addition to the selected cross entropy loss function, this study also selects some quantitative indexes to evaluate the parameter selection and performance of the algorithm. The following major evaluation indexes are mainly used.

#### Accuracy

Accuracy is the most intuitive and best understood index, and its value is the ratio of the number of data correctly classified by the algorithm to the total number of data input to the algorithm. However, this index cannot be used as the only objective evaluation index in the case of extremely obvious data imbalance in data categories or extremely biased data. Therefore, other indexes are needed to comprehensively evaluate algorithm performance.

#### Confusion matrix, sensitivity, specificity

In this study, atrial fibrillation samples are regarded as positive and non-atrial fibrillation samples are as negative. If the instance is positive and is predicted to be positive, it is called true positive (TP). Conversely, if an instance is positive but incorrectly is predicted as negative, it is called pseudo-negative (FN). Similarly, if an instance is negative and is predicted to be negative, it is called true negative (TN). An instance that is negative but predicted to be positive is called a false positive (FP).

$$ACC = ((TN + TP))/((TN + TP + FN + FP))$$

(6)

$$Sensitivity = TP/(TP + FN)$$

(7)

$$Specificity = TN/(TN + FP)$$

(8)

Confusion matrix is a visual tool to show the classification accuracy results. Each column represents the number of predicted categories, and each row represents the number of actual categories. In this study, in addition to the quantitative values of the above indexes, the final confusion matrix of the test set in the data set is used as an intuitive evaluation index of the network results.

### Data set and pre-processing

The data set selected in this study is a 12-lead ECG signal data set published in the China Physiological Signal Analysis Challenge (CPSC2018). For research needs, this study uses the information of single leads in the 12 leads for analysis.

Currently, there are several different publicly available databases in the field for the detection of arrhythmia-like diseases. Among them, MIT-BIH database is the most commonly used database for detecting cardiac arrhythmias. The data in this database is double-lead ECG data, which contains 48 ECG records from 47 patients with a sampling frequency of 360Hz, each of which lasts about 30 min and contains 15 types of arrhythmias. In addition, there is MIT-BIH AF database, which is dedicated to AF detection algorithm research and contains 25 ECG ECG signal records, each of which has a long duration, and most of them are paroxysmal AF.

The ECG signal itself is a weak signal, and at the same time, the data collected by the ECG signal acquisition equipment will contain noise, generally including power frequency noise, human myoelectric interference and baseline drift noise types. Therefore, before the ECG signal is input into the designed deep neural network, it is necessary to pre-process the ECG signal with noise reduction. The common noise reduction methods mainly include filter noise reduction and wavelet noise reduction.

Filter noise reduction is a common noise reduction processing method, which is realized based on frequency domain analysis. The useful signal and noisy signal are separated in the frequency domain mainly through the selected or designed filter, and the required frequency components are obtained from the signal or the unwanted frequency components are removed. Common noise reduction filters generally include low-pass and high-pass filters and bandpass filters. Among them, the low-pass filter and the high-pass filter process the signal in the opposite way, low-pass means that the signal below a certain cutoff frequency passes through the filter, and the signal above the cutoff frequency is weakened or reduced, and the high-pass filter, on the contrary, represses the component below the cutoff frequency, and passes the component higher than the cutoff frequency. The bandpass filter attenuates the frequency components of other components to a very low level through the signal components within a certain frequency segment.

### Network implementation details

Considering that the network scale should be controlled within a small range, and the network advantage of DenseNet is the small model size, the growth rate k refers to the number of k feature channels that will be increased after each convolution layer in DenseNet. Therefore, the growth rate of the network is set to 12 in this study (k = 12). A smaller growth rate is used to effectively control the network size while ensuring network learning ability. At the same time, we set the initial learning rate as 0.1, epoch number as 64, and the number of generation selection as 300. The test results are shown in Table 1.

From Table 1, it can be seen that with the increase of the layer number, the results will change or have little influence, and the index values will hardly be significantly improved. When the number of intensive modules is 3, the network training is sufficient. Similarly, increasing the number of network layers within each dense module will not improve the results and will make the network more complex. Therefore, in consideration of experimental results and network scale, three dense modules will be used in subsequent experiments, each of which has an 8-layer network structure.

Learning rate determines the speed of adjusting network parameters in the process of gradient descent, which is a very important hyperparameter. In this study, the auto-adjusted learning rate is used, and the initial value of the initial learning rate is set to 0.1. When the training reaches half and three-quarters of the set number of iterations, the current learning rate is multiplied by 0.1 to reduce the learning rate to achieve better optimization effect. However, we can see the training process of the network through the convergence curves of the training set and verification set in Fig. 6. The convergence speed of the network is fast. It basically converges before the 50th substitution. Therefore, the training convergence speed of the network itself is fast, and considering the small scale of the network, it also further proves the superiority of the network itself. Meanwhile, the loss value of the final test set also reached 0.0385, which also reflects the superiority of the network and algorithm in this study.

Considering the rapid convergence of the network, the convergence of the network will be achieved around the 50th generation selection. Therefore, in the following network training, the number of selected generations is set to 100 and the number of processing per batch is 64. The number of intensive modules is 3, and the number of internal network layers of modules is 8. Meanwhile, the initial learning rate is 0.1, and the growth rate k is set to 12.

### Results analysis

The training convergence process of the network is shown in Fig. 7. It can be seen that the network convergence speed is very fast, and the training set and verification set can basically start to converge around the 50th generation selection. Here, the loss of training set is close to 0, and the loss of verification set begins to stabilize below 0.1 at about 50 times of generation selection. With the training of the network, the accuracy rate increases rapidly in the training process and maintains at a state approaching 1, which further proves the stability and accuracy of the algorithm in this paper.

In this paper, the data of each of the 12 leads ECG are used as network input to verify the algorithm, and the results are shown in Table 2. As can be seen from the results, although the performance results are different, the accuracy of each lead can exceed 95%, which proves that the new algorithm is applicable to the data of each of the twelve leads. Better results are obtained regardless of which lead is used, and atrial fibrillation was best detected on the unipolar limb lead. As can be seen from Table 2, the results on the lead are also very good, which also proves the feasibility of the application of the algorithm in portable mobile medical scenarios.

Sensitivity is slightly lower in both networks. It can be seen from the confusion matrix in Table 2 that the sensitivity is slightly lower than the other two indicators because the number of AF samples in the data set is smaller than that of non-AF samples. The two roughly present a sample ratio of 1:5, and the sample proportion is small, so compared with the calculation of sensitivity, the denominator value is relatively small, resulting in slightly less sensitive experimental results.

In conclusion, one-dimensional DenseNet achieves high accuracy in the detection of atrial fibrillation. The sensitivity and specificity of the algorithm reach 93.55% and 99.11%, respectively. The very high specificity indicates that the proportion of correctly classified samples in the original negative samples is very high, the classification effect of the network is very good. The confusion matrix also shows the best classification detection result as shown in Fig. 8.

At the same time, we also compare with other advanced methods including CNN-LSTM^{28}, OTE^{29}. The dataset is CinC2017 Challenge data set. By calculating the total number of True positive (TP), False negative (FN) and False positive (FP), the F1 score of each category is calculated as an indicator to evaluate the performance of the new method. The F score for category C is calculated as follows:

$${F}_{1c}=\frac{2TP}{2TP+FP+FN}$$

(9)

The average of F1 scores for the normal rhythm class (F_{1N}), atrial fibrillation rhythm class (F_{1A}), and other rhythm class (F_{1O}) is used as the total F1 score, which is calculated as:

$$F1=\frac{{F}_{1N}+{F}_{1A}+{F}_{1O}}{3}$$

(10)

It is worth noting that although the F1 score for the noise class is not included in the calculation, misclassifying the noise sample into other categories can also affect the final score. Table 3 shows the results of tenfold cross-validation of each model.

From an outcome point of view, the model performs better for categories with more training samples (normal sinus rhythm, other rhythms). From the results, the recognition performance of the model with more samples (normal, other) was better than that of the model with fewer training samples (AF rhythm). As shown in Fig. 9, the loss and correctness changes in the training phase. It is observed that the training loss continues to decline, while the loss of the validation set levels off after about 10 training cycles, and the performance of the model is difficult to improve.