Abstract
One of the cornerstones for guaranteeing the stability of wind generation and electric power system operation is wind speed prediction. This research offers a method based on Particle Swarm Optimization ( PSO) to optimize the Bidirectional Long Short-term Memory Network (BiLSTM) in order to improve the wind speed prediction accuracy, taking into account the highly stochastic and regular aspects of wind speed. Firstly, the wind speed time sequence is subjected to the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise ( CEEMDAN ). The complexity of the wind speed pattern is reduced by decomposing it into components with different local feature information. The BiLSTM model, which incorporates the attention mechanism for prediction, is then fitted to the decomposed data, and its parameters are optimized using the particle swarm technique, reducing errors in predictive modeling. To get the final prediction, the components are finally superimposed. The empirical evidence shows that the CEEMDAN-PSO-BiLSTM-attention model decreases the RMSE(Root-Mean-Square-Error) by 15%-44%, the MAE by 18%-45%, the MAPE by 24%-52%, and the R2 by 1.4%-2.7% in comparison to the BiLSTM and other models.The validation of CEEMDAN-PSO-BiLSTM-attention model in short-term wind speed prediction is verified.
0 Introduction
The Global Wind Energy Council ( GWEC) reported a record 117 GW of new wind power capacity installed globally in 2023, marking the highest annual growth to date [1] . Despite abundant wind resources, its unpredictable variability leads to unstable energy output, challenging power grid frequency regulation and voltage control. Exceeding voltage limits threatens grid stability [2], highlighting the critical need for accurate wind speed forecasting in power system management.
There are a number of wind speed prediction algorithms and models, including physical models, artificial intelligence models, statistical models and combined prediction models [3] . Physical models usually make predictions by solving N-S equations from real-time data such as on-site speed and wind direction measurements, and numerical weatherprediction; AI models, encompassing both deep learning and supervised studying techniques, such as decision trees and support vector machines, make use of the nonlinear relationship between wind speeds to make models [4], statistical models predict wind speed depending on historical statistical data [5], and the combined prediction model takes the prediction effect of a single model into account and establishes a statistical analysis model depending on the prediction consequences of each model, which improves the combined prediction model.
The combination of combined predictive model is enhanced through the development of statistical analysis frameworks based on individual model outputs [6] . Long Short-Term Memory (LSTM) neural networks [7] were employed in comparative studies, demonstrating superior prediction accuracy over conventional time series methods, though constrained by high computational complexity and extended training durations. To address these limitations,Kisvari et al. [8] presented a Gated Recurrent Unit (GRU)-based architecture, with comparative analyses confirming GRU’s advantages in both predictive performance and computational efficiency. While each methodology demonstrates scenario-specific adaptability, achieving optimal predictions with single-model architectures remains challenging. Sabri et al. [9] developed a CNN-GRU hybrid model wherein convolutional layers perform feature extraction to enhance prediction accuracy, while gated recurrent units maintain temporal information storage. Experimental validation confirmed this architecture’s superiority over benchmark models, though GRU implementations require intensive hyper-parameter optimization across diverse data types.
Liu et al. [10] put forward a wind power prediction technique based on VMD and LSTM, and discovered that VMD processing greatly increased the wind power forecast accuracy. However, the computing complexity of VMD is high, especially for signals with long time series, the computation time may be substantial. Liu et al. [11] employed empirical modal decomposition (EMD) to decompose the time series and subsequently uses radial basis function ( RBF) neural network for prediction. This method reduces the difficulty of prediction and also enhances the accuracy of time series prediction. However, the EMD method is susceptible to module blending or endpoint effects. He et al. [12] combined the improved EEMD (ensemble empirical mode decomposition) with the LASSO-QRNN model to overcome the modal aliasing problemand enhance the prediction accuracy. However, there may be interference noise in the decomposition sequence, which affects the prediction accuracy. To address this problem, Xiong et al. [13] proposed to decompose the wind energy sequence by CEEMD (complete ensemble empirical mode decomposition), which effectively eliminates the interference noise in the decomposed sequence, thus increasing the wind power forecast accuracy. However, the disadvantage of CEEMD is that if the delivered white noise amplitude and the number of iterations are not properly selected, redundant intrinsic mode function (IMF) components are generated, which need to be restructured or processed.
In this study, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) will be used to deconstruct the original wind velocity sequence. During the decomposition process, the finalreconstructed signal has less noise residue than the EEMD result, thus minimising the number of screenings. This is achieved by summing the IMF components of each order resulting from the white noise EMD decomposition. Meanwhile, the parameters of the BiLSTM model introducing the attention mechanism are optimised by the particle swarm algorithm to enhance the forecasting accuracy. Simulation comparisons demonstrate the great prediction accuracy of the method presented in this research.
1 Basic Theory
1.1 CEEMDAN
CEEMDAN algorithm enhances signal decomposition by introducing adaptive white noise to original data, leveraging its properties to differentiate modal components across multiple scales. Initial preprocessing first addresses extreme values or monotonic trends. Adaptive noise sequences are constructed and added to the signal, generating multiple trial sequences. EMD decomposition was then performed on each set of sequences to derive the IMF. These IMFs are aggregated and weighted, then re-decomposed via EMD. The process iteratively verifies convergence criteria: if unmet, adaptive noise is adjusted; if satisfied, iterations continue until IMF quantities stabilize. This method improves decomposition accuracy and stability through signal-adaptive noise modulation, ensuring minimal interference during modal separation [14] . It successfully solves the problem of too many averaging time sets of EEMD algorithm and improves the decomposition efficiency of the approach. The following are the steps in the CEEMDAN decomposition:
1) A new signal sequence is generated by adding white noise to the primordial wind speed signal , i.e
(1)
where ε is the amplitude of white noise.
2) The EMD decomposition of the new signal sequence yields the first modal component IMF1 and the residual , i.e
(2)
(3)
where n is the number of times white noise is added.
3) After adding the white noise to the residuals , EMD decomposition is carried out to obtain the second modal component IMF2 and residuals , i.e
(4)
(5)
4) The raw wind speed signature is decomposed into Eq. (6) and the above steps are repeated for each extra white noise decomposition until the resulting remnants cannot be further decomposed.
(6)
1.2 BiLSTM with the Introduction of Attention Mechanism
Unidirectional LSTM has low learning efficiency and low prediction accuracy because it can only mine feature information from time sequence data that came into the past and cannot mine information of time sequence data that never came into the past. BiLSTM [15] can not only make use of the information of the meteorological and hydrological data in the past moment, but also learn the features of the time sequence in the future moment by mining the message of the time series data in both directions, which enhances the learning capability of the neural network prediction model and the utilization charge of the time series data, thus improving the precision of prediction. The principle of BiLSTM is shown in Fig. 1.
Fig.1Structure of BiLSTM
The attention mechanism [16] is a way for a model to focus on essential components of the input data. When working with multivariate time series, the model can deal with the most relevant time points by assigning a weight to each input time point. In this way, the model can make accurate predictions even when data are missing or abnormal at certain time points.
Adding an attention layer to the basic BiLSTM network model structure is the fundamental principle behind a BiLSTM model that incorporates an attention mechanism. This makes it possible to sample the input time series data to determine its importance, and then feed the sampled data into the BiLSTM model as input data for training, modeling and prediction. A BiLSTM model with an attention mechanism can handle not only importance-based sampling, but also the long-term dependence of sequences on historical time steps. The BiLSTM in Fig. 1 introduces the attention mechanism by defining an attention layer, where the attention layer weight is denoted by =.Through this layer of attention weights W, the BiLSTM that presents the attention mechanism samples the significance of the incoming sequential data .The sampled data is defined as ,where =), after which the importance-sampled data is inserted into the BiLSTM network to obtain the prediction results.
1.3 Particle Swarm Optimization
Some of the parameters in the BiLSTM algorithm affect the prediction accuracy of the model such as: number of neurons, learning rate, batch size, number of training iterations, regularisation parameters [17] . Optimization algorithms can be applied to optimize the above parameters, Particle Swarm Optimization (PSO) can be used to search for the optimal learning rate to speed up training and improve model performance, assist in deciding the proper variety of neurons for most desirable performance, and find the most efficient wide variety of iterations to stability education velocity and performance. PSO can help to find the optimal batch size.
PSO is an evolutionary computational approach based on the feeding behaviour of fowl flocks [18] . The problem of finding food for bird flocks in one-dimensional space is extended to multi-dimensional space to solve. Assuming that the position of a particle in D-dimensional space is denoted as ), whereN is the complete variety of particles; the flight speed of a particle in area is denoted as , the best position of an individual particle, among all the positions it passes through, is , and the best position among all the positions it passes through is .The updating formulasfor the flight speed and position of a particle can be expressed as follows:
(7)
(8)
where a denotes the inertia coefficient, which takes a non-negative value;c1 andc2 denote the learning factor; and rand is a random generating function between [0, 1].The PSO optimization search process is shown below:
Step 1: Initialize the velocity and role of the population, set the number of population sizeNand the variety of iterations M, and set the historic exceptional function of a person as the contemporary position, and the most efficient character in the populace as the contemporary .
Step 2: At each evolutionary stage, the health feature for every particle is derived.
Step 3: is up to date when the cutting-edge fitness characteristical cost is higher than the historical optimum.
Step 4: When the contemporary health characteristic price is higher than the historical optimum, is updated.
Step 5: The position and flight velocity of the particles are up to date according to the above Eqs.(7) and (8).
Step 6:Repeat Steps 2 to 5 to continue searching for the global optimum. Stop iteration after finding the global optimum location or reaching the maximum number of iterations.
2 Designed Model
2.1 Prediction Model Construction
The flow diagram of the prediction model is shown in Fig. 2. The specific prediction steps can be described as follows:
Step 1: The unique wind velocity sequence is decomposed into IMF factors with different local feature information, and the IMF factors are smoothed by CEEMDAN decomposition;
Step 2: The normalized IMF components were modeled using BiLSTM with the introduction of an attention mechanism, where the top 80% of the data was used for training and the bottom 20% for prediction;
Step 3:Parameters such as learning rate, numberof hidden neurons and batch size are optimized in BiLSTM model using PSO algorithm;
Fig.2Flowchart of wind speed prediction model
Step 4:Predictions are made for each component using the optimized BiLSTM model, and then all factors are summed and inverted to normalize to acquire the ultimate prediction.
2.2 Hyper-Parameter Optimization
The prediction accuracy of BiLSTM models is closely related to the values of hyper-parameters. At present, the values of hyper-parameters are mostly based on experience, and sometimes even through multiple trials to achieve better prediction accuracy. In this paper, particle swarm optimization BiLSTM model hyper-parameters are cited. Greff [19] pointed out that the learning rate and the number of hidden neurons are the two hyper-parameters that have the best influence on the prediction performance of LSTM. Therefore, in this paper, the learning rate, the number of hidden neurons and the batch size are used as the search parameters for particle swarm optimization. The process is shown in Fig. 3.
Fig.3PSO optimize BiLSTM-attention flowchart
3 Case Studies
In this article, short-term wind velocity data from wind turbine Scada in Turkey in 2018 are selected to constitute datasetsAandB. In datasetA, the collection period is from 00 ∶ 00 on 1 February 2018 to 23 ∶ 50 on 28 February 2018, the total number of samples is 4032, and the sampling period is 10 min; In datasetB, the collection period is from 00 ∶ 00 on 1 August 2018 to 23 ∶ 30 on 31 August 2018, with a total number of samples of 1488 and a sampling period of 30 min. The top 80% of the samples are taken as the training set and the bottom 20% of the samples are taken as the test set, respectively. The original wind speed sequences are shown in Figs. 4 and 5.
Fig.4Original wind speed series of dataset A
Fig.5Original wind speed series of dataset B
From Figs. 4 and 5, it can be observed that the amplitude of the data fluctuates a lot, so it is necessary to normalize the wind speed series shown in Eq. (9) to increase the precision of the wind speed forecasting model.
(9)
where x∗is the wind velocity value after data normalization,x is the primordial wind velocity value,Xmin is the minimum wind speed data in the original wind velocity data, and Xmax is the maximum windvelocity data in the primordial wind velocity data.
The wind velocity data are first decomposed by CEEMDAN, which adaptively adds white noise of a specific intensity at each decomposition stage to extract the IMF through multiple ensemble averaging. The residual term is updated after each decomposition as an input to the next stage until the residual becomes a monotonic trend term. Taking datasetAas an example, Fig. 6 shows the wind speed sequence of datasetA after CEEMDAN decomposition. The primordial wind speed sequence is collapsed into nine IMF components (IMF0-IMF8) and one residual term (IMF9).As can be seen from Fig. 6, the image tends to be more smooth curves as the number of decompositions increases. Components IMF1-IMF5 have higher frequency and high volatility, and components IMF6-IMF9 have overall lower frequency and smooth. Therefore CEEMDAN decomposition improves the decomposition efficiency and reduces errors in subsequent wind speed prediction.
Fig.6CEEMDAN decomposition (dataset A)
Afterwards, the corresponding BiLSTM-attention forecasting model is built for each IMF component and the BiLSTM parameters are optimized using PSO. The particle swarm optimization algorithm is calculated according to the above parameter settings and constraints, and the specific parameters are shown in Table1. The optimal hyper-parameters of each subsequence obtained after optimization are shown in Tables 2 and 3. Finally, the predicted values for each wind speed component are totaled to give the final prediction.
For a better comparison of predictive effects, this paper adopts root mean square error (RMSE), mean absolute error ( MAE), mean absolute percentage error (MAPE) and coefficient of determination (R2)as the evaluation indexes, which are calculated by the following formulas:
Table1PSO parameter setting
Table2Optimal hyper-parameters of each subsequence model of dataset A
Table3Optimal hyper-parameters of each subsequence model of dataset B
RMSE(root mean square error):
(10)
MAE(mean absolute error):
(11)
MAPE(mean absolute percentage error):
(12)
R2(coefficient of determination):
(13)
where is true value, is the predicted value, is the mean value, andk is the sample size used for training or testing.
3.1 Comparison to Combination Models
BiLSTM model, CEEMDAN-BiLSTM model, CEEMDAN-BiLSTM-attention model, and CEEMDAN-PSO-BiLSTM-attention model are respectively built to compare with the original wind speed to verify their effectiveness and accuracy, as shown in Figs.7-8. Among them, the BiLSTM hidden neurons are set to 32 and the batch size is 64. CEEMDAN divides the raw wind speed sequence into nine IMF components and a residual term.
Comparing the actual and predicted value curves of models in Fig. 7 and Fig. 8, it can be observed thatthe expected values of each model are nearer to the actual values, but the estimated profile of the CEEMDAN-PSO-BiLSTM-attention model has the largest overlap with the actual value curve.
Fig.7Comparison of predicted and actual wind speeds of dataset A
Fig.8Comparison of predicted and actual wind speeds of dataset B
Tables 4 and 5 show the comparison of the performance metrics such as RMSE, MAE, MAPE, and R2of several prediction methods in dataset A and dataset B respectively. As shown in Table4, the CEEMDAN-BiLSTM-Attention model reduces the RMSE, MAE, and MAPE error metrics by 24. 2%, 27.9%, and 20.8%, respectively, with respect to the CEEMDAN-BiLSTM model, and the R2improves by 0.62%. The CEEMDAN-PSO-BiLSTM-attention model also reduces the RMSE, MAE, and MAPE error metrics by 15.5%, 18.0%, and 24.1%, respectively, relative to the CEEMDAN-BiLSTM-attention model.R2improves by 1.41%.
Table5 similarly demonstrates that the CEEMDAN-PSO-BiLSTM-attention model outperforms other models. In conclusion, the prediction approach in this paper is superior to other prediction methods in terms of performance metrics of error.
Table4Comparison of performance metrics of dataset A
Table5Comparison of performance metrics of dataset B
3.2 Comparison to Classical Models
To equally illustrate the excellence of the model in this study, the classical model is choosen for comparison experiments. The number of 1D convolutions of CNN is set to be64, the size of convolution kernel is 3 × 3, and the number of GRU implied neurons is 32. EMD decomposes the primordial wind speed data into 8 IMF components and a residual term, and the performance indexes of the several prediction approaches for dataset A and datasetB are given in Tables 6 and 7 for the comparison,espectively. As shown in Table6, the RMSE, MAE, and MAPE error metrics of this paper’s model are reduced by 44.1%, 44.9%, and 45.4%, respectively, in contrast with those of the CNN-GRU model, and the R2improves by 2. 6%; and the RMSE, MAE, MAPE error metrics of this paper’s model are reduced by 40. 7%, 43. 1%, and 52. 5%, respectively, in contrast with those of the EMD-LSTM model, and the R2increases by 2.3%.Table7 additionally proves that the overall performance index of this model is higher than different models.
Table6Comparison of classical model performance metrics of dataset A
Table7Comparison of classical model performance metrics of dataset B
3.3 Multi-Step Prediction
Multi-step forecasting is an positive approach to check the precision of forecast models [20] . In many forecasting applications, it is necessary to predict the trend in the coming period, for example, multi-step forecasting of wind speed is of great value in wind farm applications. For example, multi-step wind speed prediction with large step length can provide more abundant time for grid adjustment in wind farms. Therefore, the CEEMDAN-BiLSTM-attention model and the model in this paper are selected to perform three-step, five-step and ten-step forecasting experiments, respectively. The multi-step forecasting results of dataset A and dataset B are shown in Figs. 9 and 10, respectively.
Fig.9Multi-step prediction results of dataset A
The experimental consequences are shown in Tables 8 - 9. The effects exhibit that the forecasting accuracy of multi-step forecasting is lower than that of single-step prediction. The fundamental motive is that the accuracy of multi-step forecasting is affected by the step length, and the forecasting of each step depends on the forecasting result of the previous step, even if the prediction of the first few steps is relatively accurate, the error may gradually accumulate with the passage of time, resulting in a larger overall prediction error. Therefore, the prediction results of three, five and ten steps gradually increase. The model proposed in this article has higher performance metrics than other models.
4 Conclusions
In this article, the CEEMDAN decomposition approach is adopted to process the fluctuating wind speed data, which is decomposed into smooth and regular signals to lessen the wind speed sequence’s unpredictability and complexity. The decomposed subsequence is then modeled and predicted using BiLSTM with the introduction of an attention mechanism, which not only captures the long-term dependence on the historical time step in the sequence, but also handles importance-based sampling. In addition, the PSO algorithm is adopted to find the optimal key parameters of BiLSTM, which decreases errors in prediction model and improves the prediction precision. The main observations drawn are as follows:
1) By contrast to the ablation experiments inSection 3.1, the CEEMDAN-PSO-BiLSTM-attention model successfully reduces RMSE by 15% to 46%, MAE by 18% to 47% and MAPE by 24% to 55%, and improves R2by 1.4% to 7.6%.
Fig.10Multi-step prediction results of dataset B
Table8Comparison of different predicted steps performance metrics of dataset A
Table9Comparison of different predicted steps performance metrics of dataset B
2) Compared to the traditional CNN-GRU and EMD-LSTM models, the CEEMDAN-PSO-BiLSTM-attention model reduces the prediction error and exhibits better performance enhancement in short-term wind velocity prediction. Specifically, the combined model successfully reduces the RMSE by 38% to 47%, MAE by 39% to 46% and MAPE by 42% to 53%, and improves the R2by 2.3% to 7.6%.
However, there are some problems with the current BiLSTM-attention model, such as the computation of the attention weights may be affected by the length of the input sequences, leading to overfitting or underfitting. These issues can be further explored in future research.