Journal of Harbin Institute of Technology (New Series)  2018, Vol. 25 Issue (2): 59-72  DOI: 10.11916/j.issn.1005-9113.16124
0

Citation 

Kun Yang, Cuanyan Feng, Jie Bai. Comprehensive Assessment of Pilot Mental Workload in Various Levels[J]. Journal of Harbin Institute of Technology (New Series), 2018, 25(2): 59-72. DOI: 10.11916/j.issn.1005-9113.16124.

Fund

Sponsored by the Tianjin Key Laboratory of Civil Aircraft Airworthiness and Maintenance in CAUC(Grant No.MJ-J-2012-07).

Corresponding author

Cuanyan Feng, E-mail:fengchuifengfly@qq.com

Article history

Received: 2016-06-02
Comprehensive Assessment of Pilot Mental Workload in Various Levels
Kun Yang1, Cuanyan Feng1,2, Jie Bai1     
1. Key Laboratory of Civil Aircraft Airworthiness and Maintenance, Civil Aviation University of China, Tianjin 300300, China;
2. School of Aeronautic Science and Engineering, Beihang University, Beijing 100191, China
Abstract: Physiological measures indexed by fixation frequency and total fixation time, task performance based on n-back accuracy, subjective assessment based on NASA Task Load Index (NASA-TLX) were used to measure the Mental Workload (MW) in different levels which were induced by vision-related flight task combined with auditory cognitive load.16 healthy novice pilots were recruited to complete a monitoring task based on Head-Up Display (HUD) and an auditory n-back task which was used to manipulate the Mental Workload Level (MWL) in flight simulation environment.In our experiment, fixation frequency, average saccade time, blink rate and average pupil diameter were sensitive to MW.What's more, a comprehensive assessment method of pilot mental workload based on various measures was advocated.At last, a Fisher projection function based on the Fisher discrimination method and a three-level discriminate model established by the Bayes discrimination method were built up, the original validation and cross-validation methods of both models were 97.92% and 95.83% respectively, which could discriminate various Mental Workload Levels (MWLs) ideally.
Key words: mental workload     fixation frequency     bayes discrimination method    
1 Introduction

The Airworthiness standards for transport category airplanes FAR 25.771(a)[1] issued by Federal Aviation Administration stated that "each pilot compartment and its equipment must allow the minimum flight crew to perform their duties without unreasonable concentration or fatigue".Also, a study[2] showed that "consideration of human factors issues early on and throughout the design process will help to ensure that the displays and controls will support all flight crew functions, tasks, and decisions", and also persisted that additional mental fatigue or overload was opposite to the regulations.Excessive attention concentration will reduce the situation awareness of pilots that might be a potential threaten to the safe operation of the aircraft.The cockpit display interface is the connection between the aircraft instruments and pilot, the "T" layout regulations of flight instruments in FAR25 laid the foundation for the cockpit display.Now, CACC handed out the "Head-Up Display application roadmap of Civil Aviation Administration of China (CAAC)", which indicated that all aircraft of domestic airlines should be equipped with HUD before 2025.So it is essential to carry out relevant research of human factors in the HUD instrument layout of pilot compartment.The workload of pilot should be considered so that it is sufficient for safe operation in the establishing of the minimum flight crew in FAR25.1523.Pilot's workload contains physical and mental ones.As the promotion of automation and intelligent level in cockpit instruments, the pilot's MW related tasks would be strengthened.Therefore, it is imminent to carry out the related research of pilot's MW.

Mental Workload can be defined as the correlation between the requirements to meet a task and the ability to achieve these requirements.A high MW always accompanied with the increasing of cognitive process, arousal situation and resource demand.A lower arousing is likely to forebode the state of mental overload, which will lead the operator to run into giddy, distraction, memory deterioration or other problems.Once a pilot runs into mental overload, he will make more mistakes or even lead to accidents.In practice, excessively high or low MW can bring about operators to neglect critical information, and the discrimination of high MWL is a key factor to guarantee the security of the operators[3]. The identification of abnormal variations under different MWLs is of great importance to prevent the excessive high cognitive load of pilot or attention deficiency, which could be important to ensure flight safety.

The subjective evaluation, sub-task assessment and physiological measurement are widely used to evaluate pilot's MW.Our research colligates each of them to evaluate different MWLs which are presented by a monitoring task combined with an auditory n-back task.The subjective evaluation begins after the subject accomplishes a task.Cooper-Harper Rating scale, Subjective Workload Assessment Technique (SWAT) scale and NASA-TLX scale are considered as the common subjective evaluation.Cooper scale is divided into ten grades, which is the main subjective evaluation method in the evaluation of workload among military pilots in 1960s.Time load, mental effort load and psychological stress load are included in SWAT Scale.Compared with Cooper-Harper Rating scale, SWAT scale has more details and appeals to be more accurate in the discrimination of workload.After that, NASA puts forward the NASA-TLX scale, which is simplified on the program compared with SWAT[4].NASA-TLX scale contains six dimensions:mental demand, physical demand, temporal demand, effort, performance and frustration[5].The MWLs are reflected in the total scores of NASA-TLX scale.The application of NASA-TLX scale extended in relevant research of human factors, for example, research of typical operation in flight crew[6], correlation between team and individual workload[7], emergency operating procedures in Nuclear power plant[8] and so on.NASA-TLX scale is sensitive to the pilot MW changes[9].What's more, considering the low cost, easy management and other advantages in subjective evaluation method, we applied it in our study.But taking the emotion of the subjects into account, performance assessment and physiological measurement were also combined.

Performance assessment can be divided into primary task assessment and sub-task assessment method.Respond time and accuracy of primary task assessment or sub-task assessment are often effective evaluation indexes[10].To overcome the disadvantages of primary task assessment, researchers prefer to introduce a sub-task assessment.What's more, sub-task assessment also has some limitations.For example, it will disturb the primary task to some extent.Now, researchers are focusing on the design of sub-task.The sub-task can be designed as memory searching, mental calculation and voice interaction etc.Compared with the single resource theory[11], the multiple resource theory[12] is accepted by most researchers.Wanyan[9] manipulated the pilot MW by monitoring the number of flight indicators (0-9) and the frequency of abnormal information (1-2 s) in a simulated flight task.The sub-task can be designed as auditory tasks, visual tasks or a combination of them.Mental calculation task (silently count down three-digit random number between 800 and 999) and auditory task (replicating information in a complicated voice environment while driving) were set up in a car driver research[13].Also, n-back task (single-digit random number between 0 and 9) can be controlled and classified, and it can be designed under different channels (visual or auditory), so many researchers use it to evaluate operator's mental fatigue [14-16].Actually the auditory interaction task was quite frequent during flight, especially the interaction between pilot and air traffic controller when approached.Considering n-back task have many advantages, so we applied it to the pilot MW research in flight simulation environment.

Compared with the two methods above, the physiological measurement method (Brain, Eye, ECG indexes) appeals to be more objective.Eye tracking technology has less disturbance and more maturity compared with brain and ECG indexes, so researchers prefer eye indexes in their studies.Eye indexes include EOG data collected by physiological record system and eye signals gathered by eye tracker.In the flight communication task[17], human-computer interaction task[18], researchers find the rapid pupil dilation is connected with participant's cognitive load[16], and the dilatational frequency increased with the augment of the task difficulty levels.Meanwhile, with the visual tasks performed, blink rate appeared an increasing trend as the cognitive load increased.Pupil diameter could detect the activated state of the subject in a web page interaction task, which meant pupil diameter might be an index of awaken[18].Iqbal, Zheng and Bailey[19] found pupil diameter was sensitive to the complexity of the task when the motion sub-tasks were filtered out, the pupil diameter could be applied to estimate the MW in the simulated flight[9].Instantaneous blink is correlated with the ending of cognitive process, and it works without a visual processing [20-21].Task related eye indexes are widely used in information processing, perception, memory, reasoning and thinking[22].Considering the advantages of eye indexes, we made a detailed analysis in our pilot MW research.

The basic idea of Fisher discrimination method is to overcome the "Curse of dimensionality" due to high dimensions, and it requires to project the high dimensional data into low dimensional space (e.g., one dimensional linear), which can make data more intensive[23].The basic idea of Bayes discrimination method is to minimize the average loss of misjudgment, not only considering the differences between opportunities of the total and various losses caused by misjudgment, but also respecting the distribution of each individual sufficiently.Our research was based on the above two kind of thoughts, to establish the Fisher projection function and the Bayes discrimination model for the identification of various MWLs.The Bayes model in this paper was based on "the maximum a posteriori criterion" discrimination criterion (using the prior probability of analytic targets to calculate the posterior probability).

A statistical analysis was been done with eye indexes which was made up of three parts:fixation, saccade and blink.The Subjective assessment (NASA-TLX scores), Performance evaluation (n-back accuracy) and Physiological measurement (Fixation frequency and total fixation time) were integrated to establish the discrimination model.

2 Materials and Methods 2.1 Apparatus

The flight environment was presented in a flight simulation platform in our laboratory while the fixation, saccade and blink data were recorded by a Tobii TX300 eye tracker with a 300 Hz sample rate.Before the experiment, the subjects were provided a practice time (about 30 min) to be acquainted with the experimental procedures.In the practicing time, all subjects must complete the flight training task and the n-back task (a certain accuracy was demanded) in our flight simulation system.Then, subjects were asked to start an eye movement calibration to make a preparation for the actual experiment.The display interface and experimental scene is shown in Fig. 1.

Figure 1 HUD display interface and experimental scene

2.2 Participants

Sixteen male flying cadets (ranging from 20 to 25 years old) from Civil Aviation University of China were recruited to participate in our study.All participants were right-handed, with normal or corrected vision, normal hearing and were required to refrain from drinking caffeinated or alcoholic beverages, smoking, taking any medication and strenuous exercise for 12 h before the experiment.All subjects had a good knowledge of flight operations (both were 1 or 2 years flight simulation experience) and passed the training session without any trouble.All subjects were acquainted with task procedures.After the experiment, all subjects would receive some compensation.

2.3 Experimental Design

All subjects manipulated the flying rocker to accomplish a dynamic process of flight simulation task which consisted of take-off, climb, cruise, approach and landing.During the cruise phase, the subjects were requested to complete a monitoring primary task, to keep an eye on the major flight parameters displayed on HUD:Heading, Air speed, Altitude, Flying attitude etc.Meanwhile, subjects should respond to the n-back tasks broadcasted by the loudspeaker.Each n-back task time lasts 2 min and the interval is 30 s, and the whole experiments lasts about 20 min.The monitoring task and the n-back tasks can provide the subjects with the MW in three levels:low MWL, medium MWL and high MWL (the MWLs manipulated by auditory n-back tasks).

The main purpose of the introduction of auditory n-back task to a flight simulation experiment is to simulate the complicated flight environment during pilot's flight task, for example the interaction between pilot and ATC (air traffic controller), or the communication of the PNF (pilot not flying). Also, the sub-task is conforming to the multiple resource theory, which is said the rest of mental resource can be measured by the performance of sub-task.And here we divide various MWLs through the n-back task, to have an understanding of variations in different measures and explore the discriminate methods for the prediction of MWL.The specific design is shown in Table 1.

Table 1 Design of flight simulation task

The delayed digit recall task (n-back) was adopted in our research.All subjects were required to respond to each of the randomly auditory stimuli (single digit in 0-9) by immediately repeat out loud verbally in certain rules[15].The 2-back, 1-back, 0-back, which need repeat out loud the number second to the last number presented, the number next to the last number presented, the last number presented.The presentation time and the interval time are 1 s and 1.5 s.The various n-back tasks appeared in the same flight simulation.The workflow of comprehensive assessment was showed as below in Fig. 2, it clearly showed the analyzing and modeling process.

Figure 2 Comprehensive assessment of MW evaluation

2.4 Data Recording and Analysis

Subjects were asked to complete NASA-TLX scale within 10 min after they completed the whole tasks in the experiment.The NASA-TLX scales were analyzed by means of dividing weights of each dimension.n-back accuracy was used to carry out the performance measurement.Fixation, saccade and blink data were analyzed as physiological measurement.Repetitive measure analysis of variance (ANOVA) and Post-hoc comparison were employed for the analysis of the above data.The Mauchly's Test of Sphericity was used to test the sphericity of the data.If sphericity cannot be assumed we would use the Lower-bound Test (which is the most conservative way) to correct for the lack of sphericity statistically.

In order to identify the main effects of MW, the test hypothesis is made.At α=0.05 significance level, H0:μ1=μ2=…=μm=μ; while H1: not all μ1, μ2, …, μm equal.

$ F = \frac{{{S_A}/\left( {m-1} \right)}}{{{S_E}/\left( {l-m} \right)}} \sim {F_{1-\alpha }}\left( {m - 1, l - m} \right) $

where SA is Sum of Squares of within-groups Deviations; SE represents Sum of Squares of between-groups Deviations; m is the number of MWL; n is the capacity of subjects; l=m×n; while F>F1-α(m-1, l-m), the H0 was refused, which means the main effects of MW is significant, another representation is significance value P < 0.05.

3 Experimental Results 3.1 Task Performance Analysis

n-back accuracy was accepted as performance index to measure the remaining resources in mental.According to n-back accuracy of the tasks' difficulties under the low, medium and high MWL were 100.00±0.00 percentages, 97.66±2.81 percentages and 87.57± 10.98 percentages respectively.At α=0.05 significance level, single-factor repeated measure ANOVA showed there were significant (F(1, 15)= 18.908, P= 0.001) main effects of MW for n-back accuracy.The result of a further paired comparison indicated that the n-back accuracy value under high MWL was obviously lower than that under medium MWL (P= 0.001) and low MWL (P < 0.001), the n-back accuracy under medium MWL was obviously lower than low MWL (P= 0.004).

3.2 Subjective Evaluation

As MW changed from low, medium, to high, the NASA-TLX scores increased accordingly (low: M= 15.13, SD= 4.62;medium: M= 37.64, SD= 9.98;high: M= 69.79, SD= 12.35).The one-way repeated measures ANOVA showed there were significant (F(2, 30)=153.08, P < 0.001) main effects of MW.Result of paired comparison showed that the NASA-TLX scores under high MWL was obviously higher than that medium MWL (P < 0.001) and low MWL (P < 0.001).The NASA-TLX scores under medium MWL was obviously higher than low MWL (P < 0.001).Result of n-back accuracy and NASA-TLX scores are shown in Table 2.

Table 2 Mean and standard deviation of n-back accuracy and NASA-TLX scores

3.3 Physiological Measurement

As for 80%-90% information gained by our eyes, there are three basic movements:fixation, saccade and blink.The concentrated tendency of fixation represents the concerning level of subjects, large groups of messages acquired management during fixation.We can obtain the spatiotemporal information of stimulation in the action of saccade, and no clear scene will form in eyeball.But saccade can rapidly search and select the stimulations, and make sure the information what we interested in dropping into the fovea, for further information processing.Blink is one of saccades, is an unconsciousness eye movement, it will exist under no stimulation.Blink do not participate in visual searching, we can get nothing through blink.We accomplished the cognitive process by the alternative appearance of the three basic eye movements.There is also many researches telling us that the eye indexes are sensitive to MW, but no agreed conclusion exists.A recent study shows the fixation time and pupil diameter has a significant correlation to MW theoretical prediction value[24].Considering the inherent contracts between them, we make a detailed analysis under our cognitive MW environment to select the sensitive physiological indexes and to study the relation between the three basic eye movement indexes.And the significance analysis is as below.

We defined the time stayed at a point more than 100 ms as a fixation.For fixation indexes, fixation frequency showed a decline trend with the increase of MWL (Table 3).The single-factor repeated measure ANOVA showed there were significant (F(1, 15)=12.967, P= 0.003) main effects of MW.Post-hoc comparison showed that the fixation frequency under high MWL was obviously lower than that under low MWL (P= 0.007) and baseline MWL (P= 0.001);the fixation frequency under medium MWL was obviously lower than under low MWL (P= 0.005) and baseline MWL (P= 0.001);the fixation frequency under low MWL was obviously lower than under baseline MWL (P= 0.049);the fixation frequency under high MWL was lower than medium MWL, but there were no significant differences observed (P= 0.248).The remaining mental resource decreased with the auditory cognitive n-back task continues, and all subjects need to obtain more information in the same time, sustained a high visual workload.It manifested a significant increase in fixation frequency, also meant a high efficiency in information processing[25].

Table 3 Mean and standard deviation of fixation indexes

The average time of a single fixation is defined as average fixation time.With the increase of MWL, the average fixation time increased firstly and then decreased (M=0.37, SD=0.12;M=0.42, SD=0.17;M=0.46, SD=0.19;M=0.38, SD=0.11). The main effect of MW on average fixation time was not significant (F(1, 15)=3.219, P= 0.093). The average fixation time under medium MWL was obviously higher than that under high and baseline MWLs (P=0.019, P=0.033), but no significant difference was found among those under the other MWLs (P>0.1).

Total fixation time is defined as the whole fixation time during the task time.The total fixation time showed a decline trend with the increase of MWL (Table 3).The main effect of MW on the total fixation time was significant (F(1, 15)=12.496, P= 0.003). The total fixation time under high MWL was significantly lower than that under medium, low and baseline MWLs (P < 0.001, P < 0.001, P= 0.002), no significant differences observed under other MWLs (P>0.1).

The fixation point moving from one area to another was defined as saccade.Saccade frequency is defined as the number of eye movement per second; total saccade time is defined as the whole saccade time during task time.With the increase of MWL, the saccade frequency decreased firstly and then increased (Table 4).The main effect of MW on saccade frequency was not significant.As to total saccade time, the main effects of MW was significant (F(3, 45)= 4.463, P= 0.008). Post-hoc comparison indicated that the total saccade time under baseline MWL was higher than low, medium and high MWLs (P= 0.029, P= 0.011, P= 0.026), no significant differences observed under other MWLs (P > 0.1).Average saccade time was the average time of single saccade activity.The average saccade time showed a decline trend with the increase of MWL (Table 4).The single-factor repeated measure ANOVA showed there were significant (F(3, 45)=9.765, P < 0.001) main effects of MW.Post-hoc comparison indicated that the average saccade time under baseline MWL was higher than low, medium, and high MWLs (P= 0.035, P= 0.002, P < 0.001) and the average saccade time under low MWL was higher than medium and high MWLs (P= 0.045, P= 0.019), no significant differences between medium and high MWLs (P > 0.1). The peak saccade velocity decreased as the cognitive workload increased during a simulate ATC task[26], the eyeball scanning velocity can predict the performance of mental alertness task[27].During a visual task, the average peak saccade velocity increased with the increase of workload, which showed an increase in workload information during a rapid saccade[28].

Table 4 Mean and standard deviation of saccade indexes

For blink indexes, the blink rate, average blink time and average pupil diameter were analyzed.The blink rate showed a rise trend with the increase of MWL (Table 5).The single-factor repeated measure ANOVA showed significant main effects of MW (F(1, 15)=10.975, P= 0.005). Result of paired comparison showed that the blink rate under high MWL was obviously higher than medium, low and baseline MWLs (P= 0.031, P= 0.003, P= 0.002), and the blink rate under medium MWL was obviously higher than low MWL and baseline MWLs (P= 0.001, P= 0.006), no significant difference observed between medium and low MWLs (P > 0.1). With the increase of MWL, the average blink time increased firstly and then decreased (Table 5), no significant main effects was found (P > 0.1). There appeared a decreasing tendency in blink rate, while pilots conducting tasks in a digit communication system compared with a language communication system, indicated a high processing demand[29].

Table 5 Mean and standard deviation of blink indexes

The average pupil diameter showed a rising trend with the increase of MWL (Table 5).The single-factor repeated measure ANOVA showed significant (F(1, 15)=30.624, P < 0.001) main effects of MW.Result of paired comparison showed that the average pupil diameter under high MWL was obviously significantly greater than that under medium, low and baseline MWLs (P < 0.001), and the average pupil diameter under medium MWL was significantly greater than under low and baseline MWLs (P < 0.001), and the average pupil diameter under low MWL was greater than that under baseline MWL (P= 0.028).It is most recognized that an increase of average pupil diameter leads to a high intensive workload state[9].

According to the results of significant analysis, NASA-TLX sores, n-back accuracy, fixation frequency, average saccade time, blink rate and average pupil diameter were significantly sensitive to MW changes.Total fixation time, average fixation time and total saccade time were sensitive to MW changes.Saccade frequency and average blink time were not sensitive to MW changes.The sensitive indexes can be used for the discriminant model.And, in order to pick up the best indexes for modeling and to evaluate the discriminate methods, the validity analysis and correlation analysis were conducted below.

3.4 Validity of MW Evaluation Methods 3.4.1 Predictive validity of all indexes

As shown in Table 6, as expected, the NASA-TLX sores were significantly correlated with MWL (r=0.92, P < 0.001). Here we made a hypothesis that we quantify MWL as linear number 0, 1, 2 and 3.And then a linear regression analysis was carried out with NASA-TLX scores as the independent variable and MW as the dependent variable.The results showed that NASA-TLX scale can predict 84.3% of the MWLs with a good model fit (F(1, 46)=252.721, P < 0.001). Performance index was negative correlated with MWL, further regression analysis indicated the N-back accuracy can only predict 35.8% of the MWL with a good model fit (F(1, 46)=27.257, P < 0.001).

Table 6 Correlation between MW and n-back accuracy, NASA-TLX scores, Physiological indexes

As to physiological indexes, at α=0.05 significance level, the total saccade time was negative correlated with MWL (r=-0.251, P= 0.046). At α=0.01 significance level, fixation frequency, total fixation time and average saccade time were negative correlated with MWL; Blink rate and average pupil diameter were positive correlated with MWL.Further regression analysis indicated fixation frequency, total fixation time, average saccade time, blink rate and average pupil diameter can predict 22.4%, 16.0%, 15.7%, 16.0%, 22.1% of the MWL respectively with a good model fit (F(1, 62)=19.223, P < 0.001; F(1, 62)=13.029, P= 0.001;F(1, 62)=12.764, P= 0.001;F(1, 62)=12.962, P= 0.001;F(1, 62)=18.826, P < 0.001), there was a weak connection between other indexes and MWL.

3.4.2 Correlation analysis between evaluation measures

As was analyzed above, NASA-TLX had a good model fit with MWL, and was widely accepted to evaluate MW[8].We made a correlation analysis between NASA-TLX and n-back accuracy, physiological indexes.At α=0.01 significance level, NASA-TLX and n-back accuracy was significantly correlated (r= 0.-0.570, P < 0.001). At α=0.05 significance level, fixation frequency, total fixation time, blink rate were significantly correlated with NASA-TLX (r= 0.-0.291, P= 0.045;r= 0.0.321, P= 0.026;r= 0.0.350, P= 0.015). At α=0.01 significance level, average pupil diameter was significantly correlated with NASA-TLX (r= 0.0.433, P= 0.002). A linear regression analysis was carried out with average pupil diameter, fixation frequency, total fixation time as the independent variable and NASA-TLX as the dependent variable, only 35.4% of the NASA-TLX was predicted with a good model fit (F(1, 62)=9.578, P < 0.001).

We made a correlation analysis between physiological indexes and n-back accuracy, and the results were showed in Table 7.All the physiological indexes had no significant correlation with n-back accuracy.

Table 7 Correlation analysis of physiological indexes and n-back accuracy, NASA-TLX scores

3.4.3 Correlations analysis among eye indexes

The correlation analysis among the eye indexes showed that the fixation indexes had an obvious correlation with the saccade indexes.At α=0.05 significance level, total fixation time was negative correlated with total saccade time; at α=0.01 significance level, total saccade time had a high positive correlation with saccade frequency (r= 0.0.865, P < 0.001). Average fixation time was negatively correlated with saccade frequency, total saccade time and blink rate, positive correlated with total fixation time, while average fixation time had a high negative correlation with saccade frequency and total saccade time (r= 0.-0.802, P < 0.001;r= 0.-0.814, P < 0.001). We made a linear fitting between average fixation time and total saccade time (Fig. 3), the total saccade time decreased with the increase of average fixation time (R2= 0.663, linear fitting equation: y=-15.949x+16.084). The correlation analysis was shown in Table 8.

Figure 3 Curve fitting result between average fixation time and total saccade time

Table 8 Correlations among eye indexes

4 Modeling 4.1 Fisher Projection Function

In order to overcame the "Curse of dimensionality" induced by the high dimensions, the high dimensional data are projected to the low dimensional space (such as on the one dimensional linear), to make the data more concentrated.That is the basic idea of Fisher discrimination method.The Fisher discrimination method is one of the common methods of state recognition in multivariate statistical analysis.

The main theory is based on variance analysis introduced by Fisher, attempt to find a linear function formed by original independent variable, to maximize the ratio of Sum of Squares of within-groups Deviations (SA) and Sum of Squares of between-groups Deviations (SE).

Formula is expressed as:

$ {S_A} = \sum\limits_{i = 1}^r {\sum\limits_{j = 1}^{{n_i}} {{{(\overline {{X_i}}-\overline X )}^2}} } = \sum\limits_{i = 1}^r {{n_i}} {(\overline {{X_i}}-\overline X )^2} $
$ {S_E} = \sum\limits_{i = 1}^r {\sum\limits_{j = 1}^{{n_i}} {{{({X_{ij}}-\overline {{X_i}} )}^2}} } = \sum\limits_{i = 1}^r {{n_i}{S_i}^2} $

where ni stand for sample size, r stand for sample classified number.

Discrimination function: U=bXp×1; p stands for number of independent variable, b is the weighted value matrix we calculated, it must ensure max(b)=bSAb/bSEb the greatest value.Also the discrimination function is to calculate the projection flat, to maximize SA.

Based on Fisher discrimination method, we built up the projection function to discriminate various MWLs based on Fisher discrimination method.As was described above, fixation frequency, total fixation time, NASA-TLX scores and n-back accuracy were both sensitive to MW, and there were also correlations among them.Therefore, we tried the other indexes to assess the MWLs, only the combination of the four indexes above performed the best.The Fisher projection functions were as follows:

$ {y_1} = 0.354{x_1} + 0.022{x_2}-0.099{x_3} + 6.165{x_4}-4.426 $ (1)
$ {y_2} =-1.464{x_1} + 0.016{x_2} + 0.022{x_3} + 11.373{x_4}-10.362 $ (2)

where x1, x2, x3 and x4 represent the fixation frequency, total fixation time, NASA-TLX scores and n-back accuracy respectively. y1, y2 represent the Fisher discrimination function 1 and function 2 which were not standardized.Each of the sample value was substituted into projection function respectively to calculate the distances to the group centroids:group 1 (3.151, -0.236), group 2 (0.513, 0.385) and group 3 (-3.663, -0.149).The type of MWL can be discriminated by the group which had the shortest distance.The classification scatter was showed in Fig. 4.

Figure 4 Classification results

The most significant difference among them was the high MWL and low MWL.The medium MWL with high MWL can be clearly separated, also with low MWL but with overlapped individually.0, 1, 2 meant low, medium and high MWL respectively.The distribution of low, medium and high MW samples revealed respectively, y1 and y2 were the non-standardized Fisher discrimination function 1 and function 2.

For example, x1=2.44, x2=88.97, x3=42.67, x4=0.95 were substituted into the projection function, then y1=0.027, y2=-2.645. The values of the distance were 3.945, 3.069 and 4.455, so the MWL was discriminated as the medium level which was the same as the original level.The error rate of original validate was 2.08%, while the cross-validate error rate was 4.17%, which was the same as the result of Bayes discrimination method.

The Wilks' λ statistic test was used to check the validity of the Fisher discrimination function 1 and function 2, while the P < 0.001 at the 0.05 significance level indicated the discrimination was effective.The significant result of Fisher discrimination are fuction 1 (P < 0.001) and fuction 2 (P= 0.340).In order to validate of Fisher function 2, the original data of the subjects were substituted into the single Fisher function 1, and the results were coincident with the model built up by both of them.

4.2 Bayes Discrimination Model

Bayes Discriminant Analysis was the discrimination method based on the conditional probability thought.Assumed researchers had a certain understanding of the overall object (probability of prior condition), the training examples was used to amend the probability distribution of prior condition to get the posterior probability.Applied the discrimination criterion to classify the new samples, the new examples was classified as the totality which had the maximum posterior probability.The linear discrimination function can extremely narrow the differences between each sample in the same group, also can maximize the differences between each sample in different categories, and can obtain a high discriminate accuracy[23,30].The same four indexes were also used to build up the Bayes model.

Three Bayes linear discrimination equations based on Fixation frequency (x1), Total fixation time (x2), NASA-TLX scores (x3) and n-back accuracy (x4) were as follows:

$ {z_0} = 6.922{x_1} + 0.459{x_2}-0.110{x_3} + 259.408{x_4}-159.632 $ (3)
$ {z_1} = 5.080{x_1} + 0.410{x_2} + 0.166{x_3} + 250.199{x_4}-149.598 $ (4)
$ {z_2} = 4.381{x_1} + 0.309{x_2} + 0.569{x_3} + 218.389{x_4}-132.105 $ (5)

The values of x1, x2, x3, x4 were substituted into the discrimination equation, to calculate the corresponding z0, z1, z2 values which represented for low MWL, medium MWL and high MWL.The maximum of z0, z1, z2 was chosen as the discriminated group.

4.3 Validity Check of the Model

As shown in Table 9, the average discrimination and prediction accuracy of original check method and cross check method were 97.92% and 95.83% respectively.Specifically, the discrimination accuracies between low MWL and other MWLs were both 100%;the discrimination accuracies between medium MWL and other MWLs were 93.75% and 87.50% respectively; the discrimination accuracies between high MWL and other MWLs were both 100%.Only one of the medium MW group was misclassified as low MW group in the original validation; two samples of medium MW group was misclassified in the cross validation, one was misclassified as low MW group and the other one was classified as high MW group.And we can infer that the Bayes discriminate results were the same as the single Fisher function 1.

Table 9 Prediction results of MW in different levels

5 Discussions 5.1 Discussions of Three Types of Measurement

In the present study, subjective evaluation was significantly sensitive to MW.High MWL induced by vision-related flight tasks combined with auditory cognitive load was in accordance with higher NASA-TLX scores.Performance evaluation was also sensitive to different MWLs, which was consistent with former study:with the increase of task difficulty, the accuracy of sub-tasks was declined[16].Task performance accuracy and operator's self-assessment could assess the whole level of MW, but not the changes of MWL.For the visual task in flight, eye movements were more diagnosis[31].In our study, when compared the high MW with low MW, fixation frequency and the total fixation time significantly decreased in high MWL, which indicated the pilots made more efforts as the cognitive mental workload increased.In this condition, pilots could not monitor the main flight parameters displayed in the HUD, and this would be a potential safety hazard in flight.Total saccade time under baseline MWL was significantly higher than under high MWL, and this showed that our operators could not saccade the flight parameter in the primary task when the auditory sub-task became more difficult.So there must be effective saccade strategy based on HUD layout in pilot training to deal with the emergency operations or mental overload induced by high cognitive load in the fight task.As expected, blink rate and average pupil diameter were closely related with task difficulty, a higher blink rate in the high MWL showed more cognitive efforts compared with the other MWLs, it was also in accordance with a previous study in the real flight[32].A sudden blink was related to the ending of cognitive processes.Our operators would sacrifice more effort with the increase of MW, and that would result in the inability for operator to acquire the information of heading, air speed, altitude and flying attitude.A research[33] pointed out that the blink rate was determined by the overall visual demand.In the final VFR approach and landing phase, the blink rate of the pilots would increase significantly.With the increase of cognitive resources, pupil diameter increased significantly, and it was consistent with the study in the force-choice task[34] and communication load mission in flight task.

5.2 Comparison of Single Assessment Index and Multi-index Assessment

The comprehensive evaluation model could predict different MWLs accurately, while using single index did not perform well compared to comprehensive evaluation.As shown in Table 10, the original verification indicated that the synthetic evaluation model based on fixation frequency (45.83%), total fixation time (47.92%), NASA-TLX scores (87.50%) and n-back accuracy (70.83%) could discriminate different MWLs very well (predicted accuracy 97.92%).The cross check method showed that the comprehensive evaluation model had the highest discrimination and prediction accuracy (95.83%), followed in succession by physiological indexes fixation frequency (45.83%), total fixation time (47.92%), NASA-TLX scores (87.50%) and n-back accuracy (68.75%).The conclusion indicated our model had a higher discrimination and prediction accuracy than the model combined with NASA-TLX scores, n-back accuracy, reaction time and HRV time domain index SDNN[35].It was consistent with the latest study by Hogervorst[36]:for visual input in tasks, EEG performed the best, followed by EOG indexes and the external physical measurement.Numerous studies indicated that the effect of single variable was inferior to comprehensive assessment of multiple variables and the latter were considered to be more precise and persuasive.

Table 10 Results of single assessment index and multidimensional synthetic assessment

5.3 Limitations

This experiment still has limitations, although we made consideration on the subjects and the training, there was disparity between our subjects and the real pilots.What's more, we used the flight simulation platform to simulate the flight dynamic environment which had some difference with real flight scenario.The correlations within the eye indexes also need further research.The discrimination and prediction model should be improved in the real flight environment.

6 Conclusions

We presented various MWLs in the flight simulation experiment.In our study, subjects were required to accomplish a monitoring primary task based on HUD which occupied the visual channel resources and a sub-task which possessed the cognitive resources of auditory channel.NASA-TLX scores, n-back accuracy and eye indexes were analyzed with single-factor repeated measure ANOVA in various MWLs.Next, the correlation analysis among and within them were also be done to evaluate the predict validity and to select the suitable indexes for further modeling.At last, we built up Fisher projection function and Bayes linear discrimination model in succession by fixation frequency and total fixation time, NASA-TLX scores and n-back accuracy to realize the discrimination of different MWLs.The conclusions are as follows:

1) In our flight simulation experiment, fixation frequency, average saccade time, blink rate and average pupil diameter were sensitive to MW, fixation frequency and average saccade time significantly decreased with the increase of MWL.While blink rate and average pupil diameter significantly increased with the increase of MWL.Total fixation time and total saccade time were sensitive to high and other MWLs, both of them decreased with the increase of MWL; average fixation time, saccade frequency and average blink time were not significant with MW.Also there were correlations among fixation, saccade and blink indexes.And average fixation time was highly negatively correlated with saccade frequency and total saccade time; total saccade time was highly positively correlated with fixation frequency.

2) In our flight simulation task, the Fisher projection function and Bayes discrimination model were established to evaluate various MWLs.Fixation frequency and total fixation time (Physiological measurement), NASA-TLX scores (Subjective assessment) and n-back accuracy (Performance evaluation) were selected for the modeling.The discriminate results of the Fisher projection function were the same as Bayes linear discrimination model, and the average discrimination and prediction accuracies of original check method and cross check method were 97.92% and 95.83% respectively.

References
[1] FAA. CFR Part 25.Airworthiness Standards:Transport Category Airplanes. Washington, DC: FAA, 2002. (0)
[2] Yeh M, Jin Jo Y, Donovan C, et al. Human Factors Considerations in the Design and Evaluation of Flight Deck Displays and Controls. Washington: U.S.Department of Transportation, 2013. (0)
[3] Durantin G, Gagnon J F, Tremblay S, et al. Using near infrared spectroscopy and heart rate variability to detect mental overload. Behavioural Brain Research, 2014, 259: 16-23. DOI:10.1016/j.bbr.2013.10.042 (0)
[4] Rubio S, Díaz E, Martín J, et al. Evaluation of subjective mental workload:A comparison of SWAT, NASA-TLX, and workload profile methods. Applied Psychology, 2004, 53(1): 61-86. DOI:10.1111/j.1464-0597.2004.00161.x (0)
[5] Hart S G, Staveland L E. Development of NASA-TLX (Task Load Index):Results of empirical and theoretical research. Advances in Psychology, 1988, 52: 139-183. DOI:10.1016/S0166-4115(08)62386-9 (0)
[6] Bonner M A, Wilson G F. Heart rate measures of flight test and evaluation. The International Journal of Aviation Psychology, 2002, 12(1): 63-77. DOI:10.1207/S15327108IJAP1201_6 (0)
[7] Funke G J, Knott B A, Salas E, et al. Conceptualization and measurement of team workload a critical need. Human Factors:The Journal of the Human Factors and Ergonomics Society, 2012, 54(1): 36-51. DOI:10.1177/0018720811427901 (0)
[8] Gao Q, Wang Y, Song F, et al. Mental workload measurement for emergency operating procedures in digital nuclear power plants. Ergonomics, 2013, 56(7): 1070-1085. DOI:10.1080/00140139.2013.790483 (0)
[9] Wanyan X, Zhuang D, Zhang H. Improving pilot mental workload evaluation with combined measures. Bio-medical Materials and Engineering, 2014, 24(6): 2283-2290. DOI:10.3233/BME-141041 (0)
[10] Mun S, Kim E S, Park M C. Effect of mental fatigue caused by mobile 3D viewing on selective attention:An ERP study. International Journal of Psychophysiology, 2014, 94(3): 373-381. DOI:10.1016/j.ijpsycho.2014.08.1389 (0)
[11] Gopher D, Braune R. On the psychophysics of workload:Why bother with subjective measures?. Human Factors:The Journal of the Human Factors and Ergonomics Society, 1984, 26(5): 519-532. DOI:10.1177/001872088402600504 (0)
[12] Wickens C D. Engineering Psychology and Human Performance. New York: Harper Collins Publishers, 1992. (0)
[13] Kohlmorgen J, Dornhege G, Braun M, et al. Improving human performance in a real operating environment through real-time mental workload detection. Massachusetts: Toward Brain-Computer Interfacing, 2007, 409-422. (0)
[14] Mehler B, Reimer B, Coughlin J, et al. Impact of incremental increases in cognitive workload on physiological arousal and performance in young adult drivers. Transportation Research Record:Journal of the Transportation Research Board, 2009(2138): 6-12. DOI:10.3141/2138-02 (0)
[15] Mehler B, Reimer B, Coughlin J F. Sensitivity of physiological measures for detecting systematic variations in cognitive demand from a working memory task an on-road study across three age groups. Human Factors:The Journal of the Human Factors and Ergonomics Society, 2012, 54(3): 396-412. DOI:10.1177/0018720812442086 (0)
[16] Ayaz H, Willems B, Bunce B, et al. Cognitive workload assessment of air traffic controllers using optical brain imaging sensors.Advances in Understanding Human Performance:Neuroergonomics, Human Factors Design, and Special Populations. New York, 2010, 21-31. DOI:10.1201/EBK1439835012-4 (0)
[17] Casali J G, Wierwille W W. A comparison of rating scale, secondary-task, physiological, and primary-task workload estimation techniques in a simulated flight task emphasizing communications load. Human Factors:The Journal of the Human Factors and Ergonomics Society, 1983, 25(6): 623-641. DOI:10.1177/001872088302500602 (0)
[18] Di Stasi L L, Antolí A, Gea M, et al. A neuroergonomic approach to evaluating mental workload in hypermedia interactions. International Journal of Industrial Ergonomics, 2011, 41(3): 298-304. DOI:10.1016/j.ergon.2011.02.008 (0)
[19] Iqbal S T, Zheng X S, Bailey B P. Task-evoked pupillary response to mental workload in human-computer interaction.Proceedings of the 22th Conference on Human Factors in Computing Systems (CHI 2004). Vienna, 2004, 1477-1480. DOI:10.1145/985921.986094 (0)
[20] Siegle G J, Ichikawa N, Steinhauer S. Blink before and after you think:blinks occur prior to and following cognitive load indexed by pupillary responses. Psychophysiology, 2008, 45(5): 679-687. DOI:10.1111/j.1469-8986.2008.00681.x (0)
[21] Ichikawa N, Ohira H. Eyeblink activity as an index of cognitive processing:temporal distribution of eyeblinks as an indicator of expectancy in semantic priming 1, 2. Perceptual and Motor Skills, 2004, 98(1): 131-140. DOI:10.2466/PMS.98.1.131-140 (0)
[22] Beatty J. Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychological Bulletin, 1982, 91(2): 276. DOI:10.1037//0033-2909.91.2.276 (0)
[23] Zhao L. Research and Improvement of Fisher Discrimination Analysis Method. Harbin: Northeast Forestry University, 2013. (0)
[24] Xiao X, Wanyan X, Zhuang D M. Comprehensive evaluation model of multidimensional visual coding on display interface. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(6): 1012-1018. DOI:10.13700/j.bh.1001-5965.2014.0428 (0)
[25] Jiang B. Influence of Column Designing on University Students' Reading:Evidence from Eye Movement. Nanjing: Nanjing University, 2007. (0)
[26] Di Stasi L L, Marchitto M, Antolí A, et al. Approximation of on-line mental workload index in ATC simulated multitasks. Journal of Air Transport Management, 2010, 16(6): 330-333. DOI:10.1016/j.jairtraman.2010.02.004 (0)
[27] Di Stasi L L, Antolí A, Cañas J J. Evaluating mental workload while interacting with computer-generated artificial environments. Entertainment Computing, 2013, 4(1): 63-69. DOI:10.1016/j.entcom.2011.03.005 (0)
[28] Bodala I P, Ke Y, Mir H, et al. Cognitive workload estimation due to vague visual stimuli using saccadic eye movements.Proceedings of the 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC 2014). Chicago, 2014, 2993-2996. DOI:10.1109/EMBC.2014.6944252 (0)
[29] Sirevaag E J, Kramer A F, Reisweber C D W M, et al. Assessment of pilot performance and mental workload in rotary wing aircraft. Ergonomics, 1993, 36(9): 1121-1140. DOI:10.1080/00140139308967983 (0)
[30] Zhang W, Dong W. SPSS Statistical Analysis Advanced Tutorials. Beijing: Higher Education Press, 2013, 311-317. (0)
[31] Hankins T C, Wilson G F. A comparison of heart rate, eye activity, EEG and subjective measures of pilot mental workload during flight. Aviation, Space, and Environmental Medicine, 1998, 69(4): 360-367. (0)
[32] Veltman J A. A comparative study of psychophysiological reactions during simulator and real flight. The International Journal of Aviation Psychology, 2002, 12(1): 33-48. DOI:10.1207/S15327108IJAP1201_4 (0)
[33] Wilson G F. An analysis of mental workload in pilots during flight using multiple psychophysiological measures. The International Journal of Aviation Psychology, 2002, 12(1): 3-18. DOI:10.1207/S15327108IJAP1201_2 (0)
[34] Juris M, Velden M. The pupillary response to mental overload. Physiological Psychology, 1977, 5(4): 421-424. DOI:10.3758/BF03337847 (0)
[35] Wei Z, Zhuang D, Wanyan X, et al. A model for discrimination and prediction of mental workload of aircraft cockpit display interface. Chinese Journal of Aeronautics, 2014, 27(5): 1070-1077. DOI:10.1016/j.cja.2014.09.002 (0)
[36] Hogervorst M A, Brouwer A M, van Erp J B F. Combining and comparing EEG, peripheral physiology and eye-related measures for the assessment of mental workload. Frontiers in Neuroscience, 2014, 8: 322. DOI:10.2478/s11756-009-0155-y (0)