Abstract

The role of the task has received special attention in visual-cognition research because it can provide causal explanations of goal-directed eye-movement responses. The dependency between visual attention and task suggests that eye movements can be used to classify the task being performed. A recent study by Greene, Liu, and Wolfe (2012), however, fails to achieve accurate classification of visual tasks based on eye-movement features. In the present study, we hypothesize that tasks can be successfully classified when they differ with respect to the involvement of other cognitive domains, such as language processing. We extract the eye-movement features used by Greene et al. as well as additional features from the data of three different tasks: visual search, object naming, and scene description. First, we demonstrated that eye-movement responses make it possible to characterize the goals of these tasks. Then, we trained three different types of classifiers and predicted the task participants performed with an accuracy well above chance (a maximum of 88% for visual search). An analysis of the relative importance of featuresfor classification accuracy reveals that just one feature, i.e., initiation time, is sufficient for above-chance performance (a maximum of 79% accuracy in object naming). Crucially, this feature is independent of task duration, which differs systematically across the three tasks we investigated. Overall, the best task classification performance was obtained with a set of seven features that included both spatial information (e.g., entropy of attention allocation) and temporal components (e.g., total fixation on objects) of the eye-movement record. This result confirms the task-dependent allocation of visual attention and extends previous work by showing that task classification is possible when tasks differ in the cognitive processes involved (purely visual tasks such as search vs. communicative tasks such as scene description).

Load libraries and data, source the functions needed

library(glmnet)
library(nnet)
library(kernlab)
library(e1071)

setwd("E:/R/replicability/jov2014/")

source("MyCenter.R")
source("F-score.R")

data = databack = read.table("EMstatistics.txt", header = TRUE)
## saving dataset twice, we need databack later

## show the names of the different features extracted
colnames(data)[4:ncol(data)]

##  [1] "Initiation.Time"            "Mean.Gaze"                 
##  [3] "Nr..Fixation"               "Mean.Saccade"              
##  [5] "Entropy"                    "Latency.Body"              
##  [7] "First.Fixation.Body"        "Total.Fixation.Body"       
##  [9] "Mean.Gaze.Body"             "Proportion.Fixation.Body"  
## [11] "Latency.Object"             "First.Fixation.Object"     
## [13] "Total.Fixation.Object"      "Mean.Gaze.Object"          
## [15] "Proportion.Fixation.Object" "Latency.Face"              
## [17] "First.Fixation.Face"        "Total.Fixation.Face"       
## [19] "Mean.Gaze.Face"             "Proportion.Fixation.Face"  
## [21] "Task"                       "Saliency"                  
## [23] "Turbulence"                 "Complexity"                
## [25] "Clutter"                    "Feature.Congestion"        
## [27] "Fixation.Area"

Features for classification

The full data set contains a total of 1,756 unique trials, which are divided across the three tasks as follows: search (580), description (600), naming (576). From the eye-movement data of each trial, we extract the seven features used by Greene et al. (2012): (a) number of fixations, (b) mean fixation duration, (c) mean saccade amplitude, and (d) percent of image covered by fixations assuming a 18 circle around the fixation position, proportion of dwell time on (e) faces, (f) bodies, and (g) objects. In addition to the Greene et al. (2012) feature set (GF), we extracted another set of 15 features (OF): (a)latency of first fixation; (b) first fixation duration; (c) mean fixation duration; (d) total gaze duration on faces, bodies, and objects (four features for three different regions, i.e., 12 features in total); (e) the initiation time, which is the time spent after scene onset before the first saccade is launched; (f) mean saliency at the fixation location; and (g) the entropy of the attentional landscape.

## take out only the features of Green,ea, 20 12
dvs = c(5, 6, 7, 13, 18, 23)

depM = data[, 24] ## the column where the task is
mat = data[, dvs] 
mat = myCenter(mat)

## give the column names back
if ( is.matrix(mat) ) colnames(mat) = colnames(data)[dvs]

data = data.frame(mat, depM)

predictors = names(data) ## define the predictors
predictors = predictors[-length(predictors)]

The classifiers

We used the features described above to train three different types of classifiers: a multinomial log-linear neural networks model (nnet) package, Venables & Ripley, 2002); a generalized linear model with penalized maximum likelihood (glmnet) and a support vector machine (ksvm) using the default setting for all three different models.

Prepare the set for training and testing (i.e., k-folds)

The classifiers were trained and tested using ten-fold cross-validation, in which the model is trained on 90% of the data and then tested in the remaining 10%; this process is repeated 10 times so that each fold functions as test data exactly once. This is a way to avoid overfitting the training data while still making maximal use of a small data set.

percent = 10
total = test.ind = seq(1,nrow(data), 1)
nr.t = round( length(total)*percent/100 )  ## nr. of testing instances

fold = 10 ## number of folds
train.folds = test.folds = list()
keep = vector()

for (k in 1:fold){   ## create the 10 folds for cross-validation

    if (length(test.ind) >= nr.t) {
        tt = sample(test.ind, nr.t) ## may not be possible to create
                                    ## 10 identical folds
        keep = c(keep, tt)
    ##  print ( c( length(tt), length(unique(tt) ) ) )
    } else {                        ## last fold put all remaining indeces in
        tt = test.ind
    }
    
    ts = setdiff(total, tt)
  
    train.folds[[k]] = ts
    test.folds[[k]] = tt
    
    rm = vector()
  
    for (r in tt){ rm = c(rm, which(test.ind == r)) }

    test.ind = test.ind[-rm]
    
    # print( length(test.ind) )
  
}


## initialize the mis-classification matrices, where instances 
## of inaccurate classifications are stored

classes = unique(data$depM)
FglmMis = FmmMis = FsvmMis = matrix(0, nrow = length(classes),
    ncol = length( classes ) )
colnames(FglmMis) = rownames(FglmMis) = colnames(FmmMis) = rownames(FmmMis) = colnames(FsvmMis) = rownames(FsvmMis) =  classes

Run the classifiers and store the F-scores

We measured the accuracy of the classifiers using F-score, which is the geometric mean of precision and recall and is defined as F-score = 2(PR)/(P + R). Precision (P) is the number of correctly classified instances over the total number of instances labeled as belonging to the class, defined as tp/(tp + fp). Here, tp is the number of true positives (i.e., instances of the class that were correctly identified as such), and fp is the number of false positives (i.e., instances that were labeled as members of the class even though they belong to a different class). Recall (R) is the number of correctly classified instances over the total number of instances in that class, defined as tp/(tp + fn), where fn is the number of false negatives, i.e., the number of instances that were labeled as nonmembers of the class even though they belong to the class.

k = 1
ans = vector() ## store the results in it

for (k in 1:fold){ ## for all the folds
  print(k)
  
  build = vector()
  
  train.ml = train = data[train.folds[[k]],]
  test = data[test.folds[[k]], ]
  
  ind.var = which(names(test) == "depM") ## get the col index of depM
  test.m = as.character(test[, ind.var])
  predictors = names( train.ml[1:(ncol(train.ml)-1)] )
  
  formula = as.formula ( paste("depM", "~", paste(predictors, collapse = "+"), sep = "") )
  
  ## Parameters are kept at standard values
  ## room for optimizing them
  
  ## run multinomial model (MM)
  
  train.ml$depM = relevel(train.ml$depM, ref = "Prod")
  
  mmodel = multinom(formula, train.ml, maxit = 150, Hess = FALSE)
  mmtest = predict(mmodel, newdata = test, "probs") 
  
  ## run support vector machine (SVM)
  
  SVMmodel =  ksvm(formula, data = train, kernel = "rbfdot")
  SVMpredict = predict(SVMmodel, test)
  
  ## run Lasso regression (LASSO)
  ## LASSO does not cope with NAs, so, first find them in both training and testing
  ## if some are found, remove the row containing them. 
  
  y = train$depM
  glmdata = as.matrix( train[,1:(ncol(train)-1)] )
  glmtest = as.matrix( test[, 1:(ncol(test)-1)] )
  
  ind = which(is.na(glmdata == TRUE), arr.ind = TRUE)  
  if(length(ind) > 0){ glmdata = glmdata[-ind[,1],]; y = y[-ind[,1]]}
  ind = which(is.na(glmtest == TRUE), arr.ind = TRUE)   
  if(length(ind) > 0){ glmtest = glmtest[-ind[,1],]}
    
  glmnetmodel = glmnet(glmdata, y, family = "multinomial", alpha = 0,
                       standardize = F)  
  predglmnet = predict(glmnetmodel, glmtest, s = 1, type = "class" )
  
  ## calculate F-score and percentage accuracy  
  classes = vector()
  
  for (r in 1:nrow(mmtest)){
    row.prob = which(mmtest[r,] == max(mmtest[r, ]) ) 
    classes = c(classes, names(row.prob))}
  
  resd = Fscore(classes, test.m)
  fscr = resd[[1]][, c(1,4)]
  
  Fmm = cbind(fscr, rep("MM", nrow(fscr)))
  FmmMis = (FmmMis + resd[[3]])/2
  
  resd = Fscore(SVMpredict, test.m)
  fscr = resd[[1]][, c(1,4)]
  
  Fsvm = cbind(fscr, rep("SVM", nrow(fscr)))
  FsvmMis = (FsvmMis + resd[[3]])/2
  
  
  resd = Fscore(predglmnet, test.m)
  fscr = resd[[1]][, c(1,4)]
  
  Fglmnet = cbind(fscr, rep("LASSO", nrow(fscr)))
  FglmMis = (FglmMis + resd[[3]])/2
  
  ans = rbind(ans, 
              rbind(Fmm, Fglmnet, Fsvm, deparse.level = 0),
              deparse.level = 0)
  
}

## [1] 1
## # weights:  24 (14 variable)
## initial  value 1733.610192 
## iter  10 value 1162.991450
## iter  20 value 953.023085
## final  value 953.022364 
## converged
## [1] 2
## # weights:  24 (14 variable)
## initial  value 1733.610192 
## iter  10 value 1137.091917
## iter  20 value 948.257636
## final  value 948.257418 
## converged
## [1] 3
## # weights:  24 (14 variable)
## initial  value 1733.610192 
## iter  10 value 1139.579863
## iter  20 value 949.112855
## final  value 949.112471 
## converged
## [1] 4
## # weights:  24 (14 variable)
## initial  value 1734.708804 
## iter  10 value 1139.508123
## iter  20 value 954.992558
## final  value 954.992484 
## converged
## [1] 5
## # weights:  24 (14 variable)
## initial  value 1733.610192 
## iter  10 value 1142.935158
## iter  20 value 965.148661
## final  value 965.148618 
## converged
## [1] 6
## # weights:  24 (14 variable)
## initial  value 1734.708804 
## iter  10 value 1133.487055
## iter  20 value 934.207520
## iter  30 value 934.202819
## final  value 934.200371 
## converged
## [1] 7
## # weights:  24 (14 variable)
## initial  value 1733.610192 
## iter  10 value 1135.346320
## iter  20 value 938.919336
## final  value 938.919193 
## converged
## [1] 8
## # weights:  24 (14 variable)
## initial  value 1733.610192 
## iter  10 value 1144.234681
## iter  20 value 952.920489
## iter  30 value 952.916769
## final  value 952.915288 
## converged
## [1] 9
## # weights:  24 (14 variable)
## initial  value 1733.610192 
## iter  10 value 1132.412092
## iter  20 value 941.886780
## iter  30 value 941.884266
## final  value 941.882995 
## converged
## [1] 10
## # weights:  24 (14 variable)
## initial  value 1738.004641 
## iter  10 value 1135.677720
## iter  20 value 943.788200
## iter  30 value 943.785425
## final  value 943.784765 
## converged

colnames(ans) = c("task", "fscore", "classifier")

Show the overall mean accuracy for the different classifiers, and divided by task

ans = as.data.frame(ans)
ans$fscore = as.numeric(as.matrix(ans$fscore))

str(ans)

## 'data.frame':    90 obs. of  3 variables:
##  $ task      : Factor w/ 3 levels "Naming","Prod",..: 2 1 3 2 1 3 2 1 3 1 ...
##  $ fscore    : num  0.63 0.739 0.872 0.636 0.739 ...
##  $ classifier: Factor w/ 3 levels "LASSO","MM","SVM": 2 2 2 1 1 1 3 3 3 2 ...

## overall
with(ans, tapply(fscore, list(classifier), mean))

##     LASSO        MM       SVM 
## 0.6689440 0.6842990 0.6917908

## divided by task
with(ans, tapply(fscore, list(classifier, task), mean))

##          Naming      Prod    Search
## LASSO 0.6869928 0.5745407 0.7452985
## MM    0.6989330 0.5980980 0.7558660
## SVM   0.7082063 0.6025014 0.7646648

We demonstrate that task can be accurately classified using eye-movement features.

Show that classification is possible with just two features, and plot the result

We test that our result is robust to self-termination (i.e., not using features related to the amount of viewing time), by training a SVM classifier on two features: initiation time and mean saccade amplitude. The first feature reflects the cuing aspect of the task, the second one the spatial span covered as the scene is explored.

## make the dataset smaller on the range to improve plotting
colnames(databack)

##  [1] "subj"                       "trial"                     
##  [3] "cond"                       "Initiation.Time"           
##  [5] "Mean.Gaze"                  "Nr..Fixation"              
##  [7] "Mean.Saccade"               "Entropy"                   
##  [9] "Latency.Body"               "First.Fixation.Body"       
## [11] "Total.Fixation.Body"        "Mean.Gaze.Body"            
## [13] "Proportion.Fixation.Body"   "Latency.Object"            
## [15] "First.Fixation.Object"      "Total.Fixation.Object"     
## [17] "Mean.Gaze.Object"           "Proportion.Fixation.Object"
## [19] "Latency.Face"               "First.Fixation.Face"       
## [21] "Total.Fixation.Face"        "Mean.Gaze.Face"            
## [23] "Proportion.Fixation.Face"   "Task"                      
## [25] "Saliency"                   "Turbulence"                
## [27] "Complexity"                 "Clutter"                   
## [29] "Feature.Congestion"         "Fixation.Area"

databack = subset(databack, databack$Initiation.Time <= 600 & databack$Mean.Saccade <= 500)

Initiation = databack$Initiation.Time
Saccade = databack$Mean.Saccade

Task = as.character(databack$Task)
Task[which(Task == "Prod")] = "Description" 
## just rename Prod, with something clearer
Task = as.factor(Task)
    
DF = data.frame(Initiation, Saccade, Task)
model = svm(Task ~ Saccade + Initiation, data = DF)

alphacol = c("#00CD0050","#FF000050","#0000FF50") ## transparency

plot(model, DF, svSymbol = 8, dataSymbol = 16, grid = 200,
     symbolPalette = c("green4", "red", "blue"),
     pch = c(16,15,3), col = alphacol)

Task classification using eye-movement features (C&K, 2014, JoV)