Scan patterns predict sentence production (C&K, 2012, CogSci)

Objectives:

  1. demonstrate cross-modal similarity between sentences and scan-pattern
  2. show how is similarity computed
  3. plot the main result of the study and present the statistical analysis

Abstract

Most everyday tasks involve multiple modalities, which raises the question of how the processing of these modalities is coordinated by the cognitive system. In this paper, we focus on the coordination of visual attention and linguistic processing during speaking. Previous research has shown that objects in a visual scene are fixated before they are mentioned, leading us to hypothesize that the scan pattern of a participant can be used to predict what he or she will say. We test this hypothesis using a data set of cued scene descriptions of photo-realistic scenes. We demonstrate that similar scan patterns are correlated with similar sentences, within and between visual scenes; and that this correlation holds for three phases of the language production process (target identification, sentence planning, and speaking). We also present a simple algorithm that uses scan patterns to accurately predict associated sentences by utilizing similarity-based retrieval.

Load libraries and data

setwd("E:/R/replicability/cogsci2012/")
## set the path to the folder with data and functions

library(lme4)
## Loading required package: Matrix
## Loading required package: Rcpp
source('LCS.R') # the Longest Common Subsequence function
source("MyCenter.R") # center your variables to reduce colinearity
source("binmeasure.R")# a binning function for plotting

load('data.Rdata') # load the Rdata object
# take out of the Rdata object all datasets to be used
objects()
## [1] "binmeasure" "dataset"    "LCS"        "myCenter"   "sentences" 
## [6] "SP"
# dataset -> contains all pairwise similarity scores for the SP and sentences
#            calculated using the LCS sequence method.
# sentences -> an example of 6 sentences of our dataset
# SP        -> an example of the 6 scan-patterns associated to those sentences

head(dataset)
##   Sb1 Tr1 Sb2 Tr2     LCS.V     LCS.L Region  Cue Clutter
## 1   1   1   5   1 1.0000000 0.4082483   Plan  Ani     Min
## 2   1   1   9   1 1.0000000 0.7071068   Plan  Ani     Min
## 3   1   1  13   1 0.7071068 0.3535534   Plan  Ani     Min
## 4   1   1  17   1 1.0000000 0.5000000   Plan  Ani     Min
## 5   1   1  21   1 0.7071068 0.2672612   Plan  Ani     Min
## 6   1   1   2   1 0.2581989 0.7071068   Plan Diff     Min
as.character(sentences$sentence)
## [1] "The woman is weighing herself"                                                                                                        
## [2] "The woman is weighing herself"                                                                                                        
## [3] "A woman weights herself upon a scale while a second scale lies to her right"                                                          
## [4] "The teddy is in the girl 's harms"                                                                                                    
## [5] "The girl resting upon the bed held her teddy tightly while another teddy sat on the ground"                                           
## [6] "There is a girl sitting on her bed hugging her teddy bear and another little girl next to her she looks like she wants the teddy bear"
as.character(SP$obj)
## [1] "woman-R,scale-R,woman-L,woman-R,woman-R,bath,bath"                                          
## [2] "woman-R,woman-R,woman-L,woman-L,mat,woman-L,woman-R,drawers,drawers,woman-R,woman-L,woman-L"
## [3] "woman-L,woman-L"                                                                            
## [4] "bed,bed,toys,bed,bed"                                                                       
## [5] "bed,teddy-L,teddy-R,girl-R,girl-R,pillow,girl-L,bed,girl-R,bed,girl-R"                      
## [6] "girl-R,girl-R,girl-L,bed,bed,girl-R"

Compute pair-wise similarities

We run two loops going through the example sentences and scan-pattern, applying the LCS algorithm, and extract the similarity score. The indeces of the loops are set such that we never compute the similarity of a sentence (or SP) with itself.

lcs = vector()

for (s1 in 1:(nrow(SP)-1)){
  for (s2 in (s1+1):nrow(SP)){

    print(c(s1, s2))

    pair1 = SP[s1,1:2]
    pair2 = SP[s2,1:2]
    
    sp1 = unlist(strsplit(as.character(SP$obj[s1]), ","))
    sp2 = unlist(strsplit(as.character(SP$obj[s2]), ","))    

    se1 = unlist(strsplit(as.character(sentences$sentence[s1]), " "))
    se2 = unlist(strsplit(as.character(sentences$sentence[s2]), " "))    
    

    ## compute the LCS
    lcs.v = LCS(sp1, sp2)$similarity    
    lcs.l = LCS(se1, se2)$similarity    
    
    lcs = rbind(lcs, c(pair1, pair2, lcs.v, lcs.l), deparse.level = 0)

  }
}
## [1] 1 2
## [1] 1 3
## [1] 1 4
## [1] 1 5
## [1] 1 6
## [1] 2 3
## [1] 2 4
## [1] 2 5
## [1] 2 6
## [1] 3 4
## [1] 3 5
## [1] 3 6
## [1] 4 5
## [1] 4 6
## [1] 5 6
lcs = as.data.frame(matrix(unlist(lcs), ncol = 6)) ## quirky tricks to please R

str(lcs)
## 'data.frame':    15 obs. of  6 variables:
##  $ V1: num  1 1 1 1 1 11 11 11 11 16 ...
##  $ V2: num  1 1 1 1 1 1 1 1 1 1 ...
##  $ V3: num  11 16 1 3 12 16 1 3 12 1 ...
##  $ V4: num  1 1 8 8 8 1 8 8 8 8 ...
##  $ V5: num  0.436 0.267 0 0 0 ...
##  $ V6: num  1 0.2309 0.3162 0.1085 0.0861 ...
colnames(lcs) = c("Sb1", "Tr1", "Sb2", "Tr2", "LCS.V", "LCS.L")

With as little as 6 different trials, we obtain already a correlation between visual and linguistic similarity of 0.44 at a p-value < .1.

with(lcs, cor.test(LCS.V, LCS.L) )
## 
##  Pearson's product-moment correlation
## 
## data:  LCS.V and LCS.L
## t = 1.8144, df = 13, p-value = 0.09276
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08152552  0.78175926
## sample estimates:
##     cor 
## 0.44951

Visualize the trend (Figure 3 in the manuscript)

First, we specify the argument for the binning function. We set number to (2) indicating that there are two different variables, unit to (0.1), we bin in intervals of .1, and we do it for our interval (i.e., between 0 and 1)

number = 2; unit = 0.1; interval = c(0,1)

We extract the subsets of data diveded by the SP region of interest, with Dur (Production), Enc (Encoding) and Plan (Planning), and calculate means and standard errors.

Dur = subset(dataset, dataset$Region == "Dur"); 
Enc = subset(dataset, dataset$Region == "Enc"); 
Plan = subset(dataset, dataset$Region == "Plan")

Dur.LCS = binmeasure(interval, unit, Dur$LCS.L, Dur$LCS.V, number)[[3]] 
Enc.LCS = binmeasure(interval, unit, Enc$LCS.L, Enc$LCS.V, number)[[3]] 
Plan.LCS = binmeasure(interval, unit, Plan$LCS.L, Plan$LCS.V, number)[[3]] 

Then, we obtain the range for scaling the plot, and draw our measures.

yrg = range(c(Dur.LCS, Enc.LCS, Plan.LCS))

plot(seq(.1,1,.1), Plan.LCS, col = "yellow3", 
     lty = 1, type = "l", lwd = 3, 
     ylab = "LCS.V", xlab = "LCS.L")
lines(seq(.1,1,.1), Enc.LCS, col = "green3", lty = 2, lwd = 3)
lines(seq(.1,1,.1), Dur.LCS, col = "red3", lty = 3, lwd = 3)

legend("topleft", c("Planning", "Encoding", "During"), 
       col = c("yellow3", "green3", "red3"), 
       lty = c(1,2,3), bty = "n", cex = 1.5, lwd = 2.2)

Compute the LME model reported in the paper (Table 2)

We utilize a backward model selection procedure to obtain our final model, but since the running time is quite prohibitive, we report here the results of such selected model. First, we create different contrasted variables for Region (During, Encoding, Planning), Cue (Animate, Inanimate, Mixed) and Clutter (Minimal, Cluttered, Mixed), all three levels categorical variables.

We manually create a contrast matrix and set up the reference levels as Cue (Mixed), Region (Encoding), Clutter (Mixed). Note, Clutter was not selected as significant during our model selection procedure. We just contrast it, as well as, for completeness. We also centered the resulting contrasted variables.

The results might very slightly differ due to updates to the LME package.

contrast = rbind(c(-0.5,-0.5),c(0.5,0),c(0,0.5));

## coding Region

Region = matrix(0, ncol=2, nrow = nrow(dataset));
enc = which(dataset$Region == "Enc"); 
plan = which(dataset$Reg == "Plan");
dur = which(dataset$Reg == "Dur");

Region[enc,1] = contrast[1,1]; Region[enc,2] = contrast[1,2];
Region[plan,1] = contrast[2,1]; Region[plan,2] = contrast[2,2];
Region[dur,1] = contrast[3,1]; Region[dur,2] = contrast[3,2];

## coding Cue

Cue = matrix(0, ncol = 2, nrow = nrow(dataset));

ani = which(dataset$Cue  == "Ani"); 
ina = which(dataset$Cue == "Ina");
diff = which(dataset$Cue == "Diff");

Cue[ina,1] = contrast[1,1]; Cue[ina,2] = contrast[1,2];
Cue[ani,1] = contrast[2,1]; Cue[ani,2] = contrast[2,2];
Cue[diff,1] = contrast[3,1]; Cue[diff,2] = contrast[3,2];

## coding Clutter

Clutter = matrix(0, ncol = 2, nrow = nrow(dataset));
min = which(dataset$Clutter == "Min"); 
clu = which(dataset$Clutter == "Clu");
diff = which(dataset$Clutter == "Diff");

Clutter[diff,1] = contrast[3,1]; Clutter[diff,2] = contrast[3,2];
Clutter[min,1] = contrast[1,1]; Clutter[min,2] = contrast[1,2];
Clutter[clu,1] = contrast[2,1]; Clutter[clu,2] = contrast[2,2];

## center the variables
Region = myCenter(Region)
Cue = myCenter(Cue)
Clutter = myCenter(Clutter)

Then, we take our dependent variable (LCS.V), and model it using LME in a simple random intercept only model with Participant and Item as random effects (adding slopes does not change the result). Note, we have two random variables for participant and items because similarity scores are computed pairwise.

depM = dataset$LCS.V

Predictor=cbind(myCenter(dataset$LCS.L), Region, Cue, Clutter)
Random = dataset[,1:4]
colnames(Predictor)=c("LCS.L","PlanvsEnc","ProdvsEnc",
                      "AnivsIna","MixvsIna", "CluvsMin","MixvsMin")

data = data.frame(depM, Predictor, Random)

## finally build the model reported in the paper
## it will take some time as the dataset is very rich

model = lmer(depM ~ LCS.L + PlanvsEnc + ProdvsEnc 
             + AnivsIna + MixvsIna +
               LCS.L:PlanvsEnc + 
               LCS.L:ProdvsEnc +
               LCS.L:AnivsIna +
               AnivsIna:PlanvsEnc +
               AnivsIna:ProdvsEnc +
               MixvsIna:ProdvsEnc 
             + (1 | Sb1) + (1 | Sb2)
             + (1 | Tr1) + (1 | Tr2),
             data = data)

summary(model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: depM ~ LCS.L + PlanvsEnc + ProdvsEnc + AnivsIna + MixvsIna +  
##     LCS.L:PlanvsEnc + LCS.L:ProdvsEnc + LCS.L:AnivsIna + AnivsIna:PlanvsEnc +  
##     AnivsIna:ProdvsEnc + MixvsIna:ProdvsEnc + (1 | Sb1) + (1 |  
##     Sb2) + (1 | Tr1) + (1 | Tr2)
##    Data: data
## 
## REML criterion at convergence: -313787.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.6354 -0.6348 -0.2305  0.4795  6.2084 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  Sb1      (Intercept) 6.831e-05 0.008265
##  Sb2      (Intercept) 1.104e-04 0.010507
##  Tr1      (Intercept) 2.351e-03 0.048485
##  Tr2      (Intercept) 2.514e-03 0.050136
##  Residual             2.576e-02 0.160505
## Number of obs: 382973, groups:  Sb1, 24; Sb2, 24; Tr1, 24; Tr2, 24
## 
## Fixed effects:
##                      Estimate Std. Error t value
## (Intercept)         0.1706286  0.0145081   11.76
## LCS.L               0.5792896  0.0019754  293.25
## PlanvsEnc           0.0384516  0.0007563   50.84
## ProdvsEnc          -0.0188180  0.0007362  -25.56
## AnivsIna           -0.0016053  0.0007973   -2.01
## MixvsIna            0.0062046  0.0007174    8.65
## LCS.L:PlanvsEnc     0.3243245  0.0053400   60.73
## LCS.L:ProdvsEnc    -0.3585995  0.0051271  -69.94
## LCS.L:AnivsIna      0.0977684  0.0052455   18.64
## PlanvsEnc:AnivsIna  0.0070196  0.0022218    3.16
## ProdvsEnc:AnivsIna -0.0170437  0.0021685   -7.86
## ProdvsEnc:MixvsIna  0.0151661  0.0017606    8.61
## 
## Correlation of Fixed Effects:
##             (Intr) LCS.L  PlnvsE PrdvsE AnvsIn MxvsIn LCS.L:PlE LCS.L:PrE
## LCS.L       -0.007                                                       
## PlanvsEnc    0.002  0.013                                                
## ProdvsEnc   -0.001 -0.003 -0.397                                         
## AnivsIna     0.003 -0.086 -0.097  0.131                                  
## MixvsIna    -0.003  0.043  0.014 -0.013 -0.261                           
## LCS.L:PlnvE  0.000  0.089 -0.017  0.006  0.044 -0.031                    
## LCS.L:PrdvE  0.000 -0.029 -0.001 -0.037 -0.095  0.057 -0.423             
## LCS.L:AnvsI -0.001  0.009  0.043 -0.097 -0.041  0.073 -0.044     0.014   
## PlnvsEnc:AI -0.002  0.046  0.017 -0.047  0.089  0.099 -0.045     0.064   
## PrdvsEnc:AI  0.003 -0.100 -0.043  0.192 -0.047 -0.111  0.068    -0.138   
## PrdvsEnc:MI -0.001  0.050  0.011 -0.159 -0.077 -0.086 -0.031     0.059   
##             LCS.L:A PlE:AI PrE:AI
## LCS.L                            
## PlanvsEnc                        
## ProdvsEnc                        
## AnivsIna                         
## MixvsIna                         
## LCS.L:PlnvE                      
## LCS.L:PrdvE                      
## LCS.L:AnvsI                      
## PlnvsEnc:AI  0.029               
## PrdvsEnc:AI -0.034  -0.399       
## PrdvsEnc:MI  0.059   0.045 -0.303