Tutorial: Quantifying Interpersonal Dynamics using CRQA

Objectives:

a basic understanding of (C)RQA and its applicability to research in cognitive science, and apply it to interpersonal dynamics.
a primer, hands-on, of (C)RQA to behavioral data using the crqa() package in R.

Background

Interpersonal dynamics has become a growing topic in cognitive science because of its implications for our ‘social cognitive system.’ Advances in this research topic have been possible through the application of statistical methods providing a quantification for the dynamic structure of cognitive responses observed when individuals interact. Recurrence Quantification Analysis (RQA) is one such framework, as it makes possible to quantify how, and the extent to which, a signal is revisiting a similar state in time. In the context of the dyad, it can be used to reveal the phases during which interlocutor responses synchronize, and the strength of such synchronization.

Focus of this source code

R basics: Brief primer on Markdown, recap of basic R functions, and pre-processing of raw data
Categorical data: Auto and Cross Recurrence
Continuous data: Auto and Cross Recurrence

R basics

Very brief primer on Markdown and Rknit

Paragraphing: continuous = sign makes the title, single # makes a section, a double ## makes a paragraph

This is a quote

To compile html make, library(knitr) must be loaded, then either press Knit HTML (R-Studio) or knit('tutorial_knitted.Rmd'), with command line.

Hyperlinks: see the borrowed R Markdown example from here.

italic and bold.

A code chunk opens with three backticks and an r in curly bracket, and closes using three backticks.

1+1

## [1] 2

inline code is enclosed as 3.1415927.

Code can also be nested into numbered (or not) lists:

String split

strsplit('hello indented world', ' ')[[1]]

## [1] "hello"    "indented" "world"

Substitute strings

a = 'hate'
a

## [1] "hate"

sub('hate', 'love', a)

## [1] "love"

Equations are enclosed using the dollar sign: : \(\alpha+\beta=\gamma\)

Plots are simply embedded in the code chunk

hist(rnorm(10000), breaks = 500)

With these basics, you can write almost any R document in MarkDown.

Load libraries and data for the tutorial.

library(plyr)
library(crqa)

## Loading required package: Matrix
## Loading required package: tseriesChaos
## Loading required package: deSolve
## Loading required package: fields
## Loading required package: spam
## Loading required package: grid
## Spam version 1.0-1 (2014-09-09) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction 
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
## 
## Attaching package: 'spam'
## 
## The following objects are masked from 'package:base':
## 
##     backsolve, forwardsolve
## 
## Loading required package: maps
## 
## Attaching package: 'fields'
## 
## The following object is masked from 'package:plyr':
## 
##     ozone
## 
## Loading required package: plot3D
## Loading required package: pracma
## 
## Attaching package: 'pracma'
## 
## The following object is masked from 'package:deSolve':
## 
##     rk4
## 
## The following objects are masked from 'package:Matrix':
## 
##     expm, lu, tril, triu

setwd('~/Dropbox/Rick/tutorial_CRQA_cogsci2015/')
# projects/papers/RickMorenoRQA/tutorial_CRQA_cogsci2015/')  # change to your own path (after download)  
source('lorenzattractor.R') # unless you have the last version of crqa() 1.0.6
load('alldata.Rdata') # load the Rdata object
# take out of the Rdata object all datasets to be used

emotData = alldata$emotData # emotional states (parent, adolescent)
jepsen =  alldata$jepsen # lyrics from carly jepsen (raw text)
obama = alldata$obama # speech from barak obama (raw text)
eyemovement = alldata$eyemovement # from Richardson & Dale, 2005 (eye movements, speaker, listener)
bodymovement =  alldata$bodymovement # from Paxton & Dale, 2013 (bodily motion of conversants)

Categorical events over a time-course

Many transcription and video-coding systems (e.g., ELAN, etc.) utilize an event-interval coding scheme that specifies a given event and its start / end times.

Let’s consider, for example, data about emotional dynamics between parent and child during an interaction (Dr. Alexandra Main’s data, UC Merced); refer to emotData.

Initially, the emotional time line of the parent looked like this:

00:00:44:23,00:00:50:09,P Neutral,,,

The timestamps from start and end of the emotional state, the identifier for (P)arent and (A)dolescent, and the actual state. The data do not start at time 0s, but at 44s, and “parent humor” is the emotional event taking place in the interation.

The first thing one must do to start recurrence is to obtain a new data format which encodes events with regular time sampling.

Put it differently, recurrence assumes that you are inputting time series which represent regularly sampled measurements (either scalars or nominal identifiers).

This is just about the sole “pre-analysis” assumption that recurrence requires before stepping into its magical world.

(Minor note: In a sense, recurrence also requires the assumption of “exclusive” codes. In any given time series, we can only have only one measure or event at a given time; which means that all time points have a discrete code. However, in special cases, such as non-events, recurrence might not require it.)

So, we reshaped the data into a two-column second-by-second “time series”.

head(emotData)

##   s    parent adolescent
## 1 0 P Neutral    A Humor
## 2 1 P Neutral    A Humor
## 3 2 P Neutral    A Humor
## 4 3 P Neutral    A Humor
## 5 4 P Neutral    A Humor
## 6 5 P Neutral    A Humor

Now the left column shows time in seconds, starting from 0s. The file is tab-delimited and the second and third columns reflect the emotional events taking place in each person.

Pre-processing is not over. In fact, we need to code the data seeking for the “recurring states.”

Recode the data into event states

In order to determine state matches, we recode the time series by stripping the “A” and “P” prefixes so that we can play with the time series and leverage matching strings.

parent = as.character(emotData[,2]) # making it characters rather than factor list
adoles = as.character(emotData[,3]) 
# let's get emotion state out (string), stripping prefixes
for (i in 1:length(parent)) { 
  parent[i] = unlist(strsplit(parent[i]," "))[2] # space delimiter
  adoles[i] = unlist(strsplit(adoles[i]," "))[2]
} 
parent[1:4]

## [1] "Neutral" "Neutral" "Neutral" "Neutral"

adoles[1:4]

## [1] "Humor" "Humor" "Humor" "Humor"

Now let’s get the set of unique codes across parent and adolescent.

uniques = unique(c(parent,adoles))
uniques

##  [1] "Neutral"        "Tension"        "Domineering"    "Anger"         
##  [5] "Defensiveness"  "Elaboration"    "Backchannels"   "Sadness"       
##  [9] "Open-ended"     "Mirroring"      "Humor"          "Positive"      
## [13] "Direct"         "Identification" "Whining"        "Belligerence"  
## [17] "Sentence"

There are 17 unique codes. We can use the ordering of elements in this list to consistently renumber each of the time series we’re analyzing (parent / adolescent). Let’s use a lame little loop to do so.

pix = (1:length(parent))*0 # pix = parent's index in uniques
aix = (1:length(parent))*0 # default to 0's; aix = adolescent's
for (i in 1:length(parent)) {
  # which returns the index in uniques[j] of the parent[i] string
  pix[i] = which(uniques==parent[i]) 
  aix[i] = which(uniques==adoles[i]) 
} # below is a more efficient example of this with "apply"
pix[1:10]

##  [1] 1 1 1 1 1 1 2 2 2 2

aix[1:10]

##  [1] 11 11 11 11 11 11 11 11 11 11

Another solution would be to use the function checkts(), which takes as input two time-series of categorical nature, check whether their length is equal, and accept or reject according to a user specified threshold. Series can be made equal by padding the shortest, or chopping the longest. Finally, categorical states are recoded into numeric identifiers.

ans = checkts(parent, adoles, datatype = 'categorical', thrshd = 5, pad = FALSE)
recoded = ans[[1]]

pix = recoded[,1]
aix = recoded[,2]
# was the difference between the series smaller than the threshold?
ans[[2]]

## [1] TRUE

We can plot these changes now, if we want to eyeball the tendency for emotions to change during their interaction.

First we create an x-axis to unfold the states (realtime):

realtime = emotData$s
indnorm = seq(1, max(realtime), 42)
indnorm = c(indnorm, max(realtime))

Generate as many colors as there are unique states (look at the function rainbow())

cols = rainbow(length(uniques))

If we wish to plot the states with different colors plot(), we can implement a loop that goes through all points of the two series, identify their state, and color code accordingly on the plot as points().

par(mar = c(4, 5, 2, 2), font.lab = 2, cex.lab = 2.5, font.axis = 2, cex.axis = 2.2)

plot(realtime, rep(1, length(realtime)),
     lwd = 3, cex = .8,
     col = "gray", xaxt = "n", yaxt = "n",
     ylim = c(0, 3.5), xlab = "Time-Course (seconds)",
     ylab = "Interlocutor",
     type = "n", pch = 19
     )

mtext("Parent", at = 1, cex = 2, font = 2, line = 1, side = 2)
mtext("Adoles", at = 2.2, cex = 2, font = 2, line = 1, side = 2)

r = 2
for (r in 1:length(parent)){
    
    curP = as.character(parent[r])
    curS = as.character(adoles[r])

    indcP = which(uniques == curP)
    indcS = which(uniques == curS)

    points(realtime[r], 1, col = cols[indcP], cex = 2, pch = 16)
    points(realtime[r], 2, col = cols[indcS], cex = 2, pch = 16)

}

mtext(seq(1, max(realtime), 1)[indnorm], at = seq(1, max(realtime), 1)[indnorm], cex = 2, font = 2, line = 1, side = 1)

legend("top", uniques, col = cols, ncol = 3, cex = 1, bty = "n",
       pch = 16)

Recoding text in 2 more ways: characters, words

An obvious recoding of text data is in the form of characters or words. Let’s consider some sample text, namely, a State of the Union speech recently made by by Obama, and a groundbreaking song by Carly Lee Jepsen that we will put in your head.

# first we do some necessary pre-processing of the raw text

obama <- unlist(strsplit(as.character(obama), " ")) ## make it a vector of words
obama = gsub("[[:punct:]]", "", obama) ## get rid of punctuations
obama = obama[-which(obama == "")] ## and of empty spaces
obama = tolower(obama) ## lower case everything

# we split into long character vector; delimiter=''
chars = unlist(strsplit(obama,'')) 
# what are the unique characters? 
uniqChars = unique(chars)
# let's use the apply function from plyr  to loop and recode
charSeries = apply(data.frame(chars),1,function(x) which(x==uniqChars))
charSeries[1:20]

##  [1]  1  2  2  3  4  5  4  6  7  6  1  8  2  6  7  1  9  8  7 10

# here are our characters
plot(charSeries,type='b')

The ordering of these values is entirely meaningless. Characters appear in the “unique” vector in the order that they are found, and so in general one sees an increasing index across a text, only to asymptote at the point that the character set of the language is assimilated.

Let’s do the same thing for words. It’s easy. We just change the delimiter.

words = unlist(strsplit(obama,' '))
uniqWords = unique(words)
uniqWords

##   [1] "good"         "evening"      "tonight"      "i"           
##   [5] "can"          "report"       "to"           "the"         
##   [9] "american"     "people"       "and"          "world"       
##  [13] "that"         "united"       "states"       "has"         
##  [17] "conducted"    "an"           "operation"    "killed"      
##  [21] "osama"        "bin"          "laden"        "leader"      
##  [25] "of"           "al"           "qaeda"        "a"           
##  [29] "terrorist"    "whos"         "responsible"  "for"         
##  [33] "murder"       "thousands"    "innocent"     "men"         
##  [37] "women"        "children"     "it"           "was"         
##  [41] "nearly"       "10"           "years"        "ago"         
##  [45] "bright"       "september"    "day"          "darkened"    
##  [49] "by"           "worst"        "attack"       "on"          
##  [53] "in"           "our"          "history"      "images"      
##  [57] "911"          "are"          "seared"       "into"        
##  [61] "national"     "memory"       "hijacked"     "planes"      
##  [65] "cutting"      "through"      "cloudless"    "sky"         
##  [69] "twin"         "towers"       "collapsing"   "ground"      
##  [73] "black"        "smoke"        "billowing"    "up"          
##  [77] "from"         "pentagon"     "wreckage"     "flight"      
##  [81] "93"           "shanksville"  "pennsylvania" "where"       
##  [85] "actions"      "heroic"       "citizens"     "saved"       
##  [89] "even"         "more"         "heartbreak"   "destruction" 
##  [93] "yet"          "we"           "know"         "those"       
##  [97] "were"         "unseen"       "empty"        "seat"        
## [101] "at"           "dinner"       "table"        "who"         
## [105] "forced"       "grow"         "without"      "their"       
## [109] "mother"       "or"           "father"       "parents"     
## [113] "would"        "never"        "feeling"      "childs"      
## [117] "embrace"      "3000"         "taken"        "us"          
## [121] "leaving"      "gaping"       "hole"         "hearts"      
## [125] "11"           "2001"         "time"         "grief"       
## [129] "came"         "together"     "offered"      "neighbors"   
## [133] "hand"         "wounded"      "blood"        "reaffirmed"  
## [137] "ties"         "each"         "other"        "love"        
## [141] "community"    "country"      "no"           "matter"      
## [145] "what"         "god"          "prayed"       "race"        
## [149] "ethnicity"    "as"           "one"          "family"      
## [153] "also"         "resolve"      "protect"      "nation"      
## [157] "bring"        "committed"    "this"         "vicious"     
## [161] "justice"      "quickly"      "learned"      "attacks"     
## [165] "carried"      "out"          "organization" "headed"      
## [169] "which"        "had"          "openly"       "declared"    
## [173] "war"          "killing"      "innocents"    "around"      
## [177] "globe"        "so"           "went"         "against"     
## [181] "friends"      "allies"

wordSeries = apply(data.frame(words),1,function(x) which(x==uniqWords))
wordSeries[1:20]

##  [1]  1  2  3  4  5  6  7  8  9 10 11  7  8 12 13  8 14 15 16 17

plot(wordSeries,type='b')

Because there are many more words than there are characters, this will generally be seen across a large text as gradually increasing indices, but for our purposes this is meaningless (I suppose one could use this as a signal for the relative increase in lexical diversity in an exchange or block of text).

A nominal (or categorical) series like charSeries or wordSeries can be input immediately into the crqa() functions for analysis of categorical recurrence.

Later on, we will see how cross-recurrence can be computed on categorical time-series, but before going into it, let us delve into categorical auto-recurrence by looking at text, and compare the patterns observable in lyrics and speeches.

Categorical auto-recurrence

Recurrence quantification analysis has two basic steps. The first is ‘plotting the dynamics’ of a system in time, called the recurrence plot (RP). The second is quantifying points on that plot (which takes us to recurrence quantification: RQA).

This will be our first use of the crqa() function, using text from Obama’s speech.

We have a number of parameters to choose for both the RP and RQA. For now, we will set the parameters for you and introduce them incrementally across this code. For now, set the delay (or lag) to 1 (the series is categorical), embedding dimension to 1 (we are not interested in recurrence of bi-grams), and radius to .0001 (we want to match the same, numerically transformed, categorical state variable).

head(crqa)

##                                                                   
## 1 function (ts1, ts2, delay, embed, rescale, radius, normalize,   
## 2     mindiagline, minvertline, tw = 0, whiteline = F, recpt = F, 
## 3     side = "both", checkl = list(do = F))                       
## 4 {                                                               
## 5     v11 = v21 = NULL                                            
## 6     if (recpt == FALSE) {

## initialize the parameters

delay = 1; embed = 1; rescale = 1; radius = 0.0001;
normalize = 0; mindiagline = 2; minvertline = 2;
tw = 1 # Theiler window set to 1 to remove the Line of Incidence

res = crqa(obama, obama, delay, embed, rescale, radius, normalize,
                mindiagline, minvertline, tw, checkl =
                list(do = TRUE, datatype = "categorical", thrshd = 10))

RP = res$RP # take out the recurrence plot
RP = matrix(as.numeric(RP), nrow = length(obama)) # transform it for plotting

tstamp = seq(0, length(obama), 10)
cols = c("white","blue4")

par(mar = c(3.8, 3.8, 0.2,2), font.axis = 2, cex.axis = 1,
    font.lab = 2, cex.lab = 1.2)

plot(tstamp, tstamp, type = "n", xlab = "", ylab = "")

l = 1
for (l in 1:length(obama)){
  ind = which(RP[,l] == 1)
  points(rep(l,length(ind)), ind, cex = 1.2, col = "blue4", pch = 20)
  
}

mtext("Time (words)", at = mean(tstamp), side = 1, line = 2, cex = 1.2, font = 2)
mtext("Time (words)", at = mean(tstamp), side = 2, line = 2, cex = 1.2, font = 2)

Note that the RP variable that is generated by crqa() contains the recurrence matrix. We convert this sparse matrix (to save storing space) into a numeric matrix to plot recurrent vs. not recurrent points using a loop spanning the full matrix.

Now that we have a RP, we can quantify the number of points on the plot and the nature of their distribution. As described in the opening introduction, there are a number of useful measures we can extract to characterize the RP. These include RR (% of points), DET (% of points on diagonal lines), and so on. Here’s how we get these numbers:

unlist(res[1:9])

##         RR        DET     NRLINE       maxL          L       ENTR 
##  1.2973936 10.8516484 72.0000000  3.0000000  2.1944444  0.4926037 
##      rENTR        LAM         TT 
##  0.7106769  0.0000000        NaN

We observe a very low recurrence rate (RR = 1.29), which indicates variability in the lexicon of words used by Obama, and an average diagonal line of 2, which indicates that often only two words are sequentially repeated by Obama. Determinism is also pretty low, which could indicate an overall irregular repetitions in the text.

Importantly, we can’t really absolutely verify the significance of these measures, because we don’t know if this RR value is high or low until we have a comparison case. So now, let’s compare Obama speech, with the lyrics of a song by Carly Jepsen.

Below will be familiar to you, as we will simply process a new text file in order to produce a new RP for Carly Jepsen’s epic and heartwarming song ‘Call Me Maybe’

# first we do some necessary pre-processing of the raw text

jepsen <- unlist(strsplit(as.character(jepsen), " ")) ## make it a vector of words
jepsen = gsub("[[:punct:]]", "", jepsen) ## get rid of punctuations
jepsen = jepsen[-which(jepsen == "")] ## and of empty spaces
jepsen = tolower(jepsen) ## lower case everything

resJ = crqa(jepsen, jepsen, delay, embed, rescale, radius, normalize,
                mindiagline, minvertline, tw, checkl =
                  list(do = TRUE, datatype = "categorical", thrshd = 10))

RPJp = resJ$RP
RPJp = matrix(as.numeric(RPJp), nrow = length(jepsen))


par(mfrow = c(1,2), mar = c(3.8, 3.8, 0.2,2), font.axis = 2, cex.axis = 1,
    font.lab = 2, cex.lab = 1.2)

tstamp = seq(0, length(obama), 10)
cols = c("white","blue4")

plot(tstamp, tstamp, type = "n", xlab = "", ylab = "")

l = 1
for (l in 1:length(obama)){
  ind = which(RP[,l] == 1)
  points(rep(l,length(ind)), ind, cex = 1.2, col = "blue4", pch = 20)
  
}

mtext("Obama (words)", at = mean(tstamp), side = 1, line = 2, cex = 1.2, font = 2)
mtext("Obama (words)", at = mean(tstamp), side = 2, line = 2, cex = 1.2, font = 2)

tstamp = seq(0, length(jepsen), 10)
plot(tstamp, tstamp, type = "n", xlab = "", ylab = "")

l = 1
for (l in 1:length(jepsen)){
  # print(l)
  ind = which(RPJp[,l] == 1)
  points(rep(l,length(ind)), ind, cex = 1.2, col = "blue4", pch = 20)
  
}

mtext("Jepsen (words)", at = mean(tstamp), side = 1, line = 2, cex = 1.2, font = 2)
mtext("Jepsen (words)", at = mean(tstamp), side = 2, line = 2, cex = 1.2, font = 2)

# compare the results of two RPs
cbind( unlist(res[1:9]), unlist(resJ[1:9]) )

##              [,1]        [,2]
## RR      1.2973936   2.3481447
## DET    10.8516484  46.6395112
## NRLINE 72.0000000 226.0000000
## maxL    3.0000000  90.0000000
## L       2.1944444   8.1061947
## ENTR    0.4926037   1.8393225
## rENTR   0.7106769   0.7401978
## LAM     0.0000000   3.3604888
## TT            NaN   2.0000000

By comparing the two plots, we can already eyeball the difference in the texture between the two types of text. In particular, we find a higher RR in the lyrics of Jepsen, as compared to Obama’s speech. Also, the average diagonal line is now much longer, indicating repetition of longer sequences of words in Jepsen, as compared to Obama. As a result, there is also a higher determinism, i.e., Jepsen ‘system’ is more predictable than the Obama ‘system’.

Categorical cross-recurrence

In this section, we will construct: 1) a diagonal-wise recurrence, and 2) a windowed (time-course) cross-recurrence profile and 3) cross-recurrence over the whole plot. We also examine the role of different parameters (e.g., lag) on the recurrence observed, and look at limit cases to understand the impact of parameter setting.

Diagonal-wise recurrence profile (`drpdfromts`)

When we explore the recurrence between two time-series, we might be interested in the diagonal-wise recurrence profile, which is where the recurrence rate reflects the two time series visiting the same state at the same time (at the LOC), and on a small range of lags around it (the diagonals off the LOC). This provides a kind of ‘coupling function,’ akin to cross-correlation.

As data, we will use two eye-movement series of a narrative task, where a speaker describes the characters of a TV serial (e.g., Friends) to a listener (Richardson & Dale, 2005).

To compute the diagonal-wise cross-recurrence profile of two time-series we use the function drpdfromts. This function returns a recurrence profile with length equal to the number of lags considered, the maximal recurrence observed between the two series, and the lag at which it occurred.

head(drpdfromts)

##                                           
## 1 function (t1, t2, ws, datatype, radius) 
## 2 {                                       
## 3     drpd = vector()                     
## 4     if (datatype == "categorical") {    
## 5         t1 = as.character(as.matrix(t1))
## 6         t2 = as.character(as.matrix(t2))

The function takes 5 arguments:

ts1 and ts2 = the two time-series
ws = a constant indicating the range of lags (positive and negative) considered.
datatype = a character indicating whether the series are of continuous or categorical values.
radius = the distance to accept the points as recurrent.

The ws argument is basically setting how many seconds would our profile span. A value of 100, for example, would give us a span of +/- 3 seconds (each time-point represents a fixation 33~ms long).

The radius is used when the datatype is continuous, and it indicates the threshold distance to consider two points as recurring. If the series are categorical, but recoded as continuous values, you should set the value of the radius to be really small.

We run the function with speaker as ts1, and listener as ts2, a ws = 100, provide also the correct datatype, and a sensible radius.

speaker = eyemovement[, 1]
listener = eyemovement[, 2]

res = drpdfromts(speaker, listener, ws = 100, datatype = "continuous",
  radius = 0.000001)

str(res)

## List of 3
##  $ profile: num [1:201] 0.151 0.155 0.159 0.162 0.163 ...
##  $ maxrec : num 0.339
##  $ maxlag : int 126

The functions returns three outputs:

profile: the diagonal-wise recurrence-profile between the two series
maxrec: the maximal recurrence observed
maxlag: the lag at which maximal recurrence occurred

Plot the diagonal-wise recurrence profile

First, we extract the profile from res, and multiply it by 100 to represent recurrence in percentage.

profile = res$profile*100

In order to plot correctly the lags on our x-axis, we need to reconstruct the time-course of the lags from the ws information, and know that each sampling unit is 33 ms.

timecourse = round( seq(-3300,3300,33)/1000, digit = 1)

Let’s also take the lag at which maximal recurrence is observed.

mxlag = res$maxlag
## The timing in second of the lag
timecourse[mxlag]

## [1] 0.8

We visualize the diagonal-wise recurrence profile (y) over the lags examined (x), and overlay using abline() at which lag maximal recurrence occurred.

plot(timecourse, profile,type = "l",
     lwd = 5, xlab = "Lag (seconds)", ylab = "RR")
abline(v = timecourse[res$maxlag], lty = 2, col = "gray40", lwd = 3)
mtext(paste("Max Lag:", timecourse[res$maxlag], "sec."),
      at = timecourse[res$maxlag] + 1.2, side = 3, col = "red", line = -1.2,
      cex = .8)

We observe the expected pattern. The speaker’s gaze is ahead of the listener’s gaze by .8 seconds.

Explore the ws parameter of `drpfromts()`

The parameter ws controls the range of lags examined, with larger ws implying a wider range of lags explored.
Set ws to a small value, say 10 re-run the same code as before, and visualize it.

res = drpdfromts(speaker, listener, ws = 10, datatype = "categorical",
  radius = 0.000001)

For plotting, do exactly the same steps as before: - extract the profile and scale it to percentage - reconstruct the actual time-stamps of the lags

profile = res$profile * 100
timecourse = round( seq(-10*33, 10*33, 33)/1000, digit = 1)
plot(timecourse, res$profile,type = "l",
     lwd = 5, xlab = "Lag (seconds)", ylab = "RR")

We zoomed on a very narrow window of -330/330 ms which is not enough to capture the full underlying dynamics. We see increasing recurrence for positive lags, but we do not find a clear peak in it. Note, the ‘correct’ number of lags depend on the particular dataset examined.

The maximum ws that we could logically consider is the length of the sequence.

res = drpdfromts(speaker, listener, ws = 2000, datatype = "categorical",
  radius = 0.000001)

profile = res$profile * 100
timecourse = round( seq(-2000*33, 2000*33, 33)/1000, digit = 1)
plot(timecourse, res$profile,type = "l",
     lwd = 5, xlab = "Lag (seconds)", ylab = "RR")

The time series is only 66s so well outside there is simply nothing because there is no data. Try, if you want, to set the value to 4000, and visualize it. You would clearly see that after the 60 seconds of data recurrence drops to zero.

Note also, that when we lag +/- 60s, we are going out to the corners of the CRP. Thus the recurrence can get unstable and noisy due to the fewer number of points defining the profile.

The ts1 versus ts2 order when inputting `drpfromts()`

The direction adopted to lag the series relative to each other is arbitrarily defined, because we will span the whole range of possible lags (-/+). In dialogue research, the assumption is that speakers are usually ahead of the listeners, and so, we order the series such that recurrence increase for positive lags (speaker side).

Obviously, when we invert the input order speaker and listener, we will observe exactly the same pattern, but mirrored on the negative side.

res = drpdfromts(listener, speaker, ws = 100, datatype = "categorical",
                 radius = 0.000001)
timecourse = round( seq(-3300,3300,33)/1000, digit = 1)

plot(timecourse, res$profile,type = "l",
     lwd = 5, xlab = "Lag (seconds)", ylab = "RR")

Now the pattern is switched: the listener is behind the speaker by -0.8 ms (for max. recurrence to be observed). Note also that the max lag did not change value, just the sign.

timecourse[res$maxlag]

## [1] -0.8

Windowed diagonal-wise recurrence (`windowdrp`)

Diagonal-wise recurrence can also be computed across the time-course using function windowdrp().

Conceptually, we ‘roll’ an overlapping window of a fixed size over the two series. In this window, we lag the two series and calculate the recurrence observed. The recurrence value returned for each window is equal to the mean recurrence observed across all lags considered within the window. In practice, this analysis makes possible to track how recurrence (along the diagonal) evolves over the time-course.

head(windowdrp)

##                                                                 
## 1 function (x, y, step, windowsize, lagwidth, datatype, radius) 
## 2 {                                                             
## 3     x = as.vector(as.matrix(x))                               
## 4     y = as.vector(as.matrix(y))                               
## 5     points = seq(1, (length(x) - (windowsize) - 1), step)     
## 6     drpd = vector()

The function has 6 arguments:

x and y that are equivalent to ts1 and ts2 of drpdfromts
step = the interval used to move the window
windowsize = the number of time bins included in the window
lagwidth = the number of lags considered (equivalent to the ws parameter in drpdfromts)

We use a step of 20 and window size of 100 and a lag width of 40 to run the windowdrp.

step = 20; windowsize = 100; lagwidth = 40

res = windowdrp(speaker, listener, step, windowsize,
    lagwidth, datatype = "categorical", radius = 0.000001)

str(res)

## List of 3
##  $ profile: num [1:95] 0.0908 0.1107 0.1694 0.1462 0.1115 ...
##  $ maxrec : num 0.72
##  $ maxlag : int 86

The outputs are the same as those returned by drpdfromts, with the difference that now the recurrence stored in the profile output, is tracked over the actual time-course of the series, and not over a set of lags. The time-course, however, needs to be reconstructed using the information of step size, which is the unit we use to move the window.

profile = res$profile*100
timecourse = round( seq(1, length(profile), 1)*step*.033, digit = 1)

plot(timecourse, profile, type = "l", lwd = 5,
     xlab = "Time-course (seconds)", ylab = "RR")
abline(v = timecourse[res$maxlag], lty = 2, col = "gray40", lwd = 3)
mtext(paste("Max Recurrence:", timecourse[res$maxlag], "sec."),
      at = timecourse[res$maxlag] - 12, side = 3, col = "red", line = -.9,
      cex = .8)

This analysis shows that recurrence between speaker and listener increases as they speak, and it is maximal at the end of the narration.

Explore the parameters of `windowdrp()`

There are two main things you can observe on the recurrence when we modulate the parameters:

if we reduce the size of the window, we obtain a more spiky recurrence, whereas if we increase the size of it, we observe a smoother recurrence. To a lesser degree, the size of the step follows the same rationale.
as for drpdfromts(), the maximum lag width we can explore corresponds to the size of the window considered. If we take a lagwidth larger than the size of the window, we would just observe no recurrence.

step = 20; windowsize = 200; lagwidth = 100
res = windowdrp(speaker, listener, step, windowsize,
    lagwidth, datatype = "categorical", radius = 0.000001)

profile = res$profile*100
timecourse = round( seq(1, length(profile), 1)*step*.033, digit = 1)
plot(timecourse, profile, type = "l", lwd = 5,
     xlab = "Time-course (seconds)", ylab = "RR")

There is no ‘magic’ setting for these three parameters, and you should play around with steps and with windows of different size (to a lesser extend lag) to figure out what values fit best the data under examination.

Cross-recurrence over the whole plot using `crqa()`

To recap, with crqa we quantify the recurrence rate observed between two time-series, when lagged and embedded in different dimensions. From an original time series \(X(t)\), lagged copies \(X(t + \tau)\) are generated by introducing a lag unit \(\tau\) into the original time series. Embedding dimensions are obtained by multiplying \(\tau\) with a constant \(m\), \(X(t + m\tau)\).

head(crqa)

##                                                                   
## 1 function (ts1, ts2, delay, embed, rescale, radius, normalize,   
## 2     mindiagline, minvertline, tw = 0, whiteline = F, recpt = F, 
## 3     side = "both", checkl = list(do = F))                       
## 4 {                                                               
## 5     v11 = v21 = NULL                                            
## 6     if (recpt == FALSE) {

There are 12 arguments:

ts1 and ts2 = the two time-series
delay = a constant indicating the number of time points we use to lag the series (corresponds to the \(\tau\) of the equation).
embed = a constant indicating the interval of the lags (corresponds to the \(m\) of the equation). .
rescale = the distance matrix (on which recurrence is evaluated); if rescale = 0 (do nothing); if rescale = 1 (mean distance of entire matrix); if rescale = 2 (maximum distance of entire matrix).
radius: a constant used to decide when two points are close enough to be considered recurrent.
normalize: the time-series; if normalize = 0 (do nothing); if normalize = 1 (unit interval); if normalize = 2 (z-score).
mindiagline: A minimum diagonal length of recurrent points. The minimum should be 2, because two points defines a line.
minvertline: A minimum vertical length of recurrent points.
tw: The Theiler window parameter. If set to 0, we keep all points in the recurrence plot. A tw of 1 would remove the recurrent points of the main diagonal. From values > 1, recurrent points are removed at the diagonal, and the lines parallel to it (+/-).
whiteline: A logical flag to calculate (TRUE) or not (FALSE) empty vertical lines. Empty vertical lines are currently not used to compute any specific measure.
recpt: A logical flag indicating whether we are directly inputting a recurrence plot (TRUE), and we want to extract just the measures. If this parameter is TRUE, the recurrence plot should be introduced as ts1, and ts2 = NULL.
side: A string indicating whether recurrence measures should be calculated in the ‘upper’ triangle of the RP ‘lower’ triangle of the matrix, or ‘both’. LOC is automatically excluded for ‘upper’ and ‘lower’.
checkl: calls checkts(), and it performs an user-directed control of the series length. Default is do not check.

We build a cross-recurrence plot (CRP) of speaker and listener gazes with starting values of 1 for delay and embed, and a very small radius. We rescale the distance matrix to the mean, do not normalize the series, and accept lines only if they have a minimum length of 2 (both diagonal and vertical). We do not compute whitelines, are constructing the recpt ourselves, and set the Theiler window to 0.

delay = 1; embed =  1 ; rescale =  1; radius = 0.00001;
normalize = 0; minvertline = 2; mindiagline = 2; whiteline = FALSE;
recpt = FALSE; 
tw = 0 # now Theiler is set to 0 because we want to consider the Line of Coincidence

res = crqa(speaker, listener, delay, embed, rescale, radius,
    normalize, minvertline, mindiagline, tw,  whiteline, recpt)

Have a look at the structure of the results

str(res)

## List of 10
##  $ RR    : num 12.5
##  $ DET   : num 99
##  $ NRLINE: int 43852
##  $ maxL  : num 124
##  $ L     : num 11.3
##  $ ENTR  : num 3.2
##  $ rENTR : num 0.666
##  $ LAM   : num 99.7
##  $ TT    : num 20.6
##  $ RP    :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
##   .. ..@ i       : int [1:500916] 0 1 11 12 13 14 15 16 17 18 ...
##   .. ..@ p       : int [1:2001] 0 0 314 628 628 628 628 628 628 628 ...
##   .. ..@ Dim     : int [1:2] 2000 2000
##   .. ..@ Dimnames:List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : NULL
##   .. ..@ factors : list()

The function returns 10 outputs, the first 9 are measures of the cross-recurrence plot, while the last is the C/RP itself.

Plotting the CRP

CRP = res$RP
dim(CRP)

## [1] 2000 2000

CRP = matrix(as.numeric(CRP), nrow = nrow(CRP))

The CRP is a square matrix with dimensions equal to the length of the series. CRP[i.j] tells you how much recurrence was found when comparing the ith entry of speaker (ts1) and the jth entry of listener (ts2).

The matrix is stored in a sparse format (to save space on the memory RAM), and we make it numeric for plotting.

tstamp = seq(0, nrow(CRP), 10)
cols = c("white","blue4")

plot(tstamp, tstamp, type = "n", xlab = "", ylab = "")

l = 1
for (l in 1:nrow(CRP)){
  ind = which(CRP[,l] == 1)
  points(rep(l,length(ind)), ind, cex = 1.2, col = "blue4", pch = 20)
  
}

mtext("Speaker", at = mean(tstamp), side = 1, line = 2, cex = 1.2, font = 2)
mtext("Listener", at = mean(tstamp), side = 2, line = 2, cex = 1.2, font = 2)

tstamp = seq(0,60,6)
rtime = seq(0, 1, 0.1)
cols = c("white","blue4")

If you did not manage to recreate the axes, just leave the defaults (0,1).

Now, let’s plot the image of CRP, and remember that you need to suppress the plotting of the axes (have a look at the arguments xaxt and yaxt); and overlay the new axes (have a look at the argument mtext).

Interpreting the measures

The measure currently computed on the CRP by crqa() are:

RR: The percentage of recurrent points in the plot
DET: Proportion of recurrent points forming diagonal line structures
NRLINE: The total number of lines in the recurrence plot
maxL: The length of the longest diagonal line segment in the plot
L: The average length of diagonal line structures
ENTR: Shannon information entropy of diagonal lines (>2)
rENTR: entropy relative to the number of lines in plot.
LAM: Proportion of recurrent points forming vertical line structures
’TT`: The average length of vertical line structures

Check again the structure of res to see the values these measures take in the context of our data.

str(res)

## List of 10
##  $ RR    : num 12.5
##  $ DET   : num 99
##  $ NRLINE: int 43852
##  $ maxL  : num 124
##  $ L     : num 11.3
##  $ ENTR  : num 3.2
##  $ rENTR : num 0.666
##  $ LAM   : num 99.7
##  $ TT    : num 20.6
##  $ RP    :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
##   .. ..@ i       : int [1:500916] 0 1 11 12 13 14 15 16 17 18 ...
##   .. ..@ p       : int [1:2001] 0 0 314 628 628 628 628 628 628 628 ...
##   .. ..@ Dim     : int [1:2] 2000 2000
##   .. ..@ Dimnames:List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : NULL
##   .. ..@ factors : list()

Usually, these values are interpreted relatively to experimental conditions within an epxeriment. In general, %DET will be higher than %REC, with %DET often quite high (90% or higher) and %REC considerably lower (10% or less), so 12% would be considered relatively high.

Continuous auto-recurrence using the Lorenz Attractor model

We explore the properties of a Lorenz attractor, a classic example of a dynamical system (differential equations), which describes the motion of a possible particle which will neither converge to a steady state nor diverge to infinity, but rather stay in a bounded but ‘chaotically’ defined region (the attractor).

Simulated data using the Lorenz attractor system

head(lorenzattractor)

##                                               
## 1 function (numsteps, dt, sigma, r, b, plots) 
## 2 {                                           
## 3     require(plot3D)                         
## 4     x = matrix(0, nrow = numsteps, ncol = 3)
## 5     for (gn in 1:3) x[1, gn] = 10 * rnorm(1)
## 6     x0 = x[1, ]

The function takes 6 arguments: - numsteps = the number of simulated points - dt, sigma, r, b = parameters characterizing the motion of the system - plots = a logical flag indicating whether the resulting system should be plotted (TRUE) or not.

The output of the function is a matrix with the 3 dimensions within which the system unfolds.

Simulate some data using the standard parameters used by Lorenz, 1963: dt = .01; sigma = 10; r = 28; b = 8/3, on numsteps = 500. Call the variable returned by the function lorenz.

dt = .01; sigma = 10; r = 28; b = 8/3; numsteps = 500
plots = TRUE
lorenz = lorenzattractor(numsteps, dt, sigma, r, b, plots)

The Lorenz system has three dimensions (or states), as it can be seen from the plot. We want to reconstruct the recurrence pattern characterizing the periodicity of the system from one of its dimension. In practice, we want to study the pattern of auto-recurrence, i.e., how is the signal recurring with itself, and work out its overall dynamics.

delay = 1; embed =  1 ; rescale =  1; radius = 5;
normalize = 0; minvertline = 2; mindiagline = 2; whiteline = FALSE;
recpt = FALSE; tw = 0

res = crqa(lorenz[,1], lorenz[,1], delay, embed, rescale, radius,
    normalize, minvertline, mindiagline, tw,  whiteline, recpt)

We can use crqa() on the first dimension of the “lorenz” dimension, and input it as the argument of ts1 and ts2, and have a delay of 1 an embedding dimension of 1, and a radius of 5.

Print only the measures returned by crqa() from res

print(unlist(res[1:9]))

##           RR          DET       NRLINE         maxL            L 
##    3.5576000   71.9811109 1337.0000000  500.0000000    4.7883321 
##         ENTR        rENTR          LAM           TT 
##    1.7852255    0.5151072   84.2253204    4.0339257

To visualize the RQA of the Lorenz, we: 1) extract the RP from res, 2) convert it into a normal matrix, i.e., not sparse, 3) assign two discriminant colors for recurrent/non-recurrent point. We use image, instead of a loop, just to have a shorter code.

RP = res$RP

RP = matrix(as.numeric(RP), nrow = ncol(RP)) 
cols = c("white","blue4")

image(RP, xlab = "", ylab = "", col = cols)

We can observe that there are perpendicular recurrent structures (to the main diagonal) in the plot. This indicates that the delay and embedding dimension are not chosen correctly. In fact, we are using a delay of 1 and embedding of 1, which do not properly unfold the system. We should, instead, see only diagonal lines structures appearing when these parameters are correctly chosen.

Finding “optimal” parameters for `crqa()` on continuous time-series

We follow principles of phase-space reconstruction to obtain “optimal” starting parameters for delay, embed and radius:

Identify a delay at which mutual information drops, and starts to level off.
Determine embedding dimensions using false nearest neighbors, and checking when there is no information gain in adding more dimensions.
Determine a radius yielding a recurrence rate between 2-5.

Use mutual information of a time-series to find out optimal delay.

Suppose that we observe only one dimension of the Lorenz, \(X(t)\), and we want to reconstruct its phase-space. We want to find a delay \(\tau\) for \(X(t) = X(t + \tau)\) that it is not too short (the data is not distributed near the same line). Nor, we want a delay that is too long, which makes the coordinates of our time-series independent, and there is no information to gain.

Thus, we want to find that delay which minimize mutual information.

delays = mutual(lorenz[,1])

The function returns a vector (and a plot) with the average mutual information index (ami) of a give time series for a specified number of lags (or delays).

del = which(as.vector(delays) == min(delays))

We can see that local minimum for 20 delays (the default) examined is 17 (including 0). This is the delay which minimizes the interaction between points of the time series, and it implies that we need shift the time-series by 17 points to calculate recurrence, optimally.

Use false nearest neighbors to find out optimal embedding dimension.

The optimal embedding dimension is obtained using false nearest neighbors, whereby the distance between points of the series is calculated when projected into higher ‘embedded dimensions’. Remember that embedding is the constant \(m\) applied to the \(\tau\) when time-series are lagged: \(X[t + m\tau]\). The algorithm eliminates those points, initially close, that become separated in higher embedding dimensions. False neighbors are basically caught when their distance increases going from dimension \(m\) to \(m+1\).

false.nearest() is used to find out the optimal embedding dimension by specifying the maximum number of embeddings, the delay parameter, the Theiler window and escape factor (left respectively to the default 0 and 10) and the neighborhood diameter (the default of which is sd(series)/10.

embeds = false.nearest(lorenz[,1], m = 20, d = del, t = 0,
        rt = 10, eps = sd(lorenz[,1])/10)
plot(embeds)

print(embeds)

##                    m1           m2           m3           m4           m5
## fraction 7.881468e-01 2.386878e-01 8.333333e-03 0.000000e+00 0.000000e+00
## total    1.356600e+04 1.768000e+03 7.200000e+02 4.540000e+02 2.240000e+02
##                    m6           m7           m8           m9          m10
## fraction 0.000000e+00                                                    
## total    7.800000e+01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
##                   m11          m12          m13          m14          m15
## fraction                                                                 
## total    0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
##                   m16          m17          m18          m19          m20
## fraction                                                                 
## total    0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00

The first row gives the percentage of false nearest neighbors found, the second gives that actual number of neighbors. By looking at the fraction, we can see that the 5th dimension is the one after which there is no more gain.

Find optimal radius

Once we know both the optimal lag and embedding dimension we want to obtain a radius that yields a recurrence rate between 2 and 5%.

To understand the parameter of radius, we create a loop going from a radius of 5 (yielding a recurrence of about 3.5%) up to a radius returning 100% recurrence.

We generate a sequence of 50 radi with increments equal to the sd() of the values observed in lorenz[,1].

sdev = sd(lorenz[,1])
mxvalue = 60*sdev
radi = seq(5,mxvalue,sdev)

We run a loop to visualize in a plot, these radi using crqa on lorenz[,1], and use the values for lag and embedding found above (i.e., 16 and 4).

lag = 16; embed = 4
recurrences = vector()
for (r in radi){

    # print(r)
    radius = r
    res = crqa(lorenz[,1], lorenz[,1], lag, embed, rescale, radius,
    normalize, minvertline, mindiagline, tw,  whiteline, recpt)

    currentrec = res$RR

    if (currentrec == 100){
        break
    } else {
        recurrences = c(recurrences, currentrec)

    }    
}

effradi = radi[1:length(recurrences)]

plot(effradi, recurrences, lwd = 4, type = "o",
     xlab = "size of radius", ylab = "recurrence (RR)")

Get the parameters setting using `optimizeParam()`

The arguments for optimizeParam should be provided in the format of a list(). In this list, the arguments we need to specify in a par (list) variable are:

lgM = maximum lag when calculating average mutual information
fnnpercent = the percentage of information gained, relative to the first embedding dimension, when higher embeddings are considered.
radiusspan = a constant setting the granularity of radius unit to explore (larger value = smaller units)
radiussample = the number of equally spaced units of the radius to explore. (Larger value = more units)
typeami = decide on how to consider improvements in mutual information:
If typeami == mindip, then take the minimum lag at which we find the minimum dip).
If typeami == maxlag, then we select the maximum lag at which minimum dip is observed, when comparing the three cases: mi(ts1), mi(ts2), ami(ts1/ts2).

Do not forget that we also need to specify the arguments for crqa(): normalize, rescale, mindiagline, minvertline, tw, whiteline, recpt. Note, the arguments delay,embed,radius are not inputted as arguments, because that’s what we are meant to achieve as result.

par = list(lgM =  20,
    fnnpercent = 10,  ## percentage of false neighbours excluded
                      ## when adding one more dimension, relative to
                      ## the first dimension.
    radiusspan = 100,       ## the span of radius examined, relative to
                           ## SD of the distance matrix 
                           ## (larger constant -> wider range of values)
    radiussample = 40, ## the number of samples of equally
                       ## spaced radius values to be examined
    normalize = 0, rescale = 1, mindiagline = 2, minvertline = 2,
    tw = 0, whiteline = FALSE, recpt = FALSE, typeami = "mindip")

res = optimizeParam(lorenz[,1], lorenz[,1], par, min.rec = 2, max.rec = 5)
print(res)

## $radius
## [1] 16.97335
## 
## $emddim
## [1] 2
## 
## $delay
## [1] 15

r = res$radius
d = res$delay
e = res$emddim

delay = d; embed =  e ; rescale =  1; radius = r;
normalize = 0; minvertline = 2; mindiagline = 2; whiteline = FALSE;
recpt = FALSE; tw = 0

res = crqa(lorenz[, 1], lorenz[, 1], delay, embed, rescale, radius,
    normalize, minvertline, mindiagline, tw,  whiteline, recpt)
unlist( res[1:9] )

##          RR         DET      NRLINE        maxL           L        ENTR 
##   4.6606441  99.7993250 577.0000000 485.0000000  18.9618718   3.4480938 
##       rENTR         LAM          TT 
##   0.8491915  99.7081091   8.7518014

RP = res$RP

RP = matrix(as.numeric(RP), nrow = ncol(RP)) 
cols = c("white","blue4")

image(RP, xlab = "", ylab = "", col = cols)

We have now cleared up a bit those structures perpendicular to the main diagonal, indicating that the embedding dimension and lag have been ‘better’ chosen.

There is no magic potion for finding out the parameter settings, and it is best to explore different solutions, as well as, try on a larger sample of the data to get the actual range distibution.