Chapter 1 Introduction

This is the supplemental material associated with our 2022 ECJ contribution entitled, A suite of diagnostic metrics for characterizing selection schemes. Preprint here.

1.1 About our supplemental material

This supplemental material is hosted on GitHub using GitHub pages. The source code and configuration files used to generate this supplemental material can be found in this GitHub repository. We compiled our data analyses and supplemental documentation into this nifty web-accessible book using bookdown.

Our supplemental material includes the following paper figures and statistics:

  • Exploitation rate results (Section 2)
  • Ordered exploitation results (Section 3)
  • Contradictory objectives results (Section 4)
  • Multi-path exploration results (Section 6.4)
  • Multi-valley crossing results (Section 2.5)

Additionally, our supplemental material includes the results from parameter tuning selection schemes:

  • Truncation selection (Section 6)
  • Tournament selection sharing (Section 7)
  • Genotypic fitness sharing (Section 8)
  • Phenotypic fitness sharing (Section 9)
  • Nondominated sorting (Section 10)
  • Novelty search (Section 11)

1.3 Research overview

Abstract:

Evolutionary algorithms typically consist of multiple interacting components, where each component influences an algorithm’s problem-solving abilities. Understanding how each component of an evolutionary algorithm influences problem-solving success can improve our ability to target particular problem domains. Benchmark suites provide insights into an evolutionary algorithm’s problem-solving capabilities, but benchmarking problems often have complex search space topologies, making it difficult to isolate and test an algorithm’s strengths and weaknesses. Our work focuses on diagnosing selection schemes, which identify individuals to contribute genetic material to the next generation, thus driving an evolutionary algorithm’s search strategy. We introduce four diagnostics for empirically testing the strengths and weaknesses of selection schemes: the exploitation rate diagnostic, ordered exploitation rate diagnostic, contradictory objectives diagnostic, and the multi-path exploration diagnostic. Each diagnostic is a handcrafted search space designed to isolate and measure the relative exploitation and exploration characteristics of selection schemes. Here, we use our diagnostics to evaluate six population selection methods: truncation selection, tournament selection, fitness sharing, lexicase selection, nondominated sorting, and novelty search. Expectedly, tournament and truncation selection excelled at gradient exploitation but poorly explored search spaces, while novelty search excelled at exploration but failed to exploit gradients. Fitness sharing performed poorly across all diagnostics, suggesting poor overall exploitation and exploration abilities. Nondominated sorting was best for maintaining diverse populations comprised of individuals inhabiting multiple optima, but struggled to effectively exploit gradients. Lexicase selection balanced search space exploration without sacrificing exploitation, generally performing well across diagnostics. Our work demonstrates the value of diagnostics for building a deeper understanding of selection schemes, which can then be used to improve or develop new selection methods.

1.4 Computer Setup

These analyses were conducted in the following computing environment:

##                _                                          
## platform       x86_64-pc-linux-gnu                        
## arch           x86_64                                     
## os             linux-gnu                                  
## system         x86_64, linux-gnu                          
## status         Patched                                    
## major          4                                          
## minor          2.2                                        
## year           2022                                       
## month          11                                         
## day            10                                         
## svn rev        83330                                      
## language       R                                          
## version.string R version 4.2.2 Patched (2022-11-10 r83330)
## nickname       Innocent and Trusting

1.5 Experimental setup

Setting up required variables variables.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## -- Attaching packages --------------------------------------- tidyverse 1.3.2
## --
## v ggplot2 3.4.0     v purrr   1.0.1
## v tibble  3.1.8     v stringr 1.5.0
## v tidyr   1.3.0     v forcats 1.0.0
## v readr   2.1.3     
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::arrange()   masks plyr::arrange()
## x purrr::compact()   masks plyr::compact()
## x dplyr::count()     masks plyr::count()
## x dplyr::desc()      masks plyr::desc()
## x dplyr::failwith()  masks plyr::failwith()
## x dplyr::filter()    masks stats::filter()
## x dplyr::id()        masks plyr::id()
## x dplyr::lag()       masks stats::lag()
## x dplyr::mutate()    masks plyr::mutate()
## x dplyr::rename()    masks plyr::rename()
## x dplyr::summarise() masks plyr::summarise()
## x dplyr::summarize() masks plyr::summarize()
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## i Please use the `linewidth` argument instead.
# default variables
REPLICATES = 50
DIMENSIONALITY = 100

# selection scheme related stuff
ACRON = tolower(c('TRU','TOR','LEX','GFS','PFS','NDS','NOV','RAN'))
NAMES = c('Truncation (tru)','Tournament (tor)','Lexicase (lex)', 'Genotypic Fitness Sharing (gfs)','Phenotypic Fitness Sharing (pfs)','Nondominated Sorting (nds)','Novelty Search (nov)','Random (ran)')
SCHEME = c('TRUNCATION','TOURNAMENT','LEXICASE','FITSHARING_G','FITSHARING_P','NONDOMINATEDSORTING','NOVELTY','TOURNAMENT')
ORDER = c('Truncation (tru)','Tournament (tor)','Lexicase (lex)', 'Genotypic Fitness Sharing (gfs)','Phenotypic Fitness Sharing (pfs)','Nondominated Sorting (nds)','Novelty Search (nov)','Random (ran)')

# selection scheme parameters
TR_LIST = c(1, 2, 4, 8, 16, 32, 64, 128, 256)
TS_LIST = c(2, 4, 8, 16, 32, 64, 128, 256)
FS_LIST = c(0.0, 0.1, 0.3, 0.6, 1.2, 2.5, 5.0)
ND_LIST = c(0.0, 0.1, 0.3, 0.6, 1.2, 2.5, 5.0)
NS_LIST = c(1, 2, 4, 8, 15, 30)

# selection scheme parameter we are looking for
PARAM = c('8', '8', '0.0', '0.3', '0.3', '0.3', '15', '1')

# for diagnostic loops
DIAGNOSTIC = tolower(c('EXPLOITATION_RATE', 'ORDERED_EXPLOITATION', 'CONTRADICTORY_OBJECTIVES', 'MULTIPATH_EXPLORATION'))

# data diractory for gh-pages
DATA_DIR = '/opt/ECJ-2022-suite-of-diagnostics-for-selection-schemes/DATA-FINAL/'

######################################################################################
# go through each diagnostic and collect over time data for cross comparison (cc)
print('Collecting over time data...')
## [1] "Collecting over time data..."
## [1] "DIAGNOSTIC exploitation_rate"
## [1] "SCHEME: TRUNCATION"
## [1] "SCHEME: TOURNAMENT"
## [1] "SCHEME: LEXICASE"
## [1] "SCHEME: FITSHARING_G"
## [1] "SCHEME: FITSHARING_P"
## [1] "SCHEME: NONDOMINATEDSORTING"
## [1] "SCHEME: NOVELTY"
## [1] "SCHEME: TOURNAMENT"
## [1] "DIAGNOSTIC ordered_exploitation"
## [1] "SCHEME: TRUNCATION"
## [1] "SCHEME: TOURNAMENT"
## [1] "SCHEME: LEXICASE"
## [1] "SCHEME: FITSHARING_G"
## [1] "SCHEME: FITSHARING_P"
## [1] "SCHEME: NONDOMINATEDSORTING"
## [1] "SCHEME: NOVELTY"
## [1] "SCHEME: TOURNAMENT"
## [1] "DIAGNOSTIC contradictory_objectives"
## [1] "SCHEME: TRUNCATION"
## [1] "SCHEME: TOURNAMENT"
## [1] "SCHEME: LEXICASE"
## [1] "SCHEME: FITSHARING_G"
## [1] "SCHEME: FITSHARING_P"
## [1] "SCHEME: NONDOMINATEDSORTING"
## [1] "SCHEME: NOVELTY"
## [1] "SCHEME: TOURNAMENT"
## [1] "DIAGNOSTIC multipath_exploration"
## [1] "SCHEME: TRUNCATION"
## [1] "SCHEME: TOURNAMENT"
## [1] "SCHEME: LEXICASE"
## [1] "SCHEME: FITSHARING_G"
## [1] "SCHEME: FITSHARING_P"
## [1] "SCHEME: NONDOMINATEDSORTING"
## [1] "SCHEME: NOVELTY"
## [1] "SCHEME: TOURNAMENT"
cc_over_time$`Selection\nScheme` <- factor(cc_over_time$`Selection\nScheme`, levels = ORDER)
cc_over_time$acron <- factor(cc_over_time$acron, levels = ACRON)
cc_over_time$uni_str_pos = cc_over_time$uni_str_pos + cc_over_time$arc_acti_gene - cc_over_time$overlap
cc_over_time = subset(cc_over_time, select = -c(trt,pop_fit_avg,archive_cnt,pmin,pareto_cnt,arc_acti_gene,overlap))

cc_over_time_mvc$`Selection\nScheme` <- factor(cc_over_time$`Selection\nScheme`, levels = ORDER)
cc_over_time_mvc$acron <- factor(cc_over_time$acron, levels = ACRON)
cc_over_time_mvc$uni_str_pos = cc_over_time_mvc$uni_str_pos + cc_over_time_mvc$arc_acti_gene - cc_over_time_mvc$overlap
cc_over_time_mvc = subset(cc_over_time_mvc, select = -c(trt,pop_fit_avg,archive_cnt,pmin,pareto_cnt,arc_acti_gene,overlap))


######################################################################################
# go through each diagnostic and collect best over time for cross comparison (cc)
cc_best = data.frame()
cc_best_mvc = data.frame()
for(diagnostic in DIAGNOSTIC)
{
  print(paste('DIAGNOSTIC',diagnostic))
  for(i in 1:8)
  {
    print(paste('SCHEME:',SCHEME[i]))
    dir = paste(DATA_DIR,'NO-MVC/',SCHEME[i],'/best-',diagnostic,'-', tolower(SCHEME[i]), '.csv', sep = "", collapse = NULL)
    dir_mvc = paste(DATA_DIR,'MVC/',SCHEME[i],'/best-',diagnostic,'-', tolower(SCHEME[i]), '.csv', sep = "", collapse = NULL)

    # read csv
    df = read.csv(dir, header = TRUE, stringsAsFactors = FALSE)
    df_mvc = read.csv(dir_mvc, header = TRUE, stringsAsFactors = FALSE)

    # add names/tags
    df$acron = ACRON[i]
    df$`Selection\nScheme` = NAMES[i]
    df$diagnostic = diagnostic
    df = subset(df, select = -c(Diagnostic,SEL) )

    df_mvc$acron = ACRON[i]
    df_mvc$`Selection\nScheme` = NAMES[i]
    df_mvc$diagnostic = diagnostic
    df_mvc = subset(df_mvc, select = -c(Diagnostic,SEL) )

    # add to cc_over_time data frame
    if(i == 3)
    {
      cc_best = rbind(cc_best, df)
      cc_best_mvc = rbind(cc_best_mvc, df_mvc)
    }
    else
    {
      cc_best = rbind(cc_best, filter(df, trt == PARAM[i]))
      cc_best_mvc = rbind(cc_best_mvc, filter(df_mvc, trt == PARAM[i]))
    }
  }
  rm(df);  rm(df_mvc);  rm(dir);   rm(dir_mvc)
}
## [1] "DIAGNOSTIC exploitation_rate"
## [1] "SCHEME: TRUNCATION"
## [1] "SCHEME: TOURNAMENT"
## [1] "SCHEME: LEXICASE"
## [1] "SCHEME: FITSHARING_G"
## [1] "SCHEME: FITSHARING_P"
## [1] "SCHEME: NONDOMINATEDSORTING"
## [1] "SCHEME: NOVELTY"
## [1] "SCHEME: TOURNAMENT"
## [1] "DIAGNOSTIC ordered_exploitation"
## [1] "SCHEME: TRUNCATION"
## [1] "SCHEME: TOURNAMENT"
## [1] "SCHEME: LEXICASE"
## [1] "SCHEME: FITSHARING_G"
## [1] "SCHEME: FITSHARING_P"
## [1] "SCHEME: NONDOMINATEDSORTING"
## [1] "SCHEME: NOVELTY"
## [1] "SCHEME: TOURNAMENT"
## [1] "DIAGNOSTIC contradictory_objectives"
## [1] "SCHEME: TRUNCATION"
## [1] "SCHEME: TOURNAMENT"
## [1] "SCHEME: LEXICASE"
## [1] "SCHEME: FITSHARING_G"
## [1] "SCHEME: FITSHARING_P"
## [1] "SCHEME: NONDOMINATEDSORTING"
## [1] "SCHEME: NOVELTY"
## [1] "SCHEME: TOURNAMENT"
## [1] "DIAGNOSTIC multipath_exploration"
## [1] "SCHEME: TRUNCATION"
## [1] "SCHEME: TOURNAMENT"
## [1] "SCHEME: LEXICASE"
## [1] "SCHEME: FITSHARING_G"
## [1] "SCHEME: FITSHARING_P"
## [1] "SCHEME: NONDOMINATEDSORTING"
## [1] "SCHEME: NOVELTY"
## [1] "SCHEME: TOURNAMENT"
## [1] "DIAGNOSTIC exploitation_rate"
## [1] "SCHEME: TRUNCATION"
## [1] "SCHEME: TOURNAMENT"
## [1] "SCHEME: LEXICASE"
## [1] "SCHEME: FITSHARING_G"
## [1] "SCHEME: FITSHARING_P"
## [1] "SCHEME: NONDOMINATEDSORTING"
## [1] "SCHEME: NOVELTY"
## [1] "SCHEME: TOURNAMENT"
## [1] "DIAGNOSTIC ordered_exploitation"
## [1] "SCHEME: TRUNCATION"
## [1] "SCHEME: TOURNAMENT"
## [1] "SCHEME: LEXICASE"
## [1] "SCHEME: FITSHARING_G"
## [1] "SCHEME: FITSHARING_P"
## [1] "SCHEME: NONDOMINATEDSORTING"
## [1] "SCHEME: NOVELTY"
## [1] "SCHEME: TOURNAMENT"
## [1] "TRUNCATION"
## [1] "TOURNAMENT"
## [1] "FITSHARING_G"
## [1] "FITSHARING_P"
## [1] "NONDOMINATEDSORTING"
## [1] "NOVELTY"
# remove unused data

ss_over_time$uni_str_pos = ss_over_time$uni_str_pos + ss_over_time$arc_acti_gene - ss_over_time$overlap
ss_over_time = subset(ss_over_time, select = -c(pop_fit_avg,archive_cnt,pmin,pareto_cnt,arc_acti_gene,overlap))

ss_over_time_mvc$uni_str_pos = ss_over_time_mvc$uni_str_pos + ss_over_time_mvc$arc_acti_gene - ss_over_time_mvc$overlap
ss_over_time_mvc = subset(ss_over_time_mvc, select = -c(pop_fit_avg,archive_cnt,pmin,pareto_cnt,arc_acti_gene,overlap))

## tournament data frames
tor_ot <- data.frame()
tor_ot <- filter(ss_over_time, acron == 'tor' & trt != 1)
tor_ot$T <- factor(tor_ot$trt, levels = TS_LIST)
tor_ot <- subset(tor_ot, select = -c(acron,trt))

## truncation data frames
tru_ot <- data.frame()
tru_ot <- filter(ss_over_time, acron == 'tru')
tru_ot$T <- factor(tru_ot$trt, levels = TR_LIST)
tru_ot <- subset(tru_ot, select = -c(acron,trt))

## genotypic fitness sharing data frames
gfs_ot <- data.frame()
gfs_ot <- filter(ss_over_time, acron == 'gfs')
gfs_ot$Sigma <- factor(gfs_ot$trt, levels = FS_LIST)
gfs_ot <- subset(gfs_ot, select = -c(acron,trt))

## phenotypic fitness sharing data frames
pfs_ot <- data.frame()
pfs_ot <- filter(ss_over_time, acron == 'pfs')
pfs_ot$Sigma <- factor(pfs_ot$trt, levels = FS_LIST)
pfs_ot <- subset(pfs_ot, select = -c(acron,trt))


## nodominated sorting data frames
nds_ot <- data.frame()
nds_ot <- filter(ss_over_time, acron == 'nds')
nds_ot$Sigma <- factor(nds_ot$trt, levels = ND_LIST)
nds_ot <- subset(nds_ot, select = -c(acron,trt))

## novelty search data frames
nov_ot <- data.frame()
nov_ot <- filter(ss_over_time, acron == 'nov' & trt != 0)
nov_ot$K <- factor(nov_ot$trt, levels = NS_LIST)
nov_ot <- subset(nov_ot, select = -c(acron,trt))

## tournament data frames mvc
tor_ot_mvc <- data.frame()
tor_ot_mvc <- filter(ss_over_time_mvc, acron == 'tor' & trt != 1)
tor_ot_mvc$T <- factor(tor_ot_mvc$trt, levels = TS_LIST)
tor_ot_mvc <- subset(tor_ot_mvc, select = -c(acron,trt))

## truncation data frames mvc
tru_ot_mvc <- data.frame()
tru_ot_mvc <- filter(ss_over_time_mvc, acron == 'tru')
tru_ot_mvc$T <- factor(tru_ot_mvc$trt, levels = TR_LIST)
tru_ot_mvc <- subset(tru_ot_mvc, select = -c(acron,trt))

## genotypic fitness sharing data frames mvc
gfs_ot_mvc <- data.frame()
gfs_ot_mvc <- filter(ss_over_time_mvc, acron == 'gfs')
gfs_ot_mvc$Sigma <- factor(gfs_ot_mvc$trt, levels = FS_LIST)
gfs_ot_mvc <- subset(gfs_ot_mvc, select = -c(acron,trt))

## phenotypic fitness sharing data frames mvc
pfs_ot_mvc <- data.frame()
pfs_ot_mvc <- filter(ss_over_time_mvc, acron == 'pfs')
pfs_ot_mvc$Sigma <- factor(pfs_ot_mvc$trt, levels = FS_LIST)
pfs_ot_mvc <- subset(pfs_ot_mvc, select = -c(acron,trt))


## nodominated sorting data frames mvc
nds_ot_mvc <- data.frame()
nds_ot_mvc <- filter(ss_over_time_mvc, acron == 'nds')
nds_ot_mvc$Sigma <- factor(nds_ot_mvc$trt, levels = ND_LIST)
nds_ot_mvc <- subset(nds_ot_mvc, select = -c(acron,trt))

## novelty search data frames mvc
nov_ot_mvc <- data.frame()
nov_ot_mvc <- filter(ss_over_time_mvc, acron == 'nov' & trt != 0)
nov_ot_mvc$K <- factor(nov_ot_mvc$trt, levels = NS_LIST)
nov_ot_mvc <- subset(nov_ot_mvc, select = -c(acron,trt))

# clean up
rm(ss_over_time_mvc)
rm(ss_over_time)


######################################################################################
# go through each scheme and collect best data
ss_best = data.frame()
ss_best_mvc = data.frame()
for(i in 1:8)
{
  # add to cc_best data frame
  if(i == 3 | i == 8)
  {
    next
  }
  print(SCHEME[i])
  for(diagnostic in DIAGNOSTIC)
  {
    dir = paste(DATA_DIR,'NO-MVC/',SCHEME[i],'/best-',diagnostic,'-', tolower(SCHEME[i]), '.csv', sep = "", collapse = NULL)
    dir_mvc = paste(DATA_DIR,'MVC/',SCHEME[i],'/best-',diagnostic,'-', tolower(SCHEME[i]), '.csv', sep = "", collapse = NULL)

    # read csv
    df = read.csv(dir, header = TRUE, stringsAsFactors = FALSE)
    df_mvc = read.csv(dir_mvc, header = TRUE, stringsAsFactors = FALSE)

    # add names/tags
    df$acron = ACRON[i]
    df$diagnostic = diagnostic

    df_mvc$acron = ACRON[i]
    df_mvc$diagnostic = diagnostic

    ss_best = rbind(ss_best, df)
    ss_best_mvc = rbind(ss_best_mvc,df_mvc)
  }
  rm(df);  rm(df_mvc);  rm(dir);   rm(dir_mvc)
}
## [1] "TRUNCATION"
## [1] "TOURNAMENT"
## [1] "FITSHARING_G"
## [1] "FITSHARING_P"
## [1] "NONDOMINATEDSORTING"
## [1] "NOVELTY"
# removed unused data
ss_best = subset(ss_best, select = -c(gen))
ss_best = filter(ss_best, col == 'pop_fit_max' | col == 'pop_uni_obj')

ss_best_mvc = subset(ss_best_mvc, select = -c(gen))
ss_best_mvc = filter(ss_best_mvc, col == 'pop_fit_max' | col == 'pop_uni_obj')

## tournament data frames
tor_best <- data.frame()
tor_best <- filter(ss_best, acron == 'tor' & trt != 1)
tor_best$T <- factor(tor_best$trt, levels = TS_LIST)
tor_best <- subset(tor_best, select = -c(acron,trt))

## truncation data frames
tru_best <- data.frame()
tru_best <- filter(ss_best, acron == 'tru')
tru_best$T <- factor(tru_best$trt, levels = TR_LIST)
tru_best <- subset(tru_best, select = -c(acron,trt))

## genotypic fitness sharing data frames
gfs_best <- data.frame()
gfs_best <- filter(ss_best, acron == 'gfs')
gfs_best$Sigma <- factor(gfs_best$trt, levels = FS_LIST)
gfs_best <- subset(gfs_best, select = -c(acron,trt))

## phenotypic fitness sharing data frames
pfs_best <- data.frame()
pfs_best <- filter(ss_best, acron == 'pfs')
pfs_best$Sigma <- factor(pfs_best$trt, levels = FS_LIST)
pfs_best <- subset(pfs_best, select = -c(acron,trt))


## nodominated sorting data frames
nds_best <- data.frame()
nds_best <- filter(ss_best, acron == 'nds')
nds_best$Sigma <- factor(nds_best$trt, levels = ND_LIST)
nds_best <- subset(nds_best, select = -c(acron,trt))

## novelty search data frames
nov_best <- data.frame()
nov_best <- filter(ss_best, acron == 'nov' & trt != 0)
nov_best$K <- factor(nov_best$trt, levels = NS_LIST)
nov_best <- subset(nov_best, select = -c(acron,trt))

## tournament data frames mvc
tor_best_mvc <- data.frame()
tor_best_mvc <- filter(ss_best_mvc, acron == 'tor' & trt != 1)
tor_best_mvc$T <- factor(tor_best_mvc$trt, levels = TS_LIST)
tor_best_mvc <- subset(tor_best_mvc, select = -c(acron,trt))

## truncation data frames mvc
tru_best_mvc <- data.frame()
tru_best_mvc <- filter(ss_best_mvc, acron == 'tru')
tru_best_mvc$T <- factor(tru_best_mvc$trt, levels = TR_LIST)
tru_best_mvc <- subset(tru_best_mvc, select = -c(acron,trt))

## genotypic fitness sharing data frames mvc
gfs_best_mvc <- data.frame()
gfs_best_mvc <- filter(ss_best_mvc, acron == 'gfs')
gfs_best_mvc$Sigma <- factor(gfs_best_mvc$trt, levels = FS_LIST)
gfs_best_mvc <- subset(gfs_best_mvc, select = -c(acron,trt))

## phenotypic fitness sharing data frames mvc
pfs_best_mvc <- data.frame()
pfs_best_mvc <- filter(ss_best_mvc, acron == 'pfs')
pfs_best_mvc$Sigma <- factor(pfs_best_mvc$trt, levels = FS_LIST)
pfs_best_mvc <- subset(pfs_best_mvc, select = -c(acron,trt))


## nodominated sorting data frames mvc
nds_best_mvc <- data.frame()
nds_best_mvc <- filter(ss_best_mvc, acron == 'nds')
nds_best_mvc$Sigma <- factor(nds_best_mvc$trt, levels = ND_LIST)
nds_best_mvc <- subset(nds_best_mvc, select = -c(acron,trt))

## novelty search data frames mvc
nov_best_mvc <- data.frame()
nov_best_mvc <- filter(ss_best_mvc, acron == 'nov' & trt != 0)
nov_best_mvc$K <- factor(nov_best_mvc$trt, levels = NS_LIST)
nov_best_mvc <- subset(nov_best_mvc, select = -c(acron,trt))

# clean up
rm(ss_best_mvc)
rm(ss_best)


######################################################################################
# go through each scheme and collect satisfactory solution found

#Tournament
exp_dir = paste(DATA_DIR,'NO-MVC/TOURNAMENT/ssf-exploitation_rate-tournament.csv', sep = "", collapse = NULL)
ord_dir = paste(DATA_DIR,'NO-MVC/TOURNAMENT/ssf-ordered_exploitation-tournament.csv', sep = "", collapse = NULL)
# read csv
exp_df = read.csv(exp_dir, header = TRUE, stringsAsFactors = FALSE)
ord_df = read.csv(ord_dir, header = TRUE, stringsAsFactors = FALSE)
# remove data
exp_df = subset(exp_df, select = -c(SEL))
exp_df = filter(exp_df, trt != 1)
ord_df = subset(ord_df, select = -c(SEL))
ord_df = filter(ord_df, trt != 1)
# combine
tor_ssf = rbind(exp_df,ord_df)
tor_ssf$T = tor_ssf$trt
tor_ssf$T = factor(tor_ssf$T)

#Truncation
exp_dir = paste(DATA_DIR,'NO-MVC/TRUNCATION/ssf-exploitation_rate-truncation.csv', sep = "", collapse = NULL)
ord_dir = paste(DATA_DIR,'NO-MVC/TRUNCATION/ssf-ordered_exploitation-truncation.csv', sep = "", collapse = NULL)
# read csv
exp_df = read.csv(exp_dir, header = TRUE, stringsAsFactors = FALSE)
ord_df = read.csv(ord_dir, header = TRUE, stringsAsFactors = FALSE)
# remove data
exp_df = subset(exp_df, select = -c(SEL))
exp_df = filter(exp_df, trt != 1)
ord_df = subset(ord_df, select = -c(SEL))
ord_df = filter(ord_df, trt != 1)
# combine
tru_ssf = rbind(exp_df,ord_df)
tru_ssf$T = tru_ssf$trt
tru_ssf$T = factor(tru_ssf$T)

#final clean up
rm(i,exp_dir,ord_dir,exp_df,ord_df)