13. Identify Sex Linked Markers

1. Background
2. Export data from catalogue
3. Read input data
4. Calculate genotype counts
5. Identify sex linked loci
6. Plot on linkage maps
7. Plot marker density
8. Produce table of detected markers
9. Blast markers to NCBI’s database

1. Background

Now that I have a catalogue of genotypes produced with stacks, and linkage maps produced with lepmap3 I will try and identify sex linked markers in 40 adult individuals of known sex (20 males and 20 females). The catalogue was produced with the two linkage mapping parents and the 40 adult individuals. The adult individuals were sequenced in McKnight et al., 2019 with the same methodology used for the linkage mapping family.

To identify sex-linked markers, I will follow the methods in Jeffries et al. (2018) as well as Lambert et al. (2016).

In the first paper the authors used Rad-seq markers to identify sex-linked loci in european Ranid frogs. They then assess the position of these loci on a linkage map of Rana temporaria also produced with lepmap3, and match the linkage groups of the linkage map to those of Xenopus tropicalis. The second paper uses a similar approach to look at sex linked markers in a North American ranid frog.

This file mostly served to conduct exploratory analyses and see what would work and what wouldn’t. I finalise the sex linked analyes in the following file 3_2_IdentifySexLinkedLoci.Rmd.

2. Export data from catalogue

To identify the sex-linked markers we will be using only the 40 adult individuals, so we need to extract their called genotypes from the catalogue.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## Warning: package 'tidyr' was built under R version 4.0.5

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

I retained markers with a minor allele frequency of 5%, and present in > 40% of individual overall.

I then retained only monomorphic or bi-allelic alleles. The script I used to do so are detailed below.

#!/bin/bash

#PBS -N populations-array
#PBS -l ncpus=20
#PBS -l mem=5g

##### Change Directory #####
cd ~/ProjectDirectory/1_DataPrep

##### Load stacks #####
module load singularity

## note that if you want to run this yourself you need to adjust the file paths
log_file=0_logfiles/3_5_populations_sex_DM.oe
files_dir=3_calling/1_ustacks/
out_dir=3_calling/2_outputs/3_sex_DM/

##### run populations #####
## note that if you want to run this yourself you need to adjust the file paths
singularity run /sw/containers/stacks-2.55.sif populations -P $files_dir  -M 0_metadata/3_6_populations_sex_DM --out-path $out_dir -R 0.40 --min-maf 0.05 --vcf --genepop --plink &> $log_file

Followed by this vcftools script, called 3_6_vcftools_sex.sh.

#!/bin/bash

#PBS -N vcftools-sex
#PBS -l ncpus=1
#PBS -l mem=10g

##### Change Directory #####
cd ~/2021_LinkageMapSexLinkage/1_stacks

##### Load singularity #####
module load singularity

log_file=0_logfiles/3_6_vcftools_sex_DM.oe
files_dir=./3_calling/2_outputs/3_sex_DM

##### run vcftools #####
singularity run /sw/containers/vcftools-0.1.16.sif vcftools --vcf $files_dir/populations.snps.vcf --out $files_dir/vcftools_biall --min-alleles 1 --max-alleles 2 --012 &> $log_file

Finally, I converted the 3 012 outputs of vcftools into one file using this script.

#obtain FAM argument from first argument provided to script
GENO="$1"
INDV=${GENO}.indv
LOCI=${GENO}.pos

# remove first column
cut -d$'\t' -f2- $GENO > temp1

# add row names - sample ids
paste -d$'\t' $INDV temp1 > temp2

# replace tabs with underscores
sed -e 's/\t/_/g' $LOCI > temp3

# add 'sampleID' on top of locus id
echo "sampleID" > temp4
cat temp4 temp3 > temp5

#transpose locus ids by replacing newline with tab, then replace last tab with newline
awk 'BEGIN { ORS = "\t" } { print }' temp5 | sed 's/\t$/\n/' > temp6

# add locus id to 012 file
cat temp6 temp2 > ${GENO}.all

Which I then submitted with this command from the folder containing both the script and the input files

inputName="13_1_vcftools_sex_DM.012"
./0_AddNamesTo012.sh $inputName

3. Read input data

I now have an input file to identify sex linked loci. Let’s read it into R.

my.df <- read_delim("../3_SexLinked/13_1_vcftools_sex_DM_R40.012", delim = "\t")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   sampleID = col_character()
## )
## ℹ Use `spec()` for the full column specifications.

Now we want to transpose the dataset to have individuals as columns.

my.df <- as_tibble(t(my.df),
                 rownames = "row_names") %>%
  row_to_names(row_number = 1)

## Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
## Using compatibility `.name_repair`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Remove all whitespaces produced by t.

my.df <- as_tibble(apply(my.df, 2, str_remove_all, " "))

Convert all remaining columns (which include 0,1,2 and -1) to numeric, except for the sampleID column (which actually contains markerIDs)

my.df <- my.df %>%
  mutate_at(vars(-("sampleID")),as.numeric) %>%
  rename(markerID = sampleID)

Save the resulting file

saveRDS(my.df,
        file = "../3_SexLinked/13_2_InputDataframe_sex_R40.RData")

Also as a tab delimited file for the permutation test

write_delim(my.df,
            file = "../3_SexLinked/13_2_InputDataframe_sex_R40.txt",
            delim = "\t")

4. Calculate genotype counts

my.df <- readRDS(file = "../3_SexLinked/13_2_InputDataframe_sex_R40.RData")

First we remove "_a" from the end of sample IDs, as we don’t need the replicate info anymore and it makes life harder. Remember, we have already retained one replicate per sample.

names(my.df) <- gsub(pattern = "_a",
                     replacement = "",
                     x = names(my.df))

names(my.df) <- gsub(pattern = "_b",
                     replacement = "",
                     x = names(my.df))

LS_male <- my.df %>%
  dplyr::select(markerID, ends_with("_M1"))

LS_female <- my.df %>%
  dplyr::select(markerID, ends_with("_F1"))

my.list <- list(LS_male,LS_female)

names(my.list) <- c("LS_male","LS_female")

Now we calculate number of homozygotes, heterozygotes and missing data, as well as proportions of these, and proportion of the 2 alleles.

my.met <- lapply(my.list, function(x) {
  
  markerID <- x %>% dplyr::select(markerID)
  tempx <- x %>% dplyr::select(-markerID)
  
  Zero <- rowSums(tempx==0)
  One <- rowSums(tempx==1) # heterozygote
  Two <- rowSums(tempx==2) # snp homozygote
  Mis <- rowSums(tempx== -1) #missing data
  
  y <- as_tibble(cbind(markerID,Zero,One,Two,Mis))
  
  y <- y %>% mutate(
    HomRef = Zero/(Zero+One+Two),
    Het = One/(Zero+One+Two),
    HomSnp = Two/(Zero+One+Two),
    pMis = Mis/(Zero+One+Two+Mis),
    Ref = ((2*Zero) + One)/(2*(Zero+One+Two)),
    Snp = ((2*Two) + One)/(2*(Zero+One+Two))
  )
  
  return(y)
})

We can now join male and female tables using the markerIDs.

LS <- full_join(my.met$LS_male, my.met$LS_female,
                by="markerID", suffix = c("_M","_F"))

rm(LS_female, LS_male, my.df)

5. Identify sex linked loci

Now we will try and identify sex markers using the three methods from Jeffries:

Method 1 - Y-specific SNP: thus X allele > 0.95 in one sex and > 0.4 and < 0.6 in the other sex
Method 2 - Y-specific SNP: homozygous in all individuals of one sex and heterozygous in at least 0.5 of the other sex.
Method 3 - Sex-limited RADtags: completely absent in one sex and present in at least 0.5 of the other sex.

And two methods from Lambert et al. (2016)

Method 4 - Original lambert: More than 75% heterozygotes in XY/ZW sex, and > 0.8 hom ref, < 0.2 het and < 0.1 hom snp in the opposite sex
Method 5 - Modified lambert: More than 55% heterozygotes in XY/ZW sex, and > 0.8 hom ref, < 0.2 het and 0 hom snp in the opposite sex

5.1. Method 1

XY sex linked markers

LS %>% filter(Ref_F > 0.9 & Ref_M > 0.4 & Ref_M < 0.6 & pMis_M < 0.4 & pMis_F < 0.4)

## # A tibble: 4 × 21
##   markerID Zero_M One_M Two_M Mis_M HomRef_M Het_M HomSnp_M pMis_M Ref_M Snp_M
##   <chr>     <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>    <dbl>  <dbl> <dbl> <dbl>
## 1 9618_32       5    12     2     0    0.263 0.632   0.105  0      0.579 0.421
## 2 16986_61      4    13     1     1    0.222 0.722   0.0556 0.0526 0.583 0.417
## 3 19156_57      6     5     4     4    0.4   0.333   0.267  0.211  0.567 0.433
## 4 32707_50      4    10     2     3    0.25  0.625   0.125  0.158  0.562 0.438
## # … with 10 more variables: Zero_F <dbl>, One_F <dbl>, Two_F <dbl>,
## #   Mis_F <dbl>, HomRef_F <dbl>, Het_F <dbl>, HomSnp_F <dbl>, pMis_F <dbl>,
## #   Ref_F <dbl>, Snp_F <dbl>

M1_XY <- LS %>%
  filter(Ref_F > 0.9 & Ref_M > 0.4 & Ref_M < 0.6 &
           pMis_M < 0.4 & pMis_F < 0.4) %>%
  add_column(Method="Method1") %>%
  add_column(SexDet="XY")

ZW sex linked markers

LS %>% filter(Ref_M > 0.9 & Ref_F > 0.4 & Ref_F < 0.6
              & pMis_M < 0.4 & pMis_F < 0.4)

## # A tibble: 1 × 21
##   markerID Zero_M One_M Two_M Mis_M HomRef_M Het_M HomSnp_M pMis_M Ref_M  Snp_M
##   <chr>     <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>    <dbl>  <dbl> <dbl>  <dbl>
## 1 28477_31     13     2     0     4    0.867 0.133        0  0.211 0.933 0.0667
## # … with 10 more variables: Zero_F <dbl>, One_F <dbl>, Two_F <dbl>,
## #   Mis_F <dbl>, HomRef_F <dbl>, Het_F <dbl>, HomSnp_F <dbl>, pMis_F <dbl>,
## #   Ref_F <dbl>, Snp_F <dbl>

M1_ZW <- LS %>% filter(Ref_M > 0.9 & Ref_F > 0.4 & Ref_F < 0.6
                       & pMis_M < 0.4 & pMis_F < 0.4) %>%
  add_column(Method="Method1") %>%
  add_column(SexDet="ZW")

5.2. Method 2

XY sex linked markers

LS %>% filter(HomRef_F==1 & Het_M >= 0.5 & pMis_M < 0.4 & pMis_F < 0.4)

## # A tibble: 10 × 21
##    markerID  Zero_M One_M Two_M Mis_M HomRef_M Het_M HomSnp_M pMis_M Ref_M Snp_M
##    <chr>      <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>    <dbl>  <dbl> <dbl> <dbl>
##  1 334_26         5    13     1     0    0.263 0.684   0.0526 0      0.605 0.395
##  2 334_36         6    13     0     0    0.316 0.684   0      0      0.658 0.342
##  3 4032_13        7    11     0     1    0.389 0.611   0      0.0526 0.694 0.306
##  4 4032_53        7    11     0     1    0.389 0.611   0      0.0526 0.694 0.306
##  5 16986_12       6    11     1     1    0.333 0.611   0.0556 0.0526 0.639 0.361
##  6 25072_33       6     8     1     4    0.4   0.533   0.0667 0.211  0.667 0.333
##  7 38607_48       8    10     0     1    0.444 0.556   0      0.0526 0.722 0.278
##  8 38866_47       7     7     0     5    0.5   0.5     0      0.263  0.75  0.25 
##  9 69834_64       4     9     0     6    0.308 0.692   0      0.316  0.654 0.346
## 10 207207_38      6     7     0     6    0.462 0.538   0      0.316  0.731 0.269
## # … with 10 more variables: Zero_F <dbl>, One_F <dbl>, Two_F <dbl>,
## #   Mis_F <dbl>, HomRef_F <dbl>, Het_F <dbl>, HomSnp_F <dbl>, pMis_F <dbl>,
## #   Ref_F <dbl>, Snp_F <dbl>

M2_XY <- LS %>% filter(HomRef_F==1 & Het_M >= 0.5
                       & pMis_M < 0.4 & pMis_F < 0.4) %>%
  add_column(Method="Method2") %>%
  add_column(SexDet="XY")

ZW sex linked markers

LS %>% filter(HomRef_M==1 & Het_F >= 0.5
              & pMis_M < 0.4 & pMis_F < 0.4)

## # A tibble: 0 × 21
## # … with 21 variables: markerID <chr>, Zero_M <dbl>, One_M <dbl>, Two_M <dbl>,
## #   Mis_M <dbl>, HomRef_M <dbl>, Het_M <dbl>, HomSnp_M <dbl>, pMis_M <dbl>,
## #   Ref_M <dbl>, Snp_M <dbl>, Zero_F <dbl>, One_F <dbl>, Two_F <dbl>,
## #   Mis_F <dbl>, HomRef_F <dbl>, Het_F <dbl>, HomSnp_F <dbl>, pMis_F <dbl>,
## #   Ref_F <dbl>, Snp_F <dbl>

5.3. Method 3

XY sex linked markers

LS %>% filter(pMis_F == 1 & pMis_M <= 0.5)

## # A tibble: 0 × 21
## # … with 21 variables: markerID <chr>, Zero_M <dbl>, One_M <dbl>, Two_M <dbl>,
## #   Mis_M <dbl>, HomRef_M <dbl>, Het_M <dbl>, HomSnp_M <dbl>, pMis_M <dbl>,
## #   Ref_M <dbl>, Snp_M <dbl>, Zero_F <dbl>, One_F <dbl>, Two_F <dbl>,
## #   Mis_F <dbl>, HomRef_F <dbl>, Het_F <dbl>, HomSnp_F <dbl>, pMis_F <dbl>,
## #   Ref_F <dbl>, Snp_F <dbl>

M3_XY <- LS %>%
  filter(pMis_F == 1 & pMis_M <= 0.5) %>%
  add_column(Method="Method3") %>%
  add_column(SexDet="XY")

ZW sex linked markers

LS %>% filter(pMis_M == 1 & pMis_F <= 0.5)

## # A tibble: 0 × 21
## # … with 21 variables: markerID <chr>, Zero_M <dbl>, One_M <dbl>, Two_M <dbl>,
## #   Mis_M <dbl>, HomRef_M <dbl>, Het_M <dbl>, HomSnp_M <dbl>, pMis_M <dbl>,
## #   Ref_M <dbl>, Snp_M <dbl>, Zero_F <dbl>, One_F <dbl>, Two_F <dbl>,
## #   Mis_F <dbl>, HomRef_F <dbl>, Het_F <dbl>, HomSnp_F <dbl>, pMis_F <dbl>,
## #   Ref_F <dbl>, Snp_F <dbl>

M3_ZW <- LS %>%
  filter(pMis_M == 1 & pMis_F <= 0.5) %>%
  add_column(Method="Method3") %>%
  add_column(SexDet="ZW")

5.4. Method 4

XY linked

LS %>% filter(Het_M > 0.75 &
                HomSnp_F < 0.1 &
                Het_F < 0.2 &
                HomRef_F > 0.8 &
                pMis_M < 0.4 &
                pMis_F < 0.4)

## # A tibble: 0 × 21
## # … with 21 variables: markerID <chr>, Zero_M <dbl>, One_M <dbl>, Two_M <dbl>,
## #   Mis_M <dbl>, HomRef_M <dbl>, Het_M <dbl>, HomSnp_M <dbl>, pMis_M <dbl>,
## #   Ref_M <dbl>, Snp_M <dbl>, Zero_F <dbl>, One_F <dbl>, Two_F <dbl>,
## #   Mis_F <dbl>, HomRef_F <dbl>, Het_F <dbl>, HomSnp_F <dbl>, pMis_F <dbl>,
## #   Ref_F <dbl>, Snp_F <dbl>

ZW linked

LS %>% filter(Het_F > 0.75 &
                HomSnp_M < 0.1 &
                Het_M < 0.2 &
                HomRef_M > 0.8 &
                pMis_M < 0.4 &
                pMis_F < 0.4)

## # A tibble: 0 × 21
## # … with 21 variables: markerID <chr>, Zero_M <dbl>, One_M <dbl>, Two_M <dbl>,
## #   Mis_M <dbl>, HomRef_M <dbl>, Het_M <dbl>, HomSnp_M <dbl>, pMis_M <dbl>,
## #   Ref_M <dbl>, Snp_M <dbl>, Zero_F <dbl>, One_F <dbl>, Two_F <dbl>,
## #   Mis_F <dbl>, HomRef_F <dbl>, Het_F <dbl>, HomSnp_F <dbl>, pMis_F <dbl>,
## #   Ref_F <dbl>, Snp_F <dbl>

5.5. Method 5

XY linked

LS %>% filter(Het_M > 0.55 &
                HomSnp_F == 0 &
                Het_F < 0.2 &
                HomRef_F > 0.8 &
                pMis_M < 0.4 &
                pMis_F < 0.4)

## # A tibble: 25 × 21
##    markerID Zero_M One_M Two_M Mis_M HomRef_M Het_M HomSnp_M pMis_M Ref_M Snp_M
##    <chr>     <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>    <dbl>  <dbl> <dbl> <dbl>
##  1 334_26        5    13     1     0    0.263 0.684   0.0526 0      0.605 0.395
##  2 334_36        6    13     0     0    0.316 0.684   0      0      0.658 0.342
##  3 650_68        8    10     0     1    0.444 0.556   0      0.0526 0.722 0.278
##  4 4032_13       7    11     0     1    0.389 0.611   0      0.0526 0.694 0.306
##  5 4032_53       7    11     0     1    0.389 0.611   0      0.0526 0.694 0.306
##  6 7194_9        7    11     0     1    0.389 0.611   0      0.0526 0.694 0.306
##  7 9618_32       5    12     2     0    0.263 0.632   0.105  0      0.579 0.421
##  8 15777_54      7    10     0     2    0.412 0.588   0      0.105  0.706 0.294
##  9 16986_12      6    11     1     1    0.333 0.611   0.0556 0.0526 0.639 0.361
## 10 16986_61      4    13     1     1    0.222 0.722   0.0556 0.0526 0.583 0.417
## # … with 15 more rows, and 10 more variables: Zero_F <dbl>, One_F <dbl>,
## #   Two_F <dbl>, Mis_F <dbl>, HomRef_F <dbl>, Het_F <dbl>, HomSnp_F <dbl>,
## #   pMis_F <dbl>, Ref_F <dbl>, Snp_F <dbl>

M5_XY <- LS %>% filter(Het_M > 0.55 &
                HomSnp_F == 0 &
                Het_F < 0.2 &
                HomRef_F > 0.8 &
                  pMis_M < 0.4 &
                  pMis_F < 0.4) %>%
  add_column(Method="Method5") %>%
  add_column(SexDet="XY")

ZW linked

LS %>% filter(Het_F > 0.55 &
                HomSnp_M == 0 &
                Het_M < 0.2 &
                HomRef_M > 0.8 &
                pMis_M < 0.4 &
                pMis_F < 0.4)

## # A tibble: 1 × 21
##   markerID Zero_M One_M Two_M Mis_M HomRef_M Het_M HomSnp_M pMis_M Ref_M  Snp_M
##   <chr>     <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>    <dbl>  <dbl> <dbl>  <dbl>
## 1 12700_56     14     3     0     2    0.824 0.176        0  0.105 0.912 0.0882
## # … with 10 more variables: Zero_F <dbl>, One_F <dbl>, Two_F <dbl>,
## #   Mis_F <dbl>, HomRef_F <dbl>, Het_F <dbl>, HomSnp_F <dbl>, pMis_F <dbl>,
## #   Ref_F <dbl>, Snp_F <dbl>

M5_ZW <- LS %>% filter(Het_F > 0.55 &
                HomSnp_M == 0 &
                Het_M < 0.2 &
                HomRef_M > 0.8 &
                  pMis_M < 0.4 &
                  pMis_F < 0.4) %>%
  add_column(Method="Method5") %>%
  add_column(SexDet="ZW")

Now we can merge all the sex-linked markers we identified into one dataframe

temp.list <- list(M1_XY, M1_ZW, M2_XY, M3_XY, M3_ZW,
                  M5_XY, M5_ZW)

my.sexLinked <- do.call(rbind, temp.list)

rm(temp.list, M1_XY, M1_ZW, M2_XY, M3_XY,
   M3_ZW, M5_XY, M5_ZW)

We now want to add linkage group data to these markers. So let’s import the linkage map object.

my.maps <- read_delim("../2_LinMap/10_4_linkageMap.txt",
                      delim = "\t")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   SNPID = col_character(),
##   fasta = col_character(),
##   Segregation = col_character(),
##   tag = col_character()
## )
## ℹ Use `spec()` for the full column specifications.

Add a locus column

my.maps <- my.maps %>%
  rename(markerID = SNPID)

Now I want to add a Locus column to the dataframe of identified sex linked markers

my.sexLinked <- my.sexLinked %>%
  mutate(Locus = str_extract(markerID, "[^_]+"))

Now add LG, male position and female position information from matching markerIDs.

temp <- my.maps %>% dplyr::select(markerID,LG,male,female)

my.sexLinked <- left_join(my.sexLinked,temp,
                          by = "markerID")

rm(temp)

Now, add LG, male position and female position information from matching Locus IDs.

temp <- my.maps %>%
  dplyr::select(Locus,LG,male,female) %>%
  group_by(Locus) %>%
  summarise(LG = mean(LG),
            mean_F = mean(female),
            mean_M = mean(male))

temp <- temp %>%
  mutate(Locus = as.character(Locus))

my.sexLinked <- left_join(my.sexLinked, temp,
                          by = "Locus",
                          suffix = c("_marker",
                                     "_locus"))

rm(temp)

Save this dataset, containing all identified sexLinked markers, their marker ID and Locus ID, the counts and proportions of genotypes across individuals, the method they were identified with, the sex determination system they match (i.e., XY or ZW), and the linkage group they are found on.

write_delim(x = my.sexLinked,
            file = "../3_SexLinked/13_3_SexLinkedMarkers_DM_RD40.txt",
            delim = "\t")

Finally, let’s assess how many XY markers are identified by each method:

my.sexLinked %>%
  filter(Method=="Method1" & SexDet=="XY") %>%
  nrow()

## [1] 4

my.sexLinked %>%
  filter(Method=="Method2" & SexDet=="XY") %>%
  nrow()

## [1] 10

my.sexLinked %>%
  filter(Method=="Method5" & SexDet=="XY") %>%
  nrow()

## [1] 25

So we identified 4 for Method 1, 10 for Method 2 and 25 for Method 5.

How many unique markers although?

length(my.sexLinked %>% pull(markerID) %>% unique)

## [1] 31

And how many unique loci?

length(my.sexLinked %>% pull(Locus) %>% unique)

## [1] 25

Which markers were identified more than once?

as.data.frame(
  table(my.sexLinked %>% pull(markerID))
  ) %>% filter(Freq>1)

##        Var1 Freq
## 1  16986_12    2
## 2  16986_61    2
## 3  32707_50    2
## 4    334_26    2
## 5    334_36    2
## 6  38607_48    2
## 7   4032_13    2
## 8   4032_53    2
## 9  69834_64    2
## 10  9618_32    2

6. Plot on linkage maps

Finally, let’s plot these markers on the male and female linkage maps

rm(list = ls())

Read in list of sex-linked loci (from heterozygosity methods only)

my.sexLinked <- read_delim("../3_SexLinked/13_3_SexLinkedMarkers_DM_2.txt",
                           delim = "\t")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   markerID = col_character(),
##   Method = col_character(),
##   SexDet = col_character()
## )
## ℹ Use `spec()` for the full column specifications.

Read in linkage map data

my.map <- read_delim("../2_LinMap/10_4_linkageMap.txt",
                     delim = "\t")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   SNPID = col_character(),
##   fasta = col_character(),
##   Segregation = col_character(),
##   tag = col_character()
## )
## ℹ Use `spec()` for the full column specifications.

Let’s retain only markers on LG6 (there’s only one XY LG3 and one ZW on LG2) and also retain only LG6 from the linkage map

my.map <- my.map %>% filter(LG == 6)

my.sexLinked <- my.sexLinked %>%
  select(markerID,Locus,LG_locus) %>%
  filter(LG_locus == 6)

Of these markers, 5 are present twice as they were identified by more than one method. Let’s retain only unique occurrences

my.sexLinked <- my.sexLinked %>% pull(markerID) %>% unique()

Now, markers 334_36, 18329_57, 18329_64, 26687_36 and 30641_54 are not present in the linkage maps. But the respective loci are. We will thus highlight the markers from those same loci in their place

my.sexLinked <- replace(my.sexLinked,
                        my.sexLinked=="334_36", "334_8")
my.sexLinked <- replace(my.sexLinked,
                        my.sexLinked=="18329_64", "18329_41")
my.sexLinked <- replace(my.sexLinked,
                        my.sexLinked=="26687_36", "26687_53")
my.sexLinked <- replace(my.sexLinked,
                        my.sexLinked=="30641_54", "30641_43")

my.sexLinked <- my.sexLinked[!my.sexLinked=="18329_57"]

Now let’s turn our map file into the desired format for LinkageMapView

tmale <- my.map %>%
  select(LG,male,SNPID) %>%
  mutate(LG = "male") %>%
  rename(group = LG, position = male, locus = SNPID)
tfemale <- my.map %>%
  select(LG,female,SNPID) %>%
  mutate(LG = "female") %>%
  rename(group = LG, position = female, locus = SNPID)
taverage <- my.map %>%
  select(LG,sex.averaged,SNPID) %>%
  mutate(LG = "sex averaged") %>%
  rename(group = LG, position = sex.averaged, locus = SNPID)

my.map <- rbind(tmale,taverage,tfemale)

rm(tmale,tfemale,taverage)

Load LinkageMapView package

library(LinkageMapView)

And plot

lmv.linkage.plot(my.map %>% drop_na(),
                 outfile = "../3_SexLinked/13_4_LS_SexLinked_Mapped_DM.pdf",
                 showonly = my.sexLinked,
                 maxnbrcolsfordups = 1,
                 pdf.height = 8,
                 main = "Linkage group 6")

The resulting figure looks like this:

Linkage group 6 and sex-linked markers

7. Plot marker density

Now let’s produce plots of marker density for LG6 to add to the above figure (done later in photoshop).

library(grid)
library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(tidyverse)

p1 <- ggplot(my.map %>% filter(group == "male"), aes(x = position)) + 
  geom_histogram(aes(y = ..density..),
                 colour = "grey", fill = "grey", alpha = 0.2, na.rm = TRUE, binwidth = 2) +
  geom_density(na.rm = TRUE, bw = "ucv") +
  scale_x_continuous(limits = c(0,170)) +
  scale_y_continuous(limits = c(0,0.22)) +
  theme_classic()

p2 <- ggplot(my.map %>% filter(group == "female"), aes(x = position)) + 
  geom_histogram(aes(y = ..density..),
                 colour = "grey", fill = "grey", alpha = 0.2, na.rm = TRUE, binwidth = 2) +
  geom_density(na.rm = TRUE, bw = "ucv") +
  scale_x_continuous(limits = c(0,170)) +
  scale_y_continuous(limits = c(0,0.22)) +
  theme_classic()

p3 <- grid.arrange(p1,p2, ncol=1)

## Warning in bw.ucv(x): minimum occurred at one end of the range

## Warning in bw.ucv(x): minimum occurred at one end of the range

8. Produce table of detected markers

Read in table of sex-linked markers.

my.sexLinked <- read_delim("../3_SexLinked/13_3_SexLinkedMarkers_DM_2.txt",
                           delim = "\t")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   markerID = col_character(),
##   Method = col_character(),
##   SexDet = col_character()
## )
## ℹ Use `spec()` for the full column specifications.

Produce table and add row for Method 4 that didn’t identify any sex-linked markers.

my.sexLinked2 <- as.data.frame.matrix(
  table(my.sexLinked$Method,
        my.sexLinked$SexDet)) %>%
  rownames_to_column(var = "Method") %>%
  add_row(Method="Method4",XY=0, ZW=0,
          .before = 4)

Save table

write_csv(my.sexLinked2,
            file = "../3_SexLinked/13_7_LS_SexLinked_Table.csv")

9. Blast markers to NCBI’s database

Now let’s extract the consensus sequence for sex-linked markers detected with heterozygosity methods (all but Method 4), so that we can blast those sequences to the NCBI catalogue and assess whether they are coming from any known sex-linked gene.

Read in table of sex-linked markers.

my.sexLinked <- read_delim("../3_SexLinked/13_3_SexLinkedMarkers_DM_2.txt",
                           delim = "\t")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   markerID = col_character(),
##   Method = col_character(),
##   SexDet = col_character()
## )
## ℹ Use `spec()` for the full column specifications.

Then let’s retain a list of unique marker IDs first.

my.ids <- my.sexLinked %>% pull(markerID) %>% unique()

Let’s extract Locus ID

my.ids <- gsub("_.*","", my.ids) %>% unique()

Next we want to save it as a single column, and as a tab delimited file

write_delim(as.data.frame(my.ids),
            file = "../3_SexLinked/13_9_HC_SexLinked_LociIDs.txt",
              delim = "\t",col_names = FALSE)

Now we extract the corresponding fasta sequences from the stacks catalogue with seqtk.

#!/bin/bash

#load seqtk
module load seqtk/1.3

#run seqtk for each family
seqtk subseq ./1_DataPrep/catalog.fa.gz ./3_SexLinked/13_9_HC_SexLinked_LociIDs.txt > ./3_SexLinked/13_9_HC_SexLinked_Loci.fa

I ran blast from NCBI’s website given the small amount of sequences. I saved the resulting search in the file 3_SexLinked/13_9_HC_SexLinked_Loci_BlastResults.xlsx. No biologically significant matches were found. Matches with an E-value below 1E-05 matched to Bufo gargarizans DAPP1, Bufo bufo IIB-like, Scleropages formosus Chromosome 6 and an uncharacterized locus in Bufo bufo. Additional matches with E-values above 1E-05 included NT5C1A, ATP13A5 and another IIB-like match.

13. Identify Sex Linked Markers

Lorenzo Bertola

Document last run on 2022-10-27

1. Background

2. Export data from catalogue

3. Read input data

4. Calculate genotype counts

5. Identify sex linked loci

5.1. Method 1

5.2. Method 2

5.3. Method 3

5.4. Method 4

5.5. Method 5

6. Plot on linkage maps

7. Plot marker density

8. Produce table of detected markers

9. Blast markers to NCBI’s database