LDlink is an interactive and powerful suite of web-based tools for querying germline variants in human population groups of interest to generate interactive tables and plots. All population genotype data originates from Phase 3 (Version 5) of the 1000 Genomes Project and variant RS numbers are indexed based on dbSNP 151.
LDlinkR is an R package developed to query and download results (internet access required) generated by LDlink web-based applications from the R console. LDlinkR accelerates genomic research by providing efficient and user-friendly functions to programmatically interrogate pairwise linkage disequilibrium from large lists of genetic variants.
Please see the online LDlink documentation for more information about understanding linkage disequilibrium (LD) and additional details about how LDlink calculates patterns of LD across a variety of ancestral human populations.
install.packages("LDlinkR")
install.packages("remotes")
::install_github("CBIIT/LDlinkR") remotes
LDlinkR depends on the following packages:
Following installation, attach the LDlinkR package with:
library(LDlinkR)
In order to access the LDlink API via LDlinkR, we use a personal access token. This is a common convention followed by many APIs and emulates the more familiar HTTPS username/password or SSH keys.
You will need to:
LDhap(snps = c("rs3", "rs4", "rs148890987"),
pop = "YRI",
token = "YourTokenHere123")
Optional:
However, the best security practice is to store your personal access token as an environment variable where LDlinkR can find it and use it on your behalf but where it will not be accidentally shared with the public. Note: Modifying R startup files (such as the .Renviron
) is for the advanced R user only. Modification of these files in the wrong way could cause problems. Please proceed cautiously. Step-by-step instructions follow:
After retrieving your personal access token from your email, put your token in your .Renviron
file. .Renviron
is a hidden file that lives in your home directory. The easiest way to both find and edit the .Renviron
file is with a function from the usethis package. From the R console, do:
::edit_r_environ() usethis
Your .Renviron
file should open in your editor. Add a line that looks like this:
=YourTokenHere123 LDLINK_TOKEN
Important, ensure you put a line break at the end by hitting the enter/return key.
Save and close the .Renviron
file. Restart R, as environment variables are only loaded from .Renviron
at the start of a new R session. Now, check to see that your token is available by entering:
Sys.getenv("LDLINK_TOKEN")
## [1] "YourTokenHere123"
You should see your personal access token print to the screen, as shown above. Now, LDlinkR function calls that use
Sys.getenv("LDLINK_TOKEN")
for the token
argument in LDlinkR function calls will use your personal access token in a private and secure way. This method will be used in the extended examples that follow.
LDexpress(snps,
pop = "CEU",
tissue = "ALL",
r2d = "r2",
r2d_threshold = 0.1,
p_threshold = 0.1,
win_size = 500000,
token = NULL,
file = FALSE
)
Search if a list of genomic variants (or variants in LD with those variants) is associated with gene expression in tissues of interest. Quantitative trait loci data is downloaded from the GTEx Portal.
snps
, between 1 - 10 variants, using an rsID or chromosome coordinate (e.g. “chr7:24966446”)pop
, a 1000 Genomes Project population, (e.g. YRI or CEU), multiple allowed, default = “CEU”. See the list_pop function in the utilities section below for available human populations and their abbreviation codes.tissue
, select from 1 - 54 non-diseased tissue sites collected for the GTEx project, multiple allowed. Acceptable user input is taken either from “tissue_name_ldexpress” or “tissue_abbrev_ldexpress” (tissue abbreviation) code listed in available GTEx tissue sites using the list_gtex_tissues function (e.g. “ADI_SUB” for Adipose Subcutaneous). Input is case sensitive. Default = “ALL” for all available tissue types.r2d
, Select either “r2” for LD R2 (R-squared) or “d” for LD D’, default = “r2”.r2d_threshold
, R-squared or D’ (depends on ‘r2d’ user input parameter) threshold for LD filtering. Any variants within -/+ of the specified genomic window and R2 or D’ less than the threshold will be removed. Value needs to be in the range 0 to 1. Default value is 0.1.p_threshold
, define the eQTL significance threshold used for returning query results. Default value is 0.1 which returns all GTEx eQTL associations with P-value less than 0.1.win_size
, set genomic base pair window size for LD calculation. Specify a value greater than or equal to zero and less than or equal to 1000000 basepairs (bp). Default value is -/+ 500000 bp.token
, LDlink provided user access token is required, default = NULL, register for a free token on the LDlink web site.file
, optional character string naming a path and file for saving results. If file = FALSE, no file will be generated, default = FALSE<- LDexpress(snps = "rs4",
my_output pop = c("YRI", "CEU"),
tissue = c("ADI_SUB", "ADI_VIS_OME"),
win_size = "500000",
token = Sys.getenv("LDLINK_TOKEN")
)
In the above example, output is a data frame stored in the variable my_output
. See below.
head(my_output)
## Query RS_ID Position R2 D'
## 1 rs4 rs10637519 chr13:32430479 0.174249321651574 0.965976331360947
## 2 rs4 rs10637519 chr13:32430479 0.174249321651574 0.965976331360947
## 3 rs4 rs473641 chr13:32431244 0.174249321651574 0.965976331360947
## 4 rs4 rs473641 chr13:32431244 0.174249321651574 0.965976331360947
## 5 rs4 rs671746 chr13:32431263 0.174249321651574 0.965976331360947
## 6 rs4 rs671746 chr13:32431263 0.174249321651574 0.965976331360947
## Gene_Symbol Gencode_ID Tissue
## 1 RP1-257C22.2 ENSG00000279314.1 Adipose - Subcutaneous
## 2 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
## 3 RP1-257C22.2 ENSG00000279314.1 Adipose - Subcutaneous
## 4 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
## 5 RP1-257C22.2 ENSG00000279314.1 Adipose - Subcutaneous
## 6 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
## Non_effect_Allele_Freq Effect_Allele_Freq Effect_Size P_value
## 1 G=0.565 GTC=0.435 0.225642 2.2578e-07
## 2 G=0.565 GTC=0.435 0.207161 1.0227e-05
## 3 A=0.565 G=0.435 0.225642 2.2578e-07
## 4 A=0.565 G=0.435 0.207161 1.0227e-05
## 5 C=0.565 T=0.435 0.226558 1.93289e-07
## 6 C=0.565 T=0.435 0.207161 1.0227e-05
<- LDexpress(snps = c("rs345", "rs456"),
my_output pop = "YRI",
tissue = "Adipose_Visceral_Omentum",
token = Sys.getenv("LDLINK_TOKEN")
)
In the above example, output is a data frame stored in the variable my_output
. See below.
head(my_output)
## Query RS_ID Position R2 D' Gene_Symbol
## 1 rs345 rs12877069 chr13:32430415 0.222088835534214 1 RP1-257C22.2
## 2 rs345 rs10637519 chr13:32430479 0.10989010989011 1 RP1-257C22.2
## 3 rs345 rs473641 chr13:32431244 0.10989010989011 1 RP1-257C22.2
## 4 rs345 rs671746 chr13:32431263 0.10989010989011 1 RP1-257C22.2
## 5 rs345 rs9315146 chr13:32432193 0.222088835534214 1 RP1-257C22.2
## 6 rs345 rs657190 chr13:32432232 0.107871720116618 1 RP1-257C22.2
## Gencode_ID Tissue Non_effect_Allele_Freq
## 1 ENSG00000279314.1 Adipose - Visceral (Omentum) C=0.685
## 2 ENSG00000279314.1 Adipose - Visceral (Omentum) G=0.519
## 3 ENSG00000279314.1 Adipose - Visceral (Omentum) A=0.519
## 4 ENSG00000279314.1 Adipose - Visceral (Omentum) C=0.519
## 5 ENSG00000279314.1 Adipose - Visceral (Omentum) A=0.685
## 6 ENSG00000279314.1 Adipose - Visceral (Omentum) T=0.514
## Effect_Allele_Freq Effect_Size P_value
## 1 T=0.315 0.355769 6.11598e-05
## 2 GTC=0.481 0.207161 1.0227e-05
## 3 G=0.481 0.207161 1.0227e-05
## 4 T=0.481 0.207161 1.0227e-05
## 5 G=0.315 0.276884 2.20517e-08
## 6 C=0.486 0.207916 9.95318e-06
tail(my_output)
## Query RS_ID Position R2 D'
## 63 rs345 rs5802624 chr13:32499700 0.154247745609872 1
## 64 rs345 rs2521196 chr13:32514328 0.414333042720139 0.824864864864865
## 65 rs345 rs367507 chr13:32516037 0.414333042720139 0.824864864864865
## 66 rs345 rs203417 chr13:32525482 0.490379008746356 0.828571428571429
## 67 rs345 rs916756 chr13:32527529 0.14623556197823 0.785430463576159
## 68 rs456 rs2529051 chr7:24594306 0.129422301836095 0.586206896551724
## Gene_Symbol Gencode_ID Tissue
## 63 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
## 64 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
## 65 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
## 66 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
## 67 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
## 68 DFNA5 ENSG00000105928.13 Adipose - Visceral (Omentum)
## Non_effect_Allele_Freq Effect_Allele_Freq Effect_Size P_value
## 63 C=0.398 CT=0.602 -0.252523 2.7687e-07
## 64 G=0.144 C=0.856 -0.418973 3.26939e-05
## 65 C=0.144 A=0.856 -0.418973 3.26939e-05
## 66 T=0.125 C=0.875 -0.413808 4.17407e-05
## 67 A=0.301 G=0.699 -0.460883 5.90627e-16
## 68 A=0.083 G=0.917 -0.134336 1.09618e-05
LDhap(snps, pop="CEU", token=NULL, file = FALSE)
Calculates population specific haplotype frequencies of all haplotypes observed for a list of query variants. Input is a list of variant RS numbers (concatenated list) and a population group.
snps
, a list of between 1 - 30 variants, using an rsID or chromosome coordinate (e.g. “chr7:24966446”)pop
, a 1000 Genomes Project population, uses three letter population code, (e.g. YRI or CEU), multiple allowed, default = “CEU”token
, LDlink provided user access token is required, default = NULL, register for a free token on the LDlink web site.file
, optional character string naming a path and file for saving results. If file = FALSE, no file will be generated, default = FALSELDhap(snps = c("rs3", "rs4", "rs148890987"),
pop = "CEU",
token = Sys.getenv("LDLINK_TOKEN")
)
## rs148890987 rs3 rs4 Count Frequency
## 1 C C A 176 0.8889
## 2 T T G 11 0.0556
## 3 T C A 7 0.0354
## 4 C T G 4 0.0202
LDhap(snps = c("rs3", "rs4", "rs148890987"),
pop = c("YRI", "CEU"),
token = Sys.getenv("LDLINK_TOKEN")
)
## rs148890987 rs3 rs4 Count Frequency
## 1 C C A 355 0.8575
## 2 C T G 41 0.099
## 3 T T G 11 0.0266
## 4 T C A 7 0.0169
Output is a table of alleles, haplotype count and haplotype frequencies.
LDmatrix(snps, pop = "CEU", r2d = "r2", token = NULL, file = FALSE)
Generates a data frame of pairwise linkage disequilibrium statistics. Input is a list of between 2 to 1000 variants. Desired output can be based on estimates of R2 or D’.
snps
, list of between 2 - 1,000 variants, using an rsID or chromosome coordinate (GRCh37/hg19) (e.g. “chr7:24966446”)pop
, a 1000 Genomes Project population, uses three letter population code, (e.g. YRI or CEU), multiple allowed, default = “CEU”r2d
, use either “r2” for pairwise R2 statistics or “d” for pairwise D’ statisticstoken
, LDlink provided user access token is required, default = NULL, register for a free token on the LDlink web site.file
, optional character string naming a path and file for saving results. If file = FALSE, no file will be generated, default = FALSELDmatrix(snps = c("rs496202", "rs11147477", "rs201578600"),
pop = "YRI", r2d = "r2",
token = Sys.getenv("LDLINK_TOKEN")
)
## RS_number rs496202 rs201578600 rs11147477
## 1 rs496202 1.000 0.660 0.504
## 2 rs201578600 0.660 1.000 0.786
## 3 rs11147477 0.504 0.786 1.000
LDmatrix(snps = c("chr13:32444611", "rs11147477", "rs201578600"),
pop = c("YRI", "CEU"), r2d = "d",
token = Sys.getenv("LDLINK_TOKEN")
)
## RS_number rs496202 rs201578600 rs11147477
## 1 rs496202 1.000 0.973 0.738
## 2 rs201578600 0.973 1.000 0.971
## 3 rs11147477 0.738 0.971 1.000
<- read.table("variant_list.txt")
my_variants my_variants
## V1
## 1 rs456
## 2 rs114
## 3 rs127
## 4 rs7805287
## 5 rs60676332
## 6 rs10239961
Then, call LDmatrix with:
LDmatrix(snps = my_variants[,1],
pop = c("YRI", "CEU"), r2d = "d",
token = Sys.getenv("LDLINK_TOKEN")
)
## RS_number rs60676332 rs7805287 rs127 rs456 rs10239961 rs114
## 1 rs60676332 1.000 0.094 0.180 0.151 0.363 0.148
## 2 rs7805287 0.094 1.000 0.818 0.789 0.464 0.710
## 3 rs127 0.180 0.818 1.000 0.929 0.912 0.886
## 4 rs456 0.151 0.789 0.929 1.000 1.000 0.963
## 5 rs10239961 0.363 0.464 0.912 1.000 1.000 0.459
## 6 rs114 0.148 0.710 0.886 0.963 0.459 1.000
Output is a table with rows and columns equal to the number of query variants and pairwise linkage disequilibrium statistics.
LDpair(var1, var2, pop = "CEU", token = NULL, output = "table", file = FALSE)
Investigates potentially correlated alleles for a pair of variants. Input is two query variants and a 1000 Genomes Project reference population(s) of interest.
var1
, the first RS number (rsID) or genomic coordinate (GRCh37/hg19) (e.g. “chr7:24966446”), must match a bi-allelic variantvar2
, the second RS number or genomic coordinate, as above, must match a bi-allelic variantpop
, a 1000 Genomes Project reference population, uses three letter population code, (e.g. YRI or CEU), multiple allowed, default = “CEU”token
, LDlink provided user access token is required, default = NULL, register for a free token on the LDlink web site.output
, two output format options are available, “text”, which displays a two-by-two matrix displaying haplotype counts and allele frequencies along with other statistics, or “table”, which displays the same data in rows and columns, default = “table”file
, optional character string naming a path and file for saving results. If file = FALSE, no file will be generated, default = FALSEoutput
argument set to “text”LDpair(var1 = "rs496202",
var2 = "rs11147477",
pop = "YRI",
token = Sys.getenv("LDLINK_TOKEN"),
output = "text"
)
## Query SNPs:
## rs496202 (chr13:32444611)
## rs11147477 (chr13:32509120)
##
## YRI Haplotypes:
## rs11147477
## C T
## -----------------
## C | 11 | 26 | 37 (0.171)
## rs496202 -----------------
## G | 173 | 6 | 179 (0.829)
## -----------------
## 184 32 216
## (0.852) (0.148)
##
## G_C: 173 (0.801)
## C_T: 26 (0.12)
## C_C: 11 (0.051)
## G_T: 6 (0.028)
##
## D': 0.7737
## R2: 0.5037
## Chi-sq: 108.8005
## p-value: <0.0001
##
## rs496202(C) allele is correlated with rs11147477(T) allele
## rs496202(G) allele is correlated with rs11147477(C) allele
output
argument option specified, using default “table”LDpair(var1 = "rs496202",
var2 = "rs11147477",
pop = "YRI",
token = Sys.getenv("LDLINK_TOKEN")
)
## var1 var2 pops var1_pos var2_pos var1_a1 var1_a2
## 1 rs496202 rs11147477 YRI chr13:32444611 chr13:32509120 C G
## var1_a1_freq var1_a2_freq var2_a1 var2_a2 var2_a1_freq var2_a2_freq
## 1 0.171 0.829 C T 0.852 0.148
## d_prime r2 chisq p_val
## 1 0.7737 0.5037 108.8005 1e-04
## corr_alleles
## 1 rs496202(C)-rs11147477(T), rs496202(G)-rs11147477(C)
Output of the output
argument “text” option is a two-by-two contingency table displaying haplotype counts and allele frequencies of the two query variants. Also displayed are calculated metrics of linkage disequilibrium including: D prime (D’), R square (R2), and goodness-of-fit (Chi-square and p-value). Goodness-of-fit tests for deviations of expected haplotype frequencies based on allele frequencies. Correlated alleles are reported if linkage disequilibrium is present (R2 > 0.1). If linkage equilibrium, no alleles are reported.
Output from the output
argument “table” option converts the data from the two-by-two contingency table into a data frame.
LDpop(var1, var2, pop = "CEU", r2d = "r2", token = NULL, file = FALSE)
Investigates allele frequencies and linkage disequilibrium patterns across 1000G populations.
var1
, the first RS number (rsID) or genomic coordinate (GRCh37/hg19) (e.g. “chr7:24966446”), must match a bi-allelic variantvar2
, the second RS number or genomic coordinate, as above, must match a bi-allelic variantpop
, a 1000 Genomes Project reference population, uses three letter population code, (e.g. YRI or CEU), multiple allowed, default = “CEU”r2d
, use “r2” if desired output is based on estimated R2 or “d” if D’token
, LDlink provided user access token is required, default = NULL, register for a free token on the LDlink web site.file
, optional character string naming a path and file for saving results. If file = FALSE, no file will be generated, default = FALSELDpop(var1 = "rs496202",
var2 = "rs11147477",
pop = "YRI",
r2d = "r2",
token = Sys.getenv("LDLINK_TOKEN")
)
## Population N rs496202_Allele_Freq rs11147477_Allele_Freq R2 D'
## 1 YRI 108 G: 82.87%, C: 17.13% C: 85.19%, T: 14.81% 0.5037 0.7737
LDproxy(snp, pop = "CEU", r2d = "r2", token = NULL, file = FALSE)
Explore proxy and putative functional variants for a single query variant. Input is a single RS number and a population group. Depending on the number of query populations, this function could take some time to run.
snp
, an RS number (rsID) or chromosome coordinate (GRCh37/hg19) (e.g. “chr7:24966446”), one per query, RS number must match a bi-allelic variantpop
, a 1000 Genomes Project reference population, uses three letter population code, (e.g. YRI or CEU), multiple allowed, default = “CEU”r2d
, use “r2” if desired output is based on estimated R2 or “d” if D’token
, LDlink provided user access token is required, default = NULL, register for a free token on the LDlink web site.file
, optional character string naming a path and file for saving results. If file = FALSE, no file will be generated, default = FALSE<- LDproxy(snp = "rs456",
my_proxies pop = "YRI",
r2d = "r2",
token = Sys.getenv("LDLINK_TOKEN")
)
Output is a data frame stored in the variable my_proxies
with 2455 rows and 10 columns with data.
head(my_proxies)
## RS_Number Coord Alleles MAF Distance Dprime R2
## 1 rs456 chr7:24962419 (G/C) 0.1944 0 1 1.0000
## 2 rs457 chr7:24962426 (T/C) 0.1944 7 1 1.0000
## 3 rs28475742 chr7:24964633 (G/T) 0.1944 2214 1 1.0000
## 4 rs123 chr7:24966446 (C/A) 0.1944 4027 1 1.0000
## 5 rs125 chr7:24959703 (C/T) 0.2037 -2716 1 0.9436
## 6 rs128 chr7:24958977 (C/T) 0.2037 -3442 1 0.9436
## Correlated_Alleles RegulomeDB Function
## 1 G=G,C=C 5 <NA>
## 2 G=T,C=C 5 <NA>
## 3 G=G,C=T 4 <NA>
## 4 G=C,C=A 1f <NA>
## 5 G=C,C=T 5 <NA>
## 6 G=C,C=T 7 <NA>
Includes information on all variants -/+ 500 Kb of the query variant with a pairwise R2 value greater than 0.01.
LDproxy_batch(snp, pop = "CEU", r2d = "r2", token = NULL, append = FALSE)
Query LDproxy using a list of query variants. LDproxy_batch will make sequential queries, one query per variant. Concurrent queries are not permitted by the LDlink API. Output is saved as text file(s) to the current working directory. Depending on the number of query variants and reference populations selected, this function could time some time to run.
snp
, a character string or data frame listing RS numbers (rsID) or chromosome coordinates (GRCh37/hg19) (e.g. “chr7:24966446”), one per line.pop
, a 1000 Genomes Project reference population, uses three letter population code, (e.g. YRI or CEU), multiple allowed, default = “CEU”r2d
, use “r2” if desired output is based on estimated R2 or “d” if D’token
, LDlink provided user access token is required, default = NULL, register for a free token on the LDlink web site.append
, a logical, if TRUE, output for each query variant is appended to a single text file and saved to the current working directory. If FALSE, output for each query variant is saved in its own text file with the query variant as the filename. Default value is FALSE.pop
and r2d
The list of query variants passed to LDproxy_batch can be stored as a character string.
LDproxy_batch(snp = c("rs456", "rs114", "rs127"),
token = Sys.getenv("LDLINK_TOKEN")
)
Or, a longer list of variants can be read into a data frame from a text file and passed into LDproxy_batch. The list should be in a simple text file, one query variant per line. For example:
<- read.table("variant_list.txt")
my_variants my_variants
## V1
## 1 rs456
## 2 rs114
## 3 rs127
## 4 rs7805287
## 5 rs60676332
## 6 rs10239961
Then, call LDproxy_batch with:
LDproxy_batch(snp = my_variants,
token = Sys.getenv("LDLINK_TOKEN")
)
Output not displayed. All output from LDproxy_batch is saved to a text file(s) in the current working directory.
LDtrait(snps,
pop = "CEU",
r2d = "r2",
r2d_threshold = 0.1,
win_size = 500000,
token = NULL,
file = FALSE
)
Search if a list of variants (or variants in LD with those variants) have been previously associated with a trait or disease. Trait and disease data is updated nightly from the GWAS Catalog.
snps
, between 1 - 50 variants, using an rsID or chromosome coordinate (GRCh37)(e.g. “chr7:24966446”). All input variants must match a bi-allelic variant.pop
, a 1000 Genomes Project population, (e.g. YRI or CEU), multiple allowed, default = “CEU”. See the list_pop function in the utilities section below for available human populations and their abbreviation codes.r2d
, use “r2” to filter desired output from a threshold based on estimated LD R2 (R squared) or “d” for LD D’ (D-prime), default = “r2”.r2d_threshold
, R-squared or D’ (depends on ‘r2d’ user input parameter) threshold for LD filtering. Any variants within -/+ of the specified genomic window and R2 or D’ less than the threshold will be removed. Value needs to be in the range 0 to 1. Default value is 0.1.win_size
, set genomic base pair window size for LD calculation. Specify a value greater than or equal to zero and less than or equal to 1000000 basepairs (bp). Default value is -/+ 500000 bp.token
, LDlink provided user access token is required, default = NULL, register for a free token on the LDlink web site.file
, optional character string naming a path and file for saving results. If file = FALSE, no file will be generated, default = FALSELDtrait(snps = "rs456",
pop = c("YRI", "CEU"),
token = Sys.getenv("LDLINK_TOKEN")
)
The following is the output from the above function call.
## Query GWAS_Trait RS_Number Position_GRCh37
## 1 rs456 Highest math class taken (MTAG) rs10248878 chr7:24908737
## 2 rs456 Educational attainment (MTAG) rs457 chr7:24962426
## Alleles R2 D' Risk_Allele
## 1 C=0.174, T=0.826 0.412346020761246 0.920415224913495 0.5967
## 2 C=0.698, T=0.302 1 1 0.4495
## Effect_Size_95_CI Beta_or_OR P_value
## 1 0.0104 0.0071-0.0137 7e-10
## 2 0.0072 0.0047-0.0097 4e-08
LDtrait(snps = c("rs114", "rs496202", "rs345"),
pop = c("YRI", "CHB", "CEU"),
win_size = "750000",
token = Sys.getenv("LDLINK_TOKEN")
)
Output of the above function is below.
## Query GWAS_Trait RS_Number
## 1 rs114 Highest math class taken (MTAG) rs10248878
## 2 rs114 Educational attainment (MTAG) rs457
## 3 rs496202 Refractive error rs353
## 4 rs345 DNA methylation variation (age effect) rs203425
## 5 rs345 Facial morphology (factor 14, intercanthal width) rs799522
## Position_GRCh37 Alleles R2 D'
## 1 chr7:24908737 C=0.123, T=0.877 0.200231693692643 0.897255733792921
## 2 chr7:24962426 C=0.748, T=0.252 0.56312684849231 0.969967060647161
## 3 chr13:32454349 A=0.902, G=0.098 1 1
## 4 chr13:32468087 A=0.074, T=0.926 0.954994192799071 1
## 5 chr13:32514028 C=0.769, T=0.231 0.236284178064096 0.918763102725367
## Risk_Allele Effect_Size_95_CI Beta_or_OR P_value
## 1 0.5967 0.0104 0.0071-0.0137 7e-10
## 2 0.4495 0.0072 0.0047-0.0097 4e-08
## 3 <NA> <NA> <NA> 1e-12
## 4 NR <NA> <NA> 2e-08
## 5 0.1263 0.2157 0.12-0.31 6e-06
SNPchip(snps, chip = "ALL", token = NULL, file = FALSE)
Used to find commercial genotyping chip arrays for variants. Input is a list of between 1 - 5000 variants (one per line) and desired commercial chip arrays to search. Input variants do not need to be on the same chromosome.
snps
, between 1 - 5,000 variants, using an rsID or chromosome coordinate (e.g. “chr7:24966446”)chip
, chip or arrays, platform code(s) for a SNP chip array, ALL_Illumina, ALL_Affy or ALL, default=ALL, use the list_chips
utility (see below) to lookup available commercial SNP chip arrays and their codes.token
, LDlink provided user access token is required, default = NULL, register for a free token on the LDlink web site.file
, optional character string naming a path and file for saving results. If file = FALSE, no file will be generated, default = FALSESNPchip(snps = c("rs3", "rs4", "rs148890987"),
chip = "ALL",
token = Sys.getenv("LDLINK_TOKEN")
)
## WARNING: The following RS number did not have any platforms found: rs148890987, rs3.
## RS_Number Position_GRCh37 A_SNP5.0 A_CHB2 A_250S A_SNP6.0
## 1 rs148890987 chr13:32403784 0 0 0 0
## 2 rs3 chr13:32446842 0 0 0 0
## 3 rs4 chr13:32447222 1 1 1 1
SNPchip(snps = c("rs3", "rs4", "rs148890987"),
chip = c("A_SNP5.0", "A_CHB2"),
token = Sys.getenv("LDLINK_TOKEN")
)
## WARNING: The following RS number did not have any platforms found: rs148890987, rs3.
## RS_Number Position_GRCh37 A_SNP5.0 A_CHB2
## 1 rs148890987 chr13:32403784 0 0
## 2 rs3 chr13:32446842 0 0
## 3 rs4 chr13:32447222 1 1
SNPchip(snps = c("rs3", "rs4", "rs148890987"),
chip = "ALL_Affy",
token = Sys.getenv("LDLINK_TOKEN")
)
## WARNING: The following RS number did not have any platforms found: rs148890987, rs3.
## RS_Number Position_GRCh37 A_SNP5.0 A_CHB2 A_250S A_SNP6.0
## 1 rs148890987 chr13:32403784 0 0 0 0
## 2 rs3 chr13:32446842 0 0 0 0
## 3 rs4 chr13:32447222 1 1 1 1
Output is a data frame of query variant rows (RS number), genomic coordinate (GRCh37) and genotyping chip array columns. The presence of a “1” designates the variant is present on the respective commercial genotyping array and a “0” indicates that it is not present on the genotyping array.
SNPclip(snps, pop = "CEU", r2_threshold = "0.1", maf_threshold = "0.01", token = NULL, file = FALSE)
Prune a list of variants by linkage disequilibrium. Input is a list of variant RS numbers (one per line) and a population group.
snps
, a list of between 1 - 5,000 variants, using an RS number (rsID) or chromosome coordinate (GRCh37) (e.g. “chr7:24966446”). All input variants must be on the same chromosome and match a bi-allelic variant.pop
, a 1000 Genomes Project reference population, uses three letter population code, (e.g. YRI or CEU), multiple allowed, default = “CEU”r2_threshold
, Used to set the R2 threshold for LD pruning. One of each pair of variants with a R2 greater than the threshold is removed. Value needs to be in the range 0 to 1. Default value is 0.1.maf_threshold
, Used to set minor allele frequency (MAF) threshold for LD pruning. Variants with a MAF less than or equal to the threshold are removed. Value needs to be in the range 0 to 1. Default value is 0.01.token
, LDlink provided user access token is required, default = NULL, register for a free token on the LDlink web site.file
, optional character string naming a path and file for saving results. If file = FALSE, no file will be generated, default = FALSESNPclip(snps = c("rs3", "rs4", "rs148890987", "rs115955931"),
pop = "YRI",
r2_threshold = "0.1",
maf_threshold = "0.01",
token = Sys.getenv("LDLINK_TOKEN")
)
## RS_Number Position Alleles
## 1 rs3 chr13:32446842 C=0.829, T=0.171
## 2 rs4 chr13:32447222 A=0.829, G=0.171
## 3 rs148890987 chr13:32403784 C=1.0, T=0.0
## 4 rs115955931 chr13:32130008 G=0.954, A=0.046
## Details
## 1 Variant kept.
## 2 Variant in LD with rs3 (R2=1.0), variant removed.
## 3 Variant MAF is 0.0, variant removed.
## 4 Variant kept.
The output table provides details including query variant RS number, genomic position, alleles, and and details about whether the variant was kept or removed.
list_chips()
Provides a data frame listing the names and abbreviation codes for available commercial SNP Chip Arrays from Illumina and Affymetrix.
list_chips()
## chip_code chip_name
## 1 A_Exome1A Affymetrix Axiom Exome 1A
## 2 A_Exome319 Affymetrix Axiom Exome 319
## 3 A_AFR Affymetrix Axiom GW AFR
## 4 A_ASI Affymetrix Axiom GW ASI
## 5 A_CHB2 Affymetrix Axiom GW CHB2
## 6 A_EAS Affymetrix Axiom GW EAS
## 7 A_EUR Affymetrix Axiom GW EUR
## 8 A_Hu Affymetrix Axiom GW Hu
## 9 A_Hu-CHB Affymetrix Axiom GW Hu-CHB
## 10 A_LAT Affymetrix Axiom GW LAT
## 11 A_DMETplus Affymetrix DMET Plus
## 12 A_10X Affymetrix Mapping 10K Xba142
## 13 A_250N Affymetrix Mapping 250K Nsp
## 14 A_250S Affymetrix Mapping 250K Sty
## 15 A_50H Affymetrix Mapping 50K Hind240
## 16 A_50X Affymetrix Mapping 50K Xba240
## 17 A_Onco Affymetrix OncoScan
## 18 A_OncoCNV Affymetrix OncoScan CNV
## 19 A_SNP5.0 Affymetrix SNP 5.0
## 20 A_SNP6.0 Affymetrix SNP 6.0
## 21 I_CardioMetab Illumina Cardio-MetaboChip
## 22 I_1M-D Illumina Human1M-Duov3
## 23 I_1M Illumina Human1Mv1
## 24 I_610-Q Illumina Human610-Quadv1
## 25 I_660W-Q Illumina Human660W-Quadv1
## 26 I_CNV-12 Illumina HumanCNV-12
## 27 I_CNV370-D Illumina HumanCNV370-Duov1
## 28 I_CNV370-Q Illumina HumanCNV370-Quadv3
## 29 I_CVD Illumina HumanCVDv1
## 30 I_Core-12 Illumina HumanCore-12v1
## 31 I_CoreE-12v1 Illumina HumanCoreExome-12v1
## 32 I_CoreE-12v1.1 Illumina HumanCoreExome-12v1.1
## 33 I_CoreE-24v1 Illumina HumanCoreExome-24v1
## 34 I_CoreE-24v1.1 Illumina HumanCoreExome-24v1.1
## 35 I_Cyto-12v2 Illumina HumanCytoSNP-12v2
## 36 I_Cyto-12v2.1 Illumina HumanCytoSNP-12v2.1
## 37 I_Cyto-12v2.1f Illumina HumanCytoSNP-12v2.1 FFPE
## 38 I_Exome-12 Illumina HumanExome-12v1.1
## 39 I_Exon510S Illumina HumanExon510Sv1
## 40 I_240S Illumina HumanHap240S
## 41 I_300-D Illumina HumanHap300-Duov2
## 42 I_300 Illumina HumanHap300v1
## 43 I_550v1 Illumina HumanHap550v1
## 44 I_550v3 Illumina HumanHap550v3
## 45 I_650Y Illumina HumanHap650Yv3
## 46 I_Immuno-24v1 Illumina HumanImmuno-24v1
## 47 I_Immuno-24v2 Illumina HumanImmuno-24v2
## 48 I_Linkage-12 Illumina HumanLinkage-12
## 49 I_Linkage-24 Illumina HumanLinkage-24
## 50 I_NS-12 Illumina HumanNS-12
## 51 I_O1-Q Illumina HumanOmni1-Quadv1
## 52 I_O1S-8 Illumina HumanOmni1S-8v1
## 53 I_O2.5-4 Illumina HumanOmni2.5-4v1
## 54 I_O2.5-8 Illumina HumanOmni2.5-8v1.2
## 55 I_O2.5E-8v1 Illumina HumanOmni2.5Exome-8v1
## 56 I_O2.5E-8v1.1 Illumina HumanOmni2.5Exome-8v1.1
## 57 I_O2.5E-8v1.2 Illumina HumanOmni2.5Exome-8v1.2
## 58 I_O2.5S-8 Illumina HumanOmni2.5S-8v1
## 59 I_O5-4 Illumina HumanOmni5-4v1
## 60 I_O5E-4 Illumina HumanOmni5Exome-4v1
## 61 I_OE-12 Illumina HumanOmniExpress-12v1
## 62 I_OE-12f Illumina HumanOmniExpress-12v1 FFPE
## 63 I_OE-24 Illumina HumanOmniExpress-24v1
## 64 I_OEE-8v1 Illumina HumanOmniExpressExome-8v1
## 65 I_OEE-8v1.1 Illumina HumanOmniExpressExome-8v1.1
## 66 I_OEE-8v1.2 Illumina HumanOmniExpressExome-8v1.2
## 67 I_OEE-8v1.3 Illumina HumanOmniExpressExome-8v1.3
## 68 I_OZH-8v1 Illumina HumanOmniZhongHua-8v1
## 69 I_OZH-8v1.1 Illumina HumanOmniZhongHua-8v1.1
## 70 I_OZH-8v1.2 Illumina HumanOmniZhongHua-8v1.2
## 71 I_Cyto850 Illumina Infinium CytoSNP-850K
## 72 I_100 Illumina Infinium Human100kv1
## 73 I_ME-Global-8 Illumina Infinium Multi-Ethnic Global-8
## 74 I_OncoArray Illumina Infinium OncoArray-500K
## 75 I_Psyc-24v1 Illumina Infinium PsychArray-24v1
## 76 I_Psyc-24v1.1 Illumina Infinium PsychArray-24v1.1
list_pop()
Provides a data frame listing the available reference populations from the 1000 Genomes Project, continental or super-populations (e.g. European, African, Admixed American) and sub-populations (e.g Finnish, Gambian, Peruvian)
list_pop()
## pop_code super_pop_code pop_name
## 1 ALL ALL ALL POPULATIONS
## 2 AFR AFR AFRICAN
## 3 YRI AFR Yoruba in Ibadan, Nigera
## 4 LWK AFR Luhya in Webuye, Kenya
## 5 GWD AFR Gambian in Western Gambia
## 6 MSL AFR Mende in Sierra Leone
## 7 ESN AFR Esan in Nigera
## 8 ASW AFR Americans of African Ancestry in SW USA
## 9 ACB AFR African Carribbeans in Barbados
## 10 AMR AMR AD MIXED AMERICAN
## 11 MXL AMR Mexican Ancestry from Los Angeles, USA
## 12 PUR AMR Puerto Ricans from Puerto Rico
## 13 CLM AMR Colombians from Medellin, Colombia
## 14 PEL AMR Peruvians from Lima, Peru
## 15 EAS EAS EAST ASIAN
## 16 CHB EAS Han Chinese in Bejing, China
## 17 JPT EAS Japanese in Tokyo, Japan
## 18 CHS EAS Southern Han Chinese
## 19 CDX EAS Chinese Dai in Xishuangbanna, China
## 20 KHV EAS Kinh in Ho Chi Minh City, Vietnam
## 21 EUR EUR EUROPEAN
## 22 CEU EUR Utah Residents from North and West Europe
## 23 TSI EUR Toscani in Italia
## 24 FIN EUR Finnish in Finland
## 25 GBR EUR British in England and Scotland
## 26 IBS EUR Iberian population in Spain
## 27 SAS SAS SOUTH ASIAN
## 28 GIH SAS Gujarati Indian from Houston, Texas, USA
## 29 PJL SAS Punjabi from Lahore, Pakistan
## 30 BEB SAS Bengali from Bangladesh
## 31 STU SAS Sri Lankan Tamil from the UK
## 32 ITU SAS Indian Telugu from the UK
list_gtex_tissues()
Provides a data frame listing the GTEx full names, LDexpress
full names (without spaces) and acceptable abbreviation codes of the 54 non-diseased tissue sites collected for the GTEx Portal and used as input for the LDexpress
function.
list_gtex_tissues()
## tissue_name_gtex tissue_name_ldexpress
## 1 Adipose - Subcutaneous Adipose_Subcutaneous
## 2 Adipose - Visceral (Omentum) Adipose_Visceral_Omentum
## 3 Adrenal Gland Adrenal_Gland
## 4 Artery - Aorta Artery_Aorta
## 5 Artery - Coronary Artery_Coronary
## 6 Artery - Tibial Artery_Tibial
## 7 Bladder Bladder
## 8 Brain - Amygdala Brain_Amygdala
## 9 Brain - Anterior cingulate cortex (BA24) Brain_Anterior_cingulate_cortex_BA24
## 10 Brain - Caudate (basal ganglia) Brain_Caudate_basal_ganglia
## 11 Brain - Cerebellar Hemisphere Brain_Cerebellar_Hemisphere
## 12 Brain - Cerebellum Brain_Cerebellum
## 13 Brain - Cortex Brain_Cortex
## 14 Brain - Frontal Cortex (BA9) Brain_Frontal_Cortex_BA9
## 15 Brain - Hippocampus Brain_Hippocampus
## 16 Brain - Hypothalamus Brain_Hypothalamus
## 17 Brain - Nucleus accumbens (basal ganglia) Brain_Nucleus_accumbens_basal_ganglia
## 18 Brain - Putamen (basal ganglia) Brain_Putamen_basal_ganglia
## 19 Brain - Spinal cord (cervical c-1) Brain_Spinal_cord_cervical_c-1
## 20 Brain - Substantia nigra Brain_Substantia_nigra
## 21 Breast - Mammary Tissue Breast_Mammary_Tissue
## 22 Cells - Cultured fibroblasts Cells_Cultured_fibroblasts
## 23 Cells - EBV-transformed lymphocytes Cells_EBV_transformed_lymphocytes
## 24 Cervix - Ectocervix Cervix_Ectocervix
## 25 Cervix - Endocervix Cervix_Endocervix
## 26 Colon - Sigmoid Colon_Sigmoid
## 27 Colon - Transverse Colon_Transverse
## 28 Esophagus - Gastroesophageal Junction Esophagus_Gastroesophageal_Junction
## 29 Esophagus - Mucosa Esophagus_Mucosa
## 30 Esophagus - Muscularis Esophagus_Muscularis
## 31 Fallopian Tube Fallopian_Tube
## 32 Heart - Atrial Appendage Heart_Atrial_Appendage
## 33 Heart - Left Ventricle Heart_Left_Ventricle
## 34 Kidney - Cortex Kidney_Cortex
## 35 Kidney - Medulla Kidney_Medulla
## 36 Liver Liver
## 37 Lung Lung
## 38 Minor Salivary Gland Minor_Salivary_Gland
## 39 Muscle - Skeletal Muscle_Skeletal
## 40 Nerve - Tibial Nerve_Tibial
## 41 Ovary Ovary
## 42 Pancreas Pancreas
## 43 Pituitary Pituitary
## 44 Prostate Prostate
## 45 Skin - Not Sun Exposed (Suprapubic) Skin_Not_Sun_Exposed_Suprapubic
## 46 Skin - Sun Exposed (Lower leg) Skin_Sun_Exposed_Lower_leg
## 47 Small Intestine - Terminal Ileum Small_Intestine_Terminal_Ileum
## 48 Spleen Spleen
## 49 Stomach Stomach
## 50 Testis Testis
## 51 Thyroid Thyroid
## 52 Uterus Uterus
## 53 Vagina Vagina
## 54 Whole Blood Whole_Blood
## 55 Select All Tissues ALL
## tissue_abbrev_ldexpress
## 1 ADI_SUB
## 2 ADI_VIS_OME
## 3 ADR_GLA
## 4 ART_AOR
## 5 ART_COR
## 6 ART_TIB
## 7 BLA
## 8 BRA_AMY
## 9 BRA_ANT_CIN_COR_BA2
## 10 BRA_CAU_BAS_GAN
## 11 BRA_CER_HEM
## 12 BRA_CER
## 13 BRA_COR
## 14 BRA_FRO_COR_BA9
## 15 BRA_HIP
## 16 BRA_HYP
## 17 BRA_NUC_ACC_BAS_GAN
## 18 BRA_PUT_BAS_GAN
## 19 BRA_SPI_COR_CER_C-1
## 20 BRA_SUB_NIG
## 21 BRE_MAM_MAM_TIS
## 22 CEL_CUL_FIB
## 23 CEL_EBV_TRA_LYN
## 24 CER_ECT
## 25 CER_END
## 26 COL_SIG
## 27 COL_TRA
## 28 ESO_GAS_JUN
## 29 ESO_MUC
## 30 ESO_MUS
## 31 FAL_TUB
## 32 HEA_ATR
## 33 HEA_LEF
## 34 KID_COR
## 35 KID_MED
## 36 LIV
## 37 LUN
## 38 MIN_SAL_GLA
## 39 MUS_SKE
## 40 NER_TIB
## 41 OVA
## 42 PAN
## 43 PIT
## 44 PRO
## 45 SKI_NOT_SUN_EXP_SUP
## 46 SKI_SUN_EXP_LOW_LEG
## 47 SMA_INT_TER_ILE
## 48 SPL
## 49 STO
## 50 TES
## 51 THY
## 52 UTE
## 53 VAG
## 54 WHO_BLO
## 55 ALL
What if my access token doesn’t work?
<- LDproxy(snp = "rs456", pop = "YRI", token = "123abc456789") df
Can I set a threshold or cut-off value for R2 or D` values?
<- LDproxy("rs12027135", pop = "CEU",r2d = "r2", token = "YourTokenHere123")
df <- subset(df, R2 >= 0.8) new_df
<- read.table("variant_list.txt", header = FALSE)
test LDmatrix(snps = test, pop = "CEU", r2d = "r2", token = "YourTokenHere123")
Error in LDmatrix(snps = test, pop = "CEU", r2d = "r2", token = "YourTokenHere123"), : Input is between 2 to 1000 variants.
<- read.table("variant_list.txt", header = FALSE)
test LDmatrix(snps = test[,1], pop = "CEU", r2d = "r2", token = "YourTokenHere123")
## RS_number rs60676332 rs7805287 rs127 rs456 rs10239961 rs114
## 1 rs60676332 1.000 0.008 0.013 0.017 0.286 0.039
## 2 rs7805287 0.008 1.000 0.980 0.882 0.170 0.614
## 3 rs127 0.013 0.980 1.000 0.900 0.167 0.632
## 4 rs456 0.017 0.882 0.900 1.000 0.177 0.722
## 5 rs10239961 0.286 0.170 0.167 0.177 1.000 0.008
## 6 rs114 0.039 0.614 0.632 0.722 0.008 1.000
What genome build does LDlink use for genomic coordinates?
How can I ask for help?
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] LDlinkR_1.1.2
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.2 compiler_4.0.3 R6_2.5.0 magrittr_2.0.1 htmltools_0.5.1.1
## [6] tools_4.0.3 yaml_2.2.1 stringi_1.5.3 rmarkdown_2.6 knitr_1.31
## [11] stringr_1.4.0 digest_0.6.27 xfun_0.20 rlang_0.4.10 evaluate_0.14