simpleM


simpleM is a multiple testing correction method for genetic association studies using correlated SNPs. The software is written in R.

Citation:

Multiple testing corrections for imputed SNPs. Gao X. Genet Epidemiol. 2011 Apr;35(3):154-8.

Avoiding the high Bonferroni penalty in genome-wide association studies.  Gao X, Becker LC, Becker DM, Starmer JD, Province MA. Genet Epidemiol. 2010 Jan;34(1):100-5.

A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Gao X, Starmer J, Martin ER. Genet Epidemiol. 2008 May;32(4):361-9.

Data Format:
Genotypes are coded as 0, 1 and 2: 0, 1 and 2 are the number of the reference alleles. Genotypes are separated by one-character white spaces. Rows are SNPs and columns are individuals. SNPs should be in their physical order (for calculating LD). NO SNP names and individuals IDs. Missing values should be imputed, which is mainly for keeping the correlation matrix positive semi-definite. There are several possible ways to fill in the missing values, e.g. using imputation software, K-nearest neighbor (KNN) or replacing them with the common allele genotypes.

An example data:
2 1 2 2 2 2 0
2 2 2 1 2 2 0
1 2 0 0 1 2 0
2 2 2 2 1 2 2
2 1 1 1 2 0 2
1 2 2 1 1 2 1
2 1 2 1 2 2 2
2 2 2 1 2 2 1
2 1 2 1 1 1 2
2 2 2 2 1 1 2

Download:
Please click the link: http://sourceforge.net/projects/simplem/files/simpleM_Ex.zip/download

How to use it:

You only need to change one single line (tell the program where your data is stored at) in the downloaded R program. Search for the line: fn_In <- "D:/simpleM_Ex/snpSample.txt". Replace the file path and name with your own data file path and name. simpleM outputs the effective number of independent tests.

License:
GPL version 2 or newer.

Frequently asked questions:
I got the following error message. What shall I do?
    Error in eigen(CLD) : infinite or missing values in 'x'
    In addition: Warning message:
    In cor(dt_My) : the standard deviation is zero
    > snpInBlk <- t(mySNP_nonmissing[myStart:numLoci, ])
    > MeffBlk <- inferCutoff(snpInBlk)
    Error in eigen(CLD) : infinite or missing values in 'x'
    In addition: Warning message:
    In cor(dt_My) : the standard deviation is zero
    > simpleMeff <- c(simpleMeff, MeffBlk)
    Error: object 'MeffBlk' not found

Answer:
The above error was triggered by non-polymorphic SNPs in your data set. For example, all individuals have only 0 values for a SNP, or all 1s, or all 2s.
Solution: filter your SNPs based on minor allele frequency (MAF), e.g. keeping only SNPs with MAF >= 0.05 or 0.01.

Feedback and Suggestions:
Xiaoyi Gao, ray.x.gao_at_gmail*dot*com