Wednesday, September 29, 2010

World Statistics Day - 20 October 2010

http://unstats.un.org/unsd/wsd/Default.aspx

On 20 October 2010, the World will celebrate the first World Statistics Day, to raise awareness of the many achievements of official statistics premised on the core values of service, professionalism and integrity.

The World Statistics Day (20 October 2010) logo is a stylized statistical chart encircled by five coloured wreaths symbolizing a dynamic, fast changing world as represented by the five continents. The three blue and green bars at the center represent the three core principles of official statistics: Service, Professionalism and Integrity. Together and intertwined, they lay a strong foundation for the national and global statistical system.


Efficient Programming in R

Martin Morgan's slides on "Efficient R Programming" 
http://bioconductor.org/help/course-materials/2010/BioC2010/EfficientRProgramming.pdf

(reposting from Revolutions blog: http://blog.revolutionanalytics.com/2010/09/efficient-r-programming.html)

codebook function for R

codebook is a nice and convenient function in Stata to get a snapshot of a dataset. I couldn't find a counterpart in R within the standard packages. But Frank Harrell's Hmisc package has describe function which pretty much does the same thing.


## install
install.packages("Hmisc")
## load the library
library(Hmisc)


describe(data)

Thursday, September 23, 2010

How to merge pdf files into one from command line

Put all the pdf files into a directory
I named them as ch00.pdf, ch01.pdf, etc.
Run on linux command(mac, cygwin, you need Ghostscript installed):
  gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf *.pdf

This method preserves the table of contents of each pdf file

How to add text at specified coordinates using ggplot2

Suppose we want to plot x1 vs. y1 contained in a data.frame data, and then add text directly on the graph.

> head(data[,c(1, 10:11)])
  ptid          x1       y1
1    1  0.51666667 2.142857
2    2  0.59999999 2.400000
3    3  1.35000000 3.000000
4    4 -0.08333333 2.142857
5    5          NA 2.500000
6    6  0.96666668 2.666667

Here is a code snippet:

ggplot(data, aes(x1, y1)) + geom_point() + geom_smooth(method="lm") +
       opts(title = "Approach 1") +
       xlab("X1") + ylab("Y1") +
       geom_text(aes(x2,y2,label = texthere), 
          data.frame(x2=2, y2=2.8, texthere="Text Here"))

Note, that we use two different data frames in ggplot() and in geom_text(). The data frame in geom_text() specifies the coordinates (x2=2 and y2=2.8) and the label to place at that point. 


Wednesday, September 22, 2010

How to combine multiple plots with ggplot2 graphics

Here is a great article on how to combine multiple plots (no, these are not facets) when using plots in ggplot2.  Using the arrange() function is then equivalent to using par(mfrow=c(nr,nc)) with regular graphics.

http://gettinggeneticsdone.blogspot.com/2010/03/arrange-multiple-ggplot2-plots-in-same.html

Deleting columns in a data.frame in R

To delete a single column: 
data["colname"] <-NULL
data <- data[, -grep("colname",names(data))]

To delete multiple columns:

# by column number (e.g. columns with indices c1, c2 and c3 )
data <- data[,-c(c1,c2,c3)] 

# by column name:
data <- data[-match(c("var1","var2"), names(data))]
data <- data[-which(names(data) %in% c("muc3", "muc4"))]
or
data[c("var1","var2")] = list()


# all columns with the same root (e.g. variables "time1", "time2", ..., "timeN")
data <- data[-grep("time",names(data))]


Monday, September 13, 2010

post-mortem debugging in R

## to start debugging (browser) right after an error
## set error option to recover


options(error=recover) ## default is NULL


after the error, select which function you want to debug (there will be more than one if the error happened in a function called from the main one)


where: tells you where in the function you are
ls(): lists all the local variables
print(variablename)


## to start debugging at a given point in the function
## insert browser()


myfunction = function(x)
{
   ...
   browser()
   ...
}


http://www.biostat.jhsph.edu/~rpeng/docs/R-debug-tools.pdf
http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/pmd.shtml

Friday, September 10, 2010

standard deviation given MAF using Hardy Weinberg Equilibrium

## compute variance of X with maf
## assuming Hardy Weinberg equilibrium

funvarx = function(maf)
  {
    paa = maf^2
    pAA = (1-maf)^2
    pq2 = 2*maf*(1-maf)
    
    varx = pAA*(1-pAA) + paa*(1-paa) + 2*pAA*paa

    return(varx)
  }
funsdx = function(maf) return(sqrt(funvarx(maf)))

maf = 0.25
fundsdx(maf)

Tuesday, September 7, 2010

get rid of annoying online ads

Try the bookmarklet from here. It shows you just the main content of the page without the annoying ads around it.

http://lab.arc90.com/experiments/readability/

Friday, September 3, 2010

Apoptosis (or should we say apotosis?)

according to this introductory book (Molecular biology made simple and  
fun by David Clark):

apoptosis in Greek for "dropping off" and if you want to fool other  
people into thinking you are educated, you must not pronounce the  
second "p". 


Apparently, that's the mechanism by which tadpoles lose their tails when becoming frogs.




biostatistics vs lab research - you have to look at this

http://www.youtube.com/watch?v=PbODigCZqL8&feature=player_embedded#

Subscribe via email

Enter your email address:

Delivered by FeedBurner

Followers

Blog Archive

google analytics