Wednesday, October 10, 2012

Looping through variable names in R

Ok, so most people probably know how to do this, but I have a mental block every time I have to do this.  If you have a list of variable names for which you want to do something, here are several ways to do it.

> set.seed(1234567)
> dat=data.frame(matrix(rbinom(100, 5,.3), ncol=5))
> head(dat)
  X1 X2 X3 X4 X5
1  2  3  1  2  1
2  2  1  1  3  0
3  3  3  2  1  2
4  0  1  2  2  2
5  2  1  0  1  1
6  1  1  3  2  2
> nms=names(dat)
> for(i in 1:length(nms)){
+   print(with(dat, eval(parse(text=paste("table(",nms[i],")")))))
+ }
X1
0 1 2 3 
3 7 6 4 
X2
0 1 2 3 4 5 
2 9 4 3 1 1 
X3
0 1 2 3 
4 8 6 2 
X4
 0  1  2  3 
 1 11  6  2 
X5
0 1 2 3 4 
3 7 8 1 1  
The innermost paste() component produces a character string which is the function to be evaluated.
 
> i=5 
> paste("table(",nms[i],")")
[1] "table( X5 )"
It is then parsed as text, and evaluated within the dat environment.  Finally, because the output can be displayed by wrapping print() around the whole expression.  This approach is proposed here

Alternatively, one could use lapply() and avoid the loop as proposed by UCLA ATS here.

> with(dat, lapply(names(dat), 
+                  function(x){
+                    table(eval(substitute(tmp, list(tmp=as.name(x)))))
+                  }))
[[1]]

0 1 2 3 
3 7 6 4 

[[2]]

0 1 2 3 4 5 
2 9 4 3 1 1 

[[3]]

0 1 2 3 
4 8 6 2 

[[4]]

 0  1  2  3 
 1 11  6  2 

[[5]]

0 1 2 3 4 
3 7 8 1 1

Note that this method doesn't print a name for each table. This problem can be solved by using sapply(), the "user-friendly version and wrapper of lapply()", and specifying the USE.NAMES=TRUE option.

> with(dat, sapply(names(dat), 
+                  function(x){
+                    table(eval(substitute(tmp, list(tmp=as.name(x)))))
+                    }, 
+                  USE.NAMES=TRUE))
$X1

0 1 2 3 
3 7 6 4 

$X2

0 1 2 3 4 5 
2 9 4 3 1 1 

$X3

0 1 2 3 
4 8 6 2 

$X4

 0  1  2  3 
 1 11  6  2 

$X5

0 1 2 3 4 
3 7 8 1 1 


In Stata there is a designated command for this - one could simply use
foreach var of varlist X1-X5{
   tab `var'
}

3 comments:

  1. The Stata snippet Masha provided is often usefully combined with the -ds- command. For example, suppose you wanted to loop through all string variables, all variables with value label "foo", or all variables whose variable label contains the word "shazam"? To do so, you could use one of:

    ds, has(type string)
    ds, has(vallabel foo)
    ds, has(varlabel *shazam*)

    followed by

    foreach var in `r(varlist)' {
    [do something with macro var]
    }

    As you can see, -ds- saves its result in the macro r(varlist).

    ReplyDelete
  2. Thanks for the reminder about the eval/parse/substitute functions, Masha. Although it doesn't illustrate the main point of your post, I thought I would note that the following code could be used to produce a similar result:

    apply(dat, 2, table)

    The '2' indicates that the 'table' function should be applied to each column of 'dat' (1 would be used to apply the function to rows). The result is similar to the lapply example you posted.

    ReplyDelete

Subscribe via email

Enter your email address:

Delivered by FeedBurner

Followers

google analytics