Wednesday, December 21, 2011
proc printto log="I:\sasods\printto_test.txt" print="I:\sasods\printto_test.txt";
input x y z;
14.2 25.2 96.8
10.8 51.6 96.8
9.5 34.2 138.2
8.8 27.6 83.2
11.5 49.4 287.0
6.3 42.0 170.7
proc print data=numbers;
title 'Listing of NUMBERS Data Set';
creates the following file:
NOTE: PROCEDURE PRINTTO used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
3 data numbers;
4 input x y z;
NOTE: The data set WORK.NUMBERS has 6 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
14 proc print data=numbers;
15 title 'Listing of NUMBERS Data Set';
Listing of NUMBERS Data Set
Obs x y z
1 14.2 25.2 96.8
2 10.8 51.6 96.8
3 9.5 34.2 138.2
4 8.8 27.6 83.2
5 11.5 49.4 287.0
6 6.3 42.0 170.7
NOTE: There were 6 observations read from the data set WORK.NUMBERS.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
18 proc printto;
The last PROC PRINTTO resets to the default output and log destinations.
Wednesday, December 14, 2011
1. Output the data set from SAS using PROC EXPORT.
proc export data=out
2. Make a Stata .do file that would read in data1.csv and perform the desired analysis. Save it as 'C:\Clients\StataFromSAS\analysisStata.do". For example:
insheet using 'C:\Clients\StataFromSas\data1.csv', clear
set linesize 100
3. In SAS, the X statement issues an operating-environment command from within a SAS session (e.g. performs a DOS command under Windows). Since Stata can be run from a DOS prompt in batch mode, we can call its .do file from SAS using X statement. First, change DOS directory to where the Stata log file should go, and then execute analysisStata.do in batch mode (for more on Stata in batch mode, see appendix C.5 of Stata [GS] manual). The following SAS statements accomplish this:
x '"C:\Program Files (x86)\Stata12\StataSE-64.exe" /e do "C:\Clients\StataFromSAS\analysisStata.do"';
4. Enjoy :)
Tuesday, December 13, 2011
In contrast to other tools in domain-specific data analysis languages like R, it features deeply integrated array axis indexing which enables intuitive data alignment, pivoting and reshaping, joining and merging, and other kinds of standard relational data manipulations.While I've not looked at nor used the library at this point, if anyone would like to explore or try it out for their work, please let me know - I'd be happy to dive into it with you.
Sunday, December 11, 2011
Monday, November 14, 2011
> Is that a pun? Word 2011 is only available for Mac :)
Like I said, why not just upgrade?
> Find and Navigation Pane in Word 2010 does not work with Mendeley Word plugin
It seems to work fine for me with Word 2011. Why not just upgrade?
Sunday, November 13, 2011
I searched online a little, and found it it's because of the Mendeley Word plugin! I uninstalled it, and Find in Word now works. Sad but true. See here:
Monday, November 7, 2011
Friday, October 28, 2011
and here is the plot that I get:
varname=label" for each group. Clearly, this isn't the desired labeling because there is no need for the "varname=" part. I didn't find much on help on the web regarding how to deal with this (except this post which is cumbersome, to say the least). There also does not appear to be an option in sts graph or in legend() to change this behavior. Also, it's not the behavior of other plot types (e.g. graph bar age, over(iGroup) labels everything with just the value label).
So here is my solution - it is to define the value labels using global macros:
global iGroup2 "Group -/+"
global iGroup3 "Group +/+"
label def iGroup 1 "$iGroup1" 2 "$iGroup2" 3 "$iGroup3", modify
label val iGroup iGroup
stset DFS, fail(Relapse)
Monday, October 17, 2011
Monday, October 10, 2011
Here is an example from a different field which, while it doesn't surprise me, does thrill me.
It's about the foldit game developed at the University of Washington, and how non-experts decoded an aids protein.
A lay overview appears here:
A paper outlining how experts structured involving others in a way which worked (and which may have some simple lessons for broader application):
Sunday, October 9, 2011
- use a macintosh (Snow Leopard or newer OS/X);
- use source repositories (any of SVN, git, or mercurial / hg )
And for those not interested in mapping their own, this related note is also interesting:
Tuesday, September 13, 2011
Be sure to use the limits and calendar icons.
Mobile app versions are available from the "full featured" site (which Google recently acquired): http://matrix.itasoftware.com/
Thursday, July 21, 2011
Monday, July 11, 2011
Tuesday, June 28, 2011
As the journal will only exist online, it offers an opportunity to create a journal and article format that will exploit the potential of new technologies to allow for improved data presentation.which is consistent with the research site structure we are only just beginning to explore.
Friday, June 24, 2011
Tuesday, May 31, 2011
Sunday, May 29, 2011
> Has anyone heard of Data Wrangler before?
Interesting -- they clearly have spent a lot of time thinking about
the interface (whether you agree with their approach or not). My
question is: who is their intended audience? It's possible that
people working in Excel (or similar) without any programming
experience might find something like this beneficial (this is an
empirical question). However, I doubt that anyone with any
programming experience (or solid knowledge of a data analysis package)
would find this useful. For instance, consider the example they
describe in their technical report. This dataset can easily be read
into Stata with
insheet using <filename>
gen state = regexs(1) if regexm(v1,"Reported crime in ([A-z]+)")
replace state = state[_n-1] if mi(state)
drop if missing(v2)
destring v1, replace
or Python with
data = 
with open('<filename>') as f:
for line in f:
items = line.split(',')
if items.startswith('Reported crime'):
state = items.replace('Reported crime in ','')
Now, compare this to the code generate by Data Wrangler:
delete('Year starts with "Reported"')
which, I would argue, is both longer and considerably more difficult
to read (moreover, this code does not even read in the data file, nor
handle transferring the data into an environment (e.g., Stata) where
they can be analyzed). Now, I suppose that the whole point here is
that with Data Wrangler this can be done via a GUI, however in my case
I could definitely do this in Stata or Python faster, and when I'm
done, I have a routine that can more easily be used/extended to handle
subsequent (similar) datasets.
This strikes me as similar in some ways to Applescript. The idea was
to simplify scripting so that anyone could do it, but in the end non-
programmers still find it too difficult, and programmers prefer to use
a standard, more capable scripting language (e.g., bash, Python, Ruby,
Saturday, May 28, 2011
Report on Feb. Conference.
Has anyone heard of Data Wrangler before?
I've just tried this, and would like to hear from non-engineering/CS staff on their experience.
In a nutshell:
- Go to http://vis.stanford.edu/wrangler/
- Preview the video (for orientation)
- Try the app: http://vis.stanford.edu/wrangler/app/
- Look at the DataWrangler Blog for some important notes
In particular, they are looking at samples of how people are using their tool (so clean your data appropriately).
Also, note the "large data" comment:
- You should not (really) depend on the actual, bulk data conversion (see the "Export" link near the script that is developed, lower left)
- You want to really use a sample of your incoming data so Data Wrangler can generate a python script for you to locally filter your actual data
RCG staff may be able to help you get started with this, if you need.
NOTES for RCG staff
- It will export CSV results (not so interesting);
- It will export your particular filtering steps as a program (very interesting)
I recommend a look at the design paper: http://vis.stanford.edu/files/2011-Wrangler-CHI.pdf)
Friday, May 13, 2011
Friday, April 22, 2011
Wednesday, April 20, 2011
"popower computes the power for a two-tailed two sample comparison of ordinal outcomes under the proportional odds ordinal logistic model. The power is the same as that of the Wilcoxon test but with ties handled properly. posamsize computes the total sample size needed to achieve a given power. Both functions compute the efficiency of the design compared with a design in which the response variable is continuous. print methods exist for both functions. Any of the input arguments may be vectors, in which case a vector of powers or sample sizes is returned. These functions use the methods of Whitehead (1993)."
Whitehead J (1993): Sample size calculations for ordered categorical data. Stat in Med 12:2257–
R in Action's author, Rob Kabacoff, also runs the site, Quick R referenced in Hacky's post:
Tuesday, April 19, 2011
Tuesday, April 12, 2011
Friday, April 1, 2011
vc <- vcov(fit, useScale = FALSE)
b <- fixef(fit)
se <- sqrt(diag(vc))
z <- b / sqrt(diag(vc))
P <- 2 * (1 - pnorm(abs(z)))
return(cbind(b, se, z, P))
This assumes normality of estimated coefficients
Wednesday, March 30, 2011
set -o vi # for vi mode
set -o emacs # for emacs mode
Emacs command line editing cheat-sheet:
Tuesday, March 29, 2011
But how do you do the complementary operation - change directories of your terminal to the finder location?
Put this in your ~/.bash_login file, and then you can use "cdf" to change terminal directory to the last finder which had focus:
# cdf: cd's to frontmost window of Finder
currFolderPath=$( /usr/bin/osascript <<" EOT"
tell application "Finder"
set currFolder to (folder of the front window as alias)
set currFolder to (path to desktop folder as alias)
POSIX path of currFolder
# ignore the cd if it's my desktop
if [ "$currFolderPath" != "$HOME/Desktop/" ]
echo "cd to \"$currFolderPath\""
echo "no finder path to cd to..."
One final piece to round this out: to open a terminal in the current window, you can add "Go2Shell" to your finder - see http://zipzapmac.blogspot.com/2013/07/go2shell-instant-terminal-window.html for more info.
Thursday, March 24, 2011
histnorm = function(xx,...)
yfit <- yfit*diff(h$mids[1:2])*length(xx)
lines(xfit, yfit, col="blue", lwd=2)
Sunday, February 20, 2011
Friday, February 11, 2011
Friday, January 14, 2011
Thursday, January 13, 2011
> coef(fit$model$corStruct,unconstrained = FALSE)
** without the unconstrained = FALSE option, it gives some other (?) number
Alternatively, intervals(fit)$corStruct gives a vector with confidence intervals
lower est. upper
Phi1 -0.421 0.0384 0.482
 "Correlation structure:"
- ► 2014 (19)
- ► 2013 (23)
- ► 2012 (20)
- ▼ December (5)
- ► November (5)
- ► October (6)
- ► May (5)
- ► April (7)
- ► March (3)