Monday, August 4, 2014

Monday, July 28, 2014

iPython notebook: slides, blog posts - live / interactive too, if you want

Create slides easily with iPython notebook using markdown cells in your notebook, and generating the slides (reveal.js based):

The post in a Nikola-based static blog, with video and instructions is here:

The raw slide-show (should you want to see it full-screen) is here:

When looking at the slide-show, be sure to try some of the reveal.js directives (such as ESC to see an overview of the slides, or 's' for speaker notes, 'b' to darken the screen).  Also swipe and pinch should work in mobile safari.

With the development of collaboratory on Jupyter (the new iPython development direction, with notebooks shared on Google Drive), you will be able to have a live generated ipython slideshow also (see the SciPy-2014 video)

Friday, July 18, 2014

iPython Notbooks (colloborative real-time code sharing) on Google Drive

I heard about this a while ago:

Last week it saw the initial announcement, and downloaded what is working now (notebook on google docs, but ipython executor for the time being on your laptop - the "in Chrome Browser" instance wasn't working so well).   This should get a big step more interesting once they get the iPython-notebook engine executing on google app-engine (which is work in progress I believe).

You can read about, get this here:

Getting Educational account on Github - Easy as pi

I just went to to apply for a “private” repository status.

I had to add / verify my uchicago email to my existing github account (literally as fast as I could follow the link, and reply to the confirmation email).

Then I got a note saying “we’ll get back to you within a week” — and immediately got this email:

Begin forwarded message:
Subject: Powerup get!
Date: July 18, 2014 at 2:40:29 PM CDT

Hey yarko, we have awesome news...
We've upgraded you to a Micro plan, which will be free for the next two years. After that, you'll get an email saying that your coupon is expiring. You can reapply for another coupon if you still have academic status. We don't have any collaboration limits, so any group projects you may encounter can be hosted via your account. If you need help getting started with Git and GitHub, check out:
Also, spread the word: we love giving educational discounts to students, teachers, administrators, and researchers! Please send them to
Have an Octotastic day!
- The GitHub Education Team

Friday, June 20, 2014

Standardized Effect Sizes for Power Calculations

Here is an excellent summary of how the various effect sizes are defined in power calculations - Cohen's d, Cohen's f and f-squared, and others. Courtesy of UCLA Institute for Digital Research and Education (IDRE).

Tuesday, May 20, 2014

IPython Notebooks integrated with Google Docs


The downside?  It runs in google chrome as a native client (NaCL),  sandboxed rather than served.

Citable Code Repositories; free academic github accounts

Github wants your research repositories (code, or otherwise) to be citable, so they've implemented Digital Object Identifiers (DOIs) for your repositories.

Github has also created a discount where individual academic researchers can receive a free micro plan with 5 private repos, while research groups can receive a free silver plan with 20 repos.
To set up an academic account on GitHub, first associate an academic email address with your account and then request a GitHub Education discount.

Read more about Github DOIs here:

Tuesday, May 6, 2014

Online course on "Reproducible Research"

John Hopkins has a 4 week MOOC on coursera (a commercial MOOC delivery system) called "Reproducible Research" which looks at R and generating documentation and procedures within R.

The course is delivered by Roger Peng.

If you are interested in taking this, I would be happy to hear from you about it.

Note that U.Chicago has recently joined consortium (similarly a MOOC delivery system, but open source;  both edX and coursera are python/django based platforms).

It would be perhaps more interesting and more accurate to think about a survey of the various essential elements of what needs to be part of "Reproducible Research" and the various existing tools currently available to accomplish that, their strengths, weaknesses, and so forth - as well as a survey of what is missing and still needed.

If you take this, please let me know and provide some feedback, impressions.

Wednesday, March 26, 2014

Python for Informatics course (MOOC)

This 10 week course on basics starts April 10th, with free eBook text and key course material available for remix / reuse:

Monday, March 17, 2014

Code Citation for Research & Academia

MIT's handbook for students says:
What does it mean to “cite” a source?
In writing a computer program, it means:You use comments to credit the source of any code you adapted from an open source site or other external sources. Generally, providing a URL is sufficient. You also need to follow the terms of any open source license that applies to the code you are using.
Given the complexity, always changing nature of software, this is insufficient for research.

Clearly, citation of code is in need of some education - what happens 5 years from now?  Which branch / version where you citing?  What were the specific versions of ancillary code and libraries your solution depended on?

Here's a brief article on need for citing code as research has become more computation and data intensive:

Note: this link is shown below for convenience, for a quick preview.

Wednesday, March 12, 2014

PyCon 2014 Talks - Chicago Warmup Session

ThoughtWorks has offered to host a series of talks for some of our
local Chicago area folks heading up on stage at PyCon 2014 (

When: 7:00 PM on Wednesday March 26, we'll have four great talks to share.

Where: 200 E Randolph St, Chicago, IL 60601

RSVP: at (required for access to building).
  • Dave Beazley will present one of:
    •  a tutorial, "Generators: The Final Frontier"
    • conference talk, "Discovering Python."
  • Naomi Ceder, a conference talk:
    • "Farewell and Welcome Home: Python in Two Genders"
  • Pete Fein, a conference talk:
    • "Free Software, Free People"
  • Yarko Tymciuraka lightning talk:
    • "Scrape: an interactive web scraping development environment using selenium"
Food and drink are sponsored by Rackspace.

Chicago mailing list

Sunday, March 9, 2014

Fast queries over large data

This past week at  I talked w/ Armando Fox about projects, MOOCs, and connection/engagement of teams (including, but not limited to students).

On a data front, this post from a little more than a year ago:

talks of Google's Dremel where Armando said:
‘Before Dremel, no one had really done a system that was that big and that fast. Usually, you have to do one or the other. The more you do one, the more you have to give up on the other. But with Dremel, they did both.’

The OpenDremel system which the wired article references has been merged with Apache Drill which has reached milestone-1.

While Drill is for searching across HBase, Mongo, and Cassandra,  Apache Spark is about significantly speeding Map-Reduce as compared to simply using Hadoop.

Sunday, February 23, 2014

Wednesday, February 5, 2014

DARPA publishes all its open source code in one place

Thought I would share this article:

From that article:

The catalog launched with more than 60 projects, many of which have an emphasis on organizing large sets of data... and the MIT-developed dynamic language Julia.
If the research community responds well to the first iteration of the catalog, DARPA says it will continue to publish information about its projects, including software, publications, data, and experimental results.
The catalog listing is at - most of the source items appear to be hosted at github, a few on (an open platform).   There is also a catalog of papers.

Some of the software items are known (to me), so the only thing new is the aspect of DARPA funding (?).  Some are extensions / support to known items.

It's an interesting collection in some aspects.

Friday, January 10, 2014

Statistics in Medicine: Multiple hypothesis testing in genomics

An useful article just came out in Statistics in Medicine - it's a tutorial on multiple hypothesis testing in genomics.  It describes methods from both conceptual and practical perspectives, and has some R code examples. Unfortunately it does not discuss experimental design.

Thursday, January 9, 2014

Python Statistical Data Visualization

This might be of interest to some: (

Note:  this link is shown below for your convenience.   It is the above link (which might be more convenient to view in a full browser window).

Subscribe via email

Enter your email address:

Delivered by FeedBurner


Blog Archive