Wednesday, December 11, 2013

On interactive python books, & hi-performing python=>javascript translators...

These may be of more interest to the programmers in our midsts -

Python Data Structures - Interactive Book

One of a set of interactive books.

The source (should you want to try to write your own book this way) is at - note: it presents a book as a web2py application (note: web2py is notoriously non-pythonic in its code base, but interactive and used in teaching web programming).  In any case, the questions and running code, and other extensions do not appear to rely on web2py, but rather run in the browser - so you can actually have a reasonably interactive static site - for an example, see the overview page:

It's instructive to compare this way of having interactive python with (and blogging w/ same).

It's also instructive to compare this (rather complicated) way of running interactive python with the golang blogging tool (see, for example, - and the elegant source for the tool which accomplishes writing interactive code in a simple post -

Python to Javascript compiler - Wait, what?!

This may sound a little gimmicky, but it seems to be a simple parser - so even though it's only ben in existence under a month, the code seems like it should be functional.

But it doesn't end there - there's competition afoot, with some pretty extreme performance claims:

which is understandable, since pythonium is a fork of pythonJS - and thus a little history (albeit slanted) might help:

Why ever write javascript?!

Friday, October 25, 2013

Thursday, October 17, 2013

Julia - analysis language, compared to R (psst: Python looks pretty good compared w/ R)

You might enjoy these links:

Efficiency and terseness, compared to R:

An example, in a blog post:

The source, and home page:

Take special note of Julia vs. Python vs. R performance here (I am surprised at Python vs. R):

(note: there is an Ipython notebook backend for Julia)

and of an overview comparison with R:

John Myles White's talk about streaming data analysis, and managing memory:

Finally, here's a comparison of some data analysis tools:
  • Matlab (really Octave);
  • R
  • Julia
  • it seems they almost got to Python (just a mention)

Tuesday, October 15, 2013

Does this sound familiar? (a reality from any data, software or almost any other activity)

Apologies to MK - I was posting posts, and saw this draft from her, from earlier in the year (January).

It pointed to a good read - one that hit a particularly vocal chord.

In case she she rescinded her intent to share, then I am solely responsible.

A good meta-parable, a good read:

ggplot for python & random forrest regression in

If you already combine R and Python in IPython-notebooks, then you probably already use R-magic (calling R from ipython).

In that case, you may appreciate a port of ggplot2 to Python:

The immediately prior post compares performance and results of random forrest regression in R and Python:

R and STATA in Statistical analysis of politics

Perhaps the proliferation of big data is leading the charge to analyze.

The question of any analysis lies in - what does your model say, and how valid is it?

Inspect and run it yourself is always a bottom line answer.

Here's one such (note sources for both R and STATA, if you browse the site):

This via a recent reference by Joshua Holland on -

I had imagined a site like this, only with some correlation to the purpose and benefactors of any particular bill (which would be easier if they were single topic items, of course). In any case, for all you "data" fans, here it is.

There are a few parts of this which I find interesting:

  • The Polarization of the Parties

The polarization diagram is best viewed in the context of this page, but since there are no tags to internally link to the various sections, I pulled out the image separately as a way to point to it.

Be sure to play with your own analysis - R-Studio ( or IPython Notebooks, with the "R magic" extension should do it ( and should help show the way).

Monday, September 9, 2013

Brief Architectural Overview of edX + getting started

At the Chicago Python Users Group this Thursday, 9/12  (details at 
  • An overview of edX by @yarkot(+Yarko Tymciurak)
    • A brief history of CBI (from 1960);
    • Python's solution carrying it forward;
    • current CBI / MOOC resurgence - and current options;
    • An architectural overview of edX
    • Getting started with edX
      • setting up a courseware development environment
      • setting up for platform development
        • grading modules, course plugins, theming
      • building your desktop environment for course development
    • Examples of extensions & graders
    • Where to go from here
Other presentations Thursday evening: 
  • Autoscale on Rackspace by @brian_curtin(+Brian Curtin ) 
  • Love: for techies: evolution, function, and leveraging for community building & problem solving @yarkot

rsvp at

Wednesday, August 14, 2013

Google Consumer Surveys

Google Consumer Surveys is a new online survey platform. As opposed to the traditional surveys where each participant is asked an entire questionnaire, Google Consumer Surveys asks just one or two questions in exchange for premium content access. The advantage - asking just a single question results in higher response rates. Using the same algorithm as for ad placement, demographics are inferred based on browsing history.

For more detail on the methodology, see the white paper and the Pew Research Center evaluation.

Tuesday, August 13, 2013

Chicago: 10th International Conference on Health Policy Statistics, October 9-11, 2013

The 2013 International Conference on Health Policy Statistics (ICHPS), organized by the Health Policy Statistics Section of the ASA, is held in Chicago on October 9-11, at the Palmer House Hilton. Early registration deadline is August 29th.

You can find registration and online program information on ICHPS 2013 website.

Thursday, July 25, 2013

Quirrel compared to R

At OSCON today, John De Goes presented briefly a comparison between Quirrel & R.

Quirrel is a query language for unstructured data.  It uses an enhanced version of JSON, and will work well with MongoDB (binary JSON) databases.   Currently, a demo is available at

They expect to open-source it in the next month or so.   It will build / run "out of the box" on Linux / Mac computers.

Some side-by-side code comparisons appear in the slides on SlideDeck.

Wednesday, June 26, 2013

Wednesday, May 15, 2013

Poster Printing Options

Here are some places that I've found to get scientific posters printed:


I've either had a good experience and/or have heard good things about the first two places. The third and fourth places are right on campus. Don't know much about the fifth place.

Sunday, April 21, 2013

I suspect that some of you have already seen E.O. Wilson's recent WSJ editorial on math and science, but in case you haven't, you might want to check it out.

Friday, April 12, 2013

Data sets for the City of Chicago

Stumbled upon the City of Chicago data portal - lots and lots of data (267 data sets as of today), as well as maps are available:

Popular collections include Food Inspections, CTA datasets and Crime statistics. WOW.

Friday, March 29, 2013

PyCon-2013 Videos & Slides

For anyone interested in this year's Python Conference talks:

#pycon videos and slide decks

Data Science Summer Fellowship

From that page:

Who we're looking forWe're looking for PhD, masters, and advanced undergraduate students in the computer science, statistics, and the computational and quantitative sciences. If you're an amazing software engineer with a serious interest in data science, we want to hear from you too.
You’ll need statistics, programming, and data skills - but you don’t need to be an expert in every area.
Most of all, we want people who are passionate about using data for social impact.

I'm considering volunteering as a (programming) mentor - Early June to late August.

Thursday, February 14, 2013

Some useful R links

As I was searching for help on some R functions, I came across a very useful page from the National Parks Service (NPS) - Using R Statistical and Graphics Tools for Natural Resource Stewardship Science. Turns out NPS uses R for data management and report generation - how cool is that!

In particular, they have very helpful overview of R's Data Import/Export, Data Manipulation, Statistical Formulas and several other advanced topics.

And as a bonus, here is a link to an R Reference Card - very handy.

Wednesday, February 6, 2013

Must See TV, er Notebooks

Too diverse and joyful not to share, iPython notebooks - a thing to behold.

Here are some interesting (some!) examples:

Thursday, January 24, 2013

CONSORT Statement Template

Ronán Conroy has created an OmniGraffle template based on the CONSORT guidelines:

A real timesaver when putting together a flow diagram for a study. Perhaps someone who doesn't work on a Mac (and therefore can't use OmniGraffle) could create a similar template in yEd?

-- Phil

Wednesday, January 23, 2013

Open Research Publications

Last November, I posted this link, but didn't publish the blog article - it seemed there needed to be more:

 More publishers move towards CC-BY licence for OA articles - Research Information

Several things have caught my attention since, suggesting a trend.

Is it a trend?  Is it likely to be useful?   How might it impact us?   How can it benefit us?

Publishing in an open way is one facet;

Software processes for research which is leverage-able ("reproducible") is another topic - one I will say more about at later dates.

Subscribe via email

Enter your email address:

Delivered by FeedBurner


Blog Archive

google analytics