Wednesday, March 26, 2014

Python for Informatics course (MOOC)

This 10 week course on basics starts April 10th, with free eBook text and key course material available for remix / reuse:

Monday, March 17, 2014

Code Citation for Research & Academia

MIT's handbook for students says:
What does it mean to “cite” a source?
In writing a computer program, it means:You use comments to credit the source of any code you adapted from an open source site or other external sources. Generally, providing a URL is sufficient. You also need to follow the terms of any open source license that applies to the code you are using.
Given the complexity, always changing nature of software, this is insufficient for research.

Clearly, citation of code is in need of some education - what happens 5 years from now?  Which branch / version where you citing?  What were the specific versions of ancillary code and libraries your solution depended on?

Here's a brief article on need for citing code as research has become more computation and data intensive:

Note: this link is shown below for convenience, for a quick preview.

Wednesday, March 12, 2014

PyCon 2014 Talks - Chicago Warmup Session

ThoughtWorks has offered to host a series of talks for some of our
local Chicago area folks heading up on stage at PyCon 2014 (

When: 7:00 PM on Wednesday March 26, we'll have four great talks to share.

Where: 200 E Randolph St, Chicago, IL 60601

RSVP: at (required for access to building).
  • Dave Beazley will present one of:
    •  a tutorial, "Generators: The Final Frontier"
    • conference talk, "Discovering Python."
  • Naomi Ceder, a conference talk:
    • "Farewell and Welcome Home: Python in Two Genders"
  • Pete Fein, a conference talk:
    • "Free Software, Free People"
  • Yarko Tymciuraka lightning talk:
    • "Scrape: an interactive web scraping development environment using selenium"
Food and drink are sponsored by Rackspace.

Chicago mailing list

Sunday, March 9, 2014

Fast queries over large data

This past week at  I talked w/ Armando Fox about projects, MOOCs, and connection/engagement of teams (including, but not limited to students).

On a data front, this post from a little more than a year ago:

talks of Google's Dremel where Armando said:
‘Before Dremel, no one had really done a system that was that big and that fast. Usually, you have to do one or the other. The more you do one, the more you have to give up on the other. But with Dremel, they did both.’

The OpenDremel system which the wired article references has been merged with Apache Drill which has reached milestone-1.

While Drill is for searching across HBase, Mongo, and Cassandra,  Apache Spark is about significantly speeding Map-Reduce as compared to simply using Hadoop.

Subscribe via email

Enter your email address:

Delivered by FeedBurner