March 5, 2017

Python is the language of choice for APIs

“Software is eating the world” is a common refrain in Silicon Valley. When you dig a little deeper, you find that that APIs make that happen. API stands for application programming interface. APIs are how software talks to other software. APIs are the special sauce that enables you to push button on your phone to quickly summon a ridesharing service to your precise location.


Why is Python language of choice for APIs? Readable. Python is designed to be easily read. Not only the built-in parts of the language are human readable, there are also strong cultural norms to have Python code be understood by everyone at first glance. Thus, there is more time spent on "what" the code is trying to than on "how" the code is written.

Another of reason is “Low bar & high ceiling.” It is easy to be productive quickly, but the language is not inherently limiting. You can write your first Python program after a couple of hours. People can be proficient in Python within a week or two. Yet, it would take a lifetime to fully understand The Zen of Python. Python is often used to write code quickly to test new ideas but that same code can be scaled to run large systems.

In particular, Python is the dominant language for data-centric APIs. The two packages that have swept through the Data Science community are Apache Spark and Tensorflow. Spark is a Big Data processing package and TensorFlow is a Deep Learning package. Both packages have APIs in several languages but Python is common to both. However, neither package is natively implemented in Python! Apache Spark is implemented in Scala; Tensorflow is implemented in C++. Python is just a friendly user interface that manages the underlying complexity.

Python allows Data Scientist to focus on their work, not on the implement details.

December 23, 2016

Digital sandwich for 21st Century education

Digital is a core aspect of students' lives and can no longer be an add-on to modern education. A digital course does not mean taking an existing course and converting everything to pdfs and putting them on a static website. Courses should be digital-first with digital elements enmeshed into the educational experience.

Here is what a "digital sandwich" looks like in my classroom:

1. Before class (digital)

I am, on my best day, a modestly good lecturer (I have other strengths that I leverage to create a learning environment for my students). The Internet has the world's best lectures. It would be a disservice to my students not use the best available resources for them. My role as an instructor is to create a curated list of the best lectures.

I extend pre-classroom preparation materials to include book chapters, technical papers, and blog posts.

My secret sauce are digital workbooks. They are interactive companions that accompany the preparation materials. They a typically Jupyter Notebooks. There are embedded videos or links followed by questions and coding activities that check for understanding. This transforms the student from a passive observer to an activate participate in the educational process. Since the workbooks are digital, they can be automatically scored (often with immediate feedback to the students).

2. In class (in-person)

I spend the limited and precious classroom time working through complex concepts examples, answering student questions, completing worked examples, and sharing personal anecdotes that connect the abstract concepts to the student experience. The goal is to provide the unique experience and social learning environment that can not be had digitally.

3. After class (digital)

Much of the learning takes place outside the classroom. It takes time for a student to integrate new material into their previous understanding of the world. My goal is to continue to offer support and guidance after a student leaves the physical classroom.

The primary medium for digital support is group chat. Like everyone in Silicon Valley, the flavor of the month is Slack. I have experienced that group chat is more effective and efficient than email or forums. Group chat is an informal conversation with lower stakes than email and real-time engagement missing from forums. Group chat is also a platform for peer-to-peer learning. The many-to-many connections removes me as a bottleneck for question answering. Group chat is also a frictionless medium to share interesting links that show the real world application of classroom concepts.

Another aspect of post-class digital support are updates to lecture materials and exercises. I demonstrate the course is a joint production of both the students and the instructional team. I update lecture materials and personalize exercises on the fly. I fix my (many) typos, add additional materials, and clarity exercises.

The digital sandwich is my interpretation of Blended Learning, a refinement of the Flipped Classroom idea.

December 9, 2016

Monte Carlo for the win

Faster Space-Efficient Algorithms for Subset Sum, k-Sum and Related Problems shows the power of probabilistic thinking.

Computer Science has decades of deterministic baggage to overcome to move forward, especially to handle the speed and size of modern data.

Data Science's superpower is uniting Computer Science fundamentals with Statistics to create novel solutions to theoretical and applied problems.

Academic disciplines change "one tombstone at a time". Data Science doesn't have an Old Guard telling the field what the rules are. There is freedom to play everywhere.

November 21, 2016

Word2vec algorithm: Made as simple as possible, but no simpler

Recently, I have given a 3 hour workshop and 30 minute talk on word2vec. I would be delighted to talk about it again. LMK...

November 19, 2016

My life's mission



I'm on a mission to democratize technical data skills (i.e., math, statistics, programming, and data analysis). What today is bundled together as "Data Science".

Science and technology (in the broadest sense possible) are very effective in improving the quality of life for people. When smart people solve hard problems then share their solutions with the world, the world is a better and more interesting place. I hope to help that process by creating more highly skilled people who can both ask and answer the right questions.

I have been on this path for the last 13 years. I have done this as a Lecturer, Researcher, Tutor, Data Scientist, Mentor, Professor, etc. However, my mission transcends those labels of convenience.

Previously, I saw the traditional academic system as the greatest force multiple for myself. I could reach and help change people in a deep way through specific and personal connections. However, the incentives of that system are badly bent (possibly broken). I have turned to a nontraditional academic system, including community organizing and Open Source software. For now, that ecosystem works better to help people learn and grow in "Data Science". In future, there will be even better systems and I'll be there.

May 12, 2014

A couple of old model fitting tricks

I picked up many modeling tricks in graduate school working in a computational neuroscience laboratory. We doing machine learning, but we don't know at the time. We called it "automated model fitting." We used custom (and very finicky) algorithms in MATLAB. Now people are blessed with scikit-learn. These ideas might help out for edge cases:
  • Have good "fake" data. "Fake" data allows for testing of the algorithms. There is an art to dummy data, it should be noisy but allow the models to converge.
  • Linearly transform the data so all dimensions are within the same order of magnitude. Some models have trouble with weighting noise parameters on different scales.

May 5, 2014

My recent failures

I fail all the time. I recently failed at completing several MOOCs. I'm one of the unwashed masses that started but didn't finish.

Even through attempting, I learned something from each course. Given my personal value in life-long learning, improvement is my measure of success. Completion is easier to quantify and a more commonly accepted measure of success. A "factory" model of education values completion. A half completed commodity has little value. A factory stops work on a product and ships it. My professional life is continuous delivery. Any incremental improvement adds value to a continuous delivery system. I picked up new viewpoints on existing concepts that improved my understanding of the world through "failing" at MOOCs. I didn't get credit but I got value.