Emacs for Data Science

Robert Vesco

June 18, 2015

Robert Vesco
Data Scientist, Bloomberg LP
Insight Fellow, 2015
University of Maryland
Robert Vesco is an alumnus from the January 2015 session of Insight in New York City. He recently received his Ph.D. in Management from the University of Maryland. In the following post, which originally appeared on his personal blog, Robert discusses emacs as a tool for data scientists. Robert is now a data scientist at Bloomberg LP.

If you want an editor that works with R, python, SAS, Stata, SQL and almost any other data science language, if you want an editor with IDE-like features, if you want an editor that works on any platform and as well as on the terminal, if you're a fan of literate programming, or if you want an editor that is highly customizable and will be around after most editors have come and gone, then you'd be hard pressed to find anything better than emacs.

Each programming language has a text editor or IDE that is well suited for that language. If you work in exclusively in R, you might want to work in Rstudio. If you work in python, you might be tempted by Spyder. Chances are there is a specialized IDE for whatever language you typically work in. But that's the rub. What if you want to work in another language? Or combine languages? You end up using several IDEs, but not knowing them well. Plus, once they fall out of favor or stop being updated, your hard-gained knowledge is lost. At the other end of the spectrum there are text editors like notepad++ and sublime. These work with just about any language you can imagine and with some add-ons you can get additional features, but they tend to be limited to certain platforms and customization is often non-trivial.

A modern data scientist often has to work on multiple platforms with multiple languages. Some projects may be in R, others in Python. Or perhaps you have to work on a cluster with no gui. Or maybe you need to write papers with latex. You can do all that with Emacs and customize it to do whatever you like. I won't lie though. The learning curve can be steep, but I think the investment is worth it.

Below are some key features that I think make Emacs an excellent editor for any data scientist.

IDE-like features

For most programming languages, you get out-of-the-box syntax highlighting. Packages like ESS and Elpy provide additional features like autocompletion, documentation and debugging capabilities. The number of IDE features available will vary by language, but at minimum there is probably syntax highlighting and some form of autocompletion.

Figure 1: "Autocompletion"



One of the things that I enjoy is easy access to help and function parameters … which often also come with autocomplete.

Figure 2: "Help for Functions"



Figure 3: "Parameter help for Function"



Enough with the print statements already and debug that R and python code!

Figure 4: "Interactive debugging with conditional breakpoint"



One of the features that first sold me on Emacs was interactive commands. With a keyboard short cut you can send a buffer, function, paragraph or line to the interpreter. Let me be clear – you don't even have to highlight the code. This saves you a ton of time when you're doing statistical analysis1.

Figure 5: "Interactive Commands"



SQL Too

Do you work with databases? Many of the same benefits mentioned above also apply to sql. Work with sqlite, postgresql, mysql and other databases interactively. Do you have a long SQL statement you are debugging? No problem. Iterate quickly.

Figure 6: "Interactive SQL"



Org mode / Literate Programming

Do you write publications? Do you want to keep your code and paper together? You a believer in reproducible research? With emacs you can put any language you want in your document. While Rstudio allows this also, you're limited to just R and latex.

Figure 7: "Literate Programming: Code & Stata"



do you need latex? No problem.

    #+BEGIN_LaTeX
    \frac{3}{4}
    #+END_LaTeX

They key to this magic is a monster package called org mode. It is one of emac's killer features. You can also use this to organize your code … or your life.

$$\frac{3}{4}$$

Terminal/remote editing

Sometime you need to remote into a server. Or perhaps you are working on a cluster with no gui and you need to interactively debug your scripts.

Figure 8: "Works in the terminal just as well"



Interacting with the shell

Is there are terminal command you wish you could run? In emacs you can run terminal commands easily. But what makes this feature super cool is that it can operate on your text. You can select a region of code, send it to a terminal command and have that stdout replace the text in your buffer!

Figure 9: "Using SED to find and replace text in the buffer"



Rectangle Editing

Data scientists often work with tabular data. Sometimes you may want to delete or move a column around. Or perhaps there is a block of white space you need to change.

Figure 10: "Using rectangle mode to alter blocks of text"



Everything at your finger tips

Emacs has numerous packages that allow you to search and find files, functions and anything else that you can imagine. But by far the best is helm. With just a few keys you can instantly find what you are looking for. I couldn't do it justice, but this demo gives you a taste for the amazing things it can do.

http://tuhdo.github.io/helm-intro.html

Any feature you want

Perhaps you're wedded to sublime's multiple cursors? You can get it: http://emacsrocks.com/. Or perhaps you're a long time vim user? Evil Mode gives you the editing power of Vim with the utility of emacs. If you're a git user, Emac's has magit, which makes working with git a joy. If there is something that it doesn't have, check for pacakges, else emacs is the most customizable editor you will find. Almost everything about it can be made to work your particular work flow.

30+ years old and a large user base

Emacs has been around a long time. Code that was written a decade ago mostly still works. And every year it's getting better. However, emacs 24 is amazing. If you tried emacs years ago, you should give it another try. It now has package management built in, so you can easily add testing packages. Importantly, there is no sign that emacs is going away anytime soon and it's free. It will likely be around for at least another decade if not more.

So what are the downsides?

Legacy code on the intertubes confuses people

Emacs has been around a long time. Emacs 24 was a huge improvement, but it also broke a lot of things. Same goes for Org-mode between versions 7 and 8. A lot of stuff on the intertubes will lead you astray and frustrated if you're not aware.

Emacs-lisp for customization

I actually enjoy working with lisp because it is so different from other languages I work with. However, many others would prefer using a language like python.

Not noob friendly

Emacs is not for the faint of heart. Depending on where you install it, you may have no gui to guide you at all. And even if you do, it's likely to be spartan. Moreover, while it can be customized quickly to meet your needs, people who are starting off with Emacs may fail to see its appeal.

To help ease this process though, there are several starter packages to enable useful features out-of-the-box. For scientists, Kieran Healy's starter package might be useful: http://kieranhealy.org/resources/emacs-starter-kit/

Another useful package is prelude: https://github.com/bbatsov/prelude

If you're on a Mac, I've heard Aquamacs will keep you warm and comfy: http://aquamacs.org/

Most of these will give you the power of Emacs, quickly. Personally, I prefer to build my Emacs up by scratch so it does what I want it to do and no more. But these packages are great ways to get a feel for its power.

Multiple packages

For data scientists, Emacs comes with many tools out of the box, but there are a variety of packages that are focused on specific languages. For those working in R, Stata, Julia or SAS http://ess.r-project.org/ is essential. It provides a whole framework for working with statistical applications.

Unfortunately, if you decide you want to work with Python or Scala be prepared to experiment with several different packages.

For instance, while Emacs has basic Python support, you probably want linting, refactoring or other useful features. Many packages have tried to implement these features, some better than others. Personally, I like elpy, https://github.com/jorgenschaefer/elpy, but it's not perfect. For Ipython, there is the Ipython Notebook for Emacs, https://github.com/millejoh/emacs-ipython-notebook/ or Ipython for Org-mode, https://github.com/gregsexton/ob-ipython.

So while there is likely a package, or several, for any language you want, the downside of options is that you have to wade through them. It can be painful sometimes.

What am I missing?

While I tried to include most of the features that I think would appeal to data scientists, let me know if I missed any killer feature and I'll try to include it here. https://twitter.com/robertvesco

Footnotes:

1

Like many other features this will depend on the package you install. That said, it's easy to implement this feature for your favorite language

Interested in transitioning to career in data science?
Find out more about the Insight Data Science Fellows Program in New York and Silicon Valley, apply today, or sign up for program updates.

Already a data scientist or engineer?
Find out more about our Advanced Workshops for Data Professionals. Register for two-day workshops in Apache Spark and Data Visualization, or sign up for workshop updates.


Share