Vim for datascience

Reading time ~3 minutes

There are plenty of tutorials here and there to have Python and vim interact beautifully. The point of this one is to provide some simple lines to add to you .vimrc file without to worry too much about installing more (vim) packages. Having myself struggled to implement this line, I will provide some explanations about the meaning of each of them.

If you have more tricks for your common datascience tasks in vim, let me know, I will add them!

Introduction

Summary

Here are the thing you can do with the following settings:

  • Associate the common import to keypresses,
  • Preview the contents of a csv file in a vim pane,
  • Format JSON files with jq or python files with autopep8,
  • Quickly add print() statements,
  • Fix the usual copy paste into vim (bonus).

If you are familiar with vim, you will know that you can do pretty much everything with a sequence of keypresses. Recording this keypresses and mapping them to another key just factors everything you want to do ;)

Requirements

Python packages: pandas, autopep8, numpy Packages: jq.

Data preview

The function in action

I start with the hardest but most satisfying command:

autocmd FileType python map <C-F9> va"y:!python -c "import pandas as pd; df=pd.read_csv('<C-R>"', nrows=5); print(df)" > tmp.tmp<CR>:sv tmp.tmp<CR>:resize 8<CR>

It will show the first five lines of the .csv file in the quotes surrounding the cursor in a new vim pane.

Details

autocmd FileType python is just saying that the mapping which follows will only apply to python files. This avoids accidental application to other languages.

map <C-F9> means map Ctrl + F9 to the following sequence of keypresses

va"y is a way to tell vim :

  • select v

  • around a

  • quotes "

  • copy y (to register)

:! allows to execute vim commands in your actual terminal

python -c "import pandas as pd; df=pd.read_csv('<C-R>"', nrows=5); print(df)" now we are doing one line python, the only trick here is the <C-R> which refers to vim clipboard (or register), so what we stored when “pressing” va"y.

> tmp.tmp<CR>:sv tmp.tmp<CR>:resize 8<CR> outputs the Python print statement to a tmp file (tmp.tmp) which in turn is opened by vim (with :sv)

Beautifying files

Python

This one needs autopep8 installed. Otherwise, it will just remove everything in the file you are editing…

autocmd FileType python map <F4> :!autopep8 --in-place --aggressive %<CR>

It will format your Python scripts using the autopep8 guidelines.

JSON

This one needs to have jq installed. It is a tool to manipulate JSON files easily and I strongly recommend using it.

autocmd FileType json map <F4> :%! jq .<CR>

Upon pressing <F4> it will ident your file beautifully.

Python

Execution

If I want to execute quickly the script I am working on, these two lines enable to do it (whether I am in visual or edit mode)

autocmd FileType python map <F5> :wall!<CR>:!python %<CR>
autocmd FileType python imap <F5> <Esc>:wall!<CR>:!python %<CR>

It is ideal when you are used to test your classes like this:

from collections import defaultdict

class MarkovLikelihood:

    def __init__(self, alpha):
        self.alpha_ = alpha
        self.transition_counts_ = defaultdict(lambda: 0)
        self.start_counts = defaultdict(lambda: 1)

    def fit(self, sentences):
        for sentence in sentences:
            self.update_(sentence)
        return self
    
    def update_(self, sentence):
        words = sentence.split(' ')
        for w1, w2 in self.pairwise_(words):
            self.transition_counts_[f"{w1}_{w2}"] += 1
            self.start_counts[w1] += 1

    def pairwise_(self, iterable):
        a = iter(iterable)
        return zip(a, a)

    def predict(self, sentence):
        res = 1
        words = sentence.split(' ')
        n = len(words)
        for w1, w2 in self.pairwise_(words):
            res *= (self.transition_counts_[f"{w1}_{w2}"] + self.alpha_) / self.start_counts[w1]

        return res
    
if __name__ == "__main__":


    ml = MarkovLikelihood(0.5)
    sentences = [ 
        "I ate dinner.",
        "We had a three-course meal.",
        "Brad came to dinner with us.",
        "He loves fish tacos.",
        "In the end, we all felt like we ate too much.",
        "We all agreed; it was a magnificent evening."]

    ml.fit(sentences)

    res = ml.predict("dinner with tacos")
    print(res)
    res = ml.predict("I love tennis")
    print(res)

Imports

The following two lines allow to have the most common imports with a couple of keypresses:

autocmd FileType python map <C-F10> ggiimport pandas as pd<CR>import numpy as np<CR>np.random.seed(0)<CR><Esc>
autocmd FileType python map <C-F11> ggiimport matplotlib.pyplot as plt<CR>import seaborn as sns<CR><Esc>

Will add the following to the Python file you are working on. Note that gg makes sure to place the cursor at the top of the file first.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(0)

Quick print

autocmd FileType python map <C-p> viwyoprint(<Esc>pA)<Esc>

if your cursor is on a word (my_variable), it will simply append a print(my_variable) statement below the current line. Useful for debugging.

Fixing copy and paste

" Automatically set paste mode in Vim when pasting in insert mode
" https://coderwall.com/p/if9mda/automatically-set-paste-mode-in-vim-when-pasting-in-insert-mode
let &t_SI .= "\<Esc>[?2004h"
let &t_EI .= "\<Esc>[?2004l"

inoremap <special> <expr> <Esc>[200~ XTermPasteBegin()

function! XTermPasteBegin()
  set pastetoggle=<Esc>[201~
  set paste
    return ""
endfunction

Random number generation in Cython

## ProblemIn one of my programs, I had to perform (a lot of) random sampling from Python lists. So much that it ended up being my bottlen...… Continue reading

Random Greedy Forest tutorial

Published on January 09, 2022

Does gradient boosting overfit

Published on December 12, 2021