Pip

pypi.org pip:

“pip is the package installer for Python. You can use pip to install packages from the Python Package Index and other indexes.”

Install an old version

For example to install pandas 0.24.2

 python3 -m pip install --user pandas==0.24.2 

or

pip3 install --user pandas==0.24.2

Control flow

dotnetperls: not python

Object types

type() displays the type of an object.

i = 1
print(type(i))
# <type 'int'>
x = 1.2
print(type(x))
# <type 'float'>
t = (1,2)
print(type(t))
# <type 'tuple'>
l = [1,2]
print(type(l))
# <type 'list'>

List

Remove an element from a list of strings

myList = ['a', 'b', 'c', 'd']
myList.remove('c')
myList
['a', 'b', 'd']

List of tuples

How to flatten a list of tuples

nested_list = [(1, 2, 4), (0, 9)]

Using reduce:

reduce(lambda x,y:x+y, map(list, nested_list))                                                                                                                              
[1, 2, 4, 0, 9]

Using itertools.chain:

import itertools
list(itertools.chain.from_iterable(nested_list))

Using extend:

flat_list = []
for a_tuple in nested_list:
    flat_list.extend(list(a_tuple))                                                                                                                                     
flat_list
[1, 2, 4, 0, 9]

Set

Instances of set provide the following operations:

issubset(other)
set <= other

    Test whether every element in the set is in other.

set < other

    Test whether the set is a proper subset of other, that is, set <= other and set != other.

issuperset(other)
set >= other

    Test whether every element in other is in the set.

set > other

    Test whether the set is a proper superset of other, that is, set >= other and set != other.

union(*others)
set | other | ...

    Return a new set with elements from the set and all others.

intersection(*others)
set & other & ...

    Return a new set with elements common to the set and all others.

difference(*others)
set - other - ...

    Return a new set with elements in the set that are not in the others.

symmetric_difference(other)
set ^ other

    Return a new set with elements in either the set or other but not both.

ipython

Add these options at the ipyhton command line to reload objects automatically while you are coding

%load_ext autoreload   
%autoreload 2         

Debugging in ipython

Once an error occurs at the ipython command line. Press debug then you can move up the stack trace with:

`u`

Move down the stack trace with:

`d` 

Show code context of the error:

`l` 

Show available variable in the current context:

`a` 

Jupyter notebooks

Install Jupyter

To install Jupyter notebooks on python3:

pip3 install jupyter notebook

Then start the notebook server as such:

jupyter notebook

table of content

Using the Table of Content extension jupyter table of content extension

jupyter nbconvert --to markdown mynotebook.ipynb
jupyter nbconvert --to html mynotebook.ipynb

Help in a jupyter notebook

To get help on a function, enter function_name? in a cell. Quick hep can also be obtained by pressing SHIFT + TAB.

Download data from a jupyter notebook

I wrote this csv download function in an SO answer

def csv_download_link(df, csv_file_name, delete_prompt=True):
    """Display a download link to load a data frame as csv from within a Jupyter notebook"""
    df.to_csv(csv_file_name, index=False)
    from IPython.display import FileLink
    display(FileLink(csv_file_name))
    if delete_prompt:
        a = input('Press enter to delete the file after you have downloaded it.')
        import os
        os.remove(csv_file_name)

To get a link to a csv file, enter the above function and the code below in a jupyter notebook cell :

csv_download_link(df, 'df.csv')

How to use notebooks in a revision control system like git

Install https://github.com/mwouts/jupytext

python3 -m pip install --user jupytext

More commands:

python3 -m jupyter notebook --generate-config
vim ~/.jupyter/jupyter_notebook_config.py

Add this line:

c.NotebookApp.contents_manager_class = "jupytext.TextFileContentsManager"

And also this line if you always want to pair notebooks with their markdown counterparts:

c.ContentsManager.default_jupytext_formats = "ipynb,md"

More commands:

python3 -m jupyter nbextension install jupytext --py --user
python3 -m jupyter nbextension enable  jupytext --py --user

Add syncing to a given notebook:

# Markdown sync
jupytext --set-formats ipynb,md --sync ~/repos/example_repos/notebooks/test.ipynb
# Python sync
jupytext --set-formats ipynb,py --sync ~/repos/example_repos/notebooks/test.ipynb

Functions

Functions in python can be defined with

def add_one(x):
    return x + 1
add_one(1)

# 2

Decorators

Decorators are a way to wrap a function around another function. It is useful to repeat a pattern of behaviour around a function.

I have used decorators to cache the function output along a data processing pipeline.

Math

How to do maths in python 3 with operators

2 to the power of 3

2**3
# 8

Floor division

5//3
# 1

Modulo

5%3
# 2

Numpy

All examples below are based on the numpy package being imported as np :

import numpy as np

Random vector or matrices

np.random.random([2,3])
array([[0.8808173 , 0.95042477, 0.3919726 ],
       [0.34651036, 0.34365034, 0.94717281]])

Matrix of zeroes

np.zeros([2,2])
#array([[0., 0.],
#       [0., 0.]])

Create a vector

a = np.array([1,2,3])

Create a matrix

b = np.array([[1,2,3],[5,6,6]])

Shape

a.shape
# (3,)
b.shape
# (2, 3)

Transpose

b.transpose()
# array([[1, 5],
#        [2, 6],
#        [3, 6]])
c = b.transpose()

Matrix multiplication

np.matmul(a,c)
# array([14, 35])

Note otherwise the simple multiplication is an element wise multiplication.

b * b
# array([[ 1,  4,  9],
#        [25, 36, 36]])

Math functions in numpy:

np.cos()
np.sin()
np.tan()
np.exp()

min and max

x = np.array([1,2,3,4,5,-7,10,-8])
x.max()
# 10
x.min()
# -8

Linear algebra with numpy.linalg For example the norm of a matrix or vector:

np.linalg.norm(x)
# 16.3707055437449
np.linalg.norm(np.array([3,4]))
# 5.0
np.linalg.norm(a)
# 3.7416573867739413

Objects

Inheritance and composition

Example

###############################################################################
class Vehicle(object):

    def __init__(self, color, speed_max, garage=None):
        # Default attributes #
        self.color = color
        self.speed_max = speed_max
        self.garage = garage

    def paint(self, new_color):
        # Default attributes #
        self.color = new_color

    def go_back_home(self, new_color):
        # Default attributes #
        self.position = self.go_to(self.parent.location)

###############################################################################
class Car(Vehicle):

    def open_door(self):
        pass

###############################################################################
class Boat(Vehicle):

    def open_balast(self):
        pass

###############################################################################
class Garage(object):

    def __init__(self, all_vehicles):
        self.all_vehicles = all_vehicles

    def mass_paint(self, new_color):
        # Default attributes #
        for v in self.all_vehicles: v.paint(new_color)

    def build_car(self, color):
        new_car = Car(color, 90, self)
        self.all_vehicles.append(new_car)
        return new_car

    @property
    def location(self):
        return '10, 18'

###############################################################################
###############################################################################
###############################################################################
honda = Car('bleu', 60)
gorgeoote = Boat('rouge', 30)
honda.paint('purple')

mike = Garage([honda, gorgeoote])

mike.mass_paint()

sport_car = mike.build_car('rouge')

Pandas data frames

Creating a pandas data frame

You can create a data frame by passing a dictionary :

import pandas 
pandas.DataFrame({'a':range(0,3),'b':range(3,6)})
       a  b
    0  0  3
    1  1  4
    2  2  5

Joining and merging

Stackoverflow Pandas mergin

Reshaping

The Pandas user guide on reshaping gives several example using melt (easier to rename the “variable” and “value” columns) or stack (designed to work together with MultiIndex objects).

Replacement

Pyton pandas equivalent for replace

import pandas
s = pandas.Series(["ape", "monkey", "seagull"])
s.replace(["ape", "monkey"], ["lion", "panda"])
s.replace("a", "x", regex=True)
`s.replace({"ape": "lion", "monkey": "panda"})`
pandas.Series(["bla", "bla"]).replace("a","i",regex=True)

Selection with loc, iloc, query and isin

loc

.loc is primarily label based, but may also be used with a boolean array.

I copied the examples below from the pandas loc documentation at: pandas.DataFrame.loc

Create an example data frame

import pandas
df = pandas.DataFrame([[1, 2], [4, 5], [7, 8]],
                      index=['cobra', 'viper', 'sidewinder'],
                      columns=['max_speed', 'shield'])

List of index labels

In :  df.loc[['viper', 'sidewinder']]
Out:
                    max_speed  shield
        viper               4       5
        sidewinder          7       8

Slice with labels for row and labels for columns.

In :  df.loc['cobra':'viper', 'max_speed':'shield']
Out:
               max_speed  shield
        cobra          1       2
        viper          4       5

Conditional that returns a boolean Series

In :  df.loc[df['shield'] > 6]
Out:
                     max_speed  shield
         sidewinder          7       8

Set value for all items matching the list of labels

In : df.loc[['viper', 'sidewinder'], ['shield']] = 50

In : df
Out:
                    max_speed  shield
        cobra               1       2
        viper               4      50
        sidewinder          7      50

Another example using integers for the index

In : df = pandas.DataFrame([[1, 2], [4, 5], [7, 8]],
                           index=[7, 8, 9],
                           columns=['max_speed', 'shield'])

Slice with integer labels for rows. Note that both the start and stop of the slice are included.

In :  df.loc[8:9]
Out:
      max_speed  shield
   8          4       5
   9          7       8

iloc

.iloc is primarily integer position based (from 0 to length -1 of the axis), but may also be used with a boolean array.

Create a sample data frame:

In : example = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
                {'a': 100, 'b': 200, 'c': 300, 'd': 400},
                {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
      df = pandas.DataFrame(example)

In : df
Out: 
            a     b     c     d
      0     1     2     3     4
      1   100   200   300   400
      2  1000  2000  3000  4000

Index with a slice object. Note that it doesn’t include the upper bound.

In :  df.iloc[0:2]
Out: 
          a    b    c    d
     0    1    2    3    4
     1  100  200  300  400

With lists of integers.

In : df.iloc[[0, 2], [1, 3]]
Out: 
            b     d
      0     2     4
      2  2000  4000

With slice objects.

In : df.iloc[1:3, 0:3]
Out: 
            a     b     c
      1   100   200   300
      2  1000  2000  3000

With a boolean array whose length matches the columns.

In : df.iloc[:, [True, False, True, False]]
Out: 
            a     c
      0     1     3
      1   100   300
      2  1000  3000

Query

Query the columns of a Data Frame with a boolean expression.

df = pandas.DataFrame({'A': range(1, 6),
                       'B': range(10, 0, -2),
                       'C': range(10, 5, -1)})
df.query("A > B")

A  B  C
5  2  6

Query using a variable

limit = 3
df.query("A > @limit")

A  B  C
4  4  7
5  2  6

isin

Use alist of values to select rows

df = pandas.DataFrame({'A': [5,6,3,4], 'B': [1,2,3,5]})
df[df['A'].isin([3, 6])]
df.loc[df['A'].isin([3, 6])]
df.query("A in [3,6]")

String operations

Concatenate all values in a character vector:

df['A'].str.cat()

Replace elements in a character vector:

df['A'].replace('a','b',regex=True)

Difference between 2 data frames

Two methods Using merge:

merged = df1.merge(df2, indicator=True, how='outer')
merged[merged['_merge'] == 'right_only']

Using drop_duplicates

newdf=pd.concat[df1,df2].drop_duplicates(keep=False)

Plot

Matplotlib

All matplotlib examples require teh following imports:

%matplotlib inline
from matplotlib import pyplot
pyplot.style.use('seaborn-whitegrid')
import numpy

Simple plot changing the figure size and the axes limit with pyplot

pyplot.rcParams['figure.figsize'] = [10, 10]
fig = pyplot.figure()
ax = pyplot.axes()
x = numpy.linspace(-1.5, 1.5, 1000)
ax.plot(x, 1-3*x)
ax.set_xlim(-6, 6)
ax.set_ylim(-6, 6)

Seaborn

All seaborn examples below require the following imports and datasets:

import seaborn
iris = seaborn.load_dataset("iris")
tips = seaborn.load_dataset("tips") 
from matplotlib import pyplot

Axes limit

Set limits on the one axis in a Seaborn plot:

p = seaborn.scatterplot('petal_length','petal_width','species',data=iris)
p.set(ylim=(-2,None))

In Seaborn facet grid. How to set xlim and ylim in seaborn facet grid

g = seaborn.FacetGrid(tips, col="time", row="smoker")
g = g.map(pyplot.hist, "total_bill")
g.set(ylim=(0, None)) 

Title

Use set_title to add a title:

(seaborn
 .scatterplot(x="total_bill", y="tip", data=tips)
 .set_title('Progression of tips along the bill amount')
)

Set a common title for grid plots

g = seaborn.FacetGrid(tips, col="time", row="smoker")
g = g.map(pyplot.hist, "total_bill")
g.fig.suptitle('I don't smoke and I don't tip.')

In case the title is overwritten on the subplots, you might need to use fig.subplot_adjust() as such:

g.fig.subplots_adjust(top=.95)

Figure size

set_figwidth and set_figheight work well for grid objects.

g = seaborn.FacetGrid(tips, col="time", row="smoker")
g = g.map(pyplot.hist, "total_bill")
g.fig.set_figwidth(10)
g.fig.set_figheight(10) 

Mentioned as a comment under this answer

R and python

See also the section above on pandas data frame / comparison with R.

Migrating from R to python Python is a full fledge programming language but it is missing statistical and plotting libraries. Vectors are an after thought in python most functionality can be reproduced using operator overloading, but some functionality looks clumsy.

Pandas comparison with R

“Tidyverse allows a mix of quoted and unquoted references to variable names. In my (in)experience, the convenience this brings is accompanied by equal consternation. It seems to me a lot of the problems solved by tidyeval would not exist if all variables were quoted all the time, as in pandas, but there are likely deeper truths I’m missing here…”

R data frame to be used for examples:

df = data.frame(x = 1:3, y = c('a','b','c'), stringsAsFactors = FALSE)

Pandas data frame to be used for examples:

`df = pandas.DataFrame({'x' : [1,2,3], 'y' : ['a','b','c']})  `
Base R pandas SO questions
gsub replace(regex=True) gsub in pandas
df[df$y %in% c('a','b'),] df[df['y'].isin(['a','b'])] list of values to select a row
length(df) and dim(df) df.shape row count of a data frame
rbind pandas.concat Pandas version of rbind
dput(df) df.to_dict() Print pandas data frame for reproducible example

The mapping of tidyverse to pandas is:

tidyverse pandas SO questions
mutate assign
select filter
rename rename
filter query
arrange sort_values
group_by groupby
summarize agg
gather melt
spread pivot
unnest explode unnest in pandas

Methods to use inside the .groupby().agg() method:

String

SO answer providing various ways to concatenate python strings.

Statistics

Scaling

Feature scaling with scikit learn

  • StandardScaler
  • MinMaxScaler
  • RobustScaler
  • Normalizer

Web Frameworks

Flask vs. Django

Note: Flask Evolution into Quart to support asyncio This last link contains a nice, simple example of how asyncio works with a simulated delay to fetch a web page.