Python Basics

General information on using Python for Data Science and Machine Learning

Python Basics

Based on this Cognitive Class Course

Labs

Jupyter Notebooks with Examples on these can be found in the labs folder

The Labs are from this Cognitive Class Course and are under the MIT License

Types

Hello World

We can simply print out a string in Python as follows

print('Hello, World!')

Python Version

We can check our version as follows

import sys
print(sys.version)

The sys module is a built-in module that has many system specific parameters and functions

Comments

Comments can be done by using the #

# Python comments

Docstrings

Python also allows for use of docstrings which can appear immediately after a function, class definition, or at the top of a module, these are done as follows

def hello():
    '''
    This function will say hello
    It also takes no input arguments
    '''
    return 'Hello'
hello()

Also note that Python uses ' and " to mean the same thing

Types of Objects

Python is object oriented, and dynamically typed. We can get the type of a value in python with the type function

type(12) # int
type(2.14) # float
type("Hello") # str
type(True) # bool
type(False) # bool

We can get information about a type using the sys object properties, for example

sys.float_info

Type Conversion

We can use the following to convert between types

float(2)
int(1.1)
int('1')
str(1)
str(1.1)
int(True) # 1
int(False) # 0
float(True) # 1.0
bool(1) # True

Expressions

Expressions in python can include integers, floats, and strings, depending on the operation

We can do the following

1 + 2 # addition
1 - 2 # subtraction
1 / 2 # division
1 // 2 # integer division

Integer division will round off to the nearest integer

It is also helpful to note that Python will obey BODMAS

Variables

Variables can simply be assigned without being defined first, and are dynamically types

x = 2
y = x / 2

x = 2 + 4
x = 'Hello'

In a notebook we can simply evaluate the value of a variable or expression by placing it as the last line of a cell

Strings

Defining Strings

Strings can be defined with either ' or ", and can be a combination of any characters

'Hello World'
'H3110 Wor!$'
"Hello World"

Indexing

Strings are simply an ordered sequence of characters, we can index these as any other array with [] as follows

name = 'John'
name[0] # J
name[3] # n

We can also index negatively as follows

name = 'John'
name[-1] # n
name[-4] # J

Length

We can get the length of a string with len()

len(name) # 4

Slicing

We can slice strings as follows

name = 'John Smith'
name[0:4] # John
name[5:7] # Sm

Or generally as

string[start:end]

Stride

We can also input the stride, which will select every nth value within a certain range

string[::stride]
string[start:stop:stride]

For example

name[::3] # Jnmh
name[0:4:2] # Jh

Concatenation

We can concatenate strings as follows

text = 'Hello'
text + text # HelloHello
text * 3 # HelloHelloHello

Escape Characters

At times we may need to escape some characters in a Python string, these are as follows

Character Escape
newline <NEW LINE>
\ \
' \'
" \"
ASCII Bell \a
ASCII Backspace \b
ASCII FF \f
ASCII LF \n
ASCII CR \r
ASCII Tab \t
ASCII VT \v
Octal Character \ooo
Hex Character \xhh

We can also do multi line strings with the """ or '''

If we have a string that would otherwise need escaping, we can use a string literal as follows

text = r'\%\n\n\t'
text # '\%\n\n\t'

String Operations

We have a variety of string operations such as

text = 'Hello;
text.upper() # HELLO
text.lower() # hello
text.replace('Hel', 'Je') # Jello
text.find('l') # 2
text.find('ell') # 1
text.find('asnfoan') # -1

Tuples

Define

A tuple is a way for us to store data of different types, this can be done simply as follows

my_tuple = ('Hello', 3, 0.14)
type(my_tuple) # tuple

A key thing about tuples is that they are immutable. We can reassign the entire tuple, but not change its values

Indexing

We can index a tuple the same way as a string or list using positive or negative indexing

my_tuple[1] # 3
my_tuple[-2] # 3

Concatenation

We can also concatenate tuples

my_tuple += ('pies', 'are', 3.14)
my_tuple # ('Hello', 3, 0.14, 'pies', 'are', 3.14)

Slice and Stride

We can slice and stride as usual with

my_tuple[start:end]
my_tuple[::2]
my_tuple[0:4:2]

Sorting

We can sort a tuple with the sorted function

sorted(tuple)

The sorted function will return a list

Nesting

Since tuples can hold anything, they can also hold tuples

my_tuple = ('hello', 4)
my_tuple2 = (my_tuple, 'bye')

We can access elements of tuples with double indexing as follows

my_tuple2[0][1] # 4

Lists

Defining

A list is an easy way for us to store data of any form, such as numbers, strings, tuples, and lists

Lists are mutable and have many operations that enable us to work with them more easily

my_list = [1,2,3,'Hello']

Indexing

Lists can also be indexed using the usual method both negatively and positively

my_list[1] # 2
my_list[-1] # Hello

Operations

Slice and Stride

my_list[start:end] # slicing
my_list[::stride]
my_list[start:end:stride]

Extend

Extend will add each object to the end of the list

my_list = [1,2]
my_list.extend([item1, item2])
my_list # [1, 2, item1, item2]

Append

Append will add the input as a single object to the last value of the list

my_list = [1,2]
my_list.append([item1, item2])
my_list # [1, 2, [item1, item2]]

Modify an element

List elements can be modified by referencing the index

my_list = [1,2]
my_list[1] = 3
my_list # [1,3]

Delete an Element

my_list = [1,2,3]
del(my_list[1])
my_list # [1,3]

We can delete elements by index as well

String Splitting

We can split a string into a list as follows

my_list = 'hello'.split()
my_list # [h,e,l,l,o]

my_list = 'hello, world, !'.split(',')
my_list # ['hello', 'world', '!']

Cloning

Lists are stored by reference in Python, if we want to clone a list we can do it as follows

new_list = my_list[:]

Sets

A set is a unique collection of objets in Python, sets will automatically remove duplicate items

Defining a Set

my_set = {1, 2, 3, 1, 2}
my_set # {1, 2, 3}

Set Operations

Set from a List

We can create a set from a list with the set function

my_set = set(my_list)

Add Element

We can add elements to a set with

my_set.add("New Element")

If the element already exists nothing will happen

Remove Element

We can remove an element from a set with

my_set.remove("New Element")

Check if Element is in Set

We can check if an element is in a set by using in which will return a bool

"New Element" in my_set # False

Set Logic

When using sets we can compare them with one another

Intersection

We can find the intersection between sets with & or with the intersection function

set_1 & set_2
set_1.intersection(set_2)

Difference

We can fin d the difference in a specific set relative to another set with

set_1.difference(set_2)

Which will give us the elements that set_1 has that set_2 does not

Union

We can get the union of two sets with

set_1.union(set_2)

Superset

We can check if one set is a superset of another with

set_1.issuperset(set_2)

Subset

We can check if one set is a subset of another with

set_1.isSubset(set_2)

Dictionaries

Dictionaries are like lists, but store data by a key instead of an index

Keys can be strings, numbers, or any immutable object such as a tuple

Defining

We can define a dictionary as a set of key-value pairs

my_dictionary = {"key1": 1, "key2": "2", "key3": [3, 3, 3], "key4": (4, 4, 4), ('key5'): 5, (0, 1): 6, 92: 'hello'}

Accessing a Value

We can access a value by using its key, such as

my_dictionary['key1'] # 1
my_dictionary[(0,1)] # 6
my_dictionary[5] # 'hello'

Get All Keys

We can get all the keys in a dictionary as follows

my_dictionary.keys()

Append a Key

Key-value pairs can be added to a dictionary as follows

my_dictionary['New Key'] = new_value

Delete an Entry

We can delete an entry by key using

del('New Key)

Verify that Key is in Dictionary

We can use the in operator to check if a key exists in a dictionary

'My Key' in my_dictionary

Conditions and Branching

Comparison Operators

We have a few different comparison operators which will produce a boolean based on their condition

Operation Operator i = 1
equal == i == 1
not equal != i != 0
greater than > i > 0
less than < i < 2
greater than or equal >= i >= 0 and i >= 1
less than or equal <= i <= 2 and i <= 1

Logical Operators

Python has the following logical operators

Operation Operator i = 1
and and i == 1 and i < 2
or or i == 1 or i == 2
not not not(i != 0)

String Comparison

When checking for equality Python will check if the strings are the same

'hello' != 'bye' # True

Comparing strings is based on the ASCII Code for the string, for example 'B' > 'A' because the ASCII Code for B is 102 and A is 101

When comparing strings like this the comparison will be done in order of the characters in the string

Branching

Branching allows us to run different statements depending on a condition

If

The if statement will only run the code that forms part of its block if the condition is true

i = 0
if i == 0:
  print('Hello')

If-Else

An if-else can be done as follows

i = 0
if i == 1:
  print('Hello')
else:
  print('Bye')

Elif

If we want to have multiple if conditions, but only have the first one that is true be executed we can do

i = 0
if i == 1:
  print('Hello')
elif i == 0:
  print('Hello World')
elif i > 1:
  print('Hello World!!')
else:
  print('Bye')

Loops

For Loops

A for loop in Python iterates through a list and executes its internal code block

loop_vals = [1,6,2,9]
for i in loop_vals:
  print(i)
#1 6 2 9

Range

If we want to iterate through the values without using a predefined list, we can use the range function to generate a list of values for us to to iterate through

The range function works as follows

ran = range([start,], stop, [,step])
ran # [start, start + step, start + 2*step, ... , stop -1]

The range function only requires the stop value, the other two are optional,the stop value is not inclusive

range(5) # [0,1,2,3,4]
range(5, 10) # [5,6,7,8,9]
range(5, 10, 2) # [5,7,9]

Using this we can iterate through the values of our array as follows

loop_vals = [1,6,2,9]
for i in range(len(loop_vals)):
  print(loop_vals[i])

While Loops

While loops will continue until the stop condition is no longer true

i = 0
while (i < 10):
  print(i)
  i ++
# 0 1 3 4 5 6 7 8 9

Functions

Defining

Functions in Python are defined and called as follows

def hello():
  print('Hello')

hello() # Hello

We can have arguments in our function

def my_print(arg1, arg2):
  print(arg1, arg2)

my_print('Hello', 'World') # Hello World

Functions can also return values

def my_sum(val1, val1):
  answer = val1 + val2
  return answer

my_sum(1,2) # 3

A function can also have a variable number of arguments such as

def sum_all(*vals):
  return sum(vals)

sum_all(1,2,3) # 6

The vals object will be taken in as a tuple

Function input arguments can also have default values as follows

def has_default(arg1 = 4):
  print(arg1)

has_default() # 4
has_default(5) # 5

Or with multiple arguments

def has_defaults(arg1, arg2 = 4):
  print(arg1, arg2)

has_defaults(5) # 5 4
has_defaults(5,6) # 5 6

Help

We can get help about a function by calling the help function

help(print)

Will give us help about the print function

Scope

Functions have access to variables that are globally defined, as well as their own local scope. Locally defined variables are not accessible from outside the function unless we declare it as global as follows

def make_global():
  global global_var = 5

make_global()
global_var # 5

Note that the global_var will not be defined until our function is at least called once

Objects and Classes

Defining a Class

We can define a class Circle which has a constructor, a radius and a colour as well as a function to increase its radius and to plot the Circle

We make use of matplotlib to plot our circle here

import matplotlib.pyplot as plt
%matplotlib inline  

class Circle(object):
  
  def __init__(self, radius=3, color='blue'):
    self.radius = radius
    self.color = color
  
  def add_radius(self, r)
    self.radius += r
    return(self.radius)

  def draw_circle(self):      
    plt.gca().add_patch(plt.Circle((0, 0), radius=self.radius, fc=self.color))
    plt.axis('scaled')
    plt.show()  

Instantiating an Object

We can create a new Circle object by using the classes constructor

red_circle = Circle(10, 'red')

Interacting with our Object

We can use the dir function to get a list of all the methods on an object, many of which are defined by Python already

dir(red_circle)

We can get our object's property values by simply referring to them

red_circle.color # red
red_circle.radius # 10

We can also manually change the object's properties with

red_circle.color = 'pink'

We can call our object's functions the same way

red_circle.add_radius(10) # 20
red_circle.radius # 20

The red_circle can be plotted by calling the draw_circle function

Reading Files

Note that the preferred method for reading files is using with

Open

We can use the built-in open function to read a file which will provide us with a File object

example1 = '/data/test.txt'
file1 = open(example1,'r')

The 'r' sets open to read mode, for write mode we can use 'w', and 'a' for append mode

Properties

File objects have some properties such as

file1.name
file1.mode

Read

We can read the file contents to a string with the following

file_content = file1.read()

Close

Lastly we need to close our File object with

file1.close

We can verify that the file is closed with

file1.closed # True

With

A better way to read files is by using using the with statement which will automatically close the file, even if we encounter an exception

with open(example1) as file1:
  file_content = file1.read()

We can also read the file in by pieces either based on characters or on lines

Read File by Characters

We can read the first four characters with

with open(example1,'r') as file1:
  content = file1.read(4)

Note that this will still continue to parse the file, and not start over each time we call read(), so we can read the first seven characters is like so

with open(example1,'r') as file1:
  content = file1.read(4)
  content += file1.read(3)

Read File by Lines

Our File object looks a lot like a list with each line a new element in the list

We can read our file by lines as follows

with open(example1,'r') as file1:
  content = file1.readline()

We can read each line of our file into a list with the readline function like so

content = []
with open(example1,'r') as file1:
  for line in file1:
    content.append(line)

Or with the readlines function like so

with open(example1, 'r') as file1:
  content = file1.readlines()

Writing Files

We can also make use of open to write content to a file as follows

out_path = 'data/output.txt'
with open(out_path, 'w') as out_file:
  out_file.write('content')

The write function works the same as the read function in that each time we call it, it will just write a single line to the file, if we want to write multiple lines to our file w need to do this as follows

content = ['Line 1 content', 'Line 2 content', 'Line 3 content']
with open(out_path, 'w') as out_file:
  for line in content:
    out_file.write(line)

Copy a File

We can copy data from one file to another by simultaneously reading and writing between the files

with open('readfile.txt','r') as readfile:
  with open('newfile.txt','w') as writefile:
    for line in readfile:
      writefile.write(line)

Pandas

Pandas is a library that is useful for working with data as a DataFrame in Python

Importing Pandas

The Pandas library will need to be installed and then imported into our notebook as

import pandas as pd

Creating a DataFrame

We can create a new DataFrame in Pandas as follows

df = pd.DataFrame({'Name':['John','Jack','Smith','Jenny','Maria'],
                'Age':[23,12,34,13,42],
                'Height':[1.2,2.3,1.1,1.6,0.5]})

Read CSV as DataFrame

We can read a csv as a DataFrame with Pandas by doing the following

csv_path ='data.csv'
df = pd.read_csv(csv_path)

Read XLSX as DataFrame

We need to install an additional dependency to do this firstm and then read it with the pd.read_excel function

!pip install xlrd
xlsx_path = 'data.xlsx'
df = pd.read_excel(xlsx_path)

View DataFrame

We can view the first few lines of our DataFrame as follows

df.head()

Assume our data looks like the following

Name Age Height
0 John 23 1.2
1 Jack 12 2.3
2 Smith 34 1.1
3 Jenny 13 1.6
4 Maria 42 0.5

Working with DataFrame

Assigning Columns

We can read the data from a specific column as follows

ages = df[['age']]
Age
0 23
1 12
2 34
3 13
4 42

We can also assign multiple columns

age_vs_height = df[['Age', 'Height']]
Age Height
0 23 1.2
1 12 2.3
2 34 1.1
3 13 1.6
4 42 0.5

Reading Cells

We can read a specific cell in one of two ways. The iloc fnction allows us to access a cell with the row and column index, and the loc function lets us do this with the row index and column name

df.iloc[1,2] # 2.3
df.loc[1, 'Height'] # 2.3

Slicing

We can also do slicing using loc and iloc as follows

df.iloc[1:3, 0:2]
Name Age
1 Jack 12
2 Smith 34
df.loc[0:2, 'Age':'Height']
Age Height
0 23 1.2
1 12 2.3
2 34 1.1

Saving Data to CSV

Using Pandas, we can save our DataFrame to a CSV with

df.to_csv('my_dataframe.csv')

Arrays

The Numpy Library allows us to work with arrays the same as we would mathematically, in order to use Numpy we need to import it as follows

import numpy as np

Arrays are similar to lists but are fixed size, and each element is of the same type

1D Arrays

Defining an Array

We can simply define an array as follows

a = np.array([1,2,3]) # casting a list to array

Types

An array can only store data of a single type, we can find the type of the data in an array with

a.dtype

Manipulating Values

We can easily manipulate values in an array by changing them as we would in a list. The same can be done with splicing and striding operations

a = np.array([1,2,3]) # array([1,2,3])
a[1] = 5 # array([5,2,3])
b = c[1:3] # array([2,3])

We can also use a list to select a specific indexes and even assign values to those indexes

a = np.array([1,2,3]) # array([1,2,3])
select = [1,2]
b = a[select] # array([1,2])
a[select] = 0 # array([1,0,0])

Attributes

An array has various properties and functions such as

a = np.array([1,2,3])
a.size # size
a.ndim # number of dimensions
a.shape # shape
a.mean() # mean of values
a.max() # max value
a.min() # min value

Array Operations

We have a few different operations on arrays such as

u = np.array([1,0])
v = np.array([0,1])
u+v # vector addition
u*v # array multiplication
np.dot(u,v) # dot product
np.cross(u,v) # cross product
u.T # transpose array

Linspace

The linspace function can be used to generate an array with values over a specific interval

np.linspace(start, end, num=divisions)
np.linspace(-2,2,num=5) # array([-2., -1.,  0.,  1.,  2.])
np.linspace(0,2*np.pi,num=10)
# array([0.        , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
#        3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531])

Plotting Values

We can apply a function to these values by using array operations, such as those mentioned above as well as others like

x = np.linspace(0,2*np.pi, num=100)
y = np.sin(x) + np.cos(x)

2D Arrays

Defining a 2D Array

Two dimensional Arrays can be defined by a list that contains nested lists of the same size as follows

a = np.array([[11,12,13],[21,22,23],[31,32,33]])

We can similarly make use of the previously defined array operations

Accessing Values

Values in a 2D array can be indexed in either one of two ways

a[1,2] # 23
a[1][2] # 23

Slicing

We can perform slicing as follows

a[0][0:2] # array([11, 12])
a[0:2,2] # array([13, 23])

Mathematical Operations

We can perform the usual mathematical operations with 2D arrays as with 1D

Dancing Man

The following Script will make a dancing man if run in Jupyter > because why not

from IPython.display import display, clear_output
import time

val1 = '(•_•)\n<)   )╯\n/    \\'
val2 = '\(•_•)\n(   (>\n/    \\'
val3 = '(•_•)\n<)   )>\n/    \\'

while True:
    for pos in [val1, val2, val3]:
        clear_output(wait=True)
        print(pos)
        time.sleep(0.6)