Python Basics
General information on using Python for Data Science and Machine Learning
Python Basics
Based on this Cognitive Class Course
Labs
Jupyter Notebooks with Examples on these can be found in the labs
folder
The Labs are from this Cognitive Class Course and are under the MIT License
Types
Hello World
We can simply print out a string in Python as follows
print('Hello, World!')
Python Version
We can check our version as follows
import sys
print(sys.version)
The sys
module is a built-in module that has many system specific parameters and functions
Comments
Comments can be done by using the #
# Python comments
Docstrings
Python also allows for use of docstrings which can appear immediately after a function, class definition, or at the top of a module, these are done as follows
def hello():
'''
This function will say hello
It also takes no input arguments
'''
return 'Hello'
hello()
Also note that Python uses '
and "
to mean the same thing
Types of Objects
Python is object oriented, and dynamically typed. We can get the type of a value in python with the type
function
type(12) # int
type(2.14) # float
type("Hello") # str
type(True) # bool
type(False) # bool
We can get information about a type using the sys
object properties, for example
sys.float_info
Type Conversion
We can use the following to convert between types
float(2)
int(1.1)
int('1')
str(1)
str(1.1)
int(True) # 1
int(False) # 0
float(True) # 1.0
bool(1) # True
Expressions
Expressions in python can include integers, floats, and strings, depending on the operation
We can do the following
1 + 2 # addition
1 - 2 # subtraction
1 / 2 # division
1 // 2 # integer division
Integer division will round off to the nearest integer
It is also helpful to note that Python will obey BODMAS
Variables
Variables can simply be assigned without being defined first, and are dynamically types
x = 2
y = x / 2
x = 2 + 4
x = 'Hello'
In a notebook we can simply evaluate the value of a variable or expression by placing it as the last line of a cell
Strings
Defining Strings
Strings can be defined with either '
or "
, and can be a combination of any characters
'Hello World'
'H3110 Wor!$'
"Hello World"
Indexing
Strings are simply an ordered sequence of characters, we can index these as any other array with []
as follows
name = 'John'
name[0] # J
name[3] # n
We can also index negatively as follows
name = 'John'
name[-1] # n
name[-4] # J
Length
We can get the length of a string with len()
len(name) # 4
Slicing
We can slice strings as follows
name = 'John Smith'
name[0:4] # John
name[5:7] # Sm
Or generally as
string[start:end]
Stride
We can also input the stride, which will select every nth value within a certain range
string[::stride]
string[start:stop:stride]
For example
name[::3] # Jnmh
name[0:4:2] # Jh
Concatenation
We can concatenate strings as follows
text = 'Hello'
text + text # HelloHello
text * 3 # HelloHelloHello
Escape Characters
At times we may need to escape some characters in a Python string, these are as follows
Character | Escape |
---|---|
newline | <NEW LINE> |
\ | \ |
' | \' |
" | \" |
ASCII Bell | \a |
ASCII Backspace | \b |
ASCII FF | \f |
ASCII LF | \n |
ASCII CR | \r |
ASCII Tab | \t |
ASCII VT | \v |
Octal Character | \ooo |
Hex Character | \xhh |
We can also do multi line strings with the """
or '''
If we have a string that would otherwise need escaping, we can use a string literal as follows
text = r'\%\n\n\t'
text # '\%\n\n\t'
String Operations
We have a variety of string operations such as
text = 'Hello;
text.upper() # HELLO
text.lower() # hello
text.replace('Hel', 'Je') # Jello
text.find('l') # 2
text.find('ell') # 1
text.find('asnfoan') # -1
Tuples
Define
A tuple is a way for us to store data of different types, this can be done simply as follows
my_tuple = ('Hello', 3, 0.14)
type(my_tuple) # tuple
A key thing about tuples is that they are immutable. We can reassign the entire tuple, but not change its values
Indexing
We can index a tuple the same way as a string or list using positive or negative indexing
my_tuple[1] # 3
my_tuple[-2] # 3
Concatenation
We can also concatenate tuples
my_tuple += ('pies', 'are', 3.14)
my_tuple # ('Hello', 3, 0.14, 'pies', 'are', 3.14)
Slice and Stride
We can slice and stride as usual with
my_tuple[start:end]
my_tuple[::2]
my_tuple[0:4:2]
Sorting
We can sort a tuple with the sorted
function
sorted(tuple)
The sorted
function will return a list
Nesting
Since tuples can hold anything, they can also hold tuples
my_tuple = ('hello', 4)
my_tuple2 = (my_tuple, 'bye')
We can access elements of tuples with double indexing as follows
my_tuple2[0][1] # 4
Lists
Defining
A list is an easy way for us to store data of any form, such as numbers, strings, tuples, and lists
Lists are mutable and have many operations that enable us to work with them more easily
my_list = [1,2,3,'Hello']
Indexing
Lists can also be indexed using the usual method both negatively and positively
my_list[1] # 2
my_list[-1] # Hello
Operations
Slice and Stride
my_list[start:end] # slicing
my_list[::stride]
my_list[start:end:stride]
Extend
Extend will add each object to the end of the list
my_list = [1,2]
my_list.extend([item1, item2])
my_list # [1, 2, item1, item2]
Append
Append will add the input as a single object to the last value of the list
my_list = [1,2]
my_list.append([item1, item2])
my_list # [1, 2, [item1, item2]]
Modify an element
List elements can be modified by referencing the index
my_list = [1,2]
my_list[1] = 3
my_list # [1,3]
Delete an Element
my_list = [1,2,3]
del(my_list[1])
my_list # [1,3]
We can delete elements by index as well
String Splitting
We can split a string into a list as follows
my_list = 'hello'.split()
my_list # [h,e,l,l,o]
my_list = 'hello, world, !'.split(',')
my_list # ['hello', 'world', '!']
Cloning
Lists are stored by reference in Python, if we want to clone a list we can do it as follows
new_list = my_list[:]
Sets
A set is a unique collection of objets in Python, sets will automatically remove duplicate items
Defining a Set
my_set = {1, 2, 3, 1, 2}
my_set # {1, 2, 3}
Set Operations
Set from a List
We can create a set from a list with the set
function
my_set = set(my_list)
Add Element
We can add elements to a set with
my_set.add("New Element")
If the element already exists nothing will happen
Remove Element
We can remove an element from a set with
my_set.remove("New Element")
Check if Element is in Set
We can check if an element is in a set by using in
which will return a bool
"New Element" in my_set # False
Set Logic
When using sets we can compare them with one another
Intersection
We can find the intersection between sets with &
or with the intersection function
set_1 & set_2
set_1.intersection(set_2)
Difference
We can fin d the difference in a specific set relative to another set with
set_1.difference(set_2)
Which will give us the elements that set_1
has that set_2
does not
Union
We can get the union of two sets with
set_1.union(set_2)
Superset
We can check if one set is a superset of another with
set_1.issuperset(set_2)
Subset
We can check if one set is a subset of another with
set_1.isSubset(set_2)
Dictionaries
Dictionaries are like lists, but store data by a key instead of an index
Keys can be strings, numbers, or any immutable object such as a tuple
Defining
We can define a dictionary as a set of key-value pairs
my_dictionary = {"key1": 1, "key2": "2", "key3": [3, 3, 3], "key4": (4, 4, 4), ('key5'): 5, (0, 1): 6, 92: 'hello'}
Accessing a Value
We can access a value by using its key, such as
my_dictionary['key1'] # 1
my_dictionary[(0,1)] # 6
my_dictionary[5] # 'hello'
Get All Keys
We can get all the keys in a dictionary as follows
my_dictionary.keys()
Append a Key
Key-value pairs can be added to a dictionary as follows
my_dictionary['New Key'] = new_value
Delete an Entry
We can delete an entry by key using
del('New Key)
Verify that Key is in Dictionary
We can use the in
operator to check if a key exists in a dictionary
'My Key' in my_dictionary
Conditions and Branching
Comparison Operators
We have a few different comparison operators which will produce a boolean based on their condition
Operation | Operator | i = 1 |
---|---|---|
equal | == |
i == 1 |
not equal | != |
i != 0 |
greater than | > |
i > 0 |
less than | < |
i < 2 |
greater than or equal | >= |
i >= 0 and i >= 1 |
less than or equal | <= |
i <= 2 and i <= 1 |
Logical Operators
Python has the following logical operators
Operation | Operator | i = 1 |
---|---|---|
and | and |
i == 1 and i < 2 |
or | or |
i == 1 or i == 2 |
not | not |
not(i != 0) |
String Comparison
When checking for equality Python will check if the strings are the same
'hello' != 'bye' # True
Comparing strings is based on the ASCII Code for the string, for example 'B' > 'A'
because the ASCII Code for B is 102 and A is 101
When comparing strings like this the comparison will be done in order of the characters in the string
Branching
Branching allows us to run different statements depending on a condition
If
The if statement will only run the code that forms part of its block if the condition is true
i = 0
if i == 0:
print('Hello')
If-Else
An if-else can be done as follows
i = 0
if i == 1:
print('Hello')
else:
print('Bye')
Elif
If we want to have multiple if conditions, but only have the first one that is true be executed we can do
i = 0
if i == 1:
print('Hello')
elif i == 0:
print('Hello World')
elif i > 1:
print('Hello World!!')
else:
print('Bye')
Loops
For Loops
A for loop in Python iterates through a list and executes its internal code block
loop_vals = [1,6,2,9]
for i in loop_vals:
print(i)
#1 6 2 9
Range
If we want to iterate through the values without using a predefined list, we can use the range function to generate a list of values for us to to iterate through
The range
function works as follows
ran = range([start,], stop, [,step])
ran # [start, start + step, start + 2*step, ... , stop -1]
The range function only requires the stop value, the other two are optional,the stop value is not inclusive
range(5) # [0,1,2,3,4]
range(5, 10) # [5,6,7,8,9]
range(5, 10, 2) # [5,7,9]
Using this we can iterate through the values of our array as follows
loop_vals = [1,6,2,9]
for i in range(len(loop_vals)):
print(loop_vals[i])
While Loops
While loops will continue until the stop condition is no longer true
i = 0
while (i < 10):
print(i)
i ++
# 0 1 3 4 5 6 7 8 9
Functions
Defining
Functions in Python are defined and called as follows
def hello():
print('Hello')
hello() # Hello
We can have arguments in our function
def my_print(arg1, arg2):
print(arg1, arg2)
my_print('Hello', 'World') # Hello World
Functions can also return values
def my_sum(val1, val1):
answer = val1 + val2
return answer
my_sum(1,2) # 3
A function can also have a variable number of arguments such as
def sum_all(*vals):
return sum(vals)
sum_all(1,2,3) # 6
The vals object will be taken in as a tuple
Function input arguments can also have default values as follows
def has_default(arg1 = 4):
print(arg1)
has_default() # 4
has_default(5) # 5
Or with multiple arguments
def has_defaults(arg1, arg2 = 4):
print(arg1, arg2)
has_defaults(5) # 5 4
has_defaults(5,6) # 5 6
Help
We can get help about a function by calling the help function
help(print)
Will give us help about the print function
Scope
Functions have access to variables that are globally defined, as well as their own local scope. Locally defined variables are not accessible from outside the function unless we declare it as global as follows
def make_global():
global global_var = 5
make_global()
global_var # 5
Note that the global_var
will not be defined until our function is at least called once
Objects and Classes
Defining a Class
We can define a class Circle
which has a constructor, a radius and a colour as well as a function to increase its radius and to plot the Circle
We make use of matplotlib
to plot our circle here
import matplotlib.pyplot as plt
%matplotlib inline
class Circle(object):
def __init__(self, radius=3, color='blue'):
self.radius = radius
self.color = color
def add_radius(self, r)
self.radius += r
return(self.radius)
def draw_circle(self):
plt.gca().add_patch(plt.Circle((0, 0), radius=self.radius, fc=self.color))
plt.axis('scaled')
plt.show()
Instantiating an Object
We can create a new Circle
object by using the classes constructor
red_circle = Circle(10, 'red')
Interacting with our Object
We can use the dir
function to get a list of all the methods on an object, many of which are defined by Python already
dir(red_circle)
We can get our object's property values by simply referring to them
red_circle.color # red
red_circle.radius # 10
We can also manually change the object's properties with
red_circle.color = 'pink'
We can call our object's functions the same way
red_circle.add_radius(10) # 20
red_circle.radius # 20
The red_circle
can be plotted by calling the draw_circle
function
Reading Files
Note that the preferred method for reading files is using with
Open
We can use the built-in open
function to read a file which will provide us with a File
object
example1 = '/data/test.txt'
file1 = open(example1,'r')
The 'r'
sets open to read mode, for write mode we can use 'w'
, and 'a'
for append mode
Properties
File
objects have some properties such as
file1.name
file1.mode
Read
We can read the file contents to a string with the following
file_content = file1.read()
Close
Lastly we need to close our File
object with
file1.close
We can verify that the file is closed with
file1.closed # True
With
A better way to read files is by using using the with
statement which will automatically close the file, even if we encounter an exception
with open(example1) as file1:
file_content = file1.read()
We can also read the file in by pieces either based on characters or on lines
Read File by Characters
We can read the first four characters with
with open(example1,'r') as file1:
content = file1.read(4)
Note that this will still continue to parse the file, and not start over each time we call read()
, so we can read the first seven characters is like so
with open(example1,'r') as file1:
content = file1.read(4)
content += file1.read(3)
Read File by Lines
Our File
object looks a lot like a list with each line a new element in the list
We can read our file by lines as follows
with open(example1,'r') as file1:
content = file1.readline()
We can read each line of our file into a list with the readline
function like so
content = []
with open(example1,'r') as file1:
for line in file1:
content.append(line)
Or with the readlines
function like so
with open(example1, 'r') as file1:
content = file1.readlines()
Writing Files
We can also make use of open to write content to a file as follows
out_path = 'data/output.txt'
with open(out_path, 'w') as out_file:
out_file.write('content')
The write
function works the same as the read
function in that each time we call it, it will just write a single line to the file, if we want to write multiple lines to our file w need to do this as follows
content = ['Line 1 content', 'Line 2 content', 'Line 3 content']
with open(out_path, 'w') as out_file:
for line in content:
out_file.write(line)
Copy a File
We can copy data from one file to another by simultaneously reading and writing between the files
with open('readfile.txt','r') as readfile:
with open('newfile.txt','w') as writefile:
for line in readfile:
writefile.write(line)
Pandas
Pandas is a library that is useful for working with data as a DataFrame in Python
Importing Pandas
The Pandas library will need to be installed and then imported into our notebook as
import pandas as pd
Creating a DataFrame
We can create a new DataFrame in Pandas as follows
df = pd.DataFrame({'Name':['John','Jack','Smith','Jenny','Maria'],
'Age':[23,12,34,13,42],
'Height':[1.2,2.3,1.1,1.6,0.5]})
Read CSV as DataFrame
We can read a csv as a DataFrame with Pandas by doing the following
csv_path ='data.csv'
df = pd.read_csv(csv_path)
Read XLSX as DataFrame
We need to install an additional dependency to do this firstm and then read it with the pd.read_excel
function
!pip install xlrd
xlsx_path = 'data.xlsx'
df = pd.read_excel(xlsx_path)
View DataFrame
We can view the first few lines of our DataFrame as follows
df.head()
Assume our data looks like the following
Name | Age | Height | |
---|---|---|---|
0 | John | 23 | 1.2 |
1 | Jack | 12 | 2.3 |
2 | Smith | 34 | 1.1 |
3 | Jenny | 13 | 1.6 |
4 | Maria | 42 | 0.5 |
Working with DataFrame
Assigning Columns
We can read the data from a specific column as follows
ages = df[['age']]
Age | |
---|---|
0 | 23 |
1 | 12 |
2 | 34 |
3 | 13 |
4 | 42 |
We can also assign multiple columns
age_vs_height = df[['Age', 'Height']]
Age | Height | |
---|---|---|
0 | 23 | 1.2 |
1 | 12 | 2.3 |
2 | 34 | 1.1 |
3 | 13 | 1.6 |
4 | 42 | 0.5 |
Reading Cells
We can read a specific cell in one of two ways. The iloc
fnction allows us to access a cell with the row and column index, and the loc
function lets us do this with the row index and column name
df.iloc[1,2] # 2.3
df.loc[1, 'Height'] # 2.3
Slicing
We can also do slicing using loc
and iloc
as follows
df.iloc[1:3, 0:2]
Name | Age | |
---|---|---|
1 | Jack | 12 |
2 | Smith | 34 |
df.loc[0:2, 'Age':'Height']
Age | Height | |
---|---|---|
0 | 23 | 1.2 |
1 | 12 | 2.3 |
2 | 34 | 1.1 |
Saving Data to CSV
Using Pandas, we can save our DataFrame to a CSV with
df.to_csv('my_dataframe.csv')
Arrays
The Numpy Library allows us to work with arrays the same as we would mathematically, in order to use Numpy we need to import it as follows
import numpy as np
Arrays are similar to lists but are fixed size, and each element is of the same type
1D Arrays
Defining an Array
We can simply define an array as follows
a = np.array([1,2,3]) # casting a list to array
Types
An array can only store data of a single type, we can find the type of the data in an array with
a.dtype
Manipulating Values
We can easily manipulate values in an array by changing them as we would in a list. The same can be done with splicing and striding operations
a = np.array([1,2,3]) # array([1,2,3])
a[1] = 5 # array([5,2,3])
b = c[1:3] # array([2,3])
We can also use a list to select a specific indexes and even assign values to those indexes
a = np.array([1,2,3]) # array([1,2,3])
select = [1,2]
b = a[select] # array([1,2])
a[select] = 0 # array([1,0,0])
Attributes
An array has various properties and functions such as
a = np.array([1,2,3])
a.size # size
a.ndim # number of dimensions
a.shape # shape
a.mean() # mean of values
a.max() # max value
a.min() # min value
Array Operations
We have a few different operations on arrays such as
u = np.array([1,0])
v = np.array([0,1])
u+v # vector addition
u*v # array multiplication
np.dot(u,v) # dot product
np.cross(u,v) # cross product
u.T # transpose array
Linspace
The linspace
function can be used to generate an array with values over a specific interval
np.linspace(start, end, num=divisions)
np.linspace(-2,2,num=5) # array([-2., -1., 0., 1., 2.])
np.linspace(0,2*np.pi,num=10)
# array([0. , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
# 3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531])
Plotting Values
We can apply a function to these values by using array operations, such as those mentioned above as well as others like
x = np.linspace(0,2*np.pi, num=100)
y = np.sin(x) + np.cos(x)
2D Arrays
Defining a 2D Array
Two dimensional Arrays can be defined by a list that contains nested lists of the same size as follows
a = np.array([[11,12,13],[21,22,23],[31,32,33]])
We can similarly make use of the previously defined array operations
Accessing Values
Values in a 2D array can be indexed in either one of two ways
a[1,2] # 23
a[1][2] # 23
Slicing
We can perform slicing as follows
a[0][0:2] # array([11, 12])
a[0:2,2] # array([13, 23])
Mathematical Operations
We can perform the usual mathematical operations with 2D arrays as with 1D
Dancing Man
The following Script will make a dancing man if run in Jupyter > because why not
from IPython.display import display, clear_output
import time
val1 = '(•_•)\n<) )╯\n/ \\'
val2 = '\(•_•)\n( (>\n/ \\'
val3 = '(•_•)\n<) )>\n/ \\'
while True:
for pos in [val1, val2, val3]:
clear_output(wait=True)
print(pos)
time.sleep(0.6)