Data Science Overview and Introduction to Python (Part I) Week1 Middlesex University Dubai. Winter ‘23, CST4050 v1.0 Instructor: Dr. Ivan Reznikov
Plan Terminology: Data Science vs Data Analytics, Data Engineering, Machine Learning, Business Intelligence, Artificial Intelligence, etc. Python: Scripts vs notebook Python: Data types 2
Terminology
Data Science vs All: Terminology FUTURE Analytics Analysis PAST Data Analysis ≠ Data Analytics 4 Business Analysis ≠ Business Analytics … Analysis ≠ … Analytics
Data Science vs All: Terminology Analysis FUTUR Analytics E on PAST practice Data Analysis ≠ Data Analytics 5 Business Analysis ≠ Business Analytics … Analysis ≠ … Analytics
Data Science vs All
Data Science vs All 1* 2* 3* 1* - Case Study – you analyze cases of companies in the past 4* 2* - Quantitative Analytics – expert believe that X will increase Y 3* - Preliminary Data Report – rough numbers by prior research 4* - Signal Processing, Fourier Transform
Data Science vs All 1* 2* 1* - Reporting 2* - Real-time dashboards
Data Science vs All 2* 1* 3* 1* - Data Systematization 2* - Dashboards and Reporting Visuals 3* - Sales Forecasting via Statistics, Linear Regression
Data Science vs All 1* 1* - Fraud Prevention, 2* ChatGPT 2* - Knowledge modelling, symbolic reasoning
Machine Learning vs AI 12
Data Science vs All: Whole picture
Data Science vs All: Recent Trend
Python basics
Why choose python for Data Science? ● Relatively Simple Software Development Process ● Wide Range of Open Source Frameworks and Tools ● Compatibility with Platforms and Systems ● Code is Readable and Maintainable ● Community 16
Python: scripts vs notebook 1 #some script.py code In [1]: #some notebook code 2 def square(a: int): 3 return a*a In [2]: def square(a: int): 4 return a*a 5 def main(): 6 random_number = 42 In [3]: def main(): 7 print(square(random_number)) print(square(42)) 8 9 main() In [4]: main() 10 1764 user: C:/py_project$ python script.py In [5]: 1764 17
Python: scripts vs notebook Script Notebook - organized and structured - fast development and exploration - reproducible - easy visualization and presentation - easy to debug - experiment-friendly - production prepared 18
Install python, pip, jupyter notebook Install python: https://www.python.org/downloads/ pip is the Python Package Manager. Install pip: https://pip.pypa.io/en/stable/installation/ Install jupyter notebook: https://jupyter.org/install > pip install notebook 19
To use live jupyter notebooks Google colab (for google account users): https://colab.research.google.com/ Official jupyter website: https://jupyter.org/try MyBinder: dhtetpptsh:///mmaysbtienrd?efirl.eopragt/hv=2b/ginhd/eipr/yItnhdoenx/.iippyytnhbon-in- 20
Python coding
Plan Python Basics: – Conditions and Logic – Loops – Variables – Error Handling – Functions – Comments – Modules and Packages – Math – Basic Data Types: 22 ● String ● Integer ● Float ● Boolean ● List ● Dictionary
Variables 4 2 some_answer = 42 some_answer print(some_answer) >>>42 print(type(some_answer)) >>>int 23
Naming variables Can you guess what is in each box? cat c var_1
Variables: Naming convention Bad examples Good examples 'l' (lowercase letter “el”), temp_celsius 'O' (uppercase letter “oh”) station_id 'I' (uppercase letter “eye”) current_timestamp 'some_variable' ---- 'NOT_GLOBAL_VAR' tempCelsius 'Averagecamelsizeindesert' stationID 'number-of-cans-in-a- currentTimestamp sixpack' 'xxrstaqwe' 25
Basic operations with variables Creat Read Update Delete e 26
Comments ''' multi-row # This is a comment before some code comment print(\"Hello Python!\") ''' print(\"Winter is coming!\") # this is an in-line comment Comments are left to increase the understandability of ones code. Very often to later version of yourself. 27
Data Type: String my_string = \"Burj Khalifa\" my_string[xstart:xend:step] Speed Task: my_string[5:] my_string[0:4] >>>\"Burj\" my_string[4] >>> my_string[2] >>>\"r\" my_string[-0] >>> my_string[-1] >>>\"a\" my_string[3:9:2] >>> my_string[2:-5] >>>\"rj Kh\" my_string[::-1] >>> my_string[::3] >>>\"Bjhi\" >>> 28
Data Type: String my_string = \"Burj Khalifa\" my_string[xstart:xend:step] Speed Task: my_string[0:4] >>>\"Burj\" my_string[5:] >>>\"Khalifa\" my_string[2] >>>\"r\" my_string[4] >>>\" \" my_string[-1] >>>\"a\" my_string[-0] >>>\"B\" my_string[2:-5] >>>\"rj Kh\" my_string[3:9:2] >>>\"jKa\" my_string[::3] >>>\"Bjhi\" my_string[::-1] >>>\"afilahK jruB\" 29
Data Type: Updating Strings 30 String concatenation: string_one = \"Expo2020 took\" string_two = \"place in 2021\" string_three = string_one + string_two >>>Expo2020 tookplace in 2021 string_three = string_three + \" \" + \"in Dubai\" >>>Expo2020 tookplace in 2021 in Dubai string_three += \"!\" >>>Expo2020 tookplace in 2021 in Dubai!
String methods dir(my_string) >>>[‘__add__’, ‘__class__’, ‘__contains__’, ‘__delattr__’, ‘__doc__’, ‘__eq__’, ‘__format__’, ‘__ge__’, ‘__getattribute__’, ‘__getitem__’, ‘__getnewargs__’, ‘__getslice__’, ‘__gt__’, ‘__hash__’, ‘__init__’, ‘__le__’, ‘__len__’, ‘__lt__’, ‘__mod__’, ‘__mul__’, ‘__ne__’, ‘__new__’, ‘__reduce__’, ‘__reduce_ex__’, ‘__repr__’, ‘__rmod__’, ‘__rmul__’, ‘__setattr__’, ‘__sizeof__’, ‘__str__’, ‘__subclasshook__’, ‘_formatter_field_name_split’, ‘_formatter_parser’, ‘capitalize’, ‘center’, ‘count’, ‘decode’, ‘encode’, ‘endswith’, ‘expandtabs’, ‘find’, ‘format’, ‘index’, ‘isalnum’, ‘isalpha’, ‘isdigit’, ‘islower’, ‘isspace’, ‘istitle’, ‘isupper’, ‘join’, ‘ljust’, ‘lower’, ‘lstrip’, ‘partition’, ‘replace’, ‘rfind’, ‘rindex’, ‘rjust’, ‘rpartition’, ‘rsplit’, ‘rstrip’, ‘split’, ‘splitlines’, ‘startswith’, ‘strip’, ‘swapcase’, ‘title’, ‘translate’, ‘upper’, ‘zfill’] 31
Data Type: Integers and Floats Integer: Whole numbers Float: Fraction numbers my_number = 5 my_number = 5.0 type(my_number) type(my_number) >>><class 'int'> >>><class 'float'> int(5.0) float(5) >>><class 'int'> >>><class 'float'> my_number += 1 del my_number >>>6 >>> 33
Data Type: Boolean Boolean values represent True/False statements. Often used to check for None, empty strings, list, objects, etc. my_bool = 2 + 2 == 5 my_bool, type(my_bool) >>>False, <class 'bool'> bool(False), bool(None), bool(0), bool(\"\"), bool(()), bool([]), bool({}) >>>False, False, False, False, False, False, False bool(True), bool(1), bool(\"a\"), bool((b)), bool([c]), bool({\"d\":\"e\"}) >>>True, True, True, True, True, True 34
Data Type: List my_string = [4,8,15,16,23,42] my_list[xstart:xend:step] my_list[0] = 1 >>>[1, 8, 15, 16, 23, 42] my_list[0] >>>4 my_list[-1] >>>42 my_list = my_list + [100] my_list[1:4] >>>[8, 15, 16] >>>[1, 8, 15, 16, 23, 42, my_list[3:] >>>[16, 23, 42] my_list[::-1] >>>[42, 23, 16, 15, 8, 100] 4] del my_list[-2] my_list[:] >>>[4, 8, 15, 16, 23, 42] >>>[1, 8, 15, 16, 23, 100] 35
List methods dir(my_list) >>>['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] 36
Data Type: Tuple my_tuple = (16,2,4,8) Tuples are immutable. my_tuple[0] = 0 Neither they, nor their >>>TypeError: 'tuple' object elements can be changed. Tuples are ordered. Their order does not support item assignment won’t change. They maintain the order of data insertion. my_tuple.append(0) >>> AttributeError: 'tuple' object 37 has no attribute 'append' my_tuple[-1] >>>8 my_tuple[1:3] >>>(2, 4)
Data Type: Set my_set = {1, 2, 4, 8, 16, 32} my_set = {16,1,2,32,2,2,4,8,16,1,1} Sets contain unique values. >>>{1, 2, 4, 8, 16, 32} Duplicate values are not allowed. my_set.add(5.0) Sets are immutable. my_set.add(False) Their elements cannot be changed. >>>{False, 1, 2, 4, 5.0, 8, 16} Sets are unordered. They don’t my_list = [8, 2, 2, 3, 1, 1, 4, 5, 5, 7, 8] maintain the order of data my_set = set(my_list) insertion. >>> {1, 2, 3, 4, 5, 7, 8} 38
Data Type: Dictionary my_dictionary = { \"name\" : \"Burj Khalifa\", my_dictionary[\"name\"] \"height_m\" : 830, >>>”Burj Khalifa” \"completed\" : True my_dictionary.keys() } >>>dict_keys(['name', 'height_m', { 'completed']) \"key\": \"value\" my_dictionary.values() } >>>dict_values(['Burj Khalifa', 830, True]) Keys are unique within a dictionary Dictionaries are ordered by key, not maintaining initial order 39
Data Type: Dictionary my_dictionary = { \"name\" : \"Burj Khalifa\", my_dictionary['height_m'] = 829.8 \"height_m\" : 830, my_dictionary['hashtags'] = [\"#burjkhalifa\", \"completed\" : True \"#dubai\", \"#uae\", \"#dubaimall\"] } del my_dictionary['completed'] 40 >>>{ \"name\" : \"Burj Khalifa\", \"height_m\" : 829.8, \"hashtags\" : [\"#burjkhalifa\", \"#dubai\", \"#uae\", \"#dubaimall\"] }
Dictionary methods dir(my_dictionary) >>>['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values'] 41
Data Type comparison
Search
Read the Text Version
- 1 - 40
Pages: