CASE STUDY -1 guage for different contents. he last 30 days? nt in them. w will you display duplicates from the table?
APPROACH Table-1: users This table includes one row per user, with descriptive informati Table-2: events This table includes one row per event, where an event is an acti events, messaging events, search events, events logged as users received emails.
CASE STUDY -2 ion about that user’s account. ion that a user has taken. These events include login s progress through a signup funnel, events around
APPROACH Table-3: email_events This table contains events specific to the sending of emails. It is
CASE STUDY -2 s similar in structure to the events table above.
TECH-STACK USED • TO CREATE THIS PROJECT I USED • DB-FIDDLE DATABASE: MYSQL V5.7 . • I USED IT BECAUSE IT IS ONLINE AND FAST .
INSIGHTS • I UNDERSTOOD WHERE TO USE MY DATA ANA • I UNDERSTOOD THAT HOW TO WRITE CODES • I UNDERSTOOD THAT WHAT TYPE OF QUESTIO • WHILE DOING THE PROJECT I FACED SO POINTS AGAIN AND SOLVED THE ERRORS.
ALYSIS SKILLS. ACCORDING TO THE QUESTIONS. ONS I WILL GET IN FUTURE. MANY ERRORS THEN I REMINISCED THE
RESULTS • AFTER COMPLETING THIS PROJECT , I AM AIDS. • THIS PROJECT HELPED ME TO RECALL E LEARNED AND HELPED ME TO THINK HO • IT HELPED ME TO GET CONFIDENCE IN C • I AM VERY HAPPY THAT I LEARNED A NEW CREATING A DATABASE.
M VERY BUOYANT ABOUT MY SQL EACH AND EVERYTHING WHICH I OW CAN I DO BETTER IN ANALYSIS . CREATING THE DATABASE . W LANGUAGE AND LEARNED
THANK
K YOU
HIRING PROCESS ANA
ALYTICS ADVITYA SINGH
AGENDA Project Description Approach Tech-Stack Used Insights Result
Project description Hiring process is the fundamental and the most important funct major underlying trends about the hiring process. Trends such a vacancies etc. are important for a company to analyse before hir Being a Data Analyst, your job is to go through these trends and You are working for a MNC such as Google as a lead Data Analys their previous hirings and have asked you to answer certain que You are given a dataset of a company where the details about p this company. You are required to use your knowledge in statisti conclusions about the company. Use the below Steps for EDA 1.Understanding data columns and data 2.Checking for missing data 3.Clubbing columns with multiple categories 4.Checking for outliers 5.Removing outliers 6.Drawing Data Summary
tion of a company. Here, the MNCs get to know about the as- number of rejections, number of interviews, types of jobs, ring freshers or any other individual. d draw insights out of it for hiring department to work upon. st and the company has provided with the data records of estions making sense out of that data. people who registered for a particular post in a department of ics and use different formulas in excel and draw necessary
Approach Task A : Hiring: Process of intaking of people into an organizatio Your task: How many males and females are Hired ? Task B : Average Salary: Adding all the salaries for a select grou employees in the group. Your task: What is the average salary offered in this company ?
on for different kinds of positions. up of employees and then dividing the sum by the number of
Approach Task C : Class Intervals: The class interval is the difference betw Your task: Draw the class intervals for salary in the company ? Task D : Charts and Plots: This is one of the most important par Your task: Draw Pie Chart / Bar Graph ( or any other graph ) to sh
ween the upper class limit and the lower class limit. rt of analysis to visualize the data. how proportion of people working different department ?
Approach Task E: Charts: Use different charts and graphs to perform the Your task: Represent different post tiers using chart/graph?
e task representing the data.
Tech-stack used To create this project I used Ms excel. Pivot table.
INSIGHTS • I understood where to use my excel analysis • I understood how to write excel formulae acc • I understood that what type of questions I will • I also worked on the effective tool pivot tables
skills. cording to the questions. get in future. s.
Results After completing this project , I am ver It helped me to get confidence in analy I am very happy that I learned effective
ry buoyant about ms excel aids. yzing hiring process database . e MS excel tools.
1. Project Description: A dataset having various columns of different IMDB Movies data was pr decades, in numerous languages, genres and various crew cast and direc the IMDB score is also generated. Information about other social media (a) Problem Statement and Aim This problem could be approached by analysing the IMDB Movie (i) Data cleaning process. (ii) movies with highest profit. (iii) IMDB top 250 movies. (iv) Best Directors. (v) Popular genres. (vi) Actor specific movie extraction. (vii) Find the critic-favourite and audience-favourite ac (viii) Create interactive Dashboard. By using data analysis techniques, we can analyse the above asp the basis of user reviews.
rovided. The data contains a lot of information of movies produced world over the ctors. The dataset had information about the user reviews and based on the same Platforms as facebook likes for a particular title as well as actor was also provided. es data and generate insights wrt the following aspects :- ctors. pects and draw the insights about the top performing movies, actors, directors on
(b) Description of the data sources used in the The dataset contains huge information from website IMDB rega language, country of origin etc .It was collected and made availab is as under: (i) Number of observations: 1,38,535 (ii) Number of variables: 28 (iii) File type: CSV (Comma Separated Values) This dataset could be useful for a variety of data analysis tasks, s (i) movies with highest profit. (ii) IMDB top 250 movies. (iii) Best Directors. (iv) Popular genres. (v) Actor specific movie extraction. (vi) Find the critic-favourite and audience-favourite ac (vii) Create interactive Dashboard. (e) Description of the data cleaning and pre-proces following process and tools in excel were used to clean the data :- (i) Blank cells. A number of blank cells were foun same were removed by shortcut key F5- GoTo-special -B errors in future analysis and pivot table formations. (ii) Dropping Columns. The columns and data not pivot tables thereby reducing long list of pivot table field (iii) Splitting column data. The data given in co to Columns with Multiple Delimiters using TEXTSPLIT in
e project. arding the movies, actors, directors, genre, budget, reviews, facebook likes, color, ble on Final Project-1 : IMDB MOVIE ANAYSIS. The brief overview of the dataset such as: ctors. ssing steps performed on the data. After studying the data the nd in various columns like – color, actor name, user reviews, gross, plot , etc. The Blanks; after highlighting blanks ctrl(-), deleted all blank rows. This would lead to required for analysis were dropped to reduce the data load during creation of list. olumn name “Genres” had multiple data separated by special character. Split Text Excel Formula was carried out for ease of analysis.
(f) Any assumptions made during the project 2. Approach: the analytical methods ad in the project is discussed in the succeeding TASK Insight Required: Data cleaning is one of the most important step to till now to do this. (Dropping columns, removing null values, etc.) Task A : Carry out Data cleaning as per the learned tools in Excel. ANALYSIS OF TASK A: After studying the data the following process and (i) Blank cells. A number of blank cells were foun same were removed by shortcut key F5- GoTo-special -B errors in future analysis and pivot table formations. (ii) Dropping Columns. The columns and data not pivot tables thereby reducing long list of pivot table field (iii) Splitting column data. The data given in co to Columns with Multiple Delimiters using TEXTSPLIT in
t. No assumptions were made during the process of analysis. dopted for carrying out each analysis tasks given g paras :- K ANALYSIS o perform before moving forward with the analysis. Use your knowledge learned tools in excel were used to clean the data :- nd in various columns like – color, actor name, user reviews, gross, plot , etc. The Blanks; after highlighting blanks ctrl(-), deleted all blank rows. This would lead to required for analysis were dropped to reduce the data load during creation of list. olumn name “Genres” had multiple data separated by special character. Split Text Excel Formula was carried out for ease of analysis.
Insight Required: The task requires to analyse the Movies with hi Task B : Create a new column called profit which contains t the profit column as reference. Plot profit (y-axis) vs budget ANALYSIS OF TASK B: Average call time for all incoming calls answered b Pivot Table - The analysis was carried out by cr column was computed by generating new measur as tasked. Outliers - There are few values which are d generated. They are as under :- -12213298588 -4199788333 -2499804112 -2397701809 -2127109510 Insight - The analysis clearly indicated movi clearly. To work out the top profit generating movi form and bar chart.
ighest profit. the difference of the two columns: gross and budget. Sort the column using t (x- axis) and observe the outliers using the appropriate chart type. by agent was worked out as under :- reating pivot table and BUDGET and GROSS was derived from Field list. PROFIT re (diff of gross and budget) in value field setting. Same was plotted on X and Y axis data point or set of values that are significantly different from the profit values ies with highest profit. The same was shown by scatter plot indicting the outliers ies, Top 10 movies were sorted out based on highest profit, it was shown in tabular
Insight Required: The task requires to find top IMDB 250 movie Non- English language movies. Task C: Find top 250 movies based on the IMDB ratings and ANALYSIS OF TASK C: The Top IMDB 250 and Foreign movies based IMDB TOP 250 - The analysis was carried out the results. Ranking was carried out USING ADVA TOP NON ENGLISH MOVIES - using the FILTER ratings. Insight - The analysis clearly indicated that than English language were are separately listed.
es based on the IMDB ratings and also to segregate English language and also to segregate English language and Non- English language movies. d on IMDB rating is worked out as under :- t by creating pivot table. Movie title, language, imdb score data was used to tabulate ANCE FUNCTIONS SUCH AS =SEQUENCE(COUNTA(A6:A255),1,1,1). R OUT function NON-ENGLISH movies were filtered and stacked as per the IMDB TOP 250 all language movies based on IMDB rankings. Also Foreign movies other
Insight Required: Find out the top 10 directors for whom the mean of im a tie in IMDb score between two directors, sort them alphabetically. Task D: Find out the top 10 directors for whom the mean of imdb ANALYSIS OF TASK D: Find out the top 10 directors for whom the top10director:- TOP 10 DIRECTORS - The analysis was carried out 1. Workout the top 10 dire 2. Change value field settin 3. Use filter to sort as per h 4. Show the table data in a Insight - with the derived data top 10 filter, sorting and indexing were used.
mdb_score is the highest and store them in a new column top10director. In case of b_score is the highest and store them in a new column top10director. IMDB RANK TOP 10 DIRECTORS Charles Chaplin Tony Kaye 8.65 Alfred Hitchcock 8.6 8.6 Damien Chazelle Majid Majidi 8.6 Ron Fricke Sergio Leone 8.55 Christopher Nolan 8.5 8.5 8.5 8.5 Asghar Farhadi Marius A. Markevicius 8.5 8.45 8.43 8.42 8.40 8.40 8.4 8.35 8.3 1 DIRECTORS e mean of imdb_score is the highest and store them in a new column t by performing the following steps.:- ectors with data field - director, imdb score and movie title in pivot table. ng of IMDB_SCORE to Average to work out MEAN. Also count for movie title. highest average imdb score and use index function for ranking . a bar chart. 0 directors on the basis of mean imdb score was depicted. Functions like pivot table,
Insight Required: Find popular genres based on the previous gained ex Task E: Find popular genres of movies based on the count of ge ANALYSIS OF TASK E: Popular genres of movies based on the count POPULAR GENRES - The analysis was carried out 1. Workout the POPULAR table. 2. Change value field settin 3. Use filter to sort as per h 4. Show the table data in a Insight - with the derived data POPU like pivot table, filter, sorting and indexing were us
xperience of data analysis. enres. t of genres is worked out as under :- t by performing the following steps.:- GENRES with data field – GENRES AND COUNT OF GENRES in value field in pivot ng genre in value filed setting to COUNT. highest COUNT OF GENRES. a bar chart. ULAR GENRES on the basis of HIGHEST COUNT OF GENRES was depicted. Functions sed.
Insight Required: Extraction of movie titles based on actor names nam Task F(i) : creation of three new columns namely Meryl_Streep 'Meryl Streep', 'Leonardo DiCaprio', and 'Brad Pitt' for the extract ANALYSIS OF TASK F(i) : Popular genres of movies based on the POPULAR GENRES - The analysis was carried out 1. Three new columns wer 2. =IFERROR(INDEX($B$5:$ MIN(ROW($B$5:$B$372 column for all three acto 3. Append these three co =CHOOSE({1,2,3}, H5:H1 Insight - with the derived data the us
mes 'Meryl Streep', 'Leonardo DiCaprio', and 'Brad Pitt' for the said extraction. p, Leo_Caprio, and Brad_Pitt and Extraction of movie titles based on actor names tion. Append the data in rows to columns and name it as combined. e count of genres is worked out as under :- t by performing the following steps.:- re created as tasked. $B$3725,SMALL(IF($A$5:$A$3725=$H$4,ROW($B$5:$B$3725)- 25))+1),ROWS($H$5:H5))),””) formula was used to extract data from Actor_1 ors. columns to one column with all rows data in one column by array function 15, I5:I25, J5:J21). Name it as COMBINED. se of ARRAY FUNCTIONS in excel are practised.
Insight Required: Find popular genres based on the previous gained ex Task F(ii) & (iii) : Find popular genres of movies based on the co ANALYSIS OF TASK F(ii) & (iii): Popular genres of movies based Critic Fav & Num Fav Actors - The analysis 1. Workout the mean of w 2. Change value field settin 3. Use filter to sort as top t 4. Show the table data in a Num Users by Decade - The analysis was car 1. Workout the title_year i 2. Change value field settin 3. Use Vlookup with a cust 4. Show the table data in a Insight - with the derived data Critic and VLOOKUP functions were used for desired res
xperience of data analysis. ount of genres. d on the count of genres is worked out as under :- was carried out by performing the following steps.:- with Critic Fav & Num Fav by pivot table in data field – and average in value field. ng of Critic Fav & Num Fav in value filed setting to AVERAGE . three actors on the basis of mean data. a bar chart. rried out by performing the following steps.:- in column by pivot table in data field – and Num_voted_user in value field as sum. ng of Num_voted_user in value field as sum. tom table showing the range for the decade as given in task. a bar chart. c Fav & Num Fav Actors and Num Users by Decade the pivot table advance sorting sults.
Building the An interactive dashboard was created in excel to d screenshot is shown as under :-
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154