Introduction
Google Massive Question is a safe, accessible, fully-manage, pay-as-you-go, server-less, multi-cloud information warehouse Platform as a Service (PaaS) service offered by Google Cloud Platform that helps to generate helpful insights from massive information that may assist enterprise stakeholders in efficient decision-making. Google Massive Question supplies built-in machine studying functionality and SQL question engine to jot down SQL, which can be utilized for analyzing massive datasets. We are able to develop a safe and extremely accessible information warehouse utilizing Google Massive Question.
Udemy is without doubt one of the hottest on-line studying platforms. Udemy supplies high-quality studying content material in design, advertising and marketing, improvement, finance & accounting, IT & software program, pictures & video, well being & wellness, workplace productiveness, and many others. in several languages. Udemy is a vital supply of knowledge for a lot of college students, freelancers, and dealing professionals. Udemy is without doubt one of the greatest platforms to study Python and React and to arrange for AWS and Azure certification. Nonetheless, learners may be involved in taking programs from instructors extra aligned to their job titles, programs taken by many customers, and licensed builders like AWS licensed, Salesforce licensed, and so forth. To handle this downside, we are going to construct an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question.
![](https://cdn.analyticsvidhya.com/wp-content/uploads/2023/03/Screenshot-2023-03-31-at-8.10.51-PM.png)
Nearly all main cloud service suppliers, like Google, Amazon, Microsoft, and many others., immediately present information warehouse instruments. Cloud-based information warehouse instruments are extremely scalable and supply catastrophe restoration. Utilizing a information warehouse we are able to retailer and analyze a considerable amount of information and produce helpful information insights with the assistance of knowledge visualizations and stories. Properly-designed information warehouses ship high-quality information and enhance question efficiency by correctly defining the kind of information, utilizing information mining, synthetic intelligence, and many others., and serving to in making smarter selections.
This text will talk about the method of constructing an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question which is able to assist us to establish issues equivalent to classifying programs based mostly on teacher job titles, the common ranking of all of the programs of an teacher, and many others.
Studying Targets
On this article, we are going to study:
- Easy methods to construct an information warehouse utilizing Google Massive Question
- Easy methods to use Google Massive Question Sandbox
- Acquire information about creating datasets and tables in Massive Question
- Querying Udemy information in Massive Question SQL question engine
This text was printed as part of the Data Science Blogathon.
Desk of Contents
Undertaking Description
Now, we are going to create the desk contained in the dataset within the Google Cloud Platform SQL question engine from the downloaded information. After creating the desk, we are going to format the desk schema and carry out information cleansing. We are able to carry out querying on imported information to generate helpful insights equivalent to classifying programs based mostly on teacher job titles, figuring out programs having most rankings, instructors whose programs have good rankings, and many others.
At present, now we have information from just one supply, and we’re importing CSV format information by means of batch ingestion utilizing the Google Cloud Platform UI interface. We are able to additionally import information from a number of sources equivalent to Cloud Storage, Azure Storage Account, and many others. Other than importing information by means of the Google Cloud Platform UI interface, customers may also import information utilizing CLI, and REST APIs, utilizing information pipeline choices equivalent to Cloud Dataflow, Cloud Dataproc, and many others. Google Massive Question additionally helps file codecs equivalent to Parquet, Avro, and many others., for information loading and processing. Builders may also save, share and run queries within the SQL question engine on the scheduled time.
!["](https://av-eks-blogoptimized.s3.amazonaws.com/Source_Udemy-thumbnail_webp-600x300.png)
By querying Udemy information, customers can decide which programs they need to buy based mostly heading in the right direction length, course rankings, teacher job titles, course reputation, and many others. Customers can save and share these queries. Customers may also save the outcomes of those queries to create dashboards utilizing Energy BI, Looker Studio, Tableau, and many others. Customers may also extract extra information from Udemy utilizing net scraping strategies and ingest it in Google Massive Question SQL question engine to maintain the info up to date in order that customers can get extra correct outcomes.
Downside Assertion
On this article, we shall be utilizing Udemy Programs Information 2023 dataset from Kaggle to develop an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question, which is able to assist us to establish issues equivalent to classifying programs based mostly on teacher job titles, the common ranking of all of the programs of an teacher, classifying programs based mostly on the variety of lectures within the course, figuring out not too long ago printed and modified programs on Udemy, and many others.
As already mentioned, we are able to extract extra information from Udemy utilizing net scraping strategies as new programs and instructors carry on rising on the Udemy platform. We’ll create tables contained in the dataset within the Google Cloud Platform SQL question engine to import the programs and teacher information downloaded from Kaggle. After desk creation, we are going to carry out information cleansing and desk schema formatting.
![Problem Statement](https://av-eks-blogoptimized.s3.amazonaws.com/Source_Google_Cloud-thumbnail_webp-600x300.png)
We are able to save, share and run queries within the SQL question engine on the scheduled time. Other than this, we are able to additionally save the outcomes of the question execution in order that it may be utilized queries to create dashboards utilizing Energy BI, Looker Studio, Tableau, and many others. This venture goals to develop an information warehouse utilizing Udemy information, querying which customers can establish not too long ago printed and modified programs on Udemy, classify programs based mostly heading in the right direction length and course rankings, establish common rankings of all of the programs of an teacher, classify programs based mostly on the variety of lectures within the course, and many others.
Stipulations
Under are some stipulations to undertake this venture:
- Understanding of Information Warehouse: On this venture, we are going to construct an information warehouse to discover Udemy course tendencies and insights utilizing Google Massive Question. Due to this fact, understanding what an information warehouse is, why an information warehouse is helpful, and what the info warehouse supplies by numerous cloud distributors, and many others., are vital.
- Expertise with Google Cloud Platform: We’ll use Google Massive Question, an information warehouse service out there contained in the Google Cloud Platform. So, expertise with the Google Cloud Platform is vital to simply navigate the platform and perceive the useful resource creation course of, roles & entry permissions, and many others.
- Expertise with SQL queries: We shall be writing queries within the SQL question engine to generate helpful insights, equivalent to classifying programs based mostly on teacher job titles, figuring out programs having most rankings, instructors whose programs have good rankings, and many others.
- Familiarity with Udemy and Kaggle: Understanding what Kaggle is, how it’s helpful for downloading datasets, and fundamental familiarity with the web studying platform Udemy shall be useful whereas growing the venture.
- Understanding of Google Massive Question: As this venture makes use of Google Massive Question for creating an information warehouse, it could be helpful to have an understanding of Google Massive Question’s frequent information operations, ideas, and strategies.
Realizing concerning the Dataset
On this article, we shall be utilizing Udemy Programs Information 2023 dataset from Kaggle. The dataset may be downloaded by visiting https://www.kaggle.com/datasets/ankushbisht005/udemy-courses-data-2023. The aim behind utilizing this dataset is to establish not too long ago printed and modified programs on Udemy, classify programs based mostly heading in the right direction length and course rankings, establish common rankings of all of the programs of an teacher, classify programs based mostly on the variety of lectures within the course, and many others.
The Udemy Programs Information 2023 dataset has two recordsdata named programs.csv and instructors.csv. The programs.csv incorporates data associated to the Udemy programs. The instructors.csv incorporates the knowledge associated to the Udemy instructors. The programs.csv incorporates 11 columns and 83,105 rows. The instructors.csv incorporates 10 columns and 32,234 rows. The programs.csv incorporates the instructors_id column, which supplies the id of the trainer of the course. The instructors_id column is used to type the relation between programs.csv and instructors.csv.
![Knowing about the dataset](https://av-eks-blogoptimized.s3.amazonaws.com/Source_Kaggle-thumbnail_webp-600x300.png)
The programs.csv incorporates the distinctive id of the course, the course title, course ranking, course length, the variety of lectures within the Udemy course, the URL of the course, the creation date of the course, the date on which the course was final modified, variety of opinions of the course and id of the course teacher. The instructors.csv incorporates the distinctive id of the trainer, the identify of the course teacher, the show identify of the course teacher, the title of the course teacher, the job title of the course teacher, the trainer class, the URL of the trainer, initials of the course teacher, 50 X 50 picture of the trainer and 100 X 100 picture of the trainer. To study extra concerning the dataset, go to https://www.kaggle.com/datasets/ankushbisht005/udemy-courses-data-2023.
Strategy to the Undertaking
On this venture, we shall be utilizing Udemy Programs Information 2023 dataset from Kaggle to develop an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question, which is able to assist us to establish issues equivalent to classifying programs based mostly on teacher job titles, the common ranking of all of the programs of an teacher, classifying programs based mostly on the variety of lectures within the course, figuring out not too long ago printed and modified programs on Udemy, and many others.
Comply with the under steps to create an information warehouse utilizing Udemy Programs Information 2023 dataset from Kaggle:
Step 1: Create a New Undertaking utilizing Massive Question Sandbox
To work with Google Massive Question, builders can both create an account on the Google Cloud Platform or make the most of the Google Massive Question Sandbox. I’ll use Google Massive Question Sandbox on this article to create an information warehouse. The venture is used for organizing all of the Google cloud sources in GCP. Utilizing Identification and Entry Administration, we are able to specify which consumer is allowed to entry which sources in a venture.
Go to the under hyperlink to make use of the Google Massive Question Sandbox: https://console.cloud.google.com/bigquery
Now, observe the steps described under:
1. Click on on NEW PROJECT, then Present the Undertaking Title as Udemy-Undertaking and Location on the following display. Click on CREATE.
![Step 1: Create a new Project using Big Query Sandbox](https://av-eks-blogoptimized.s3.amazonaws.com/i1_4UH0cLU-thumbnail_webp-600x300.png)
!["](https://av-eks-blogoptimized.s3.amazonaws.com/i2_vAU0ZYN-thumbnail_webp-600x300.png)
2. Udemy-Undertaking is efficiently created. Choose the Udemy-Undertaking to view the venture and handle consumer permissions and sources contained in the venture.
![Google Big Query | trends](https://av-eks-blogoptimized.s3.amazonaws.com/i3_vdB1k4W-thumbnail_webp-600x300.png)
Step 2: Obtain the Dataset from Kaggle and Reserve it on the Native Machine
Go to https://www.kaggle.com/datasets/ankushbisht005/udemy-courses-data-2023 and click on Obtain. After unzipping the downloaded zip file, you will discover two CSV recordsdata named programs.csv and instructors.csv. The programs.csv incorporates data associated to the Udemy programs. The instructors.csv incorporates the knowledge associated to the Udemy instructors. The programs.csv incorporates 11 columns and 83,105 rows. The instructors.csv incorporates 10 columns and 32,234 rows. The instructors_id column is used to type the relation between programs.csv and instructors.csv.
![Google Big Query](https://av-eks-blogoptimized.s3.amazonaws.com/Source_Kaggle_YPkFt08-thumbnail_webp-600x300.png)
Step 3: Creating Dataset Inside Google Massive Question Useful resource
Comply with the steps described under to create a dataset inside Google Massive Question:
1. Choose the identify of the Undertaking -> Massive Question within the sources card -> Click on Create dataset.
![Google Big Query](https://av-eks-blogoptimized.s3.amazonaws.com/i5_I19c0o0-thumbnail_webp-600x300.png)
2. Present Udemy_dataset as Dataset ID, select Area in Location Sort, select Asia-south1 (Mumbai) as Area, and allow desk expiration.
!["](https://av-eks-blogoptimized.s3.amazonaws.com/i6_iJhuO22-thumbnail_webp-600x300.png)
3. Click on CREATE DATASET
![CREATE DATASET](https://av-eks-blogoptimized.s3.amazonaws.com/i7_Zv8UBSN-thumbnail_webp-600x300.png)
Step 4: Create Tables within the Dataset Inside Google Massive Question Useful resource
Comply with the steps described under to create tables within the dataset inside Google Massive Question:
1. Choose Udemy_dataset dataset -> Create desk
![trends](https://av-eks-blogoptimized.s3.amazonaws.com/i8_C14rBkJ-thumbnail_webp-600x300.png)
2. Select to create desk from add, choose the programs.csv file downloaded from Kaggle, choose file format as CSV, present programs as desk identify, Native desk as a desk sort, select Auto to detect within the schema, and partition and cluster settings as per our necessities. Within the Advance choices, present 1 within the header rows to skip and select Encryption appropriate as per the requirement. Click on CREATE TABLE.
![Google big Query](https://av-eks-blogoptimized.s3.amazonaws.com/i9_Smyo9ig-thumbnail_webp-600x300.png)
3. Now, once more choose the Udemy_dataset dataset
-> Create desk. Select to create a desk from add, choose the instructors.csv file downloaded from Kaggle, choose file format as CSV, present instructors as desk identify, Native desk as a desk sort, select Auto to detect within the schema, and partition and cluster settings as per our necessities. Within the Advance choices, present 1 within the header rows to skip and select Encryption appropriate as per the requirement. Click on CREATE TABLE.
!["](https://av-eks-blogoptimized.s3.amazonaws.com/i10_eFCSeFG-thumbnail_webp-600x300.png)
Step 5: Verifying Tables Schema and Previewing Information
Go to the programs desk, and cross-verify the sphere identify, sort, and mode within the schema tab. View the row entry insurance policies of the programs desk and edit desk schema, if required. View the desk data within the DETAILS tab and edit the main points in case of corrections. We are able to additionally preview, copy, refresh, and share the info. Equally, go to the instructors’ desk, and cross-verify the sphere identify, sort, and mode within the schema tab. View the row entry insurance policies of the instructors’ desk and edit the desk schema if required.
!["](https://av-eks-blogoptimized.s3.amazonaws.com/i11_BQqVogu-thumbnail_webp-600x300.png)
!["](https://av-eks-blogoptimized.s3.amazonaws.com/i12_nMYEj83-thumbnail_webp-600x300.png)
Step 6: Exploring Udemy Course Developments and Insights by Querying the Information
To see 5000 data from the programs desk, execute the under question within the SQL question engine:
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs` LIMIT 5000
![trends](https://av-eks-blogoptimized.s3.amazonaws.com/i1_JxOczg3-thumbnail_webp-600x300.png)
To see 5000 data from the instructors’ desk, execute the under question within the SQL question engine:
SELECT * FROM `udemy-project-381211.Udemy_dataset.instructors` LIMIT 5000
![trends](https://av-eks-blogoptimized.s3.amazonaws.com/i2_Rf2jmeu-thumbnail_webp-600x300.png)
A. Discover the title of all programs whose rankings are better than 4.5 and greater than 10000 individuals has given the ranking for these programs. Show these programs in reducing order after all rankings and creation date.
SELECT title AS course_title FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE ranking>4.5 and num_reviews>10000
ORDER BY ranking DESC, created DESC
!["](https://av-eks-blogoptimized.s3.amazonaws.com/i3_B5Ih5KB-thumbnail_webp-600x300.png)
B. Discover the main points of the ten newly created Udemy programs.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
ORDER BY created DESC
LIMIT 10
![Google big Query](https://av-eks-blogoptimized.s3.amazonaws.com/i4_jOMoLK3-thumbnail_webp-600x300.png)
C. Discover the main points of the ten not too long ago modified Udemy programs.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
ORDER BY last_update_date DESC
LIMIT 10
!["](https://av-eks-blogoptimized.s3.amazonaws.com/ic_dTL5ydt-thumbnail_webp-600x300.png)
D. Discover the main points of the JavaScript programs whose rankings are better than 4 and greater than 20000 individuals have given the ranking for these programs.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE title LIKE '%JavaScript%' AND
ranking>4 AND num_reviews>20000
!["](https://av-eks-blogoptimized.s3.amazonaws.com/id_o3eL1DM-thumbnail_webp-600x300.png)
E. Show the title, ranking, and variety of lectures of the Udemy React programs which has better than 50-course lectures.
SELECT title AS course_title, ranking AS course_rating, num_published_lectures as course_lectures
FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE title LIKE '%React%' AND
num_published_lectures>50
!["](https://av-eks-blogoptimized.s3.amazonaws.com/ie_44gDxj9-thumbnail_webp-600x300.png)
F. Discover the variety of programs, and course teacher identify developed by the course instructors with course rankings better than common rankings of the programs.
SELECT COUNT(programs.id), instructors.identify
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.instructors_id IN
(SELECT instructors_id FROM `Udemy_dataset.programs`
WHERE ranking >(SELECT AVG(ranking) FROM `Udemy_dataset.programs`))
GROUP BY instructors.identify
!["](https://av-eks-blogoptimized.s3.amazonaws.com/if_qBdTiFq-thumbnail_webp-600x300.png)
G. Show the course teacher identify and title of the Udemy programs created by individuals whose job title is an online developer
and whose course rankings are better than 4.2.
SELECT instructors.display_name, programs.title as course_title
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE instructors.job_title LIKE '%Internet developer%' and programs.ranking>4.2
![trends](https://av-eks-blogoptimized.s3.amazonaws.com/ig_Py2AhKR-thumbnail_webp-600x300.png)
H. Show the course title, course teacher identify, rankings, and course length of the Udemy programs the place the course length is larger than 40 minutes, 40 hours, or 40 questions.
SELECT programs.title as course_title,
instructors.display_name as course_instructor, programs.ranking, programs.length
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE
CASE WHEN programs.length LIKE '%.%'
THEN CAST(LEFT(programs.length, STRPOS(programs.length,'.')-1) AS FLOAT64)>40
WHEN programs.length LIKE '%whole%'
THEN CAST(LEFT(programs.length, STRPOS(programs.length,'t')-1) AS FLOAT64)>40
WHEN programs.length LIKE '%ques%'
THEN CAST(LEFT(programs.length, STRPOS(programs.length,'q')-1) AS FLOAT64)>40
END
!["](https://av-eks-blogoptimized.s3.amazonaws.com/ih_GsMnwZR-thumbnail_webp-600x300.png)
I. Show the course teacher identify and title of the Udemy programs created by licensed builders.
SELECT programs.title as course_title, instructors.display_name as course_instructor
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE instructors.job_title LIKE '%licensed%'
![Google big Query](https://av-eks-blogoptimized.s3.amazonaws.com/ii_nEHAdQC-thumbnail_webp-600x300.png)
J. Discover all of the distinct job titles of Udemy course instructors.
SELECT DISTINCT instructors.job_title
FROM `Udemy_dataset.instructors` instructors
![Google big Query | trends](https://av-eks-blogoptimized.s3.amazonaws.com/ij_JnJxzU1-thumbnail_webp-600x300.png)
Ok. Discover the title, rankings, and teacher of all programs whose rankings are better than 4 and greater than 17000 individuals have given the ranking for these programs. Show these programs in reducing order after all rankings.
SELECT programs.title as course_title, instructors.display_name as course_instructor, programs.ranking
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.ranking > 4 and programs.num_reviews > 17000
ORDER BY programs.ranking DESC
![Google big Query | trends](https://av-eks-blogoptimized.s3.amazonaws.com/ik_FtDllkr-thumbnail_webp-600x300.png)
L. Discover the main points of the 20 newly created Azure Udemy programs.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE title LIKE '%Azure%'
ORDER BY created DESC
LIMIT 20
![Google big Query](https://av-eks-blogoptimized.s3.amazonaws.com/il_6m5Lzl0-thumbnail_webp-600x300.png)
M. Discover the main points of the 15 newly created AWS Udemy programs.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE title LIKE '%AWS%'
ORDER BY created DESC
LIMIT 15
![Google big Query | trends](https://av-eks-blogoptimized.s3.amazonaws.com/im_Mk94MZj-thumbnail_webp-600x300.png)
N. Show all the main points of the Udemy SAS programs which have course lectures between 112 and 156 in growing order after all title.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE title LIKE '%SAS %' AND
num_published_lectures BETWEEN 112 AND 156
ORDER BY title
![Google big Query](https://av-eks-blogoptimized.s3.amazonaws.com/in_kxEE8XY-thumbnail_webp-600x300.png)
O. Show the course teacher identify, title, rankings, and the course opinions of the highest two Udemy Azure Information Manufacturing unit programs based mostly heading in the right direction rankings and the variety of course opinions.
SELECT programs.title as course_title,
instructors.display_name as course_instructor, programs.ranking, programs.num_reviews
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.title LIKE '%Azure Information Manufacturing unit %'
ORDER BY programs.num_reviews DESC, programs.ranking DESC
LIMIT 2
![Google big Query](https://av-eks-blogoptimized.s3.amazonaws.com/io_tleVWuw-thumbnail_webp-600x300.png)
P. Show the course teacher identify, title, rankings, and course opinions of the most effective Udemy Salesforce course based mostly heading in the right direction rankings and the variety of course opinions.
SELECT programs.title as course_title, instructors.display_name as course_instructor,
programs.ranking, programs.num_reviews
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.title LIKE '%Salesforce %'
ORDER BY programs.num_reviews DESC, programs.ranking DESC
LIMIT 1
![Google big Query](https://av-eks-blogoptimized.s3.amazonaws.com/ip_rKxlp2h-thumbnail_webp-600x300.png)
Key Developments and Insights Found Whereas Exploring the Udemy Programs Information
From the above, we all know find out how to construct an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question. Under are some key tendencies and insights found whereas exploring the Udemy programs information:
1. The most well-liked JavaScript programs have a median ranking better than 4.6.
2. Solely 34 Udemy programs are created by instructors whose job title is an online developer and whose course rankings are better than 4.2.
3. Nearly 150 Udemy programs are created by AWS, Azure, GCP, or Salesforce-certified builders.
4. Ramesh Retnasamy creates the preferred Azure Information Manufacturing unit course on Udemy.
5. Lately created Azure and AWS programs are very fashionable on Udemy.
6. Udemy customers desire to enroll in SAS programs with about 100-150 lectures with good rankings.
Conclusion
On this article, now we have seen find out how to construct an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question. An information warehouse shops and analyze a considerable amount of information and produce helpful information insights with the assistance of knowledge visualizations and stories. Now we have seen find out how to create a desk by importing information from Kaggle in Google Massive Question. We additionally perceive find out how to create relationships between tables to grasp information higher. We checked out find out how to analyze the info with the assistance of queries to get significant perception from the info. Under are the key takeaways from the above article:
- Now we have seen how we are able to create tables in Google Massive Question.
- We understood find out how to question information within the Massive Question SQL question engine.
- Now we have additionally recognized particulars of the Udemy programs created by individuals whose job title is an online developer and whose course rankings are better than 4.2.
- Now we have additionally seen what number of programs on Udemy are created by licensed builders.
- Now we have discovered the newly created Azure and AWS programs on Udemy foundation the tendencies.
- Other than that, now we have additionally seen different course tendencies on Udemy by exploring Udemy information contained in the SQL question engine.
The media proven on this article is just not owned by Analytics Vidhya and is used on the Creator’s discretion.