GCP Data Engineer
|
|
1. Introduction to GCP, Python Basics, and Linux Fundamentals |
· Introduction to Google Cloud Platform (GCP) |
· Overview of GCP services |
· Creating a GCP account and setting up the project |
· Python Basics |
· Variables, data types, and operators |
· Control structures: if-else, loops, and functions |
· Working with Python libraries |
· Linux Fundamentals |
· Linux distributions and installation |
· Basic Linux commands |
· File system structure and navigation |
2. Advanced Python, Linux Concepts, and Introduction to DBMS |
· Python |
· Working with data structures: lists, tuples, dictionaries, and sets |
· Exception handling |
· File handling and operations |
· Linux |
· File permissions and ownership |
· Text editors: Vim and Nano |
· Basic shell scripting |
· Introduction to Database Management Systems (DBMS) |
· Types of DBMS: Relational, NoSQL, and NewSQL |
· Overview of SQL and NoSQL databases |
· Database design and normalization |
3. Python Libraries for Data Engineering and GCP Introduction |
· NumPy |
· Introduction to NumPy and arrays |
· Array operations and functions |
· pandas |
· Introduction to pandas and data manipulation |
· Series and DataFrame objects |
· Handling missing data and data cleaning |
· Introduction to GCP Data Engineering Services |
· Cloud Storage |
· BigQuery |
4. SQL and NoSQL Databases |
· SQL Databases |
· SQL syntax: SELECT, INSERT, UPDATE, DELETE, JOIN, and GROUP BY |
· Indexes and performance optimization |
· Database transactions and ACID properties |
· NoSQL Databases |
· Overview of NoSQL database types: Document, Key-Value |
5. Data Processing with Dataflow, Apache Beam, and Dataproc |
· Introduction to Dataflow and Apache Beam |
· Understanding Dataflow and Apache Beam concepts |
· Building data pipelines with Apache Beam |
· Python SDK for Apache Beam |
· Installation and setup |
· Creating simple pipelines |
· Windowing and time-based processing |
· Introduction to Dataproc |
· Overview of Hadoop, Spark, and the Hadoop ecosystem |
· Creating and managing Dataproc clusters |
· Running Hadoop and Spark jobs on Dataproc |
· Submitting jobs using the Cloud SDK |
· Monitoring job progress |
6. Data Transformation, Integration with DBMS, and Final Project |
· Data Transformation with Cloud Data Fusion |
· Introduction to Cloud Data Fusion |
· Overview and use cases |
· Creating and managing instances |
· Building data pipelines with Cloud Data Fusion |
· Pipeline design and development |
· Deploying and monitoring pipelines |
· Integration with SQL and NoSQL Databases |
· Connecting to SQL databases (Cloud SQL) using Python |
· Connecting to NoSQL databases (Firestore, MongoDB, Cassandra) using Python |
· Data import and export between GCP services and external databases |
· Final Project |
· Project briefing and requirements |
· Project implementation |
· Project presentation and evaluation |
7. Data Ingestion, Real-Time Data Processing, and Machine Learning |
· Data Ingestion using Cloud Pub/Sub |
· Introduction to Cloud Pub/Sub |
· Creating topics and subscriptions |
· Publishing and consuming messages using Python |
· Real-Time Data Processing with Cloud Dataflow and Apache Beam |
· Real-time data processing concepts |
· Streaming vs. batch processing |
· Windowing and watermarks |
· Real-time data processing with Apache Beam |
· Developing streaming pipelines |
· Integrating with Cloud Pub/Sub |
· Machine Learning with BigQuery ML and AI Platform |
.