Azure Data Engineer
1. Introduction to Azure, Python Basics, and Linux Fundamentals |
· Introduction to Azure |
· Overview of Azure services |
· Creating an Azure account and setting up a project |
· Python Basics |
· Variables, data types, and operators |
· Control structures: if-else, loops, and functions |
· Working with Python libraries |
· Linux Fundamentals |
· Linux distributions and installation |
· Basic Linux commands |
· File system structure and navigation |
2. Advanced Python, Linux Concepts, and Introduction to DBMS |
· Python |
· Working with data structures: lists, tuples, dictionaries, and sets |
· Exception handling |
· File handling and operations |
· Linux |
· File permissions and ownership |
· Text editors: Vim and Nano |
· Basic shell scripting |
· Introduction to Database Management Systems (DBMS) |
· Types of DBMS: Relational, NoSQL, and NewSQL |
· Overview of SQL and NoSQL databases |
· Database design and normalization |
3. Python Libraries for Data Engineering and Azure Introduction |
· NumPy |
· Introduction to NumPy and arrays |
· Array operations and functions |
· pandas |
· Introduction to pandas and data manipulation |
· Series and DataFrame objects |
· Handling missing data and data cleaning |
· Introduction to Azure Data Engineering Services |
· Azure Blob Storage |
· Azure Data Lake Storage |
· Azure Synapse Analytics |
4. SQL and NoSQL Databases |
· SQL Databases |
· SQL syntax: SELECT, INSERT, UPDATE, DELETE, JOIN, and GROUP BY |
· Indexes and performance optimization |
· Database transactions and ACID properties |
· NoSQL Databases |
· Overview of NoSQL database types: Document, Key-Value, Column-Family, and Graph |
· Introduction to popular NoSQL databases: MongoDB, Cassandra, Redis, and Neo4j |
· Use cases and trade-offs |
5. Data Processing with Azure Data Factory, Databricks, and HDInsight |
· Introduction to Azure Data Factory |
· Understanding Azure Data Factory concepts |
· Building data pipelines with Azure Data Factory |
· Azure Databricks |
· Introduction to Azure Databricks and Apache Spark |
· Creating and managing Databricks clusters |
· Running Spark jobs on Databricks |
· Azure HDInsight |
· Overview of Hadoop, Hive, and the Hadoop ecosystem |
· Creating and managing HDInsight clusters |
· Running Hadoop and Hive jobs on HDInsight |
6. Data Ingestion, Real-Time Data Processing, and Machine Learning |
· Introduction to Azure Event Hubs |
· Creating Event Hubs namespaces and instances |
· Publishing and consuming messages using Python |
· Real-Time Data Processing with Azure Stream Analytics |
· Introduction to Azure Stream Analytics |
· Developing streaming jobs |
· Integrating with Event Hubs and other sources |
· Machine Learning with Azure Machine Learning Service |
· Overview of Azure Machine Learning Service |
· Creating and managing workspaces |
· Training and deploying models |
· Serving predictions |
· Data Transformation, Integration with DBMS, and Final Project |
7. Data Transformation with Azure Data Factory and Mapping Data Flows |
· - Introduction to Mapping Data Flows |
· - Overview and use cases |
· - Creating and managing data flows in Azure Data Factory |
· - Data flow design and development |
· - Deploying and monitoring data flows |
8. Integration with SQL and NoSQL Databases |
· Connecting to SQL databases (Azure SQL Database) using Python |
· Connecting to NoSQL databases ( MongoDB) using Python |
· Data import and export between Azure services and external databases |
· Final Project |
· Project briefing and requirements |
· Project implementation |
· Project presentation and evaluation |