AWS

Data & Analytics

Cập nhật 23/06/2026

  • #aws
  • #analytics

Data & Analytics

Maarek SAA-C03 Slides v45 — Chapter 20. Personal study extract.

Key content

  • Data & Analytics
  • Amazon Athena
  • reporting/dashboards
  • query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...
  • S3 Bucket
  • load data
  • Amazon
  • Athena
  • Query & Analyze
  • Amazon
  • QuickSight
  • Reporting & Dashboards
  • Amazon Athena – Performance Improvement
  • /<PARTITION_COLUMN_NAME>=<VALUE>
  • /<PARTITION_COLUMN_NAME>=<VALUE>
  • /<PARTITION_COLUMN_NAME>=<VALUE>
  • /etc…
  • Amazon Athena – Federated Query
  • data stored in relational, non-relational,
  • object, and custom data sources (AWS
  • or on-premises)
  • on AWS Lambda to run Federated
  • Queries (e.g., CloudWatch Logs,
  • DynamoDB, RDS, …)
  • Amazon
  • Athena
  • Database
  • (On-Premises)
  • S3 Bucket
  • Lambda
  • (Data Source
  • Connector) ElastiCache
  • DocumentDB
  • DynamoDB
  • Redshift
  • HBase in EMR
  • MySQL Aurora SQL Server
  • Redshift Overview
  • warehousing)
  • Redshift Cluster
  • planning, results
  • aggregation
  • performing the queries,
  • send results to leader
  • in advance
  • for cost savings
  • Query
  • SELECT COUNT (*), …
  • FROM MY_TABLE
  • GROUP BY …
  • Amazon Redshift Cluster
  • JDBC/ODBC
  • Leader Node
  • Compute Nodes
  • Redshift – Snapshots & DR
  • clusters
  • stored internally in S3
  • changed is saved)
  • schedule. Set retention between 1 to 35 days
  • automatically copy snapshots (automated or
  • manual) of a cluster to another AWS Region
  • Region
  • (us-east-1)
  • Redshift Cluster
  • (Original)
  • Cluster
  • Snapshot
  • Take Snapshot
  • Region
  • (eu-west-1)
  • Redshift Cluster
  • (New)
  • Copied
  • Snapshot
  • Restore
  • Automated
  • / Manual
  • Copy
  • Loading data into Redshift:
  • Large inserts are MUCH better
  • Amazon Kinesis
  • Data Firehose
  • S3 using COPY command
  • Amazon Kinesis
  • Data Firehose
  • Amazon Redshift
  • Cluster
  • (through S3 copy)
  • Amazon Redshift
  • Cluster
  • S3 Bucket
  • (mybucket)
  • copy customer
  • from 's3://mybucket/mydata'
  • iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
  • Internet
  • Without Enhanced VPC Routing
  • With Enhanced VPC Routing
  • Through VPC
  • EC2 Instance
  • JDBC driver
  • Amazon Redshift
  • Cluster
  • EC2 Instance
  • Better to write
  • Data in batches
  • Redshift Spectrum
  • S3 without loading it
  • available to start the query
  • to thousands of Redshift
  • Spectrum nodes
  • Query
  • SELECT COUNT (*), …
  • FROM S3.EXT_TABLE
  • GROUP BY …
  • Amazon Redshift Cluster
  • JDBC/ODBC
  • Leader Node
  • Compute Nodes
  • 1 2 …. N
  • Redshift Spectrum
  • Amazon S3
  • Amazon OpenSearch Service
  • OpenSearch patterns
  • DynamoDB
  • DynamoDB Table DynamoDB Stream Lambda Function Amazon OpenSearch
  • API to retrieve items API to search items
  • CRUD
  • OpenSearch patterns
  • CloudWatch Logs
  • CloudWatch Logs Subscription Filter Lambda Function
  • (managed by AWS)
  • Amazon OpenSearch
  • CloudWatch Logs Subscription Filter Kinesis Data Firehose Amazon OpenSearch
  • Real time
  • Near Real Time
  • OpenSearch patterns
  • Kinesis Data Streams & Kinesis Data Firehose
  • Kinesis Data
  • Streams
  • Kinesis Data
  • Firehose
  • (near real time)
  • Amazon
  • OpenSearch
  • data
  • transformation
  • Lambda
  • Function
  • Amazon

…186 more lines in source.

Study checklist