AWS
Data & Analytics
Cập nhật 23/06/2026
- #aws
- #analytics
Data & Analytics
Maarek SAA-C03 Slides v45 — Chapter 20. Personal study extract.
Key content
- Data & Analytics
- Amazon Athena
- reporting/dashboards
- query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...
- S3 Bucket
- load data
- Amazon
- Athena
- Query & Analyze
- Amazon
- QuickSight
- Reporting & Dashboards
- Amazon Athena – Performance Improvement
- /<PARTITION_COLUMN_NAME>=<VALUE>
- /<PARTITION_COLUMN_NAME>=<VALUE>
- /<PARTITION_COLUMN_NAME>=<VALUE>
- /etc…
- Amazon Athena – Federated Query
- data stored in relational, non-relational,
- object, and custom data sources (AWS
- or on-premises)
- on AWS Lambda to run Federated
- Queries (e.g., CloudWatch Logs,
- DynamoDB, RDS, …)
- Amazon
- Athena
- Database
- (On-Premises)
- S3 Bucket
- Lambda
- (Data Source
- Connector) ElastiCache
- DocumentDB
- DynamoDB
- Redshift
- HBase in EMR
- MySQL Aurora SQL Server
- Redshift Overview
- warehousing)
- Redshift Cluster
- planning, results
- aggregation
- performing the queries,
- send results to leader
- in advance
- for cost savings
- Query
- SELECT COUNT (*), …
- FROM MY_TABLE
- GROUP BY …
- Amazon Redshift Cluster
- JDBC/ODBC
- Leader Node
- Compute Nodes
- Redshift – Snapshots & DR
- clusters
- stored internally in S3
- changed is saved)
- schedule. Set retention between 1 to 35 days
- automatically copy snapshots (automated or
- manual) of a cluster to another AWS Region
- Region
- (us-east-1)
- Redshift Cluster
- (Original)
- Cluster
- Snapshot
- Take Snapshot
- Region
- (eu-west-1)
- Redshift Cluster
- (New)
- Copied
- Snapshot
- Restore
- Automated
- / Manual
- Copy
- Loading data into Redshift:
- Large inserts are MUCH better
- Amazon Kinesis
- Data Firehose
- S3 using COPY command
- Amazon Kinesis
- Data Firehose
- Amazon Redshift
- Cluster
- (through S3 copy)
- Amazon Redshift
- Cluster
- S3 Bucket
- (mybucket)
- copy customer
- from 's3://mybucket/mydata'
- iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
- Internet
- Without Enhanced VPC Routing
- With Enhanced VPC Routing
- Through VPC
- EC2 Instance
- JDBC driver
- Amazon Redshift
- Cluster
- EC2 Instance
- Better to write
- Data in batches
- Redshift Spectrum
- S3 without loading it
- available to start the query
- to thousands of Redshift
- Spectrum nodes
- Query
- SELECT COUNT (*), …
- FROM S3.EXT_TABLE
- GROUP BY …
- Amazon Redshift Cluster
- JDBC/ODBC
- Leader Node
- Compute Nodes
- 1 2 …. N
- Redshift Spectrum
- Amazon S3
- Amazon OpenSearch Service
- OpenSearch patterns
- DynamoDB
- DynamoDB Table DynamoDB Stream Lambda Function Amazon OpenSearch
- API to retrieve items API to search items
- CRUD
- OpenSearch patterns
- CloudWatch Logs
- CloudWatch Logs Subscription Filter Lambda Function
- (managed by AWS)
- Amazon OpenSearch
- CloudWatch Logs Subscription Filter Kinesis Data Firehose Amazon OpenSearch
- Real time
- Near Real Time
- OpenSearch patterns
- Kinesis Data Streams & Kinesis Data Firehose
- Kinesis Data
- Streams
- Kinesis Data
- Firehose
- (near real time)
- Amazon
- OpenSearch
- data
- transformation
- Lambda
- Function
- Amazon
…186 more lines in source.
Study checklist
- Read chapter once in English (no full translation)
- Add 7–10 terms →
/admin/aws-english/vocab - Practice 5 questions →
/admin/aws-english/reader(tags: aws, analytics) - SRS review →
/flashcards/aws-english