Tag Archives: aws

Overview of AWS for Business Analysts

23 Jan

Purpose of the blog

I’ve written this blog as a non-techie guide to AWS.

If you’ve ever been in a meeting where people mentioned ‘EC2‘, ‘ELB‘, ‘RDS‘ and gone … ‘WFH‘ then this blog post is for you.

It’ll provide:

  • an overview of AWS (what is it, why its popular, how its used)
  • typical AWS architecture for a project – including key terms (VPN, Regions, AZs)
  • cheat sheet of other key terms (EBS, EFS etc)

Overview of AWS

What is AWS

AWS is the most popular cloud platform in the world. It’s owned by Amazon & is almost as large as the next 2 cloud providers combined (Google Cloud + Microsoft’s Azure).

In a nutshell – AWS allows companies to use on-demand cloud computing from Amazon. Customers can easily access servers, storage, databases and a huge set of application services using a pay-as-you-go model.

TL;DR: AWS is a cloud platform (owned by Amazon) used by companies to host and manage services in the cloud.

Why companies use it

Historically companies have owned their own IT infrastructure (e.g. servers / routers / storage). This has an overhead in terms of maintenance. It meant companies had to pay large amounts of money to own their infrastructure – even if that infrastructure was barely used certain times (e.g. at 3am). Companies also struggled to ramp up the infrastructure if demand suddenly went up (e.g. viral video on a website).

AWS & the cloud in general helps companies with that situation. It has 5 main benefits:

  1. Pay for what you use
  2. Scale the infrastructure to meet the demand
  3. Resiliency (e.g. if a data centre goes down)
  4. Cheaper (by leveraging the purchasing scale of Amazon)
  5. Removes the need to own and manage your own data centres

TL;DR: AWS allows companies to only pay for the infrastructure they use. It also allows companies to quickly ramp up & ramp down infrastructure depending on demand.

How companies use it

There’s 3 main cloud computing models. Most companies use IaaS.

  1. Infrastructure as a Service (IaaS) – provides access to networking features, computers (virtual or dedicated hardware) and data storage. This provides the greatest flexibility as you control the software / IT resources. With this model you get the kit but you manage it
  2. Platform as a Service (PaaS) – removes the need for your organisation to manage the infrastructure (hardware and operating systems). You don’t have to worry about software updates, resource procurement & capacity planning. With this model there’s even less to do – you just deploy / manage your own application (e.g. your website code)
  3. Software as a Service (SaaS) – provides you with a product that is run and managed by AWS. In this model you don’t need to worry about the infrastructure OR the service

If Amazon provides a suitable managed service, then it’s often cheaper to use PaaS rather than IaaS – because you don’t need to build and manage the service yourself.

A note about cloud deployment models …. broadly speaking there’s two models & most companies operate as “Hybrid“:

  1. Cloud = application is fully deployed in the cloud. All parts of the application run in the cloud
  2. Hybrid = connects infrastructure & applications between cloud-based resources and non-cloud based resources. This is typically used when legacy resources were built on-prem & its too complex to move them (e.g. part of an application was built years ago), or because the company doesn’t want certain information in the cloud (e.g. privileged customer information)

TL;DR: Most companies use AWS to provision infrastructure (IaaS). Amazon also offer PaaS and SaaS. PaaS means Amazon manage the platform (e.g. hardware / OS). SaaS means Amazon provides the product / service as well as the infrastructure.

Typical architecture

Region / Availability Zone

AWS has multiple Regions around the world. A Region is a geographic location (e.g. London, Ireland). You will typically deploy your application to one Region (e.g. London).

An Availability Zone is a data centre. A Region will have multiple Availability Zones. This means if one Availability Zone (AZ) fails, the other one(s) will keep running so you have resiliency. If you deploy to the London region – you will be in 3 AZs.

TL;DR Your application is likely to be hosted in 1 Region (London). Across 3 Availability Zones

VPC / subnet

A VPC (Virtual Private Cloud) is your own chunk of the cloud. It allows you to create your own network in the cloud.

Essentially a VPC is a subsection of the cloud – allowing you more control. You control what traffic goes in and out of the network.

A VPC sits at the region level. You can leverage any of the Availability Zones to create your virtual machines (e.g. EC2 instances) and other services.

Within a VPC you can create subnets – which are isolated parts of the network. You can create many subnets in an AZ. Subnets are just a way to divide up your VPC. A subnet exists at the AZ level. You can have public or private subnets (or both).

The main AWS Services inside a VPC are: EC2, RDS, ELB. Although most things can now sit in a VPC.

TL;DR: You’ll likely have 1 VPC (Virtual Private Cloud) in London & it will span all 3 AZs. A VPC gives your company an isolated part of AWS. You will create subnets to break-up the VPC into smaller chunks.

Internet Gateway = configures incoming and outgoing traffic to your VPC. It’s attached to the VPC & allows it to communicate with the Internet.

Route Table = Each VPC has a route table which makes the routing decision. Used to determine where network traffic is directed.

NACL = Acts as a firewall at the subnet level. Controls traffic coming in and out of a subnet. You can associate multiple subnets with a single NACL. There are 2 levels of firewall in a VPC: Network access control list (NACL) = at a subnet level. Security group = At an EC2 instance level.

Subnet = a subnetwork inside a VPC. It exists in 1 AZ. You can assign it an IP range & it allows you to control access to resources (e.g. you could create a private subnet for a DB and ensure its only accessible by the VPC).

NAT (not represented in the diagram) = Network address translation. NATs are devices which sit on the public subnet and can talk to the Internet on behalf of EC2 which are on private instances.

Every VPC comes with a private IP address range which is called CIDR (classless inter-domain routing). A VPC comes with a default local router that routes the traffic within a VPC.

Key concepts

EC2 / EBS / AMI – server, storage, machine image

Elastic Compute Cloud (EC2) is a virtual machine in the cloud. You can run applications on it. It’s a bit like having a computer. It’s at an AZ level.

You install an image on the EC2 instance (e.g. Windows or Linux) & chose the size (CPU / memory / storage).

Storage is not persisted on an EC2 (e.g. if you delete an EC2 instance the storage is lost), so you will need EBS.

EBS = Elastic Block Storage. It’s like a hard drive & is local to an EC2 instance. This means it’s at an AZ level. You use it for storing things like the EC2 Operating System. It behaves like a raw, unformatted block device & is used for persistent storage.

AMI = Amazon Machine Image. A template that contains the software configuration (e.g. OS, application, server) required to launch your EC2 instance.

TL;DR You will spin up EC2 instances on your subnets. EC2 instances are like computers (with OS, CPU, memory storage)& you can run your application on them. EBS is storage attached to an EC2 instance. AMI is a template for launching EC2 instances.

ELB, Autoscaling & CloudWatch – load balancing, scaling, monitoring

Elastic Load Balancer (ELB) allows you to balance incoming traffic across multiple EC2 instances. It allows you to route traffic across EC2 instances so that they’re not overwhelmed.

Autoscaling adds capacity on the fly to ELB. Autoscaling increases or decreases the number of EC2 instances based on a scaling policy. Autoscaling will increase instances when a threshold value is exceeded and remove instances when they are not being utilised.

Cloudwatch is a monitoring service. It monitors the health of resources and applications. If an action is to be taken it will trigger the appropriate resources via alarms. Cloudwatch triggers the autoscaling.

TL;DR: Elastic Load Balancer (ELB) distributes traffic across your existing EC2 instances. Cloudwatch monitors the service & triggers autoscaling. Autoscaling will perform scaling up or down of EC2 instances.

IAM – access management

IAM = Identity and Access Management. This is where you manage access to AWS resources (e.g. S3 bucket) & the actions that can be performed (e.g. create S3 bucket). It’s commonly used to manage users, groups, IAM Access Policies & roles. You can use IAM roles for example to grant applications permissions to AWS resources.

IAM is set at a global level (above region level – essentially at an AWS account level).

TL;DR: IAM is where you manage access to computing, storage, database & application services. You can decide what resources a user or application can access, and what actions they can perform.

ELK – analytics, data processing & visualisation

ELK Elasticsearch + Logstash + Kibana. It’s often used to aggregate and analyse the logs from all your systems.

Elasticsearch is a search and analytics engine. Logstash is used for data processing; Logstash ingests data from multiple sources, transforms it & sends it to Elasticsearch. Kibana lets you view data with charts and graphs. Here’s an example from Kibana:

Elastic Stack is the next evolution of ELK. It includes Beats:

  • Beats = lightweight, single purpose data shippers. Sits on your server and sends data to Logstash or Elasticsearch
  • Example Beats include: Filebeat (ships logs and other data), Metricbeat (ships metric data), Packetbeat (ships network data)

As a note – there is an Amazon-managed elastic service called ‘Amazon OpenSearch Service’.

TL;DR ELK lets you analyse logs and visualise them on a dashboard. You can see errors, volumes, performance (& more) for your service. Elastic Stack is ELK + Beats (data shippers).

Bringing it all together

Example 1 – VPC in 1 region, 3 AZs, with multiple subnets

Here we have a VPC spanning 3 AZs. This VPC could be in the London Region.

To segment the VPC into smaller networks – they have setup private and public subnets. Each subnet is likely to have EC2 instances / DB instances in them.

Example 2 – VPC in 1 region, 2 AZs, with multiple subnets (EC2 and DB instances)

In this example you have a VPC in 1 Region across 2 AZs. You can see that they’ve setup public subnets (to connect to the Internet) and private subnets (for EC2 instances and to host a DB with private information). The IGW (Internet Gateway) is attached to the VPC; the Internet Gateway is controlling incoming & outgoing traffic and allows the VPC to communicate with the Internet.

There is an Elastic Load Balancer (ELB) which is being used to balance incoming traffic across EC2 instances – so that the EC2 instances are not overwhelmed. It’s not shown here – but they may also be using Cloudwatch and Autoscaling to increase / decrease the number of EC2 instances depending on traffic.

Example 3 – VPC that’s extending out to an S3 bucket

This is a more detailed version of example 2. In this example you can see they’re connecting to an S3 bucket (let’s say to upload and download photos). Because the S3 bucket is available on the internet – the EC2 instance could go via the public subnet (via NAT > IGW > S3). However they’ve put a VPC (Endpoint) Gateway inplace.

If you have your S3 bucket or Dynamo DB in the same region you can use the “VPC Endpoint service” to reach them without going via the internet. If you want to access any other AWS services (e.g. SQS, Cloudwatch, SNS, SES) that’s not S3 or Dynamo DB – you can use “VPC endpoint interfaces”.

Example 4 – Multiple VPCs, VPC peering, transit gateway, VPN tunnels and direct connects

Looking at the right hand side of the image. In this design there are multiple VPCs.

One big application may be across multiple VPCs. VPC peering allows one VPC to talk to another using a dedicated and private network. They can be in the same AWS region or a different AWS region. It means you don’t have to talk over public internet but via AWS managed connectivity. HOWEVER this is VPC-to-VPC and if you have many VPCs this becomes complex because its 1:1 connection between VPCs.

If you want to connect hundreds of VPCs you can use a transit gateway. With this design all VPCs connect to a transit gateway + the transit gateway can connect to any VPC (it acts like a hub).

There is a 3rd way to connect a VPC to another VPC – if you don’t want to expose all the machines in one VPC (e.g. if its a SAAS product). It’s not represented in this diagram but if you only want to expose 1 service you can use “private link”. Which allows the Network load balancer of one VPC to connect to the VPC Endpoint Interface.

Finally – in the bottom right you can see a Virtual Private Gateway. This allows your VPC to connect to your on-prem network or your on-prem data centre. It can enable connectivity using VPN tunnels or a dedicated connection called AWS direct connect (the latter gives more bandwidth reliability). Essentially its used for hybrid connectivity – where some of your workloads are on premise & some are in AWS.

Appendices – more information

Note on storage options – EBS, S3, EFS

There are several storage options (https://aws.amazon.com/products/storage/). Three examples are:

EBS (Elastic Block Storage) = Block Storage. It can only be used by EC2 instances & is local to the EC2 instance. It’s like a hard drive & used for things like the EC2 Operating System. Exists at an AZ level.

S3 (Simple Storage Service) = Object Storage. Essentially a bucket where you can store things – S3 can be accessed over the internet. S3 is flat storage (there’s no hierarchy). It offers unlimited storage. Used for uploading and sharing files like images/videos, log files & data backups etc.

EFS (Elastic File System) = File Storage. It’s shared between EC2 instances. It allows a hierarchical structure. It’s at a region level and can be accessed across multiple AZs. Used for web serving, data analytics etc.

Note on DB options

There are eleven database services (https://aws.amazon.com/products/databases/). These include:

  • RDS = Service for relational databases
  • DynamoDB = NoSQL DB
  • ElastiCache = Used for DB caching (Redis and Memcached engines)

Note on caching

Cloudfront is the AWS CDN Service. It means static content (e.g. video or images) can be cached at each location to reduce latency. It stores data in ‘edge locations’.

Note on global / region / VPC / AZ level

Some AWS services are at an account level. IAM(Identity and access management), billing, Route 53. They are global and affect all regions & all services that work below it.

Some are at a region level e.g. S3, CDN, Dyanmo DB, SNS, API Gateway, Lambda. These services are managed by AWS – they’re in your region but not in your VPC.

Some are at a VPC level

Some are at a AZ level e.g. EC2, RDS.