Teleport
Storage Backends
Version preview- Older Versions
A Teleport cluster stores different types of data in different locations. By default everything is stored in a local directory on the Auth Service host.
For self-hosted Teleport deployments, you can configure Teleport to integrate with other storage types based on the nature of the stored data (size, read/write ratio, mutability, etc.).
Data type | Description | Supported storage backends |
---|---|---|
core cluster state | Cluster configuration (e.g. users, roles, auth connectors) and identity (e.g. certificate authorities, registered nodes, trusted clusters). | Local directory (SQLite), etcd, PostgreSQL, AWS DynamoDB, GCP Firestore |
audit events | JSON-encoded events from the audit log (e.g. user logins, RBAC changes) | Local directory, PostgreSQL, AWS DynamoDB, GCP Firestore |
session recordings | Raw terminal recordings of interactive user sessions | Local directory, AWS S3 (and any S3-compatible product), GCP Cloud Storage, Azure Blob Storage |
teleport instance state | ID and credentials of a non-auth teleport instance (e.g. node, proxy) | Local directory |
Cluster state
Cluster state is stored in a central storage location configured by the Auth Service. The cluster state includes:
- Agent and Proxy Service membership information, including offline/online status.
- List of active sessions.
- List of locally stored users.
- RBAC configuration (roles and permissions).
- Dynamic configuration.
There are two ways to achieve High Availability. You can "outsource" this function to the infrastructure. For example, using a highly available network-based disk volumes (similar to AWS EBS) and by migrating a failed VM to a new host. In this scenario, there's nothing Teleport-specific to be done.
If High Availability cannot be provided by the infrastructure (perhaps you're running Teleport on a bare metal cluster), you can still configure Teleport to run in a highly available fashion.
Teleport Enterprise Cloud takes care of this setup for you so you can provide secure access to your infrastructure right away.
Get started with a free trial of Teleport Enterprise Cloud.
Auth Service State
To run multiple instances of the Teleport Auth Service, you must switch to one of the high-availability storage backends listed below first.
Once you have a high-availability storage backend and multiple instances of
the Auth Service running, you'll need to create a load balancer to evenly
distribute traffic to all Auth Service instances and have a single point of
entry for all components that need to communicate with the Auth Service. Use the
address of the load balancer in the auth_server
field when
configuring other components of Teleport.
Configure your load balancer to use Layer 4 (TCP) load balancing, round-robin load balancing, and a 300 second idle timeout.
With multiple instances of the Auth Service running, special attention needs to
be paid to keeping their configuration identical. Settings like cluster_name
,
tokens
, storage
, etc. must be the same.
Proxy Service State
The Teleport Proxy is stateless which makes running multiple instances trivial.
If using the default configuration, configure your load
balancer to forward port 3080
to the servers that run the Teleport Proxy
Service. If you have configured your Proxy Service to not use TLS Routing
and/or are using non-default ports, you will need to configure your load
balancer to forward the ports you specified for listen_addr
,
tunnel_listen_addr
, and web_listen_addr
in teleport.yaml
.
Configure your load balancer to use Layer 4 (TCP) load balancing, round-robin load balancing, and a 300 second idle timeout.
If you terminate TLS with your own certificate for web_listen_addr
at your
load balancer you'll need to run Teleport with --insecure-no-tls
If your load balancer supports HTTP health checks, configure it to hit the
/readyz
diagnostics endpoint on
machines running Teleport. This endpoint must be enabled by using the
--diag-addr
flag to teleport start:
teleport start --diag-addr=0.0.0.0:3000
The /readyz
endpoint will
reply {"status":"ok"}
if the Teleport service is running without problems.
The endpoint must be exposed on a proxy interface for the load balancer health checks
to succeed. You should only do this on the proxy instances and ensure that
port 3000 is not exposed to the public internet, just the load balancers. For other services, continue to use
the 127.0.0.1 local loopback interface.
We'll cover how to use etcd
, PostgreSQL, DynamoDB, and Firestore storage
backends to make Teleport highly available below.
Etcd
Teleport can use etcd as a storage backend to
achieve highly available deployments. You must take steps to protect access to
etcd
in this configuration because that is where Teleport secrets like keys
and user records will be stored.
etcd
can only currently be used to store Teleport's internal database in a
highly-available way. This will allow you to have multiple Auth Service instances in your
cluster for an High Availability deployment, but it will not also store Teleport audit events
for you in the same way that DynamoDB or
Firestore will. etcd
is not designed to handle large volumes of time series data like audit events.
To configure Teleport for using etcd as a storage backend:
- Make sure you are using etcd versions 3.3 or newer.
- Follow etcd's cluster hardware recommendations. In particular, leverage SSD or high-performance virtualized block device storage for best performance.
- Install etcd and configure peer and client TLS authentication using the etcd
security guide.
- You can use this script provided by etcd if you don't already have a TLS setup.
- Configure all Teleport Auth Service instances to use etcd in the "storage" section of the config file as shown below.
- Deploy several Auth Service instances connected to etcd backend.
- Deploy several Proxy Service instances that have
auth_server
pointed to the Auth Service to connect to.
teleport:
storage:
type: etcd
# List of etcd peers to connect to:
peers: ["https://172.17.0.1:4001", "https://172.17.0.2:4001"]
# Required path to TLS client certificate and key files to connect to etcd.
#
# To create these, follow
# https://coreos.com/os/docs/latest/generate-self-signed-certificates.html
# or use the etcd-provided script
# https://github.com/etcd-io/etcd/tree/master/hack/tls-setup.
tls_cert_file: /var/lib/teleport/etcd-cert.pem
tls_key_file: /var/lib/teleport/etcd-key.pem
# Optional file with trusted CA authority
# file to authenticate etcd nodes
#
# If you used the script above to generate the client TLS certificate,
# this CA certificate should be one of the other generated files
tls_ca_file: /var/lib/teleport/etcd-ca.pem
# Alternative password-based authentication, if not using TLS client
# certificate.
#
# See https://etcd.io/docs/v3.4.0/op-guide/authentication/ for setting
# up a new user.
username: username
password_file: /mnt/secrets/etcd-pass
# etcd key (location) where teleport will be storing its state under.
# make sure it ends with a '/'!
prefix: /teleport/
# NOT RECOMMENDED: enables insecure etcd mode in which self-signed
# certificate will be accepted
insecure: false
# Optionally sets the limit on the client message size.
# This is usually used to increase the default which is 2MiB
# (1.5MiB server's default + gRPC overhead bytes).
# Make sure this does not exceed the value for the etcd
# server specified with `--max-request-bytes` (1.5MiB by default).
# Keep the two values in sync.
#
# See https://etcd.io/docs/v3.4.0/dev-guide/limit/ for details
#
# This bumps the size to 15MiB as an example:
etcd_max_client_msg_size_bytes: 15728640
PostgreSQL
PostgreSQL cluster state and audit log storage is available starting from
Teleport 13.3
.
Teleport can use PostgreSQL as a storage backend to achieve high availability. You must take steps to protect access to PostgreSQL in this configuration because that is where Teleport secrets like keys and user records will be stored. The PostgreSQL backend supports two types of Teleport data:
- Cluster state
- Audit log events
The PostgreSQL backend requires PostgreSQL 13 or later, and, for the cluster
state only, the wal2json
logical
decoding plugin. The plugin is available in packages for all stable versions in
the PostgreSQL Apt and
Yum repositories for Debian- and RPM-based Linux
distributions respectively, or it can be compiled following the
instructions provided in
its repository. The plugin is pre-installed with no extra steps to take in
Azure Database for
PostgreSQL.
Teleport needs separate databases for the cluster state and the audit log, and it will attempt to create them if given permissions to do so; it will also set up the database schemas as needed, so we recommend giving the user ownership over the databases.
The PostgreSQL backend for cluster state relies on the ability to use logical
decoding to get a
stream of changes from the database; because of that, the
wal_level
parameter must be set to logical
and
max_replication_slots
must be set to at least as many Teleport Auth Service instances as you'll be
running (a higher number is recommended, to account for network conditions).
The Teleport Auth Service needs to be able to create a replication slot when starting and when reestablishing a new connection to the PostgreSQL cluster, and any long-running transaction will prevent that. It's therefore only advisable to store the Teleport cluster state on a shared PostgreSQL cluster if the other workloads on the cluster only consist of short-lived transactions.
wal_level
can only be set at server start, so it should be set in
postgresql.conf
:
# the default value for wal_level is replica
wal_level = logical
# the default value for max_replication_slots is 10
max_replication_slots = 10
In addition, the database user must have the initiating replication
role
attribute. In the
psql
shell:
postgres=# CREATE USER new_user WITH REPLICATION;
CREATE ROLE
postgres=# ALTER ROLE existing_user WITH LOGIN REPLICATION;
ALTER ROLE
Since replication permissions allow for essentially full read access over the entire cluster (with a physical replication connection) or to all databases that the user can connect to, it's recommended to prevent the user from opening replication connections, or from connecting to databases other than the ones used for Teleport, if the PostgreSQL cluster is shared between Teleport and other applications.
For convenience, Teleport will attempt to grant itself the initiating replication
role attribute, to accommodate the ability of some managed services
(such as Azure Database for PostgreSQL) to create superuser accounts through
their API; this should only be leveraged if the entire PostgreSQL cluster is
dedicated to Teleport.
To configure Teleport to use PostgreSQL:
- Configure all Teleport Auth Service instances to use the PostgreSQL backend in the
storage
section ofteleport.yaml
as shown below. - Deploy several Auth Service instances connected to the PostgreSQL storage backend.
- Deploy several Proxy Service nodes.
- Make sure that the Proxy Service instances and all Teleport agent services that
connect directly to the the Auth Service have the
auth_server
configuration setting populated with the address of a load balancer for Auth Service instances.
teleport:
storage:
type: postgresql
# conn_string is a libpq-compatible connection string (see
# https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING);
# pool_max_conns is an additional parameter that determines the maximum
# number of connections in the connection pool used for the cluster state
# database (the change feed uses an additional connection), defaulting to
# a value that depends on the number of available CPUs.
#
# If your certificates are not stored at the default ~/.postgresql
# location, you will need to specify them with the sslcert, sslkey, and
# sslrootcert parameters.
conn_string: postgresql://user_name@database-address/teleport_backend?sslmode=verify-full&pool_max_conns=20
# In certain managed environments it can be necessary or convenient to
# use a different user or different settings for the connection used
# to set up and make use of logical decoding. If specified, Teleport
# will use the connection string in change_feed_conn_string for that,
# instead of the one in conn_string. Available in Teleport 13.4 and later.
change_feed_conn_string: postgresql://replication_user_name@database-address/teleport_backend?sslmode=verify-full
# An audit_events_uri with a scheme of postgresql:// will use the
# PostgreSQL backend for audit log storage; the URI is a libpq-compatible
# connection string just like the cluster state conn_string, but cannot be
# specified as key=value pairs. It's possible to specify completely
# different PostgreSQL clusters for cluster state and audit log.
#
# If your certificates are not stored at the default ~/.postgresql
# location, you will need to specify them with the sslcert, sslkey, and
# sslrootcert parameters.
audit_events_uri:
- postgresql://user_name@database-address/teleport_audit?sslmode=verify-full
Audit log events are periodically deleted after a default retention period of
8766 hours (one year); it's possible to select a different retention period or
to disable the cleanup entirely, by specifying the retention_period
or the
disable_cleanup
parameters in the fragment of the URI:
teleport:
storage:
audit_events_uri:
- postgresql://user_name@database-address/teleport_audit?sslmode=verify-full#disable_cleanup=false&retention_period=2160h
Authentication
We strongly recommend using client certificates to authenticate Teleport to PostgreSQL, as well as enforcing the use of TLS and verifying the server certificate on the client side.
You will need to update your pg_hba.conf
file to include the following lines
to ensure connections to Teleport use client certificates. See The
pg_hba.conf
file in the PostgreSQL documentation for more details.
# TYPE DATABASE USER CIDR-ADDRESS METHOD
hostssl teleport all ::/0 cert
hostssl teleport all 0.0.0.0/0 cert
If the use of passwords is unavoidable, we recommend configuring them in the
~/.pgpass
file
rather than storing them in Teleport's configuration file.
Azure AD authentication
If you are running Teleport on Azure, Teleport can make use of Azure AD authentication to connect to an Azure Database for PostgreSQL server without having to manage any secrets:
teleport:
storage:
type: postgresql
conn_string: postgresql://user_name@database-name.postgres.database.azure.com/teleport_backend?sslmode=verify-full&pool_max_conns=20
auth_mode: azure
audit_events_uri:
- postgresql://user_name@database-name.postgres.database.azure.com/teleport_audit?sslmode=verify-full#auth_mode=azure
When auth_mode
is set to azure
, Teleport will automatically fetch
short-lived tokens from the credentials available to it, to be used as database
passwords. The database user must be configured to allow connections using
Azure
AD.
Teleport will make use of the Azure AD credentials specified by environment variables, Azure AD Workload Identity credentials, or managed identity credentials.
Google Cloud IAM authentication
If you are running Teleport on Google Cloud, Teleport can make use of IAM Authentication to connect to an GCP Cloud SQL for PostgreSQL without having to manage any secrets:
teleport:
storage:
type: postgresql
auth_mode: gcp-cloudsql
# GCP connection name has the format <project>:<location>:<instance>.
gcp_connection_name: project:location:instance
# The type of IP address to use for connecting to the Cloud SQL instance. Valid options are:
# - "" (default to "public")
# - "public"
# - "private"
# - "psc" (for Private Service Connect)
gcp_ip_type: public
# Leave host and port empty as they are not required.
conn_string: postgresql://service-account@project.iam@/teleport_backend
audit_events_uri:
- postgresql://service-account@project.iam@/teleport_audit#auth_mode=gcp-cloudsql&gcp_connection_name=project:location:instance&gcp_ip_type=public
To enable IAM authentication and logical replication for Cloud SQL, make sure
flags cloudsql.iam_authentication
and cloudsql.logical_decoding
are set to
on
for the Cloud SQL instance. The database user must also have the
REPLICATION
role attribute for using the logical decoding features. See set
up logical replication and
decoding
for more details.
In order for Teleport to use the Cloud SQL Go
Connector with IAM
authentication, the service account of the target database user must have "Cloud
SQL Client"/roles/cloudsql.client
and "Cloud SQL Instance
User"/roles/cloudsql.instanceUser
roles assigned to the service account.
Teleport will make use of the credentials specified through the
GOOGLE_APPLICATION_CREDENTIALS
environment
variable,
Workload Identity
Federation
with service account impersonation, or service account credentials attached to
VMs.
If the service account used in the PostgreSQL connection string is different from the service account of the default credentials, Teleport will impersonate the service account used in the connection string as a Service Account Token Creator using the default credentials.
Development
If you are not ready to connect Teleport to a production instance of PostgreSQL, you can use the following instructions to set up a throwaway instance of PostgreSQL using Docker.
First copy the following script to disk and run it to generate the CA, client certificate, and server certificate used by Teleport and PostgreSQL to establish a secure mutually authenticated connection:
#!/bin/bash
# Create the certs directory.
mkdir -p ./certs
cd certs/
# Create CA key and self-signed certificate.
openssl genpkey -algorithm RSA -out ca.key
openssl req -x509 -new -key ca.key -out ca.crt -subj "/CN=root"
# Function to create certificates.
create_certificate() {
local name="$1"
local dns_name="$2"
openssl genpkey \
-algorithm RSA \
-out "${name}.key"
openssl req -new \
-key "${name}.key" \
-out "${name}.csr" \
-subj "/CN=${dns_name}"
openssl x509 -req \
-in "${name}.csr" \
-CA ca.crt \
-CAkey ca.key \
-out "${name}.crt" \
-extfile <(printf "subjectAltName=DNS:${dns_name}") \
-CAcreateserial
chmod 0600 "${name}.key"
}
# Create client certificate with SAN.
create_certificate "client" "teleport"
# Create server certificate with SAN.
create_certificate "server" "localhost"
echo "Certificates and keys generated successfully."
Next, create a Dockerfile
using the official PostgreSQL Docker
image and add wal2json
to it:
FROM postgres:15.0
RUN apt-get update
RUN apt-get install -y postgresql-15-wal2json
Create an init.sql
file that will ensure the Teleport user is created upon
startup of the container:
CREATE USER teleport WITH REPLICATION CREATEDB;
Create a pg_hba.conf
file to enforce certificate-based authentication for
connections to PostgreSQL:
# TYPE DATABASE USER CIDR-ADDRESS METHOD
local all all trust
hostssl all all ::/0 cert
hostssl all all 0.0.0.0/0 cert
Create a postgresql.conf
file that configures the WAL level and certificates
used for authentication:
listen_addresses = '*'
port = 5432
max_connections = 20
shared_buffers = 128MB
temp_buffers = 8MB
work_mem = 4MB
wal_level=logical
max_replication_slots=10
ssl=on
ssl_ca_file='/certs/ca.crt'
ssl_cert_file='/certs/server.crt'
ssl_key_file='/certs/server.key'
Start the PostgreSQL container with the following command:
docker run --rm --name postgres \
-e POSTGRES_DB=db \
-e POSTGRES_USER=user \
-e POSTGRES_PASSWORD=password \
-v $(pwd)/data:/var/lib/postgresql/data \
-v $(pwd)/certs:/certs \
-v $(pwd)/postgresql.conf:/etc/postgresql/postgresql.conf \
-v $(pwd)/pg_hba.conf:/etc/postgresql/pg_hba.conf \
-v $(pwd)/init.sql:/docker-entrypoint-initdb.d/init.sql \
-p 5432:5432 \
$(docker build -q .) \
postgres \
-c hba_file=/etc/postgresql/pg_hba.conf \
-c config_file=/etc/postgresql/postgresql.conf
Lastly, update the storage section in teleport.yaml
to use PostgreSQL and
start Teleport:
teleport:
storage:
type: postgresql
conn_string: "postgresql://teleport@localhost:5432/teleport_backend?sslcert=/path/to/certs/client.crt&sslkey=/path/to/certs/client.key&sslrootcert=/path/to/certs/ca.crt&sslmode=verify-full&pool_max_conns=20"
S3 (Session Recordings)
Teleport supports using S3 as a backend for both session recordings and audit logs. S3 cannot be used as the cluster state backend. This section covers the use of S3 as a session recording backend. For information on using S3 for audit logs, see the Athena section.
S3 buckets must have versioning enabled, which ensures that a session log cannot be permanently altered or deleted. Teleport will always look at the oldest version of a recording.
Authenticating to AWS
The Teleport Auth Service must be able to read AWS credentials in order to authenticate to S3.
Grant the Teleport Auth Service access to credentials that it can use to authenticate to AWS. If you are running the Teleport Auth Service on an EC2 instance, you may use the EC2 Instance Metadata Service method. Otherwise, you must use environment variables:
Teleport will detect when it is running on an EC2 instance and use the Instance Metadata Service to fetch credentials.
The EC2 instance should be configured to use an EC2 instance profile. For more information, see: Using Instance Profiles.
Teleport's built-in AWS client reads credentials from the following environment variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
When you start the Teleport Auth Service, the service reads environment variables from a
file at the path /etc/default/teleport
. Obtain these credentials from your
organization. Ensure that /etc/default/teleport
has the following content,
replacing the values of each variable:
AWS_ACCESS_KEY_ID=00000000000000000000
AWS_SECRET_ACCESS_KEY=0000000000000000000000000000000000000000
AWS_DEFAULT_REGION=<YOUR_REGION>
Teleport's AWS client loads credentials from different sources in the following order:
- Environment Variables
- Shared credentials file
- Shared configuration file (Teleport always enables shared configuration)
- EC2 Instance Metadata (credentials only)
While you can provide AWS credentials via a shared credentials file or shared
configuration file, you will need to run the Teleport Auth Service with the AWS_PROFILE
environment variable assigned to the name of your profile of choice.
If you have a specific use case that the instructions above do not account for, consult the documentation for the AWS SDK for Go for a detailed description of credential loading behavior.
Configuring the S3 backend
Below is an example of how to configure the Teleport Auth Service to store the recorded sessions in an S3 bucket.
teleport:
storage:
# The region setting sets the default AWS region for all AWS services
# Teleport may consume (DynamoDB, S3)
region: us-east-1
# Path to S3 bucket to store the recorded sessions in.
audit_sessions_uri: "s3://Example_TELEPORT_S3_BUCKET/records"
# Teleport assumes credentials. Using provider chains, assuming IAM role or
# standard .aws/credentials in the home folder.
You can add optional query parameters to the S3 URL. The Teleport Auth Service reads these parameters to configure its interactions with S3:
s3://bucket/path?region=us-east-1&endpoint=mys3.example.com&insecure=false&disablesse=false&acl=private&use_fips_endpoint=true
-
region=us-east-1
- set the Amazon region to use. -
endpoint=mys3.example.com
- connect to a custom S3 endpoint. Optional. -
insecure=true
- set totrue
orfalse
. Iftrue
, TLS will be disabled. Default value isfalse
. -
disablesse=true
- set totrue
orfalse
. The Auth Service checks this value before uploading an object to an S3 bucket.If this is
false
, the Auth Service will set the server-side encryption configuration of the upload to use AWS Key Management Service and, ifsse_kms_key
is set, configure the upload to use this key.If this value is
true
, the Auth Service will not set an explicit server-side encryption configuration for the object upload, meaning that the upload will use the bucket-level server-side encryption configuration. -
sse_kms_key=kms_key_id
- If set to a valid AWS KMS CMK key ID, all objects uploaded to S3 will be encrypted with this key (as long asdisablesse
isfalse
). Details can be found below. -
acl=private
- set the canned ACL to use. Must be one of the predefined ACL values. -
use_fips_endpoint=true
- Configure S3 FIPS endpoints
S3 IAM policy
On startup, the Teleport Auth Service checks whether the S3 bucket you have configured for session recording storage exists. If it does not, the Auth Service attempts to create and configure the bucket.
The IAM permissions that the Auth Service requires to manage its session recording bucket depends on whether you expect to create the bucket yourself or enable the Auth Service to create and configure it for you:
Note that Teleport will only use S3 buckets with versioning enabled. This ensures that a session log cannot be permanently altered or deleted, as Teleport will always look at the oldest version of a recording.
You'll need to replace these values in the policy example below:
Placeholder value | Replace with |
---|---|
your-sessions-bucket | Name to use for the Teleport S3 session recording bucket |
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketActions",
"Effect": "Allow",
"Action": [
"s3:ListBucketVersions",
"s3:ListBucketMultipartUploads",
"s3:ListBucket",
"s3:GetEncryptionConfiguration",
"s3:GetBucketVersioning"
],
"Resource": "arn:aws:s3:::your-sessions-bucket"
},
{
"Sid": "ObjectActions",
"Effect": "Allow",
"Action": [
"s3:GetObjectVersion",
"s3:GetObjectRetention",
"s3:GetObject",
"s3:PutObject",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload"
],
"Resource": "arn:aws:s3:::your-sessions-bucket/*"
}
]
}
You'll need to replace these values in the policy example below:
Placeholder value | Replace with |
---|---|
your-sessions-bucket | Name to use for the Teleport S3 session recording bucket |
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketActions",
"Effect": "Allow",
"Action": [
"s3:PutEncryptionConfiguration",
"s3:PutBucketVersioning",
"s3:ListBucketVersions",
"s3:ListBucketMultipartUploads",
"s3:ListBucket",
"s3:GetEncryptionConfiguration",
"s3:GetBucketVersioning",
"s3:CreateBucket"
],
"Resource": "arn:aws:s3:::your-sessions-bucket"
},
{
"Sid": "ObjectActions",
"Effect": "Allow",
"Action": [
"s3:GetObjectVersion",
"s3:GetObjectRetention",
"s3:*Object",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload"
],
"Resource": "arn:aws:s3:::your-sessions-bucket/*"
}
]
}
S3 Server Side Encryption
Teleport supports using a custom AWS KMS Customer Managed Key for encrypting objects uploaded to S3. This allows you to restrict who can read objects like session recordings separately from those that have read access to a bucket by restricting key access.
The sse_kms_key
parameter above can be set to any valid KMS CMK ID corresponding to a symmetric standard spec KMS key.
Example template KMS key policies are provided below for common usage cases. IAM users do not have access to any
key by default. Permissions have to be explicitly granted in the policy.
Encryption/Decryption
This policy allows an IAM user to encrypt and decrypt objects. This allows a cluster auth to write and play back session recordings.
Replace [iam-key-admin-arn]
with the IAM ARN of the user(s) that should have
administrative key access and [auth-node-iam-arn]
with the IAM ARN
of the user the Teleport auth nodes are using.
{
"Id": "Teleport Encryption and Decryption",
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Teleport CMK Admin",
"Effect": "Allow",
"Principal": {
"AWS": "[iam-key-admin-arn]"
},
"Action": "kms:*",
"Resource": "*"
},
{
"Sid": "Teleport CMK Auth",
"Effect": "Allow",
"Principal": {
"AWS": "[auth-node-iam-arn]"
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
}
]
}
Encryption/Decryption with separate clusters
This policy allows specifying separate IAM users for encryption and decryption. This can be used to set up a multi cluster configuration where the main cluster cannot play back session recordings but only write them. A separate cluster authenticating as a different IAM user with decryption access can be used for playing back the session recordings.
Replace [iam-key-admin-arn]
with the IAM ARN of the user(s) that should have
administrative key access, [iam-node-write-arn]
with the IAM ARN of the user the
main write-only cluster auth nodes are using and [iam-node-read-arn]
with the
IAM ARN of the user used by the read-only cluster.
For this to work the second cluster has to be connected to the same audit log as the main cluster. This is needed to detect session recordings.
{
"Id": "Teleport Separate Encryption and Decryption",
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Teleport CMK Admin",
"Effect": "Allow",
"Principal": {
"AWS": "[iam-key-admin-arn]"
},
"Action": "kms:*",
"Resource": "*"
},
{
"Sid": "Teleport CMK Auth Encrypt",
"Effect": "Allow",
"Principal": {
"AWS": "[auth-node-write-arn]"
},
"Action": [
"kms:Encrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
},
{
"Sid": "Teleport CMK Auth Decrypt",
"Effect": "Allow",
"Principal": {
"AWS": "[auth-node-read-arn]"
},
"Action": [
"kms:Decrypt",
"kms:DescribeKey"
],
"Resource": "*"
}
]
}
ACL example: transferring object ownership
If you are uploading from AWS account A
to a bucket owned by AWS account B
and want A
to retain ownership of the objects, you can take one of two approaches.
Without ACLs
If ACLs are disabled, object ownership will be set to Bucket owner enforced
and no action will be needed.
With ACLs
- Set object ownership to
Bucket owner preferred
(under Permissions in the management console). - Add
acl=bucket-owner-full-control
toaudit_sessions_uri
.
To enforce the ownership transfer, set B
's bucket's policy to only allow uploads that include the bucket-owner-full-control
canned ACL.
{
"Version": "2012-10-17",
"Id": "[id]",
"Statement": [
{
"Sid": "[sid]",
"Effect": "Allow",
"Principal": {
"AWS": "[ARN of account A]"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::BucketName/*",
"Condition": {
"StringEquals": {
"s3:x-amz-acl": "bucket-owner-full-control"
}
}
}
]
}
For more information, see the AWS Documentation.
DynamoDB
If you are running Teleport on AWS, you can use DynamoDB as a storage backend to achieve High Availability. DynamoDB backend supports two types of Teleport data:
- Cluster state
- Audit log events
Teleport uses DynamoDB and DynamoDB Streams endpoints for its storage backend management.
DynamoDB cannot store the recorded sessions. You are advised to use AWS S3 for that as shown above.
Authenticating to AWS
The Teleport Auth Service must be able to read AWS credentials in order to authenticate to DynamoDB.
Grant the Teleport Auth Service access to credentials that it can use to authenticate to AWS. If you are running the Teleport Auth Service on an EC2 instance, you may use the EC2 Instance Metadata Service method. Otherwise, you must use environment variables:
Teleport will detect when it is running on an EC2 instance and use the Instance Metadata Service to fetch credentials.
The EC2 instance should be configured to use an EC2 instance profile. For more information, see: Using Instance Profiles.
Teleport's built-in AWS client reads credentials from the following environment variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
When you start the Teleport Auth Service, the service reads environment variables from a
file at the path /etc/default/teleport
. Obtain these credentials from your
organization. Ensure that /etc/default/teleport
has the following content,
replacing the values of each variable:
AWS_ACCESS_KEY_ID=00000000000000000000
AWS_SECRET_ACCESS_KEY=0000000000000000000000000000000000000000
AWS_DEFAULT_REGION=<YOUR_REGION>
Teleport's AWS client loads credentials from different sources in the following order:
- Environment Variables
- Shared credentials file
- Shared configuration file (Teleport always enables shared configuration)
- EC2 Instance Metadata (credentials only)
While you can provide AWS credentials via a shared credentials file or shared
configuration file, you will need to run the Teleport Auth Service with the AWS_PROFILE
environment variable assigned to the name of your profile of choice.
If you have a specific use case that the instructions above do not account for, consult the documentation for the AWS SDK for Go for a detailed description of credential loading behavior.
The IAM role that the Teleport Auth Service authenticates as must have the policies specified in the next section.
IAM policies
Make sure that the IAM role assigned to Teleport is configured with sufficient access to DynamoDB.
On startup, the Teleport Auth Service checks whether the DynamoDB table you have specified in its configuration file exists. If the table does not exist, the Auth Service attempts to create one.
The IAM permissions that the Auth Service requires to manage DynamoDB tables depends on whether you expect to create a table yourself or enable the Auth Service to create and configure one for you:
If you choose to manage DynamoDB tables yourself, you must take the following steps, which we will explain in more detail below:
- Create a cluster state table.
- Create an audit event table.
- Create an IAM policy and attach it to the Teleport Auth Service's IAM identity.
Create a cluster state table
The cluster state table must have the following attribute definitions:
Name | Type |
---|---|
HashKey | S |
FullPath | S |
The table must also have the following key schema elements:
Name | Type |
---|---|
HashKey | HASH |
FullPath | RANGE |
Create an audit event table
The audit event table must have the following attribute definitions:
Name | Type |
---|---|
SessionID | S |
EventIndex | N |
CreatedAtDate | S |
CreatedAt | N |
The table must also have the following key schema elements:
Name | Type |
---|---|
CreatedAtDate | HASH |
CreatedAt | RANGE |
Create and attach an IAM policy
Create the following IAM policy and attach it to the Teleport Auth Service's IAM identity.
You'll need to replace these values in the policy example below:
Placeholder value | Replace with |
---|---|
us-west-2 | AWS region |
1234567890 | AWS account ID |
teleport-helm-backend | DynamoDB table name to use for the Teleport backend |
teleport-helm-events | DynamoDB table name to use for the Teleport audit log (must be different to the backend table) |
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ClusterStateStorage",
"Effect": "Allow",
"Action": [
"dynamodb:BatchWriteItem",
"dynamodb:UpdateTimeToLive",
"dynamodb:PutItem",
"dynamodb:DeleteItem",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:DescribeStream",
"dynamodb:UpdateItem",
"dynamodb:DescribeTimeToLive",
"dynamodb:DescribeTable",
"dynamodb:GetShardIterator",
"dynamodb:GetItem",
"dynamodb:ConditionCheckItem",
"dynamodb:UpdateTable",
"dynamodb:GetRecords",
"dynamodb:UpdateContinuousBackups"
],
"Resource": [
"arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-backend",
"arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-backend/stream/*"
]
},
{
"Sid": "ClusterEventsStorage",
"Effect": "Allow",
"Action": [
"dynamodb:BatchWriteItem",
"dynamodb:UpdateTimeToLive",
"dynamodb:PutItem",
"dynamodb:DescribeTable",
"dynamodb:DeleteItem",
"dynamodb:GetItem",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:UpdateItem",
"dynamodb:DescribeTimeToLive",
"dynamodb:UpdateTable",
"dynamodb:UpdateContinuousBackups"
],
"Resource": [
"arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-events",
"arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-events/index/*"
]
}
]
}
Note that you can omit the dynamodb:UpdateContinuousBackups
permission if
disabling continuous backups.
You'll need to replace these values in the policy example below:
Placeholder value | Replace with |
---|---|
us-west-2 | AWS region |
1234567890 | AWS account ID |
teleport-helm-backend | DynamoDB table name to use for the Teleport backend |
teleport-helm-events | DynamoDB table name to use for the Teleport audit log (must be different to the backend table) |
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ClusterStateStorage",
"Effect": "Allow",
"Action": [
"dynamodb:BatchWriteItem",
"dynamodb:UpdateTimeToLive",
"dynamodb:PutItem",
"dynamodb:DeleteItem",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:DescribeStream",
"dynamodb:UpdateItem",
"dynamodb:DescribeTimeToLive",
"dynamodb:CreateTable",
"dynamodb:DescribeTable",
"dynamodb:GetShardIterator",
"dynamodb:GetItem",
"dynamodb:ConditionCheckItem",
"dynamodb:UpdateTable",
"dynamodb:GetRecords",
"dynamodb:UpdateContinuousBackups"
],
"Resource": [
"arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-backend",
"arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-backend/stream/*"
]
},
{
"Sid": "ClusterEventsStorage",
"Effect": "Allow",
"Action": [
"dynamodb:CreateTable",
"dynamodb:BatchWriteItem",
"dynamodb:UpdateTimeToLive",
"dynamodb:PutItem",
"dynamodb:DescribeTable",
"dynamodb:DeleteItem",
"dynamodb:GetItem",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:UpdateItem",
"dynamodb:DescribeTimeToLive",
"dynamodb:UpdateTable",
"dynamodb:UpdateContinuousBackups"
],
"Resource": [
"arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-events",
"arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-events/index/*"
]
}
]
}
Configuring the DynamoDB backend
To configure Teleport to use DynamoDB:
- Configure all Teleport Auth servers to use DynamoDB backend in the "storage"
section of
teleport.yaml
as shown below. - Auth servers must be able to reach DynamoDB and DynamoDB Streams endpoints.
- Deploy up to two auth servers connected to DynamoDB storage backend.
- Deploy several proxy nodes.
- Make sure that all Teleport resource services have the
auth_servers
configuration setting populated with the addresses of your cluster's Auth Service instances.
AWS can throttle DynamoDB if more than two processes are reading from the same stream's shard simultaneously, so you must not deploy more than two Auth Service instances that read from a DynamoDB backend. For details on DynamoDB Streams, read the AWS documentation.
teleport:
storage:
type: dynamodb
# Region location of dynamodb instance, https://docs.aws.amazon.com/en_pv/general/latest/gr/rande.html#ddb_region
region: us-east-1
# Name of the DynamoDB table. If it does not exist, Teleport will create it.
table_name: Example_TELEPORT_DYNAMO_TABLE_NAME
# This setting configures Teleport to send the audit events to three places:
# To keep a copy in DynamoDB, a copy on a local filesystem, and also output the events to stdout.
# NOTE: The DynamoDB events table has a different schema to the regular Teleport
# database table, so attempting to use the same table for both will result in errors.
# When using highly available storage like DynamoDB, you should make sure that the list always specifies
# the High Availability storage method first, as this is what the Teleport web UI uses as its source of events to display.
audit_events_uri: ['dynamodb://events_table_name', 'file:///var/lib/teleport/audit/events', 'stdout://']
# This setting configures Teleport to save the recorded sessions in an S3 bucket:
audit_sessions_uri: s3://Example_TELEPORT_S3_BUCKET/records
# By default, Teleport stores audit events with an AWS TTL of 1 year.
# This value can be configured as shown below. If set to 0 seconds, TTL is disabled.
retention_period: 365d
# Enables either Pay Per Request or Provisioned billing for the DynamoDB table. Set when Teleport creates the table.
# Possible values: "pay_per_request" and "provisioned"
# default: "pay_per_request"
billing_mode: "pay_per_request"
# continuous_backups is used to optionally enable continuous backups.
# default: false
continuous_backups: true
- Replace
us-east-1
andExample_TELEPORT_DYNAMO_TABLE_NAME
with your own settings. Teleport will create the table automatically. Example_TELEPORT_DYNAMO_TABLE_NAME
andevents_table_name
must be different DynamoDB tables. The schema is different for each. Using the same table name for both will result in errors.- Audit log settings above are optional. If specified, Teleport will store the
audit log in DynamoDB and the session recordings must be stored in an S3
bucket, i.e. both
audit_xxx
settings must be present. If they are not set, Teleport will default to a local file system for the audit log, i.e./var/lib/teleport/log
on an Auth Service instance.
The optional GET
parameters shown below control how Teleport interacts with a DynamoDB endpoint.
dynamodb://events_table_name?region=us-east-1&endpoint=dynamo.example.com&use_fips_endpoint=true
region=us-east-1
- set the Amazon region to use.endpoint=dynamo.example.com
- connect to a custom S3 endpoint.use_fips_endpoint=true
- Configure DynamoDB FIPS endpoints.
DynamoDB Continuous Backups
When setting up DynamoDB it's important to enable backups so that cluster state can be restored if needed from a snapshot in the past.
DynamoDB On-Demand
For best performance it is recommended to use On-Demand mode instead of configuring capacity manually via Provisioned mode. This helps prevent any DynamoDB throttling due to underestimated usage or increased usage from impacting Teleport.
Configuring AWS FIPS endpoints
This config option applies to Amazon S3 and Amazon DynamoDB.
Set use_fips_endpoint
to true
or false
. If true
, FIPS Dynamo endpoints will be used.
If false
, normal Dynamo endpoints will be used. If unset, the AWS Environment Variable AWS_USE_FIPS_ENDPOINT
will determine which endpoint is used.
FIPS endpoints will also be used if Teleport is run with the --fips
flag.
Config option priority is applied in the following order:
- Setting the
use_fips_endpoint
query parameter as shown above - Using the
--fips
flag when running Teleport - Using the AWS environment variable
Setting this environment variable to true will enable FIPS endpoints for all AWS resource types. Some FIPS endpoints are not supported in certain regions or environments or are only supported in GovCloud.
Athena
The Athena audit log backend is available starting from Teleport v14.0.
If you are running Teleport on AWS, you can use an Athena-based audit log system that manages Parquet files stored on S3 as a storage backend to achieve high availability. The Athena backend supports only one type of Teleport data, audit events.
The Athena audit backend is better at scale and search than DynamoDB.
The Athena audit logs are eventually consistent. It may take up to one minute
(depending on the batchMaxInterval
setting and event load) until you can view
events in the Teleport Web UI.
Infrastructure setup
The Auth Service uses an SQS queue subscribed to an SNS topic for event publishing. A single Auth Service instance reads events in batches from SQS, converts them into Parquet format, and sends the resulting data to S3. During queries, the Athena engine searches for events on S3, reading metadata from a Glue table.
You can set up the required infrastructure to support the Athena backend with the following Terraform script:
variable "aws_region" {
description = "AWS region"
default = "us-west-2"
}
variable "sns_topic_name" {
description = "Name of the SNS topic used for publishing audit events"
}
variable "sqs_queue_name" {
description = "Name of the SQS queue used for subscription for audit events topic"
}
variable "sqs_dlq_name" {
description = "Name of the SQS Dead-Letter Queue used for handling unprocessable events"
}
variable "max_receive_count" {
description = "Number of times a message can be received before it is sent to the DLQ"
default = 10
}
variable "kms_key_alias" {
description = "The alias of a custom KMS key"
}
variable "long_term_bucket_name" {
description = "Name of the long term storage bucket used for storing audit events"
}
variable "transient_bucket_name" {
description = "Name of the transient storage bucket used for storing query results and large events payloads"
}
variable "database_name" {
description = "Name of Glue database"
}
variable "table_name" {
description = "Name of Glue table"
}
variable "workgroup" {
description = "Name of Athena workgroup"
}
variable "workgroup_max_scanned_bytes_per_query" {
description = "Limit per query of max scanned bytes"
default = 1073741824 # 1GB
}
# search_event_limiter variables allows to configured rate limit on top of
# search events API to prevent increasing costs in case of aggressive use of API.
# In current version Athena Audit logger is not prepared for polling of API.
# Burst=20, time=1m and amount=5, means that you can do 20 requests without any
# throttling, next requests will be throttled, and tokens will be filled to
# rate limit bucket at amount 5 every 1m.
variable "search_event_limiter_burst" {
description = "Number of tokens available for rate limit used on top of search event API"
default = 20
}
variable "search_event_limiter_time" {
description = "Duration between the addition of tokens to the bucket for rate limit used on top of search event API"
default = "1m"
}
variable "search_event_limiter_amount" {
description = "Number of tokens added to the bucket during specific interval for rate limit used on top of search event API"
default = 5
}
variable "access_monitoring_trusted_relationship_role_arn" {
description = "AWS Role ARN that will be used to configure trusted relationship between provided role and Access Monitoring role allowing to assume Access Monitoring role by the provided role"
default = ""
}
variable "access_monitoring" {
description = "Enabled Access Monitoring"
type = bool
default = false
}
variable "access_monitoring_prefix" {
description = "Prefix for resources created by Access Monitoring"
default = ""
}
provider "aws" {
region = var.aws_region
}
data "aws_caller_identity" "current" {}
resource "aws_kms_key" "audit_key" {
description = "KMS key for Athena audit log"
enable_key_rotation = true
}
resource "aws_kms_key_policy" "audit_key_policy" {
key_id = aws_kms_key.audit_key.id
policy = jsonencode({
Statement = [
{
Action = [
"kms:*"
]
Effect = "Allow"
Principal = {
AWS = data.aws_caller_identity.current.account_id
}
Resource = "*"
Sid = "Default Policy"
},
{
Action = [
"kms:GenerateDataKey",
"kms:Decrypt"
]
Effect = "Allow"
Principal = {
Service = "sns.amazonaws.com"
}
Resource = "*"
Sid = "SnsUsage"
Condition = {
StringEquals = {
"aws:SourceAccount" = data.aws_caller_identity.current.account_id
}
ArnLike = {
"aws:SourceArn" : aws_sns_topic.audit_topic.arn
}
}
},
]
Version = "2012-10-17"
})
}
resource "aws_kms_alias" "audit_key_alias" {
name = "alias/${var.kms_key_alias}"
target_key_id = aws_kms_key.audit_key.key_id
}
resource "aws_sns_topic" "audit_topic" {
name = var.sns_topic_name
kms_master_key_id = aws_kms_key.audit_key.arn
}
resource "aws_sqs_queue" "audit_queue_dlq" {
name = var.sqs_dlq_name
kms_master_key_id = aws_kms_key.audit_key.arn
kms_data_key_reuse_period_seconds = 300
message_retention_seconds = 604800 // 7 days which is three days longer than default 4 of sqs queue
}
resource "aws_sqs_queue" "audit_queue" {
name = var.sqs_queue_name
kms_master_key_id = aws_kms_key.audit_key.arn
kms_data_key_reuse_period_seconds = 300
redrive_policy = jsonencode({
deadLetterTargetArn = aws_sqs_queue.audit_queue_dlq.arn
maxReceiveCount = var.max_receive_count
})
}
resource "aws_sns_topic_subscription" "audit_sqs_target" {
topic_arn = aws_sns_topic.audit_topic.arn
protocol = "sqs"
endpoint = aws_sqs_queue.audit_queue.arn
raw_message_delivery = true
}
data "aws_iam_policy_document" "audit_policy" {
statement {
actions = [
"SQS:SendMessage",
]
effect = "Allow"
principals {
type = "Service"
identifiers = ["sns.amazonaws.com"]
}
resources = [aws_sqs_queue.audit_queue.arn]
condition {
test = "ArnEquals"
variable = "aws:SourceArn"
values = [aws_sns_topic.audit_topic.arn]
}
}
}
resource "aws_sqs_queue_policy" "audit_policy" {
queue_url = aws_sqs_queue.audit_queue.url
policy = data.aws_iam_policy_document.audit_policy.json
}
resource "aws_s3_bucket" "long_term_storage" {
bucket = var.long_term_bucket_name
force_destroy = true
# On production we recommend enabling object lock to provide deletion protection.
object_lock_enabled = false
}
resource "aws_s3_bucket_server_side_encryption_configuration" "long_term_storage" {
bucket = aws_s3_bucket.long_term_storage.id
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.audit_key.arn
sse_algorithm = "aws:kms"
}
bucket_key_enabled = true
}
}
resource "aws_s3_bucket_ownership_controls" "long_term_storage" {
bucket = aws_s3_bucket.long_term_storage.id
rule {
object_ownership = "BucketOwnerEnforced"
}
}
resource "aws_s3_bucket_versioning" "long_term_storage" {
bucket = aws_s3_bucket.long_term_storage.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_public_access_block" "long_term_storage" {
bucket = aws_s3_bucket.long_term_storage.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket" "transient_storage" {
bucket = var.transient_bucket_name
force_destroy = true
# On production we recommend enabling lifecycle configuration to clean transient data.
}
resource "aws_s3_bucket_server_side_encryption_configuration" "transient_storage" {
bucket = aws_s3_bucket.transient_storage.id
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.audit_key.arn
sse_algorithm = "aws:kms"
}
bucket_key_enabled = true
}
}
resource "aws_s3_bucket_ownership_controls" "transient_storage" {
bucket = aws_s3_bucket.transient_storage.id
rule {
object_ownership = "BucketOwnerEnforced"
}
}
resource "aws_s3_bucket_versioning" "transient_storage" {
bucket = aws_s3_bucket.transient_storage.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_public_access_block" "transient_storage" {
bucket = aws_s3_bucket.transient_storage.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_glue_catalog_database" "audit_db" {
name = var.database_name
}
resource "aws_glue_catalog_table" "audit_table" {
name = var.table_name
database_name = aws_glue_catalog_database.audit_db.name
table_type = "EXTERNAL_TABLE"
parameters = {
"EXTERNAL" = "TRUE",
"projection.enabled" = "true",
"projection.event_date.type" = "date",
"projection.event_date.format" = "yyyy-MM-dd",
"projection.event_date.interval" = "1",
"projection.event_date.interval.unit" = "DAYS",
"projection.event_date.range" = "NOW-4YEARS,NOW",
"storage.location.template" = format("s3://%s/events/$${event_date}/", aws_s3_bucket.long_term_storage.bucket)
"classification" = "parquet"
"parquet.compression" = "SNAPPY",
}
storage_descriptor {
location = format("s3://%s", aws_s3_bucket.long_term_storage.bucket)
input_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
ser_de_info {
name = "example"
parameters = { "serialization.format" = "1" }
serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
}
columns {
name = "uid"
type = "string"
}
columns {
name = "session_id"
type = "string"
}
columns {
name = "event_type"
type = "string"
}
columns {
name = "event_time"
type = "timestamp"
}
columns {
name = "event_data"
type = "string"
}
columns {
name = "user"
type = "string"
}
}
partition_keys {
name = "event_date"
type = "date"
}
}
resource "aws_athena_workgroup" "workgroup" {
name = var.workgroup
force_destroy = true
configuration {
bytes_scanned_cutoff_per_query = var.workgroup_max_scanned_bytes_per_query
engine_version {
selected_engine_version = "Athena engine version 3"
}
result_configuration {
output_location = format("s3://%s/results", aws_s3_bucket.transient_storage.bucket)
encryption_configuration {
encryption_option = "SSE_KMS"
kms_key_arn = aws_kms_key.audit_key.arn
}
}
}
}
output "athena_url" {
value = format("athena://%s.%s?%s",
aws_glue_catalog_database.audit_db.name,
aws_glue_catalog_table.audit_table.name,
join("&", [
format("topicArn=%s", aws_sns_topic.audit_topic.arn),
format("largeEventsS3=s3://%s/large_payloads", aws_s3_bucket.transient_storage.bucket),
format("locationS3=s3://%s/events", aws_s3_bucket.long_term_storage.bucket),
format("workgroup=%s", aws_athena_workgroup.workgroup.name),
format("queueURL=%s", aws_sqs_queue.audit_queue.url),
format("queryResultsS3=s3://%s/query_results", aws_s3_bucket.transient_storage.bucket),
format("limiterBurst=%d", var.search_event_limiter_burst),
format("limiterRefillAmount=%s", var.search_event_limiter_amount),
format("limiterRefillTime=%s", var.search_event_limiter_time),
])
)
}
Configuring the Athena audit log backend
To configure Teleport to use Athena:
- Make sure you are using Teleport version 14.0.0 or newer.
- Prepare infrastructure
- Specify an Athena URL inside the
audit_events_uri
array in your Teleport configuration file:
teleport:
storage:
# This setting configures Teleport to keep a copy of the audit log in Athena
# and a copy on a local filesystem, and also to output the events to stdout.
audit_events_uri:
# More details about the full Athena URL are shown below.
- 'athena://database.table?params'
- 'file:///var/lib/teleport/audit/events'
- 'stdout://'
Here is an example of an Amazon Athena URL within the audit_events_uri
configuration field:
athena://db.table?topicArn=arn:aws:sns:region:account_id:topic_name&largeEventsS3=s3://transient/large_payloads&locationS3=s3://long-term/events&workgroup=workgroup&queueURL=https://sqs.region.amazonaws.com/account_id/queue_name&queryResultsS3=s3://transient/query_results
The URL hostname consist of database.table
, which points to the Glue database
and a table which will be used by the Athena audit logger.
Other parameters are specified as query parameters within the Athena URL.
The following parameters are required:
Parameter name | Example value | Description |
---|---|---|
topicArn | arn:aws:sns:region:account_id:topic_name | ARN of SNS topic where events are published |
locationS3 | s3://long-term/events | S3 bucket used for long-term storage |
largeEventsS3 | s3://transient/large_payloads | S3 bucket used for transient storage for large events |
queueURL | https://sqs.region.amazonaws.com/account_id/queue_name | SQS URL used for a subscription to an SNS topic |
workgroup | workgroup_name | Athena workgroup used for queries |
queryResultsS3 | s3://transient/results | S3 bucket used for transient storage for query results |
The following parameters are optional:
Parameter name | Example value | Description |
---|---|---|
region | us-east-1 | AWS region. If empty, defaults to one from the AuditConfig or ambient AWS credentials |
batchMaxItems | 20000 | defines the maximum number of events allowed for a single Parquet file (default 20000) |
batchMaxInterval | 1m | defines the maximum interval used to buffer incoming data before creating a Parquet file (default 1m) |
Authenticating to AWS
The Teleport Auth Service must be able to read AWS credentials in order to authenticate to Athena.
Grant the Teleport Auth Service access to credentials that it can use to authenticate to AWS. If you are running the Teleport Auth Service on an EC2 instance, you may use the EC2 Instance Metadata Service method. Otherwise, you must use environment variables:
Teleport will detect when it is running on an EC2 instance and use the Instance Metadata Service to fetch credentials.
The EC2 instance should be configured to use an EC2 instance profile. For more information, see: Using Instance Profiles.
Teleport's built-in AWS client reads credentials from the following environment variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
When you start the Teleport Auth Service, the service reads environment variables from a
file at the path /etc/default/teleport
. Obtain these credentials from your
organization. Ensure that /etc/default/teleport
has the following content,
replacing the values of each variable:
AWS_ACCESS_KEY_ID=00000000000000000000
AWS_SECRET_ACCESS_KEY=0000000000000000000000000000000000000000
AWS_DEFAULT_REGION=<YOUR_REGION>
Teleport's AWS client loads credentials from different sources in the following order:
- Environment Variables
- Shared credentials file
- Shared configuration file (Teleport always enables shared configuration)
- EC2 Instance Metadata (credentials only)
While you can provide AWS credentials via a shared credentials file or shared
configuration file, you will need to run the Teleport Auth Service with the AWS_PROFILE
environment variable assigned to the name of your profile of choice.
If you have a specific use case that the instructions above do not account for, consult the documentation for the AWS SDK for Go for a detailed description of credential loading behavior.
The IAM role that the Teleport Auth Service authenticates as must have the policies specified in the next section.
IAM policies
Make sure that the IAM role assigned to Teleport is configured with sufficient access to Athena. Below you can find the IAM permissions that the Auth Service requires to use Athena Audit logs as an audit event backend.
You'll need to replace these values in the policy example below:
Placeholder value | Replace with |
---|---|
eu-central-1 | AWS region |
1234567890 | AWS account ID |
audit-long-term | S3 bucket used for long-term storage |
audit-transient | S3 bucket used for transient storage |
audit-sqs | SNS topic name |
audit-sns | SQS name |
kms_id | KMS key ID used for server-side encryption of SNS/SQS/S3 |
audit_db | Glue database used for audit logs |
audit_table | Glue table used for audit logs |
audit_workgroup | Athena workgroup used for audit logs |
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:ListBucketMultipartUploads",
"s3:GetBucketLocation",
"s3:ListBucketVersions",
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::audit-transient",
"arn:aws:s3:::audit-long-term"
],
"Sid": "AllowListingMultipartUploads"
},
{
"Action": [
"s3:PutObject",
"s3:ListMultipartUploadParts",
"s3:GetObjectVersion",
"s3:GetObject",
"s3:DeleteObjectVersion",
"s3:DeleteObject",
"s3:AbortMultipartUpload"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::audit-transient/results/*",
"arn:aws:s3:::audit-transient/large_payloads/*",
"arn:aws:s3:::audit-long-term/events/*"
],
"Sid": "AllowMultipartAndObjectAccess"
},
{
"Action": "sns:Publish",
"Effect": "Allow",
"Resource": "arn:aws:sns:eu-central-1:1234567890:audit-sns",
"Sid": "AllowPublishSNS"
},
{
"Action": [
"sqs:ReceiveMessage",
"sqs:DeleteMessage"
],
"Effect": "Allow",
"Resource": "arn:aws:sqs:eu-central-1:1234567890:audit-sqs",
"Sid": "AllowReceiveSQS"
},
{
"Action": [
"glue:GetTable",
"athena:StartQueryExecution",
"athena:GetQueryResults",
"athena:GetQueryExecution"
],
"Effect": "Allow",
"Resource": [
"arn:aws:glue:eu-central-1:1234567890:table/audit_db/audit_table",
"arn:aws:glue:eu-central-1:1234567890:database/audit_db",
"arn:aws:glue:eu-central-1:1234567890:catalog",
"arn:aws:athena:eu-central-1:1234567890:workgroup/audit_workgroup"
],
"Sid": "AllowAthenaQuery"
},
{
"Action": [
"kms:GenerateDataKey",
"kms:Decrypt"
],
"Effect": "Allow",
"Resource": "arn:aws:kms:eu-central-1:1234567890:key/kms_id",
"Sid": "AllowAthenaKMSUsage"
}
]
}
Migration from Dynamo to the Athena audit logs backend
Migration is only needed if you used Amazon DynamoDB for audit logs and you want to keep old data.
Migration consist of following steps:
- Set up Athena infrastructure
- Dual write to both DynamoDB and Athena, and query from DynamoDB
- Migrate old data from DynamoDB to Athena
- Dual write to both DynamoDB and Athena, and query from Athena
- Disable writing to DynamoDB
In the Teleport storage configuration, audit_events_uri
accepts multiple
URLs. Those URLs are used to configure connections to the different audit
loggers. If more than 1 is used, then events are written to each audit system,
and queries are executed from first one.
If anything goes wrong during migration steps 1-4, roll back to the Amazon
DynamoDB solution by making sure its URL is the first value in the
audit_events_uri
field and removing the Athena URL.
Each of these steps is explained in more detail below.
Dual write to both DynamoDB and Athena, and query from DynamoDB
The second step of migration requires setting the following configuration:
teleport:
storage:
audit_events_uri:
- 'dynamodb://events_table_name'
- 'athena://db.table?otherQueryParams'
When an Auth Service instance is restarted, you should verify that Parquet files
are stored in the S3 bucket specified using the locationS3
parameter.
Migrate old data from DynamoDB to Athena
This step requires using the client machine to export data from Amazon DynamoDB and publish it to the Athena logger. We recommend using, for example, an EC2 instance with a disk size at least 2x bigger than the table size in Amazon DynamoDB.
Instructions for how to use the migration tool can be found on GitHub.
You should set exportTime
to the time when dual writing began.
We recommend running your first migration with the -dry-run
flag because it
validates the exported data. If no errors are reported, proceed to a real
migration without the -dry-run
flag.
Dual write to both DynamoDB and Athena, and query from Athena
Change the order of the audit_events_uri
values in your Teleport
configuration file:
teleport:
storage:
audit_events_uri:
- 'athena://db.table?otherQueryParams'
- 'dynamodb://events_table_name'
When the Auth Service is restarted, you should verify that events are visible on the Audit Logs page.
Disable writing to DynamoDB
Disabling writing to DynamoDB means that you won't be able to roll back to DynamoDB without losing data. Dual writing to both Athena and DynamoDB does not have a significant performance impact, and it's recommended to keep dual writing for some time, even if your system already executes queries from Athena.
To disable writing to DynamoDB, remove the DynamoDB URL from the
audit_events_uri
array.
GCS
Google Cloud Storage (GCS) can be used as storage for recorded sessions. GCS cannot store the audit log or the cluster state. Below is an example of how to configure a Teleport Auth Service to store the recorded sessions in a GCS bucket.
teleport:
storage:
# Path to GCS to store the recorded sessions in.
audit_sessions_uri: 'gs://$BUCKET_NAME/records?projectID=$PROJECT_ID&credentialsPath=$CREDENTIALS_PATH'
We recommend creating a bucket in Dual-Region
mode with the Standard
storage class to ensure cluster performance and high availability.
Replace the following variables in the above example with your own values:
-
$BUCKET_NAME
with the name of the desired GCS bucket. If the bucket does not exist it will be created. Please ensure the following permissions are granted for the given bucket:storage.buckets.get
storage.objects.create
storage.objects.get
storage.objects.list
storage.objects.update
storage.objects.delete
storage.objects.delete
is required in order to clean up multipart files after they have been assembled into the final blob.If the bucket does not exist, please also ensure that the
storage.buckets.create
permission is granted. -
$PROJECT_ID
with a GCS-enabled GCP project. -
$CREDENTIALS_PATH
with the path to a JSON-formatted GCP credentials file configured for a service account applicable to the project.
Firestore
If you are running Teleport on GCP, you can use Firestore as a storage backend to achieve high availability. Firestore backend supports two types of Teleport data:
- Cluster state
- Audit log events
Firestore cannot store the recorded sessions. You are advised to use Google Cloud Storage (GCS) for that as shown above. To configure Teleport to use Firestore:
- Configure all Teleport Auth servers to use Firestore backend in the "storage"
section of
teleport.yaml
as shown below. - Deploy several auth servers connected to Firestore storage backend.
- Deploy several proxy nodes.
- Make sure that all Teleport resource services have the
auth_servers
configuration setting populated with the addresses of your cluster's Auth Service instances or use a load balancer for Auth Service instances in high availability mode.
teleport:
storage:
type: firestore
# Project ID https://support.google.com/googleapi/answer/7014113?hl=en
project_id: Example_GCP_Project_Name
# Name of the Firestore table.
collection_name: Example_TELEPORT_FIRESTORE_TABLE_NAME
# An optional database id to use. If not provided the default
# database for the project is used.
database_id: Example_TELEPORT_FIRESTORE_DATABASE_ID
credentials_path: /var/lib/teleport/gcs_creds
# This setting configures Teleport to send the audit events to three places:
# To keep a copy in Firestore, a copy on a local filesystem, and also write the events to stdout.
# NOTE: The Firestore events table has a different schema to the regular Teleport
# database table, so attempting to use the same table for both will result in errors.
# When using highly available storage like Firestore, you should make sure that the list always specifies
# the High Availability storage method first, as this is what the Teleport web UI uses as its source of events to display.
audit_events_uri: ['firestore://Example_TELEPORT_FIRESTORE_EVENTS_TABLE_NAME?projectID=$PROJECT_ID&credentialsPath=$CREDENTIALS_PATH&databaseID=$DATABASE_ID', 'file:///var/lib/teleport/audit/events', 'stdout://']
# This setting configures Teleport to save the recorded sessions in GCP storage:
audit_sessions_uri: gs://Example_TELEPORT_GCS_BUCKET/records
- Replace
Example_GCP_Project_Name
andExample_TELEPORT_FIRESTORE_TABLE_NAME
with your own settings. Teleport will create the table automatically. Example_TELEPORT_FIRESTORE_TABLE_NAME
andExample_TELEPORT_FIRESTORE_EVENTS_TABLE_NAME
must be different Firestore tables. The schema is different for each. Using the same table name for both will result in errors.- The GCP authentication setting above can be omitted if the machine itself is running on a GCE instance with a Service Account that has access to the Firestore table.
- Audit log settings above are optional. If specified, Teleport will store the audit log in Firestore
and the session recordings must be stored in a GCS bucket, i.e. both
audit_xxx
settings must be present. If they are not set, Teleport will default to a local filesystem for the audit log, i.e./var/lib/teleport/log
on an Auth Service instance.
Azure Blob Storage
Azure Blob Storage for session storage is available starting from Teleport
13.3
.
Azure Blob Storage can be used as storage for recorded sessions. Azure Blob Storage cannot store the audit log or the cluster state. Below is an example of how to configure a Teleport Auth Service instance to store the recorded sessions in an Azure Blob Storage storage account.
teleport:
storage:
audit_sessions_uri: azblob://account-name.blob.core.windows.net
Teleport makes use of two containers in the account, whose names default to
inprogress
and session
, but they can be configured with parameters in the
fragment of the URI.
teleport:
storage:
audit_sessions_uri: azblob://account-name.core.blob.windows.net#session_container=session_container_name&inprogress_container=inprogress_container_name
Permissions
Teleport needs the following permissions on the inprogress
container:
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete
(only on theinprogress
container)
In addition, Teleport will check if the containers exist at startup, and it will
attempt to create them if they can't be confirmed to exist; giving Teleport
Microsoft.Storage/storageAccounts/blobServices/containers/read
will allow for
checking and Microsoft.Storage/storageAccounts/blobServices/containers/write
will allow for creating them.
It's highly recommended to set up a time-based retention
policy
for the session
container, as well as a lifecycle management
policy,
so that recordings are kept in an immutable state for a given period, then
deleted. Teleport will not delete recordings automatically.
With a time-based retention policy in place, it's safe to give Teleport the "Blob Storage Data Contributor" role scoped to the containers, instead of having to define a custom role for it.
Authentication
Teleport will make use of the Azure AD credentials specified by environment variables, Azure AD Workload Identity credentials, or managed identity credentials.
SQLite
The Auth Service uses the SQLite backend when no type
is specified in the
storage section in the Teleport configuration file, or when type
is set to
sqlite
or dir
. The SQLite backend is not designed for high throughput and
it's not capable of serving the needs of Teleport's High Availability configurations.
If you are planning to use SQLite as your backend, scale your cluster slowly and
monitor the number of warning messages in the Auth Service's logs that say
SLOW TRANSACTION
, as that's a sign that the cluster has outgrown the capabilities
of the SQLite backend.
As a stopgap measure until it's possible to migrate the cluster to use a HA-capable backend, you can configure the SQLite backend to reduce the amount of disk synchronization, in exchange for less resilience against system crashes or power loss. For an explanation on what the options mean, see the official SQLite docs. No matter the configuration, we recommend you take regular backups of your cluster state.
To reduce disk synchronization:
teleport:
storage:
type: sqlite
sync: NORMAL
To disable disk synchronization altogether:
teleport:
storage:
type: sqlite
sync: "OFF"
When running on a filesystem that supports file locks (i.e. a local filesystem, not a networked one) it's possible to also configure the SQLite database to use Write-Ahead Logging (see the official docs on WAL mode) for significantly improved performance without sacrificing reliability:
teleport:
storage:
type: sqlite
sync: NORMAL
journal: WAL
The SQLite backend and other required data will be written to the Teleport data directory.
By default, Teleport's data directory is /var/lib/teleport
. To modify
the location set the data_dir
value within the Teleport configuration file.
teleport:
data_dir: /var/lib/teleport_data
CockroachDB
Use of the CockroachDB storage backend requires Teleport Enterprise.
Teleport can use CockroachDB as a storage backend to achieve high availability and survive regional failures. You must take steps to protect access to CockroachDB in this configuration because that is where Teleport secrets like keys and user records will be stored.
At a minimum you must configure CockroachDB to allow Teleport to create tables. Teleport will create the database if given permission to do so but this is not required if the database already exists.
CREATE DATABASE database_name;
CREATE USER database_user;
GRANT CREATE ON DATABASE database_name TO database_user;
You must also enable change feeds in CockroachDB's cluster settings. Teleport
will configure this setting itself if granted SYSTEM MODIFYCLUSTERSETTING
.
SET CLUSTER SETTING kv.rangefeed.enabled = true;
There are several ways to deploy and configure CockroachDB, the details of which are not in scope for this guide. To learn about deploying CockroachDB, see CockroachDB's deployment options. To learn about how to configure multi-region survival goals, see multi-region survival goals.
To configure Teleport to use CockroachDB as a storage backend:
- Configure all Teleport Auth Service instances to use the CockroachDB backend in the
storage
section ofteleport.yaml
as shown below. - Deploy several Auth Service instances connected to the CockroachDB storage backend.
- Deploy several Proxy Service instances.
- Make sure that the Proxy Service instances and all Teleport agent services that
connect directly to the the Auth Service have the
auth_server
configuration setting populated with the address of a load balancer for Auth Service instances.
teleport:
storage:
type: cockroachdb
# conn_string is a required parameter. It is a PostgreSQL connection string used
# to connect to CockroachDB using the PostgreSQL wire protocol. Client
# parameters may be specified using the URL. For a detailed list of available
# parameters see https://www.cockroachlabs.com/docs/stable/connection-parameter
#
# If your certificates are not stored at the default ~/.postgresql
# location, you will need to specify them with the sslcert, sslkey, and
# sslrootcert parameters.
#
# pool_max_conns is an additional parameter that determines the maximum
# number of connections in the connection pool used for the cluster state
# database (the change feed uses an additional connection), defaulting to
# a value that depends on the number of available CPUs.
conn_string: postgresql://user_name@database-address/teleport_backend?sslmode=verify-full&pool_max_conns=20
# change_feed_conn_string is an optional parameter. When unspecified Teleport
# will default to using the same value specified for conn_string. It may be used
# to configure Teleport to use a different user or connection parameters when
# establishing a change feed connection.
#
# If your certificates are not stored at the default ~/.postgresql
# location, you will need to specify them with the sslcert, sslkey, and
# sslrootcert parameters.
change_feed_conn_string: postgresql://user_name@database-address/teleport_backend?sslmode=verify-full
# ttl_job_cron is an optional parameter which configures the interval at which CockroachDB will expire backend
# items based on their time to live. By default this is configured to run every
# 20 minutes. This is used by Teleport to clean up old resources that are no longer
# connected to or needed by Teleport. Note that configuring this to run more
# frequently may have performance implications for CockroachDB.
ttl_job_cron: '*/20 * * * *'