❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Learning Notes #55 – API Keys and Tokens

14 January 2025 at 05:27

Tokens and API keys are foundational tools that ensure secure communication between systems. They enable authentication, authorization, and access control, facilitating secure data exchange.

What Are Tokens?

Tokens are digital objects that represent a specific set of permissions or claims. They are often used in authentication and authorization processes to verify a user’s identity or grant access to resources. Tokens can be time-bound and carry information like:

  1. User Identity: Information about the user or system initiating the request.
  2. Scope of Access: Details about what actions or resources the token permits.
  3. Validity Period: Start and expiry times for the token.

Common Types of Tokens:

  • JWT (JSON Web Tokens): Compact, URL-safe tokens containing a payload, signature, and header.
  • Opaque Tokens: Tokens without embedded information; they require validation against a server.
  • Refresh Tokens: Used to obtain a new access token when the current one expires.

What Are API Keys?

API keys are unique identifiers used to authenticate applications or systems accessing APIs. They are simple to use and act as a credential to allow systems to make authorized API calls.

Key Characteristics:

  • Static Credential: Unlike tokens, API keys do not typically expire unless explicitly revoked.
  • Simple to Use: They are easy to implement and often passed in headers or query parameters.
  • Application-Specific: Keys are tied to specific applications rather than user accounts.

Functionalities and Usage

Both tokens and API keys enable secure interaction between systems, but their application depends on the scenario

1. Authentication

  • Tokens: Often used for user authentication in web apps and APIs.
    • Example: A JWT issued after login is included in subsequent API requests to validate the user’s session.
  • API Keys: Authenticate applications rather than users.
    • Example: A weather app uses an API key to fetch data from a weather API.

2. Authorization

  • Tokens: Define user-specific permissions and roles.
    • Example: A token allows read-only access to specific resources for a particular user.
  • API Keys: Grant access to predefined resources for the application.
    • Example: An API key allows access to public datasets but restricts write operations.

3. Rate Limiting and Monitoring

Both tokens and API keys can be used to

  • Enforce usage limits.
  • Monitor and log API usage for analytics and security.

Considerations for Secure Implementation

1. For Tokens

  • Use HTTPS: Always transmit tokens over HTTPS to prevent interception.
  • Implement Expiry: Set reasonable expiry times to minimize risks.
  • Adopt Refresh Tokens: Allow users to obtain new tokens securely when access tokens expire.
  • Validate Signatures: For JWTs, validate the signature to ensure the token’s integrity.

2. For API Keys

  • Restrict IP Usage: Limit the key’s use to specific IPs or networks.
  • Set Permissions: Assign the minimum required permissions for the API key.
  • Regenerate Periodically: Refresh keys periodically to mitigate risks.
  • Monitor Usage: Track API key usage for anomalies and revoke compromised keys promptly.

3. For Both

  • Avoid Hardcoding: Never embed tokens or keys in source code. Use environment variables or secure vaults.
  • Audit and Rotate: Regularly audit and rotate keys and tokens to maintain security.
  • Educate Users: Ensure users and developers understand secure handling practices.

Learning Notes #50 – Fixed Partition Pattern | Distributed Pattern

9 January 2025 at 16:51

Today, i learnt about fixed partition, where it handles about balancing the data among servers without high movement of data. In this blog, i jot down notes on how fixed partition helps in solving the problem.

This entire blog is inspired from https://www.linkedin.com/pulse/distributed-systems-design-pattern-fixed-partitions-retail-kumar-v-c34pc/?trackingId=DMovSwEZSfCzKZEKa7yJrg%3D%3D

Problem Statement

In a distributed key-value store system, data items need to be mapped to a set of cluster nodes to ensure efficient storage and retrieval. The system must satisfy the following requirements,

  1. Uniform Distribution: Data should be evenly distributed across all cluster nodes to avoid overloading any single node.
  2. Deterministic Mapping: Given a data item, the specific node responsible for storing it should be determinable without querying all the nodes in the cluster.

A common approach to achieve these goals is to use hashing with a modulo operation. For example, if there are three nodes in the cluster, the key is hashed, and the hash value modulo the number of nodes determines the node to store the data. However, this method has a critical drawback,

Rebalancing Issue: When the cluster size changes (e.g., nodes are added or removed), the mapping for most keys changes. This requires the system to move almost all the data to new nodes, leading to significant overhead in terms of time and resources, especially when dealing with large data volumes.

Challenge: How can we design a mapping mechanism that minimizes data movement during cluster size changes while maintaining uniform distribution and deterministic mapping?

Solution

There is a concept of Fixed Partitioning,

What Is Fixed Partitioning?

This pattern organizes data into a predefined number of fixed partitions that remain constant over time. Data is assigned to these partitions using a hashing algorithm, ensuring that the mapping of data to partitions is permanent. The system separates the fixed partitioning of data from the physical servers managing these partitions, enabling seamless scaling.

Key Features of Fixed Partitioning

  1. Fixed Number of Partitions
    • The number of partitions is determined during system initialization (e.g., 8 partitions).
    • Data is assigned to these partitions based on a consistent hashing algorithm.
  2. Stable Data Mapping
    • Each piece of data is permanently mapped to a specific partition.
    • This eliminates the need for large-scale data reshuffling when scaling the system.
  3. Adjustable Partition-to-Server Mapping
    • Partitions can be reassigned to different servers as the system scales.
    • Only the physical location of the partitions changes; the fixed mapping remains intact.
  4. Balanced Load Distribution
    • Partitions are distributed evenly across servers to balance the workload.
    • Adding new servers involves reassigning partitions without moving or reorganizing data within the partitions.

Naive Example

We have a banking system with transactions stored in 8 fixed partitions, distributed based on a customer’s account ID.


CREATE TABLE transactions (
    id SERIAL PRIMARY KEY,
    account_id INT NOT NULL,
    transaction_amount NUMERIC(10, 2) NOT NULL,
    transaction_date DATE NOT NULL
) PARTITION BY HASH (account_id);

1. Create Partition


DO $$
BEGIN
    FOR i IN 0..7 LOOP
        EXECUTE format(
            'CREATE TABLE transactions_p%s PARTITION OF transactions FOR VALUES WITH (modulus 8, remainder %s);',
            i, i
        );
    END LOOP;
END $$;

This creates 8 partitions (transactions_p0 to transactions_p7) based on the hash remainder of account_id modulo 8.

2. Inserting Data

When inserting data into the transactions table, PostgreSQL automatically places it into the correct partition based on the account_id.


INSERT INTO transactions (account_id, transaction_amount, transaction_date)
VALUES (12345, 500.00, '2025-01-01');

The hash of 12345 % 8 determines the target partition (e.g., transactions_p5).

3. Querying Data

Querying the base table works transparently across all partitions


SELECT * FROM transactions WHERE account_id = 12345;

PostgreSQL automatically routes the query to the correct partition.

4. Scaling by Adding Servers

Initial Setup:

Suppose we have 4 servers managing the partitions,

  • Server 1: transactions_p0, transactions_p1
  • Server 2: transactions_p2, transactions_p3
  • Server 3: transactions_p4, transactions_p5
  • Server 4: transactions_p6, transactions_p7

Adding a New Server:

When a 5th server is added, we redistribute partitions,

  • Server 1: transactions_p0
  • Server 2: transactions_p1
  • Server 3: transactions_p2, transactions_p3
  • Server 4: transactions_p4
  • Server 5: transactions_p5, transactions_p6, transactions_p7

Partition Migration

  • During the migration, transactions_p5 is copied from Server 3 to Server 5.
  • Once the migration is complete, Server 5 becomes responsible for transactions_p5.

Benefits:

  1. Minimal Data Movement – When scaling, only the partitions being reassigned are copied to new servers. Data within partitions remains stable.
  2. Optimized Performance – Queries are routed directly to the relevant partition, minimizing scan times.
  3. Scalability – Adding servers is straightforward, as it involves reassigning partitions, not reorganizing data.

What happens when a new server is added then. Don’t we need to copy the data ?

When a partition is moved to a new server (e.g., partition_b from server_A to server_B), the data in the partition must be copied to the new server. However,

  1. The copying is limited to the partition being reassigned.
  2. No data within the partition is reorganized.
  3. Once the partition is fully migrated, the original copy is typically deleted.

For example, in PostgreSQL,

  • Export the Partition pg_dump -t partition_b -h server_A -U postgres > partition_b.sql
  • Import on New Server: psql -h server_B -U postgres -d mydb < partition_b.sql

❌
❌