Thank You For Reaching Out To Us
We have received your message and will get back to you within 24-48 hours. Have a great day!

Welcome to Haposoft Blog

Explore our blog for fresh insights, expert commentary, and real-world examples of project development that we're eager to share with you.
aws-containers-at-scale
latest post
Mar 24, 2026
15 min read
AWS Containers at Scale: Choosing Between ECS, EKS, and Fargate for Microservices Growth
Running containers on AWS is straightforward. Operating microservices at scale is not. As systems grow from a handful of services to dozens or hundreds, the real challenges shift to networking, deployment safety, scaling strategy, and cost control. The choices you make between Amazon ECS, Amazon EKS, and AWS Fargate will directly shape how your platform behaves under load, how fast you can ship, and how much you pay each month. This article delves into practical solutions for building a robust AWS container platform. The Scalability Challenges of Large-Scale Microservices In practice, microservices do not become difficult because of containers themselves, but because of what happens around them as the system grows. A setup that works well with a few services often starts to break down when the number of services increases, traffic becomes less predictable, and deployments happen continuously across teams. What used to be a straightforward architecture gradually turns into a system that requires coordination across multiple layers, from networking to deployment and scaling. Microservices are widely adopted because they solve real problems at the application level. They allow teams to move faster and avoid tight coupling between components, while also making it easier to scale specific parts of the system instead of everything at once. In most modern systems, these are not optional advantages but baseline expectations: Ability to scale based on unpredictable traffic patterns Independent deployment of each service Reduced blast radius when failures occur Consistent runtime environments across teams Those benefits remain valid, but they also introduce a different kind of complexity. As the number of services grows, the system stops being about individual services and starts behaving like a distributed platform. At this point, the core challenges shift away from “running containers” and move into areas that require more deliberate design: Service-to-service networking in a dynamic cloud environment CI/CD pipelines that can handle dozens or hundreds of services Autoscaling at both application and infrastructure levels Balancing operational overhead with long-term portability These are not edge cases but standard problems in any large-scale microservices system. AWS addresses them through a combination of Amazon ECS, Amazon EKS, and AWS Fargate, each offering a different trade-off between simplicity, control, and operational responsibility. The goal is not to choose one blindly, but to use them in a way that keeps the system scalable without introducing unnecessary complexity. ECS, EKS, and Fargate – A Strategic Choice Analysis Selecting between Amazon ECS, Amazon EKS, and AWS Fargate is not just a technical comparison. It directly affects how your microservices are deployed, scaled, and operated over time. In real-world systems, this decision determines how much infrastructure your team needs to manage, how flexible your architecture can be, and how easily you can adapt as requirements change. For teams working with AWS container orchestration, the goal is not to pick the most powerful tool, but the one that aligns with their operational model. Amazon ECS: Simplicity and Power of AWS-Native ECS is designed with an "AWS-First" philosophy. It abstracts the complexity of managing orchestrator components. Amazon ECS is designed for teams that want to focus on building applications rather than managing orchestration layers. It integrates tightly with AWS services, which makes it a natural choice for systems that are already fully built on AWS. Instead of dealing with cluster-level complexity, teams can define tasks and services directly, keeping the operational model relatively simple even as the system grows. In practice, ECS works well because it removes unnecessary layers while still providing enough control for most production workloads. This makes ECS a strong option for teams deploying microservices on AWS without needing advanced customization in networking or orchestration. Fine-grained IAM roles at the task level for secure service access Faster task startup compared to Kubernetes-based systems Native integration with ALB, CloudWatch, and other AWS services Amazon EKS: Global Standardization and Flexibility EKS brings the power of the open-source community to AWS. Amazon EKS brings Kubernetes into the AWS ecosystem, which changes the equation entirely. Instead of a simplified AWS-native model, EKS provides a standardized platform that is widely used across cloud providers. This is especially important for teams that need portability or already have experience with Kubernetes. The strength of EKS lies in its ecosystem and extensibility. It allows teams to integrate advanced tools and patterns that are not available in simpler orchestration models: GitOps workflows using tools like ArgoCD Service mesh integration for advanced traffic control Advanced autoscaling with tools like Karpenter For teams searching for aws kubernetes (EKS) solutions, the trade-off is clear: more flexibility comes with more operational responsibility. EKS is powerful, but it requires a deeper understanding of how Kubernetes components work together in production. AWS Fargate: Redefining Serverless Operations AWS Fargate takes a different approach by removing infrastructure management entirely. Instead of provisioning EC2 instances or managing cluster capacity, teams can run containers directly without worrying about the underlying compute layer. This makes it particularly attractive for workloads that need to scale quickly without additional operational burden. Fargate is not an orchestrator, but a compute engine that can be used with both ECS and EKS. Its value becomes clear in scenarios where simplicity and speed are more important than deep customization. For teams evaluating aws fargate use cases, the limitation is that lower control over the runtime environment may not fit highly customized workloads. However, for many microservices architectures, that trade-off is acceptable in exchange for reduced operational overhead. No need to manage servers, patch OS, or handle capacity planning Per-task or per-pod scaling without cluster management Strong isolation at the infrastructure level Comparison Table: ECS vs. EKS vs. Fargate There is no universal answer to ECS vs EKS vs Fargate. The decision depends on how your system is expected to evolve and how much complexity your team can realistically handle. In many cases, teams do not choose just one, but combine them based on workload requirements. Criteria Amazon ECS Amazon EKS AWS Fargate Infrastructure Management Low (AWS manages control plane) Medium (User manages add-ons/nodes) None (Fully Serverless) Customizability Medium (AWS API-driven) Very High (Kubernetes CRDs) Low (Limited root/ kernel access) Scalability Very Fast Depends on Node Privisioner (e.g., Karpenter) Fast (Per Task/Pod) Use Case AWS-centric workflows Multi-cloud & complex CNCF tools Zero-ops, event-driven workloads Designing Networking for Microservices on AWS In microservices systems, networking is not just about connectivity. It determines how services communicate, how traffic is controlled, and how costs scale over time. As the number of services increases, small inefficiencies in network design can quickly become operational issues. A production-ready setup on AWS focuses on clarity in traffic flow and minimizing unnecessary exposure. 3.1. VPC Segmentation A proper VPC structure starts with separating public and private subnets, where each layer has a clear and limited responsibility. This is essential to prevent unnecessary exposure and to maintain control over traffic flow as the system grows. Public Subnets: Used only for Application Load Balancers (ALB) and NAT Gateways. Containers should never be placed in this layer, as it exposes workloads directly to the internet and breaks the security boundary. Private Subnets: Host ECS tasks or EKS pods, where application services actually run. These workloads are not directly accessible from the internet. When they need external access, such as downloading libraries or calling APIs, traffic is routed through the NAT Gateway. VPC Endpoints (Key optimization): Instead of routing traffic through NAT Gateway, which adds data transfer cost, use: Gateway Endpoints for S3 and DynamoDB Interface Endpoints for ECR, CloudWatch, and other services This keeps traffic inside the AWS network and can significantly reduce internal data transfer costs, in some cases up to 80%. Service-to-Service Communication In a dynamic container environment, IP addresses are constantly changing as services scale or are redeployed. Because of this, communication cannot rely on static addressing and must be handled through service discovery. With ECS: Use AWS Cloud Map to register services and expose them via internal DNS (e.g. order-service.local). With EKS: Use CoreDNS, which is built into Kubernetes, to resolve service names within the cluster. For more advanced traffic control, especially during deployments, a service mesh layer can be introduced: App Mesh: Enables traffic routing based on rules, such as sending a percentage of traffic to a new version (e.g. 10% to a new deployment). This approach ensures that services can communicate reliably even as infrastructure changes, while also allowing controlled rollouts and reducing deployment risk. CI/CD: Automation and Zero-Downtime Strategies As the number of services increases, manual deployment quickly becomes a bottleneck. In a microservices system, changes happen continuously across multiple services, so the deployment process needs to be automated, consistent, and safe by default. A well-designed CI/CD pipeline is not just about speed, but about reducing risk and ensuring that each release does not affect system stability. Standard Pipeline Flow A typical pipeline for CI/CD in microservices on AWS follows a sequence of steps that ensure code quality, security, and deployment reliability. Each stage serves a specific purpose and should be automated end-to-end. Code Commit & Validation: When code is pushed, the system runs unit tests and static analysis to detect errors early. This prevents broken code from entering the build stage. Build & Containerization: The application is packaged into a Docker image. This ensures consistency between environments and standardizes how services are deployed. Security Scanning: Images are scanned using Amazon ECR Image Scanning to detect vulnerabilities (CVE) in base images or dependencies. This step is important to prevent security issues from reaching production. Deployment: The new version is deployed using AWS CodeDeploy or integrated deployment tools. At this stage, the system must ensure that updates do not interrupt running services. This pipeline ensures that every change goes through the same process, reducing variability and making deployments predictable even when multiple services are updated at the same time. Blue/Green Deployment Strategy In microservices environments, deployment strategy matters as much as the pipeline itself. Updating services directly using rolling updates can introduce risk, especially when changes affect service behavior or dependencies. Blue/Green deployment addresses this by creating two separate environments: Blue environment: Current production version Green environment: New version being deployed Instead of updating in place, the new version is deployed fully in parallel. Traffic is only switched to the Green environment after it passes health checks and validation. If any issue occurs, traffic can be immediately routed back to the Blue environment without redeploying. This approach provides several advantages: Zero-downtime deployments for user-facing services Immediate rollback without rebuilding or redeploying Safer testing in production-like conditions before full release For systems running microservices on AWS, Blue/Green deployment is one of the most reliable ways to reduce deployment risk while maintaining availability. Autoscaling: Optimizing Resources and Real-World Costs Autoscaling in microservices is not just about adding more resources when traffic increases. In practice, it is about deciding what to scale, when to scale, and based on which signals. If scaling is configured too simply, the system either reacts too late under load or wastes resources during normal operation. On AWS, autoscaling typically happens at two levels: the application layer and the infrastructure layer. These two layers need to work together. Scaling containers without enough underlying capacity leads to bottlenecks, while scaling infrastructure without demand leads to unnecessary cost. Application-Level Scaling At the application level, scaling is usually based on how services behave under load rather than just raw resource usage. While CPU and memory are common metrics, they often do not reflect real demand in microservices systems. For example, a service processing queue messages may appear idle in terms of CPU but still be under heavy workload. A more reliable approach is to scale based on metrics that are closer to actual traffic. This includes request count per target, response latency, or the number of messages waiting in a queue. These signals allow the system to react earlier and more accurately to changes in demand. Instead of relying only on CPU thresholds, a typical setup combines multiple signals: Request-based metrics (e.g. requests per target) Queue-based metrics (e.g. SQS backlog) Custom CloudWatch metrics tied to business logic Infrastructure-Level Scaling At the infrastructure level, the goal is to ensure that there is always enough capacity for containers to run, without overprovisioning resources. When using EC2-backed clusters, this becomes a scheduling problem: containers may be ready to run, but no suitable instance is available. This is where tools like Karpenter or Cluster Autoscaler are used. Instead of scaling nodes based on predefined rules, they react to actual demand from pending workloads. When pods cannot be scheduled, new instances are created automatically, often selecting the most cost-efficient option available. In practice, this approach introduces two important improvements. First, capacity is provisioned only when needed, which reduces idle resources. Second, instance selection can be optimized based on price and workload requirements, including the use of Spot Instances where appropriate. The result is a system that scales more flexibly and uses infrastructure more efficiently, especially in environments with variable or unpredictable traffic patterns. Best Practices for Production-Grade Microservices on AWS At scale, stability does not come from one decision, but from a set of consistent practices applied across all services. These practices are not complex, but they are what keep systems predictable as traffic increases and deployments become more frequent. Keep the system immutable Containers should be treated as immutable units. Once deployed, they should not be modified in place. Any change—whether configuration, dependency, or code—should go through the build pipeline and result in a new image. This ensures that what runs in production is always reproducible and consistent with what was tested. Do not SSH into containers to fix issues Rebuild and redeploy instead of patching in production Handle shutdowns properly Scaling and deployments continuously create and remove containers. If services are terminated too quickly, in-flight requests can be dropped, leading to intermittent errors that are difficult to trace. This small detail has a direct impact on user experience during deployments and scaling events. Configure a stop timeout (typically 30–60 seconds) Allow services to finish ongoing requests Close database and external connections gracefully Centralize logging and observability Containers are ephemeral, so logs stored inside them are not reliable. All logs and metrics should be sent to a centralized system where they can be analyzed over time. Push logs to CloudWatch Logs or a centralized logging stack Use metrics and tracing to understand system behavior Enable container-level monitoring (e.g. Container Insights) Implement meaningful health checks A running container does not always mean a healthy service. Health checks should reflect whether the service can actually handle requests. Expose a /health endpoint Verify connections to critical dependencies (database, cache) Avoid relying only on process-level checks Accurate health checks allow load balancers and orchestrators to make better routing decisions. Apply basic security hardening Security should be part of the default setup, not an afterthought. Simple configurations can significantly reduce risk without adding complexity. Run containers as non-root users Use read-only root filesystems where possible Restrict permissions using IAM roles Conclusion The choice between ECS, EKS, and Fargate comes down to one thing: how much complexity your team can handle. ECS is simple and AWS-native. EKS is powerful but demands Kubernetes expertise. Fargate removes infrastructure entirely. In practice, most production systems mix them—using the right tool for each workload instead of committing to a single orchestrator. Haposoft helps you get this right. We design and deploy AWS container platforms that scale, stay secure, and don't waste your money. ECS, EKS, Fargate—we know when to use what, and more importantly, when not to.
submit-app-google-play-closed-testing
Nov 26, 2025
10 min read
Submit App To Google Play Without Rejection: Handling Closed Testing Failures
When you submit an app to Google Play, most early failures surface in Closed Testing, not the final review. What we share here comes from real testing practice, and it’s what made handling those failures predictable for us. What Google Play Closed Testing Is Closed Testing is where Google first checks your app using real user activity, so it matters to understand what this stage actually requires. Where Closed Testing Fits in the Submission Process When you submit an app to Google Play, it doesn’t go straight to the final review. Before reaching that stage, every build must pass through Google’s internal testing tracks—Internal Testing → Closed Testing → Open Testing. Closed Testing sits in the middle of this flow and is the first point where Google expects real usage from real users. If the app fails here, it never reaches the actual “Submit for Review” step. That’s why many teams face repeated rejections without realizing the root cause comes from this stage, not the final review. Google Play Closed Testing in Simple Terms Google Play Closed Testing is a private release track where your app is shared with a small group of testers you select. These testers install the real build you intend to ship and use it in everyday conditions. The goal is straightforward: Google wants to see whether the app behaves like a complete product when real people interact with it. In this controlled environment, Google observes how users move through your features, how data is handled, and whether the experience matches what you describe in your Play Console settings. This is essentially Google’s early check to confirm that the app is stable, transparent, and built for genuine use—not just something assembled to pass review. What Google Expects During Closed Testing The core function of Google Play Closed Testing is to verify authenticity. Google wants evidence that your app is functional, transparent, and ready for real users, not a rushed build created solely to pass review. To make this evaluation, Google looks for a few key signals: Real testers using real, active Google accounts Real usage patterns, not one-off opens or artificial interactions Consistent engagement over time, typically around 14 days for most app types Actions inside your core features, not empty screens or placeholder flows Behavior that aligns with your Data Safety, privacy details, and feature declarations Evidence that the app is “alive”, such as logs, events, and navigation patterns generated from authentic interactions Google began tightening its review standards in 2023 after more unfinished and auto-generated apps started slipping into the submission flow. Instead of relying only on manual checks, Google now leans heavily on the activity recorded during Closed Testing to understand how an app performs under real use. This gives the review team a clearer picture of stability, data handling, and readiness—making Closed Testing a much more decisive step in whether an app moves forward. Why Google Play Closed Testing Is So Hard to Pass Most teams fail Closed Testing because their testing behavior doesn’t match the actual evaluation signals Google uses. The table below compares real developer mistakes with Google’s real criteria, so you can see exactly why each issue leads to rejection. Common Issues During Testing What Google Actually Checks Teams treat Closed Testing like internal QA. Testers only tap around the interface and rarely complete real user journeys. Google checks full, natural flows. It expects onboarding → core action → follow-up action. Shallow tapping does not confirm real functionality, so Google marks the test as lacking behavioral proof. Testers open the app once or twice and stop. Most activity happens on day 1, then engagement drops to zero. Google checks multi-day usage patterns. It needs recurring activity to evaluate stability and real adoption. One-off launches look like artificial or incomplete testing → fail. Core features remain untouched because testers don’t find or understand them. Navigation confusion prevents users from triggering important flows. Google checks whether declared core features are actually used. If users don’t naturally reach those flows, Google cannot validate them → flagged as “unverified behavior.” Permissions are declared but no tester enters flows that use them. e.g., camera, location, contacts, or other data-related actions never get triggered. Google cross-checks declared permissions with real behavior. If a permission never activates during testing, Google treats the Data Safety form as unverifiable → extremely high rejection rate. Engagement collapses after the first day. Testers lose interest quickly, resulting in long periods of zero activity. Google checks consistency over time (≈14 days). When usage dies early, the system sees weak, unreliable activity that does not resemble real-world usage → rejection. Passing Google Play Closed Testing: A Real Case Study Closed Testing turned out to be far stricter than we expected. What looked like a simple pre-release step quickly became the most decisive part of the review, and our team had to learn this the hard way—through three consecutive rejections before finally getting approved. The Three Issues That Held Us Back in Closed Testing These were the three recurring problems that blocked our app from moving past Google Play’s Closed Testing stage. #Issue 1: Having Testers, but Not Enough “Real” Activity In the first attempt, we only invited one person to join the test, so the app barely generated any meaningful activity. Most of the usage stopped at simple screen opens, and none of the core features were exercised in a way Google could evaluate. With such a small and shallow pattern, the system couldn't treat it as real user participation. The build was rejected right away for not meeting the minimum level of authentic activity. #Issue 2: Misunderstanding the “14-Day Activity” Requirement For the second round, we expanded the group to twelve testers, but most of them stopped using the app after just a few days. The remaining period showed almost no engagement, which meant the full 14-day window Google expects was never actually covered. Although the number of testers looked correct, the lack of continuous usage made the test inconclusive. Google dismissed the submission because the activity dropped off too early. #Issue 3: No Evidence of Real Activity (Logs, Tracking, or Records) By the third attempt, we finally kept twelve testers active for the entire duration, but we failed to capture what they did. There were no logs showing feature flows, no tracking to confirm event sequences, and no recordings for actions tied to sensitive permissions. From Google's viewpoint, the numbers in the dashboard had nothing to support them. Without verifiable evidence, the review team treated the activity as unreliable and rejected the build again. What Finally Helped Us Pass Google Play Closed Testing To fix the issues in the earlier attempts, the team reorganized the entire test instead of adding more testers at random. Everything was structured so Google could see consistent, authentic behaviour from real users. A larger tester group created a more reliable activity curve The previous rounds didn’t generate enough meaningful activity, so we increased the number of people involved. The larger group created a more natural engagement pattern that gave Google more complete usage signals to review. Extending the testing period from 14 to 17 consecutive days To avoid the early drop-off that hurt our earlier attempts, we kept the test running a little longer than the minimum 14 days. The longer duration prevented mid-test gaps and helped Google see continuous interaction across multiple days. Introducing a detailed daily checklist so testers covered the right flows Instead of letting everyone tap around freely, we provided a short list of the core actions Google needed to observe. A clear checklist guided testers through specific actions each day, producing consistent evidence for the features Google needed to verify. Enabling device-level tracking and full system logs Earlier data was too thin to validate behaviour, so we enabled device-level tracking and full system logs to review and later align with Google’s dashboard. This fixed the “invisible activity” issue from the earlier rounds and gave the review team something concrete to validate. Having testers record short videos of their actions Some flows involving permissions weren’t reflected clearly in logs, so testers recorded short clips when performing these tasks. These videos provided direct confirmation of how camera, file access and upload flows worked. Adding small features and content to encourage natural engagement The previous builds didn’t encourage repeated use, so we added minor features and content updates to create more realistic daily engagement. These adjustments helped testers interact with the app in a way that resembled real usage, not surface-level taps. Release Access Form: A Commonly Overlooked Step in the Approval Process After Closed Testing is completed, Google requires developers to submit the Release Access Form before the app can move forward in the publishing process. It sounds simple, but the way this form is written has a direct influence on the final review. Taking the form seriously, paired with the testing evidence we had already prepared, helped our final submission go through smoothly on the fourth attempt. Here’s what became clear when we worked through it: The answers must reflect the real behaviour of the app — especially the sections on intended use and where user data comes from. Any mismatch creates doubt. Google expects clear descriptions of features, user actions and the scope of testing. Vague explanations often slow the process down. Looking at how other developer communities handled this form helped us understand the phrasing that aligns with Google’s criteria. Final Thoughts Closed Testing is ultimately about proving that your app behaves like a real, ready-to-ship product. Most teams lose time because they only react after a rejection; we prevent 80% of those rejections long before you ship. If you want fewer surprises and a tighter, lower-risk review cycle, talk to us and Haposoft will run the entire review cycle for you.
cta-background

Subscribe to Haposoft's Monthly Newsletter

Get expert insights on digital transformation and event update straight to your inbox

Let’s Talk about Your Next Project. How Can We Help?

+1 
©Haposoft 2025. All rights reserved