fix: resolve gRPC DNS resolution issues in Kubernetes #8384 (#8387)

* fix: resolve gRPC DNS resolution issues in Kubernetes #8384

- Replace direct `grpc.NewClient` calls with `pb.GrpcDial` for consistent connection establishment
- Fix async DNS resolution behavior in K8s with `ndots:5`
- Ensure high-level components use established helper for reliable networking

* refactor: refine gRPC DNS fix and add documentation

- Use instance's grpcDialOption in BrokerClient.ConfigureTopic
- Add detailed comments to GrpcDial explaining Kubernetes DNS resolution rationale

* fix: ensure proper context propagation in broker_client gRPC calls

- Pass the provided `ctx` to `pb.GrpcDial` in `ConfigureTopic` and `GetUnflushedMessages`
- Ensures that timeouts and cancellations are correctly honored during connection establishment

* docs: refine gRPC resolver documentation and cleanup dead code

- Enhanced documentation for `GrpcDial` with explicit warnings about global state mutation when using `resolver.SetDefaultScheme("passthrough")`.
- Recommended `passthrough:///` prefix as the primary migration path for `grpc.NewClient`.
- Removed dead commented-out code for `grpc.WithBlock()` and `grpc.WithTimeout()`.
This commit is contained in:
Chris Lu
2026-02-19 15:46:02 -08:00
committed by GitHub
parent e9c45144cf
commit a2005cb2a6
3 changed files with 18 additions and 10 deletions

View File

@@ -82,13 +82,22 @@ func NewGrpcServer(opts ...grpc.ServerOption) *grpc.Server {
return grpc.NewServer(options...)
}
// GrpcDial establishes a gRPC connection.
// IMPORTANT: This function intentionally uses the deprecated grpc.DialContext/grpc.Dial behavior
// to preserve the "passthrough" resolver semantics required for Kubernetes ndots/search-domain DNS behavior.
// This allows kube DNS suffixes to be correctly appended by the OS resolver.
//
// Switching to grpc.NewClient (which defaults to the "dns" resolver) would break this behavior
// in environments with ndots:5 and many-dot hostnames.
//
// Safe alternatives if switching to grpc.NewClient:
// 1. Prefix the target with "passthrough:///" (e.g., "passthrough:///my-service:8080"). This is the recommended primary migration path.
// 2. Call resolver.SetDefaultScheme("passthrough") exactly once during init().
// WARNING: This is NOT thread-safe, and mutates global resolver state affecting all grpc.NewClient calls in the process.
func GrpcDial(ctx context.Context, address string, waitForReady bool, opts ...grpc.DialOption) (*grpc.ClientConn, error) {
// opts = append(opts, grpc.WithBlock())
// opts = append(opts, grpc.WithTimeout(time.Duration(5*time.Second)))
var options []grpc.DialOption
options = append(options,
// grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultCallOptions(
grpc.MaxCallSendMsgSize(Max_Message_Size),
grpc.MaxCallRecvMsgSize(Max_Message_Size),