The condition was inverted - it was caching lookups with errors
instead of successful lookups. This caused every replicated write
to make a gRPC call to master for volume location lookup, resulting
in ~1 second latency for writeToReplicas.
The bug particularly affected TTL volumes because:
- More unique volumes are created (separate pools per TTL)
- Volumes expire and get recreated frequently
- Each new volume requires a fresh lookup (cache miss)
- Higher volume churn = more cache misses = more master lookups
With this fix, successful lookups are cached for 10 minutes,
reducing replication latency from ~1s to ~10ms for cached volumes.
* Added context for the MasterClient's methods to avoid endless loops
* Returned WithClient function. Added WithClientCustomGetMaster function
* Hid unused ctx arguments
* Using a common context for the KeepConnectedToMaster and WaitUntilConnected functions
* Changed the context termination check in the tryConnectToMaster function
* Added a child context to the tryConnectToMaster function
* Added a common context for KeepConnectedToMaster and WaitUntilConnected functions in benchmark