skip to Main Content

I’m accessing the Google Cloud Datastore from my PHP App Engine instance using the official google-cloud-php library.

I’m consistently seeing upwards of 0.35 seconds of latency per query. Even for simple queries with less than 100 entities in the datastore.

My web app needs to make 4 or so consecutive datastore queries per request which makes datastore entirely unusable (consistently 1.5 to 3 seconds of latency per page load)

Am I missing something?


Here’s how I connect to the datastore:

        // Same issue even without 'authCache' (a memcached wrapper).
        $authCache = new DatastoreAuthCache();
        $datastore = new DatastoreClient([
            'projectId' => AppIdentityService::getApplicationId(),
            'authCache' => $authCache
        ]);
        Datastore::$ds = $datastore;

Here are two examples of my queries:

    // Lookup by keys.
    $ds = Datastore::get();
    $queryResults = $ds->lookupBatch($keys);
    $rows = keyValue($queryResults, "found");

    // Query by fields.
    $query = $ds->query()
        ->kind(self::EntityName)
        ->filter('owner', '=', $a)
        ->filter('target', '=', $b)
        ->limit(1)
        ->keysOnly();

    $results = $ds->runQuery($query);
    foreach ($results as $entity) {
        return $entity;
    }

Is this level of latency to be expected? I can cache some results, but not all, so I’m hoping this is an issue on my end.

Here’s what I’ve already tried to improve the latency:

  • Added ‘authCache’ handler to cache datastore API tokens (no impact)

  • Confirmed datastore and app engine instance are in the same region

  • Confirmed that index.yaml is set up correctly

  • Confirmed that latency is due to datastore calls and not business logic

  • Other database backends are working fine (Cloud SQL server returns in < 0.1 seconds). The local datastore emulator also returns in <0.01 seconds.

What can I do to improve this latency?

2

Answers


  1. I’m not sure this qualifies as an answer but I’ll try to help you debug it using Stack Driver.

    Take a look at your logs in the Cloud Console – https://console.cloud.google.com/logs/viewer

    Look for the slow handler.

    gae_log_message

    Hover over the latency column and click. This will take you to the distributed tracing system in Stack Driver monitoring. It’s possible this will confirm that Datastore is slow but hopefully it will shed some light on something else that’s causing the slowness.

    It will look something like this:

    stack_driver_tracing

    Login or Signup to reply.
  2. The biggest bottleneck is establishing a connection with Datastore (which can take upto 200ms this is where the auth cache helps). Unfortunately this is bad news for php because we cannot establish a permanent connection. The Datastore client needs to reconnect on every request.

    It gets even more difficult when trying to optimise because it relies heavily on lazy loading. What seems to work best is reusing the same datastore client instance for all requests.

    Limiting the amount of filters speed up queries and instead retrieve larger chunks of data which can then be filtered locally. With the use of something like redis which can also doubles as a data cache.

    Batching updates picked up by a cron service can also help in releasing the request quicker. Notifications can be pushed to a `websocket or picked up on a subsequent requests.

    You didn’t mention it so it is not clear whether you are using gRPC, DatastoreClient will use grpc by default if the module is installed otherwise it falls back to REST which is significantly slower in comparison.

    To check if you have grpc installed:

    php -m|grep grpc
    

    The only other advice I can think of are indexes, but this will only help with large datasets. You should also try testing at another data centre it may be that the one you are at gets congested.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search