Zone Analytics Colos Endpoint to GraphQL Analytics
This guide shows how you might migrate from the deprecated (and soon to be sunset) zone analytics API to the GraphQL API. It provides an example for a plausible use-case of the colos endpoint, then shows how that use-case is translated to the GraphQL API. It also explores features of the GraphQL API that make it more powerful than the API it replaces.
In this example, we want to calculate the number of requests for a particular colo, broken down by the hour in which the requests occurred. Referring to the zone analytics colos endpoint, we can construct a curl which retrieves the data from the API.
curl -H "Authorization: Bearer $API_TOKEN" "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/analytics/colos?since=2020-12-10T00:00:00Z" > colos_endpoint_output.json
This query says:
- Given an
API_TOKEN
which has Analytics Read access toZONE_ID
. - Fetch colos analytics for
ZONE_ID
with a time range that starts on2020-12-10T00:00:00Z
(since
parameter) to now.
The question that we want to answer is: “What is the number of requests for ZHR per hour?” Using the colos endpoint response data and some wrangling by jq we can answer that question with this command:
cat colos_endpoint_output.json | jq -c '.result[] | {colo_id: .colo_id, timeseries: .timeseries[]} | {colo_id: .colo_id, timeslot: .timeseries.since, requests: .timeseries.requests.all, bandwidth: .timeseries.bandwidth.all} | select(.requests > 0) | select(.colo_id == "ZRH") '
This jq command is complex, so we can break it down:
.result[]
This means that the result array is split into individual json lines.
{colo_id: .colo_id, timeseries: .timeseries[]}
This breaks each json line into multiple json lines. Each resulting line contains a colo_id
and one element of the timeseries
array.
{colo_id: .colo_id, timeslot: .timeseries.since, requests: .timeseries.requests.all, bandwidth: .timeseries.bandwidth.all}
This flattens out the data we are interested in that is inside the timeseries object of each line.
select(.requests > 0) | select(.colo_id == "ZRH")
This selects only lines that contain more than 0 requests and the colo_id
is ZRH.
The final data we get looks like the following response:Response
How do we get the same result using the GraphQL API?
The GraphQL API allows us to be much more specific about the data that we want to retrieve. While the colos endpoint forces us to retrieve all the information about the breakdown of requests and bandwidth per colo, using the GraphQL API allows us to fetch only the information we are interested in.
The data we want is about HTTP requests. Hence, we use the canonical source for HTTP request data, also known as httpRequestsAdaptiveGroups
. This node in GraphQL API allows you to filter and group by almost any dimension of an HTTP request imaginable. It is Adaptive so responses will be fast since it is driven by our ABR technology.
The following is a GraphQL API query to retrieve the data we need to answer the question: “What is the number of requests for ZHR per hour?”
{viewer {zones(filter: {zoneTag:"$ZONE_TAG"}) {httpRequestsAdaptiveGroups(filter: {datetime_gt: "2020-12-10T00:00:00Z", coloCode:"ZRH"}, limit:10000, orderBy: [datetimeHour_ASC]) {countsum {edgeResponseBytes}avg {sampleInterval}countdimensions {datetimeHourcoloCode}}}}}
Then we can run it with curl:
curl -X POST -H 'Authorization: Bearer $API_TOKEN' https://api.cloudflare.com/client/v4/graphql -d "@./coloGroups.json" > graphqlColoGroupsResponse.json
We can answer our question in the same way as before using jq:
cat graphqlColoGroupsResponse.json| jq -c '.data.viewer.zones[] | .httpRequestsAdaptiveGroups[] | {colo_id: .dimensions.coloCode, timeslot: .dimensions.datetimeHour, requests: .count, bandwidth: .sum.edgeResponseBytes}'
This command is much simpler than what we had before, because the data returned by the GraphQL API is more specific than what is returned by the colos endpoint.
Still, it is worth explaining the command since it will help to understand some of the concepts underlying the GraphQL API.
.data.viewer.zones[]
The format of a GraphQL response is very similar to the query. A successful response always contains a data
object which wraps the data in the response. A query will always have a viewer
object which represents your user. Then, we unwrap the zones objects, one per line. Our query only has one zone (since this is how we chose to do it). But a query could have multiple zones as well.
.httpRequestsAdaptiveGroups[]
The httpRequestsAdaptiveGroups
field is a list, where each datapoint in the list represents a combination of the dimensions that were selected, along with the aggregation that was selected for that combination of the dimensions. Here, we unwrap each of the datapoints, one per row.
{colo_id: .dimensions.coloCode, timeslot: .dimensions.datetimeHour, requests: .count, bandwidth: .sum.edgeResponseBytes}
This is straightforward: it just selects the attributes of each datapoint that we are interested in, in the format which we used previously in the colos endpoint.
The GraphQL API is a very powerful tool, as you can filter and group the data by many dimensions. This feature is totally absent from the colos endpoint in the Zone Analytics API.