Instagram Basic Display API Pagination

CDS
January 19, 2020
223 views
3 votes
3 Answers

Is there anyway to use paging for the media results obtained using the Instagram Basic Display API? I’ve read the following documentations but they don’t have any examples for using pagination:

I would like to limit the media returned in the response e.g. media 1-15 for the first call, then get the next set e.g. 16-30 in the next call.

TIA

Answers

Chosen as BEST ANSWER
- CDS
- January 20, 2020 at 9:21 pm
- 0 votes
0
Found an answer by playing around with the pagination parameters from this documentation: https://developers.facebook.com/docs/graph-api/using-graph-api#paging

Currently, the Basic Display API returns the most recent 20 media by default. If you want to return more or less than this, use the following url:

https://graph.instagram.com/{user-id}/media?fields={media-fields-you-want-to-return}&access_token={access-token}&limit={number-of-media-you-want-to-return}

To do pagination, you need to have a "next" endpoint to call. To try this out, limit your first call to less than the number of media that you have. You should get 3 endpoints for pagination:
```
    "paging": {
              "cursors": {
                       "before": "abc",
                       "after": "def"
               },
              "next": "ghi"
    }
```
Now add your next endpoint to the original url above: https://graph.instagram.com/{user-id}/media?fields={media-fields-you-want-to-return}&access_token={access-token}&limit={number-of-media-you-want-to-return}&next={next-endpoint}

(Edit)

I was not able to get things working with the answer by CDS. Instead, used an approach that looks for the "next" tag in the returned json formatted string, and used that directly.

In my case, I have built a Storage Access Framework implementation for Instagram, and so here is the flow:

In the "add rows" call that the SAF makes to my provider, I do the initial query of Instagram:

 instagramQueryResult = queryInstagramAccount(instagramUserID, null); // Initially no "next" url

This method in turn looks like:

private JSONObject queryInstagramAccount(String instagramUserID, String nextPageUrl) {
    String instagramToken = InTouchUtils.getInstagramAccessToken();
    if ( instagramToken == null || DEFAULT_MEDIA_SERVICE_ACCESS_TOKEN_DEFAULT.equals(instagramToken)) {
        return null;
    }
    // Returned from Instagram
    String instagramRetval = null;
    // What we send back from this method - normalized list of media plus any pagination data.
    JSONObject returnResult = null;
    // Used to build a normalized array of media objects, flattening out "CAROUSEL_ALBUM" return types
    JSONArray dataArray = new JSONArray(), returnedArray = null;
    // Initial response from Instagram as JSON prior to normalization
    JSONObject instagramJSONResult = null;
    // Parameters for the Volley call
    HashMap<String,String> params = new HashMap<>();
    params.put(INSTAGRAM_ACCESSTOKEN_KEY, InTouchUtils.getInstagramAccessToken());

    // Build the query string
    String url = null;
    if ( nextPageUrl == null ) {
        url = INSTAGRAM_GRAPH_URI + instagramUserID + MEDIA_MEDIA_EDGE;
        String fieldsString = MEDIA_ID_KEY + "," +
                MEDIA_TYPE_KEY + "," +
                MEDIA_URL_KEY + "," +
                MEDIA_THUMBNAIL_URL_KEY + "," +
                MEDIA_UPDATED_TIME_KEY;
        params.put(MEDIA_LIMIT_KEY, Long.toString(batchSize));
        params.put(MEDIA_FIELDS_KEY, fieldsString);
    } else {
        // We've been given the fully created url to use
        url = nextPageUrl;
        params = null;
    }

    try {
        instagramRetval = InTouchUtils.callWebsiteFunction(url, params);
        instagramJSONResult = new JSONObject(instagramRetval);
        returnedArray = instagramJSONResult.getJSONArray(MEDIA_DATA_ARRAY);
        if ( returnedArray.length() == 0) {
            return null;
        }
        for ( int i = 0; i < returnedArray.length(); i++) {
            JSONObject o = returnedArray.getJSONObject(i);
            // this result could have types IMAGE, VIDEO or CAROUSEL_ALBUM. The latter type
            // needs a subsequent call to get the children info
            if (o.getString(MEDIA_TYPE_KEY).equals(MEDIA_TYPE_CAROUSEL)) {
                // Here we need to make a separate call to get the carousel detail
                String mediaID = null;
                try {
                    mediaID = o.getString(MEDIA_ID_KEY);
                    String childrenEdgeUrl = INSTAGRAM_GRAPH_URI + mediaID + MEDIA_CHILDREN_EDGE;
                    params = new HashMap<>();
                    params.put(INSTAGRAM_ACCESSTOKEN_KEY, InTouchUtils.getInstagramAccessToken());
                    String mediafieldsString = MEDIA_ID_KEY + "," +
                            MEDIA_TYPE_KEY + "," +
                            MEDIA_URL_KEY + "," +
                            MEDIA_THUMBNAIL_URL_KEY + "," +
                            MEDIA_UPDATED_TIME_KEY;
                    params.put(MEDIA_FIELDS_KEY, mediafieldsString);
                    String carouselRetval = InTouchUtils.callWebsiteFunction(childrenEdgeUrl, params);
                    JSONObject carouselJSON = new JSONObject(carouselRetval);
                    // Cycle through these entries
                    JSONArray carouselData = carouselJSON.getJSONArray(MEDIA_DATA_ARRAY);
                    if ( carouselData != null && carouselData.length() > 0) {
                        for ( int x = 0; x < carouselData.length(); x++) {
                            dataArray.put(carouselData.getJSONObject(x));
                        }
                    }

                } catch (Exception e) {
                    Timber.d("Lifecycle: Exception processing carousel entry with ID %s, message: %s", mediaID, e.getMessage());
                }
            } else {
                // Add to dataArray
                dataArray.put(o);
            }
        }

    } catch (Exception e) {
        Timber.e("Exception getting Instagram info: %s", e.getMessage());
        return null;
    } finally  {
        returnedArray = null;
        instagramRetval = null;
    }

    // See if there is pagination
    JSONObject pagingObject = null;
    try {
        pagingObject = instagramJSONResult.getJSONObject(MEDIA_PAGING_KEY);
    } catch (JSONException e) {
        // No paging returned, no problem
        pagingObject = null;
    }
    returnResult = new JSONObject();
    try {
        returnResult.put(MEDIA_DATA_ARRAY, dataArray);
        if ( pagingObject != null ) {
            returnResult.put(MEDIA_PAGING_KEY, pagingObject);
        }
    } catch (JSONException e) {
        Timber.d("Lifecycle: exception gathering instagram data: %s", e.getMessage());
        returnResult = null;
    } finally {
        instagramJSONResult = null;
    }
    return returnResult;
}

The initial check has to do with a constant DEFAULT_MEDIA_SERVICE_ACCESS_TOKEN_DEFAULT which is initialized elsewhere in my DocumentsProvider as a default value which means they haven’t yet entered their Instagram credentials, so in that case I bail out.

Where you see calls to "InTouchUtils", this is my own class that encapsulates a bunch of utility functions, like using Volley to make the web API calls.

This method gets called from a couple of places in the DocumentsProvider, so one of the parameters is whether I am processing nextPageUrl or not. If not (nextPageUrl is null), we construct the default URL, in which I am calling the Media "Edge" API for the given user. This method puts the limit in the params hashtable along with the Instagram access token (which are both defined in the preferences aspect of my app), and the fields string.

Note that if nextPageUrl IS passed in, then I completely bypass creating this url, and simply use nextPageUrl instead.

Here is the callWebsiteFunction code from InTouchUtils that uses Volley in synchronous mode to make the website API call (this entire code sample is already being run on a separate thread, and I have given INTERNET permission in my app):

public static String callWebsiteFunction(String url, HashMap params) throws Exception {
    return callWebsiteFunction(url, params, VOLLEY_REQUEST_DEFAULT_TIMEOUT);
}

public static String callWebsiteFunction(String url, HashMap params, int timeoutInSeconds) throws Exception {
    RequestFuture<String> future = RequestFuture.newFuture();
    String newUrl = null;
    if ( params != null ) {
        newUrl = InTouchUtils.createGetRequestUrl(url, params);
    } else {
        newUrl = url;
    }
    String result = null;
    StringRequest request =
            new StringRequest(Request.Method.GET,
                    newUrl,
                    future,
                    new Response.ErrorListener() {
                        @Override
                        public void onErrorResponse(VolleyError error) {
                            Timber.e("Got VolleyError: %s", error.getMessage());
                        }
                    }) {

            };

    InTouchUtils.addToRequestQueue(request);
    try {
        // Using a blocking volley request
        // See SO: https://stackoverflow.com/questions/16904741/can-i-do-a-synchronous-request-with-volley
        try {
            result = future.get(timeoutInSeconds, TimeUnit.SECONDS);
        } catch (InterruptedException e) {
            Timber.e("Got Interrupted Exception attempting Volley request: %s", e.getMessage());
        } catch (ExecutionException e) {
            Timber.e("Got Execution Exception attempting Volley request: %s", e.getMessage());
        } catch (TimeoutException e) {
            Timber.e("Got Timeout Exception attempting Volley request: %s", e.getMessage());
        }
    } catch (Exception e) {
        Timber.e("Got General Exception");
        throw e;
    }
    return result;
}

Now that I have a result, I can process it. The first thing to do is convert the string to a JSONObject so I can start to parse it. Then see if I got back a JSONArray of media items by parsing out the "data" key (constant MEDIA_DATA_ARRAY in my code).

For my purposes, what I want to do is normalize my returned data to a complete list of images and/or videos – so I have to check to see if what got returned is a CAROUSEL_ALBUM type, and if so I make another call to get the media children of that CAROUSEL.

Ultimately, I repackage all the media entries, plus any pagination that was returned from Instagram, and return that back to the caller.

Now back in the caller, I can inspect what I got, and see if I have pagination going on, in particular a "next" url.

If I don’t have one, then I reset the SAF "loading" flag (this is an SAF thing that causes an indeterminate progress bar to show or not show back in the file chooser while your provider is doing work fetching more entries) and I’m done. Notice that the definition of "I don’t have one" is if EITHER the "paging" element or the "next" element are not present. This is because it is possible you won’t get a paging element at all, or you do get a paging element and you don’t get a "next" element within it.

If I do, I indicate to the SAF that I am "loading", then I start a thread ("BatchFetcher") that essentially loops doing the same call to query Instagram, but passing in the "next" url as long as it finds one:

            if (instagramQueryResult == null || instagramQueryResult.length() == 0) {
                // Nothing in instagram for this user
                Timber.d( "addRowstoQueryChildDocumentsCursor: I called queryInstagramAccount() but nothing was there!");
                return;
            }
            JSONArray data = null;
            try {
                data = instagramQueryResult.getJSONArray(MEDIA_DATA_ARRAY);
                if ( data.length() == 0) {
                    return;
                }
            } catch (JSONException e) {
                // No data, nothing to do
                Timber.d("Lifecycle: Found no media data for user, exception was: %s", e.getMessage());
                return;
            }
            JSONObject paging = null;
            String nextUrl = null;
            try {
                paging = instagramQueryResult.getJSONObject(MEDIA_PAGING_KEY);
                // If we get here, test to see if we have a "next" node. If so, that's what
                // we need to query, otherwise we are done.
                nextUrl = paging.getString(MEDIA_NEXT_KEY);
            } catch (JSONException e) {
                // No paging
                paging = null;
                nextUrl = null;
            }

            Timber.d( "addRowstoQueryChildDocumentsCursor: New query fetch got %d entries.", data.length());
            if ( paging == null || nextUrl == null) {
                // We are done - add these to cache and cursor and clear loading flag
                populateResultsToCacheAndCursor(data, cursor);
                clearCursorLoadingNotification(cursor);
                Timber.d( "addRowstoQueryChildDocumentsCursor: Directory retrieval is complete for parentDocumentId: " +
                        parentDocumentId +
                        " took " +
                        (System.currentTimeMillis()- startTimeForDirectoryQuery)+"ms.");

            } else {
                // Store our results to both the cache and cursor - cursor for the initial return,
                // cache for when we come back after the Thread finishes
                populateResultsToCacheAndCursor(data, cursor);
                // Set the getExtras()
                setCursorForLoadingNotification(cursor);
                // Register this cursor with the Resolver to get notified by Thread so Cursor will then notify loader to re-load
                Timber.d( "addRowstoQueryChildDocumentsCursor: registering cursor for notificationUri on: %s and starting BatchFetcher.", getChildDocumentsUri(parentDocumentId).toString());
                cursor.setNotificationUri(getContext().getContentResolver(),getChildDocumentsUri(parentDocumentId));
                // Start new thread
                batchFetcher = new BatchFetcher(parentDocumentId, nextUrl);
                batchFetcher.start();
            }

The thread "batchFetcher" handles checking the return value for media items and continues in a loop until no more entries are found, no more "next url" is returned from Instagram, or until it is interrupted.
It populates an internal cache, which is read on the subsequent request from the SAF to my provider, until there are no more to fetch, in which case the "loading" aspect of the cursor is reset and the SAF will stop requesting data from my provider.

Here is the simple python function I have created on @CDS answer.

import requests

def get_user_data2(user_id, access_token, max_limit=100):
    fields = 'caption, id, username, media_type'
    all_posts = []
    paging_url = f'https://graph.instagram.com/{user_id}/media?fields={fields}&access_token={access_token}&limit={max_limit}'

    while paging_url is not None:
        r = requests.get(paging_url)
        r = r.json()
        all_posts.extend(r['data'])
        try:
            paging_url = r['paging']['next']
        except:
            paging_url = None
    
    return all_posts

Please signup or login to give your own answer.

Click here to cancel reply.