Automatically Find & Re-post Popular Instagram Content with Python

Disclaimer: Most of this code was obtained from other tutorials, I do not take credit for writing the selenium code. The purpose of this tutorial is to demonstrate Instagram automation and not to encourage posting/re-posting other people’s work. Unfortunately I am unable to find the original author of the code for Autogram, if anyone knows the author please do let me know so that I can give them credit for their awesome work. The jupyter notebook containing all the code is available here.

What it does?

This takes a keyword as an input from the user and using it as a hashtag, retrieves public Instagram posts. It then sorts those posts based on the number of likes. The post with the most likes is then downloaded to be reposted later. It then pulls any hashtags from the caption, finds other hashtags being used with these hashtags on twitter and Instagram and use them in the caption, along with credit to the original poster of the selected post. The script then opens Instagram, logs into the user’s account and uploads the picture along with the caption.

Importing required libraries and set up the variables:

import requests
import urllib.request
import urllib.parse
import urllib.error
from bs4 import BeautifulSoup
import ssl
import json
from IPython.display import Image
import re
import time
import autoit
from selenium import webdriver
from selenium.webdriver.chrome.options import *
from selenium.webdriver.common.keys import Keys
import operatorimport tweepy as tw
import bs4
import requestsconsumer_key= ‘your-consumer-key’
consumer_secret= ‘your-consumer-secret’
access_token= ‘your-access-token’
access_token_secret= ‘your-access-token-secret’
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

Get input from user:

Here we will ask the user for the key word. This key word will be used as a hashtag to find relevant Instagram posts.

key_word = input('Please enter your key word:')

Finding the Instagram photo for re-posting and then download it:

The following function with use the Instagram explore option and retrieve posts using the key word provided by the user.

def get_posts(key_word):
    url= 'https://www.instagram.com/explore/tags/'+key_word+'/'
    html = urllib.request.urlopen(url).read()    soup = BeautifulSoup(html, 'html.parser')
    script = soup.find('script', text=lambda t: \
                           t.startswith('window._sharedData'))
    page_json = script.text.split(' = ', 1)[1].rstrip(';')    posts=json.loads(page_json)
    b=posts['entry_data']['TagPage'][0]['graphql']['hashtag']['edge_hashtag_to_top_posts']['edges']
    return b

The following function will use the above function to get a list of Instagram posts, and then find the one with the most likes to return.

def get_top_post(key_word):
    posts= get_posts(key_word)
    l=[]
    for item in posts:
        d={}
        d['likes']=item['node']['edge_liked_by']['count']
        d['url']=item['node']['display_url']
        d['urlcode']=item['node']['shortcode']
        d['owner']=item['node']['owner']['id']        d['caption']=item['node']['edge_media_to_caption']['edges'][0]['node']['text']
        l.append(d)
        l.sort(key=operator.itemgetter('likes'), reverse=True)
    return l[0]

This function returns a dictionary containing attributes of the top post.

Download the photo:

We write a function that takes a URL and downloads the photo at that URL and saves it as downloaded.jpg.

def download_image(url):
    f = open('download.jpg','wb')
    f.write(requests.get(url).content)
    f.close()

The URL for this function will come from the dictionary containing attributes of the top post.

Building the caption:

What I want to do is to credit the original uploader, therefore I want to get the username of the original uploader. Our post dictionary contains ‘URL code’. The following function takes that URL code and returns the username of the original uploader.

def get_owner(shortcode):
    url= 'https://www.instagram.com/p/'+shortcode+'/'
    html = urllib.request.urlopen(url).read()    soup = BeautifulSoup(html, 'html.parser')
    script = soup.find('script', text=lambda t: \
                           t.startswith('window._sharedData'))
    page_json = script.text.split(' = ', 1)[1].rstrip(';')    post=json.loads(page_json)
    return post['entry_data']['PostPage'][0]['graphql']['shortcode_media']['owner']['username']

Another thing we want to add to our caption is relevant hashtags. Following are the functions to do that. I have written detailed tutorials on how this code works (Part 1, Part 2).

def return_all_hashtags(tweets, key_word):
    all_hashtags = []
    for tweet in tweets:
        for word in tweet.split():
            if word.startswith('#') and word.lower() != '#' + key_word.lower():
                all_hashtags.append(word.lower())
    return all_hashtagsdef extract_shared_data(doc):
    for script_tag in doc.find_all("script"):
        if script_tag.text.startswith("window._sharedData ="):
            shared_data = re.sub("^window\._sharedData = ", "", script_tag.text)
            shared_data = re.sub(";$", "", shared_data)
            shared_data = json.loads(shared_data)
            return shared_datadef get_hashtags(key_word):
    tweets = tw.Cursor(api.search,
                       q='#' + key_word,
                       lang="en").items(200)
    tweets_list = []
    for tweet in tweets:
        tweets_list.append(tweet.text)    url_string = "https://www.instagram.com/explore/tags/%s/" % key_word
    response = bs4.BeautifulSoup(requests.get(url_string).text, "html.parser")    shared_data = extract_shared_data(response)
    media = shared_data['entry_data']['TagPage'][0]['graphql']['hashtag']['edge_hashtag_to_media']['edges']    captions = []
    for post in media:
        if post['node']['edge_media_to_caption']['edges'] != []:
            captions.append(post['node']['edge_media_to_caption']['edges'][0]['node']['text'])    all_tags = return_all_hashtags(tweets_list + captions, key_word)
    frequency = {}
    for item in set(all_tags):
        frequency[item] = all_tags.count(item)
    return {k: v for k, v in sorted(frequency.items(), key=lambda item: item[1], reverse=True)}

Binding it all in one function:

Now that we have written all the helper functions, we can write one function to make a post using all out helper functions. This function will find the top post, download the image, build the caption which will include credit to the original uploader and then relevant hashtags and return the caption.

def make_post(key_word):
    tag = clean_input(key_word)
    top_post = get_top_post(tag)
    download_image(top_post['url'])
    caption = 'Repost from @'+get_owner(top_post['urlcode']) + ' \n' + ', '.join(list(get_hashtags(tag).keys())[:10])
    return caption

Now that we have downloaded the picture that we want to re-post and have built a caption, we need the script/code to upload it to Instagram.

Autogram:

The Autogram class contains all the code required for automatically loading Instagram, logging in and then uploading the picture. The Autogram class code can be found here. I did not write this code therefore decided not to go over it. However the whole code is available here as mentioned above.

The following code opens a new chrome window and logs into Instagram.

ig = Autogram('your-instagram-username', 'your-instagram-password')
ig.open_instagram()
ig.login()
ig.popup_close_save_login_info()
ig.popup_close_turn_on_notifications()
ig.popup_close_add_to_home_screen()

I usually like to watch it load and log in. Instagram can be unpredictable in giving you pop-ups on logging in. Therefore the script might miss an unexpected pop-up and you might have to make one or two clicks manually.

Once it is logged in, you can execute the following script to upload the picture.

ig.upload_image(os.path.normpath(r'full-path-to-folder\download.jpg'), description=make_post(key_word)
ig.popup_close_turn_on_notifications()

Improvements:

Definitely there’s a lot that can be improved with this script. The first thing I can think of, right now, we are using a fixed path and name to the file that is downloaded and then uploaded (download.jpg), however, to make it more universal, the path can be programmatically extracted and then re-used during upload.

Another improvement would be completely automating selenium driver and close app the pop ups that Instagram throws at you. If that can be achieved, then the whole script can run “headless” meaning you will not see the chrome window loading Instagram and everything will happen in the background, however, I do enjoy watching chrome go on it’s own.

Automatically Find & Re-post Popular Instagram Content with Python

Recent Posts

コメント

Subscribe Form