Faking and rotating User Agents in python

What is User Agent?

Before we start, it is important to take a note that we will be using Python 3 and on earlier versions code might differ.

A User-Agent is a string of characteristics that your browser sends to the website for identification purposes. This string usually contains type of application, software name and version and operation system. Websites use this data to better display content and optimize performance.

So why change it?

If you are reading this, you probably know that when not accounting for the user agent your scraper gets blocked very quickly. Bypassing this problem is not complicated knowing how to rotate your user agents.

How to set your user agents?

There are 2 ways, that are commonly used:
1.      Python Requests:

import requests 
url = 'YOUR_DESIRED_URL'
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'
headers = {'User-Agent': user_agent}
response = requests.get(url,headers=headers)
html = response.content
print(response.content)

2.      Urllib

import urllib.request
url = 'https://httpbin.org/user-agent'
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'
request = urllib.request.Request(url,headers={'User-Agent': user_agent})
response = urllib.request.urlopen(request)
html = response.read()

Before rotating

There are still very important things to keep in mind.

First, it is basically mandatory nowadays to update your user agents consistently. New browser versions come out very quickly nowadays and older versions can flag your scraper to the website.

Second, while rotating User-Agents helps your scrapers not too get blocked, almost always it is not enough and that’s where our proxy list comes in handy. Having REST and GraphGL APIs, makes it easy to incert in any code. Sending each request to the website with different User-agent plus IP address will provide maximum protection to your scraper and make it “invisible” for the most webpages.

User-Agent list

For the successful rotation you will need not only a proxy list, but only a good User-Agent list. If you are feeling lazy Python library "fake-useragent" might come in handy, nevertheless we recommend to create it yourself and keep it updated.  Here is something to start with:

user_agent_list = [
###Chrome
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
    'Mozilla/5.0 (Windows NT 5.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
###Firefox
    'Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)',
    'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)',
    'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (Windows NT 6.2; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)',
    'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
    'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)',
    'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)'
]

Rotating using Requests

import requests
import random
url = 'YOUR_DESIRED_URL'
user_agent_list = [USER-AGENT-LIST]


for i in range(PLEASE_SPECIFY_YOUR_RANGE):
    user_agent = random.choice(user_agent_list)
    headers = {'User-Agent': user_agent}
    response = requests.get(url,headers=headers)

Rotating using Urllib

import urllib.request
import random
url = 'YOUR_DESIRED_URL'
user_agent_list = [USER-AGENT-LIST]


for i in range(PLEASE_SPECIFY_YOUR_RANGE):
    user_agent = random.choice(user_agent_list)
    headers = {'User-Agent': user_agent}
    request = urllib.request.Request(url,headers={'User-Agent': user_agent})
    response = urllib.request.urlopen(request)
    html = response.read()