How to Extract Google Ad Results Using Python?

X-Byte Enterprise Crawling

3 min readNov 30, 2021

There’re two kinds of ad results available having different layouts:

Logic:

Import libraries for working with.
Add user-agent for fake real-user visits.
Enter the search queries.
Have HTML response.
Have HTML code.
Discover and specify where to extract data.
Repeat over that till nothing is left.

Google might block the requests if:

Recognize script as the script, e.g. python-requests.
There’re so many requests from a single IP address.
Not working like a human. Fundamentally everything above

There’re many ways to tag along blocking scripts from Google:

Use referrer or Python-requests Session Objects.
Use customized headers -User Agents and a list of different user agents.
Use headless browsers or browser auto frameworks like Pyppeteer or Selenium.
Use proxies as well as rotate them.
Use CAPTCHA solving services.
Use request delays much slower.

Shopping Ads

import requests, lxml, urllib.parse from bs4 import BeautifulSoup # Adding User-agent (default user-agent from requests library is 'python-requests') # https://github.com/psf/requests/blob/589c4547338b592b1fb77c65663d8aa6fbb7e38b/requests/utils.py#L808-L814 headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582" } # Search query params = {'q': 'сoffee buy'} # Getting HTML response html = requests.get(f'https://www.google.com/search?q=', headers=headers, params=params).text # Getting HTML code from BeautifulSoup soup = BeautifulSoup(html, 'lxml') # Looking for container that has all necessary data findAll() or find_all() for container in soup.findAll('div', class_='RnJeZd top pla-unit-title'): # Scraping title title = container.text # Creating beginning of the link to join afterwards startOfLink = 'https://www.googleadservices.com/pagead' # Scraping end of the link to join afterwards endOfLink = container.find('a')['href'] # Combining (joining) relative and absolute URL's (adding begining and end link) ad_link = urllib.parse.urljoin(startOfLink, endOfLink) # Printing each title and link on a new line print(f'{title}\n{ad_link}\n') # Output ''' Jot Ultra Coffee Triple | Ultra Concentrated https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABABGgJ5bQ&sig=AOD64_0x-PlrWek-JFlDTSo7E9Z7YhUOjg&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCED4&adurl= MUD\WTR | A Healthier Coffee Alternative, 30 servings https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABAJGgJ5bQ&sig=AOD64_3gltZJ6kPrxic5o8yUO5cuJrHXnw&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCEEg&adurl= Jot Ultra Coffee Double | 2 bottles = 28 cups https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABAHGgJ5bQ&sig=AOD64_3hD0JWZSLr8NUgoTW5K0HMzdFvng&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCEE4&adurl= '''

Note: At times, there would be zero results as Google didn’t indicate ads at script runtime. Just run that again.

Standard Website Ads

import requests, lxml, urllib.parse from bs4 import BeautifulSoup # Adding user-agent to fake real user visit headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582" } # Search query params = {'q': 'coffee buy'} # HTML response html = requests.get(f'https://www.google.com/search?q=', headers=headers, params=params).text # HTML code from BeautifulSoup soup = BeautifulSoup(html, 'lxml') # Looking for container that has needed data and iterating over it for container in soup.findAll('span', class_='Zu0yb LWAWHf qzEoUe'): # Using .text since in 'span' there's no other text other than link ad_link = container.text # Printing links print(ad_link) # Output ''' https://www.coffeeam.com/ https://www.sfbaycoffee.com/ https://www.onyxcoffeelab.com/ https://www.enjoybettercoffee.com/ https://www.klatchroasting.com/ https://www.pachamamacoffee.com/ https://www.bulletproof.com/ '''

Use Google Ads Results API

Instead, you can perform the same things using Google Ad Results API from X-Byte, except you don’t need to consider solving CAPTCHA in case you send so many requests, getting proxies, reducing development complexities, and offering easy data manipulation.

This is a paid API.

Code to integrate:

import os from serpapi import GoogleSearch params = { "engine": "google", "q": "kitchen table", "api_key": os.getenv("API_KEY"), "no_cache":"true" # add this param if it throws an error } search = GoogleSearch(params) results = search.get_dict() for ad in results['ads']: # shopping ads -> ['shopping_results'] shopping_ad = ad['tracking_link'] # shopping ads -> ['link'] print(shopping_ad) # Output for regular ads ''' https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAPGgJxdQ&ae=2&sig=AOD64_2ZH32FlwxW1XqO9V49i2L8J5qy2A&q&adurl https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAMGgJxdQ&ae=2&sig=AOD64_2l1PVJAqbVmrcu8UpkGPVk-VK3UA&q&adurl https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAQGgJxdQ&sig=AOD64_2DDuyRZUcFi04jfneAzwnOQBuLtw&q&adurl ''' # Output for shopping ads ''' https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAEGgJ5bQ&ae=2&sig=AOD64_2zCyytR6tDeB3BjdOX5sFQQKwOAA&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARA8&adurl= https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAFGgJ5bQ&ae=2&sig=AOD64_2HeGVTNF91vkSHjg-wRDtC1ouATw&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARBI&adurl= https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAGGgJ5bQ&ae=2&sig=AOD64_1n4ztvwQxiSMInwgntgY-WyVc2eQ&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARBY&adurl= '''

In case, you have any queries or anything isn’t working properly or you need to write some other codes, feel free to contact X-Byte Enterprise Crawling or ask for a free quote!

Originally published at https://www.xbyte.io.

How to Extract Google Ad Results Using Python?

Written by X-Byte Enterprise Crawling