How Web Scraping is Used in Extracting MercadoLibre Data Using Selenium?

X-Byte Enterprise Crawling
4 min readDec 20, 2021

The blog will show you how to use Selenium, a strong library whose main feature is the ability to automate the interface of any website. Because most websites are so dynamic nowadays, they do not show in a flat or static order.

Learn the following steps:

  • Explain how to acquire the automated Chrome Driver
  • Import the required libraries from Selenium
  • Discuss the different modes for finding elements in Selenium
  • Explore the different options for retrieving the HTML paths for each element
  • Use it on the MercadoLibre website.

Initially, ensure to download the automized web browser from the following page.

https://chromedriver.chromium.org/downloads

You can establish a new notebook on Jupyter and recollect the relevant libraries and settings after downloading the Chromedriver:

import jovian import pandas as pd import random from time import sleep !pip install selenium --upgrade from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.common.exceptions import NoSuchElementException

The following four lines of code are crucial because they allow us to avoid errors while retrieving information and calculating time.

Inspecting the Website

Ways of Getting Elements

find_element_by_xpath find_element_by_css_selector find_element_by_name find_element_by_id find_element_by_class_name

Check out the step wise guide to get the required information

# For using Selenium we must call the driver, also is recommended to accept the cookings driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver') driver.get('https://www.mercadolibre.com.co/') sleep(random.uniform(2.0, 3.0)) WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click()
  • It is critical to identify the uploaded driver and copy the path as shown above.
  • As you may be aware, most websites assign cookies, and we must accept the most recent code created for us.
# For using Selenium we must call the driver, also is recommended to accept the cookings driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver') driver.get('https://www.mercadolibre.com.co/') sleep(random.uniform(2.0, 3.0)) WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click() driver.find_element_by_xpath('//p[text()="Computación"]').click() sleep(random.uniform(2.0, 3.0))

Let’s have the marketing computer information. To do so, we’ll look for the term label in the Computers box.

# For using Selenium we must call the driver, also is recommended to accept the cookings driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver') driver.get('https://www.mercadolibre.com.co/') sleep(random.uniform(2.0, 3.0)) WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click() # Click on PC's driver.find_element_by_xpath('//p[text()="Computación"]').click() sleep(random.uniform(2.0, 3.0)) # Click on Laptops box driver.find_element_by_xpath('//h3[text()="Portátiles"]').click() # Create the list where the chosen elements will be stored and define the next click button. main_list= [] boton = driver.find_element_by_xpath('//span[text()="Siguiente"]')

We define the list to preserve the information in the past two code lines, as well as the next click button.

# In this step it is required to create a loop to access the information depending on the boxes content an ther class link. for i in range(3): main_box = driver.find_elements_by_xpath('//div[@class="ui-search-result__content-wrapper"]') for pc in main_box: precio = pc.find_element_by_xpath('.//span[@class="price-tag-fraction"]').get_attribute("innerHTML") descripcion = pc.find_element_by_xpath('.//h2[@class="ui-search-item__title"]').get_attribute("innerHTML") try: vat_tag = pc.find_element_by_xpath('.//span[@class="ui-search-styled-label ui-search-item__highlight-label__text"]').get_attribute("innerHTML") if vat_tag == 'MÁS VENDIDO': vat_tag = None except NoSuchElementException: vat_tag = None main_list.append({'Product Description': descripcion, 'Price' : precio, 'VAT exclusion': vat_tag})

The following parameters will be defined for this block code.

You can set up a range of 3 for a loop to get the price and description product for the intended search.

Determine whether the product is exempt from Value-added Tax (VAT).

To obtain all of the information required, the Xpath must be specified; this cannot be accomplished by just copying the Xpath.

The VAT information could not be received if the NoSuchElement had not been listed previously.

Check the Below Screenshots of How the Information will Look:

Finally, we could use the stored data to create a DataFrame:

df_mercadolibre = pd.DataFrame(main_list) df_mercadolibre
  • To begin, we installed the Selenium libraries that are required to execute Selenium.
  • Using Selenium, we study how to use the basic command to find elements using XPath.
  • Last but not least, for data loading, defined waiting times were required.
  • We create loops to collect all of the laptops’ specifications and prices.
  • The desired data was preserved in the lists that were constructed.
  • The DataFrame was built with Pandas.

If you are in search of someone who can scrape the MercadoLibre Data using Selenium, then contact X-Byte Enterprise Crawling today or request a quote!!

Originally published at https://www.xbyte.io.

--

--

X-Byte Enterprise Crawling

Offer web scraping & Data extraction services like Amazon data scraping, Real Estate,eBay, Travel & all type of services per client requirements. www.xbyte.io