How to Extract Amazon Data without any APIs?

X-Byte Enterprise Crawling
5 min readSep 23, 2021

We all know that Amazon has its own an API, but sometimes, it’s good not to use this API (request limit is a big reason), so let’s see how to extract Amazon data without an API!

Initially, we would require to install Python. Python has many web data scraping packages, Selenium and BeautifulSoup are some, in the tutorial we will use Selenium.

Initially, let’s make a way to Amazon’s site as well as look at the product. Just see the laptop example.

You can see that there are lots of data just in the screenshot including title, ratings, prices, and more. Let’s say we want to extract the title, pricing, ratings, etc. and how we will scrape them.

Starting with the Python environment, we wish to import a Selenium package, Panda package, as well as a webdriver manager we use for the product, we perform that by utilizing the following lines of code:

#IMPORT THESE PACKAGES import selenium from selenium import webdriver import pandas as pd #OPTIONAL PACKAGE, BUY MAYBE NEEDED from webdriver_manager.chrome import ChromeDriverManager

Next, we wish to install as well as declare the driver as well as point that to the website and the driver is a web browser we are using. To do that, use this code:

#THIS INITIALIZES THE DRIVER (AKA THE WEB BROWSER) driver = webdriver.Chrome(ChromeDriverManager().install()) #THIS PRETTY MUCH TELLS THE WEB BROWSER WHICH WEBSITE TO GO TO driver.get('https://www.amazon.com/Acer-Display-Graphics-Keyboard-A515-43-R19L/dp/B07RF1XD36/ref=sr_1_3?dchild=1&keywords=laptop&qid=1618857971&sr=8-3')

The driver.get function tells a browser that website we wish to go to. After that, we wish to provide 2 variables including title and pricing, it will hold text of the values from a website, this will make sense in the second, then we would use a Selenium function named driver.find_element_by_xpath() for getting text from a website as well as store that within those variables, that is how we will do that:

#TITLE OF PRODUCT Title = driver.find_element_by_xpath('PASTE THE FULL XPATH HERE').text #PRICE OF PRODUCT Price = driver.find_element_by_xpath('PASTE THE FULL XPATH HERE').text #NUMBER OF RATINGS Rating = driver.find_element_by_xpath('PASTE THE FULL XPATH HERE').text

After that, we wish to get complete xpath, for doing that, we wish to open a web browser and go to the particular web page as well as right click on any text of the title and click inspect > to look at highlighted portion of inspector console > and then right click on that as well as click copy > and click on the copy full xpath, utilize the given image as the source:

Then, we wish to copy- paste the xpath in between quotes inside a Title variable beyond, that variable will look something like that:

Title = driver.find_element_by_xpath('/html/body/div[2]/div[3] We then want to /div[9]/div[4]/div[4]/div[1]/div/h1/span').text

Wonderful! Now, let’s do similar thing for pricing, just right click on any text on a price, then click on inspect > look at highlighted part on an inspector console > right click on that as well as click copy > click on the copy complete xpath, utilize the given image as a source:

After that, paste it in a price variable given above, the pricing variable would look like that:

Price = driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[9]/div[4]/div[4]/div[10]/div[1]/div/table/tbody/tr/td[2]/span[1]').text

In the end, let’s do that total ratings also, just right click on any of these text on ratings, and click on inspect > look at highlighted part on an inspector console > then right click on it as well as click on copy > click on copy complete xpath, utilize the provided image as a source:

Rating = driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[8]/div[4]/div[4]/div[3]/div/span[3]/a/span').text

Great! Now, we just need to create empty Pandas data frames using variable names and append it to the data frame. For doing this, utilize the given codes:

#CREATES A EMPTY DATAFRAME data1 = {'Title':[], 'Price':[], 'Rating':[],} fulldf = pd.DataFrame(data1)

Nearly done! Now, append data from a variable into other variable and append data into the panda data frame and utilize the given lines of codes to perform that:

#APPENDING THE DATA PULLED FROM ABOVE INTO THE EXISTING DATAFRAME row = [Title, Price, Rating] fulldf.loc[len(fulldf)] = row

Amazing! That is all of a code (having some additional) we have developed in the project:

#IMPORT THESE PACKAGES import selenium from selenium import webdriver import pandas as pd #OPTIONAL PACKAGE, BUY MAYBE NEEDED from webdriver_manager.chrome import ChromeDriverManager #THIS INITIALIZES THE DRIVER (AKA THE WEB BROWSER) driver = webdriver.Chrome(ChromeDriverManager().install()) #THIS PRETTY MUCH TELLS THE WEB BROWSER WHICH WEBSITE TO GO TO driver.get('https://www.amazon.com/Acer-Display-Graphics-Keyboard-A515-43-R19L/dp/B07RF1XD36/ref=sr_1_3?dchild=1&keywords=laptop&qid=1618857971&sr=8-3') #TITLE OF PRODUCT Title = driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[9]/div[4]/div[4]/div[1]/div/h1/span').text #PRICE OF PRODUCT Price = driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[9]/div[4]/div[4]/div[10]/div[1]/div/table/tbody/tr/td[2]/span[1]').text Rating = driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[8]/div[4]/div[4]/div[3]/div/span[3]/a/span').text #PRINTS OUT THE DATA PULLED FROM ABOVE print(Title) print(Price) print(Rating) #CREATES A EMPTY DATAFRAME data1 = {'Title':[], 'Price':[], 'Rating':[],} fulldf = pd.DataFrame(data1) #APPENDING THE DATA PULLED FROM ABOVE INTO THE EXISTING DATAFRAME row = [Title, Price, Rating] fulldf.loc[len(fulldf)] = row

Two main ways are there for running this program, the first one is running a .py file in the command prompt or terminal or run the program line by line. Nevertheless, whenever you run a program, you would see a Chrome browser popping up on the display, navigating an Amazon product page as well as title and price would print on the Python console!

Overwhelming! You have extracted data from Amazon! You can look into different ways to improve the project, make the front-end with Streamlit! We hope that you have enjoyed reading the blog! In case, you have any thoughts, suggestions, or comments, just write down in the given section below. You can also contact for all your Amazon Data Scraping requirements or ask for a free quote!

Originally published at https://www.xbyte.io.

--

--

X-Byte Enterprise Crawling

Offer web scraping & Data extraction services like Amazon data scraping, Real Estate,eBay, Travel & all type of services per client requirements. www.xbyte.io