How Web Scraping Google Play App Reviews Will Create Dataset for Sentiment Analysis?

X-Byte Enterprise Crawling
10 min readSep 15, 2021

Let us learn to create a dataset for Sentiment Analysis by scraping reviews and ratings for Android applications. You will convert the application and review information into Data Frames and save the content to CSV files.

Executing the code with Scripting with Pytorch (Google Calab)

Installing necessary packages and setting up the imports import json import pandas as pd from tqdm import tqdm import seaborn as sns import matplotlib.pyplot as plt from pygments import highlight from pygments.lexers import JsonLexer from pygments.formatters import TerminalFormatter from google_play_scraper import Sort, reviews, app %matplotlib inline %config InlineBackend.figure_format='retina' sns.set(style='whitegrid', palette='muted', font_scale=1.2)

You would like to hear the reviews of your products whether good or bad, both are beneficial. You want to hear what others think about your app. Both the bad and positive aspects are beneficial. However, the negative one may expose crucial features that are lacking or service outages (when it is much more frequent).

Fortunately for us, Google Play provides a wide array of apps, ratings, and reviews. By using the google-play-scraper package, we can scrape app information and reviews.

You have a lot of options when it comes to apps to evaluate. Various app categories, on the other hand, have different target audiences, domain-specific peculiarities, and so on. Let’s begin with the basics.

We require applications that are around for a while so that feedback can be gathered naturally. We wish to limit the amount of advertising we use as much as feasible. Because apps are continually updated, the date of the review is critical.

In an ideal world, you’d gather every conceivable review and work with it. However, data is frequently limited in the real world (too large, inaccessible, etc.). As a result, we’ll do our best.

Let’s look at several apps that satisfy the Productivity category’s criteria. We’ll utilize AppAnnie to pick some of the most popular apps in the United States:

app_packages = [ 'com.anydo', 'com.todoist', 'com.ticktick.task', 'com.habitrpg.android.habitica', 'cc.forestapp', 'com.oristats.habitbull', 'com.levor.liferpgtasks', 'com.habitnow', 'com.microsoft.todos', 'prox.lab.calclock', 'com.gmail.jmartindev.timetune', 'com.artfulagenda.app', 'com.tasks.android', 'com.appgenix.bizcal', 'com.appxy.planner' ]

Let us scrape the information for every app.

app_infos = [] for ap in tqdm(app_packages): info = app(ap, lang='en', country='us') del info['comments'] app_infos.append(info)

We were able to obtain information for all 15 apps. Let’s develop a helper function to improve the printing of JSON objects:

def print_json(json_object): json_str = json.dumps( json_object, indent=2, sort_keys=True, default=str ) print(highlight(json_str, JsonLexer(), TerminalFormatter()))

Here’s an example of app data from the list:

print_json(app_infos[0]) { "adSupported": null, "androidVersion": "Varies", "androidVersionText": "Varies with device", "appId": "com.anydo", "containsAds": null, "contentRating": "Everyone", "contentRatingDescription": null, "currency": "USD", "description": "\ud83c\udfc6 Editor's Choice by Google \r\n\r\nAny.do is a To Do List, Calendar, Planner, Tasks & Reminders App That Helps Over 25M People Stay Organized and Get More Done. \r\n\r\n\ud83e\udd47 \"It\u2019s A MUST HAVE PLANNER & TO DO LIST APP\" (NYTimes, USA TODAY, WSJ & Lifehacker). \r\n\r\nAny.do is a free to-do list, planner & calendar app for managing and organizing your daily tasks, to-do lists, notes, reminders, checklists, calendar events, grocery lists and more. \r\n\r\n\ud83d\udcc5 Organize Your Tasks & To-Do List in Seconds \r\n\r\n\u2022 ADVANCED CALENDAR & DAILY PLANNER - Keep your to-do list and calendar events always at hand with our calendar widget. Any.do to-do list & planner support daily calendar view, 3-day Calendar view, Weekly calendar view & agenda view, with built-in reminders. Review and organize your calendar events and to do list side by side. \r\n\r\n\u2022 SYNCS SEAMLESSLY - Keeps all your to do list, tasks, reminders, notes, calendar & agenda always in sync so you\u2019ll never forget a thing. Sync your phone\u2019s calendar, google calendar, Facebook events, outlook calendar or any other calendar so you don\u2019t forget an important event. \r\n\r\n\u2022 SET REMINDERS - One time reminders, recurring reminders, Location reminders & voice reminders. NEW! Easily create tasks and get reminders in WhatsApp. \r\n\r\n\u2022 WORK TOGETHER - Share your to do list and assign tasks with your friends, family & colleagues from your task list to collaborate and get more done. \r\n\r\n--- \r\n\r\nALL-IN-ONE PLANNER & CALENDAR APP FOR GETTING THINGS DONE\r\nCreate and set reminders with voice to your to do list. \r\nFor better task management flow we added a calendar integration to keep your agenda always up to date. \r\nFor better productivity, we added recurring reminders, location reminders, one-time reminder, sub-tasks, notes & file attachments. \r\nTo keep your to do list up to date, we\u2019ve added a daily planner and focus mode. \r\n\r\nINTEGRATIONS\r\nAny.do To do list, Calendar, planner & Reminders Integrates with Google Calendar, Outlook, WhatsApp, Slack, Gmail, Google Tasks, Evernote, Trello, Wunderlist, Todoist, Zapier, Asana, Microsoft to-do, Salesforce, OneNote, Google Assistant, Amazon Alexa, Office 365, Exchange, Jira & More. \r\n\r\nTO DO LIST, CALENDAR, PLANNER & REMINDERS MADE SIMPLE\r\nDesigned to keep you on top of your to do list, tasks and calendar events with no hassle. With intuitive drag and drop of tasks, swiping to mark to-do's as complete, and shaking your device to remove completed from your to do list - you can stay organized and enjoy every minute of it. \r\n\r\nPOWERFUL TO DO LIST TASK MANAGEMENT\r\nAdd a to do list item straight from your email / Gmail / Outlook inbox by forwarding do@Any.do. Attach files from your computer, Dropbox, or Google Drive to your to- tasks. \r\n\r\nDAILY PLANNER & LIFE ORGANIZER\r\nAny.do is a to do list, a calendar, an inbox, a notepad, a checklist, task list, a board for post its or sticky notes, a task & project management tool, a reminder app, a daily planner, a family organizer, an agenda, a bill planner and overall the simplest productivity tool you will ever have. \r\n\r\nSHARE LISTS, ASSIGN & ORGANIZE TASKS\r\nTo plan & organize projects has never been easier. Now you can share lists between family members, assign tasks to each other, chat and much more. Any.do will help you and the people around you stay in-sync and get reminders so that you can focus on what matters, knowing you had a productive day and crossed off your to do list. \r\n\r\nGROCERY LIST & SHOPPING LIST\r\nAny.do task list, calendar, agenda, reminders & planner is also great for shopping lists at the grocery store. Simply create a list on Any.do, share it with your loved ones and see them adding their shopping items in real-time.", "descriptionHTML": "\ud83c\udfc6 Editor's Choice by GoogleAny.do is a To Do List, Calendar, Planner, Tasks & Reminders App That Helps Over 25M People Stay Organized and Get More Done.\ud83e\udd47 "It\u2019s A MUST HAVE PLANNER & TO DO LIST APP" (NYTimes, USA TODAY, WSJ & Lifehacker).Any.do is a free to-do list, planner & calendar app for managing and organizing your daily tasks, to-do lists, notes, reminders, checklists, calendar events, grocery lists and more.\ud83d\udcc5 Organize Your Tasks & To-Do List in Seconds\u2022 ADVANCED CALENDAR & DAILY PLANNER - Keep your to-do list and calendar events always at hand with our calendar widget. Any.do to-do list & planner support daily calendar view, 3-day Calendar view, Weekly calendar view & agenda view, with built-in reminders. Review and organize your calendar events and to do list side by side.\u2022 SYNCS SEAMLESSLY - Keeps all your to do list, tasks, reminders, notes, calendar & agenda always in sync so you\u2019ll never forget a thing. Sync your phone\u2019s calendar, google calendar, Facebook events, outlook calendar or any other calendar so you don\u2019t forget an important event.\u2022 SET REMINDERS - One time reminders, recurring reminders, Location reminders & voice reminders. NEW! Easily create tasks and get reminders in WhatsApp.\u2022 WORK TOGETHER - Share your to do list and assign tasks with your friends, family & colleagues from your task list to collaborate and get more done. ---ALL-IN-ONE PLANNER & CALENDAR APP FOR GETTING THINGS DONE
Create and set reminders with voice to your to do list.
For better task management flow we added a calendar integration to keep your agenda always up to date.
For better productivity, we added recurring reminders, location reminders, one-time reminder, sub-tasks, notes & file attachments.
To keep your to do list up to date, we\u2019ve added a daily planner and focus mode.
INTEGRATIONS
Any.do To do list, Calendar, planner & Reminders Integrates with Google Calendar, Outlook, WhatsApp, Slack, Gmail, Google Tasks, Evernote, Trello, Wunderlist, Todoist, Zapier, Asana, Microsoft to-do, Salesforce, OneNote, Google Assistant, Amazon Alexa, Office 365, Exchange, Jira & More.
TO DO LIST, CALENDAR, PLANNER & REMINDERS MADE SIMPLE
Designed to keep you on top of your to do list, tasks and calendar events with no hassle. With intuitive drag and drop of tasks, swiping to mark to-do's as complete, and shaking your device to remove completed from your to do list - you can stay organized and enjoy every minute of it.
POWERFUL TO DO LIST TASK MANAGEMENT
Add a to do list item straight from your email / Gmail / Outlook inbox by forwarding do@Any.do. Attach files from your computer, Dropbox, or Google Drive to your to- tasks.
DAILY PLANNER & LIFE ORGANIZER
Any.do is a to do list, a calendar, an inbox, a notepad, a checklist, task list, a board for post its or sticky notes, a task & project management tool, a reminder app, a daily planner, a family organizer, an agenda, a bill planner and overall the simplest productivity tool you will ever have.
SHARE LISTS, ASSIGN & ORGANIZE TASKS
To plan & organize projects has never been easier. Now you can share lists between family members, assign tasks to each other, chat and much more. Any.do will help you and the people around you stay in-sync and get reminders so that you can focus on what matters, knowing you had a productive day and crossed off your to do list.
GROCERY LIST & SHOPPING LIST
Any.do task list, calendar, agenda, reminders & planner is also great for shopping lists at the grocery store. Simply create a list on Any.do, share it with your loved ones and see them adding their shopping items in real-time.", "developer": "Any.do Calendar & To-Do List", "developerAddress": "Any.do Inc.\n\n6 Agripas Street, Tel Aviv\n6249106 ISRAEL", "developerEmail": "feedback+androidtodo@any.do", "developerId": "5304780265295461149", "developerInternalID": "5304780265295461149", "developerWebsite": "https://www.any.do", "free": true, "genre": "Productivity", "genreId": "PRODUCTIVITY", "headerImage": "https://lh3.googleusercontent.com/dZknnlk1LM8fYS3wjOvVHOmWKOGH1HAe691Yuh7LAeBj6a730A1CQqZnXxjNahAYUFFw", "histogram": [27291, 9246, 13735, 29904, 262997], "icon": "https://lh3.googleusercontent.com/zgOLUXCHkF91H8xuMTMLT17smwgLPwSBjUlKVWF-cZRFjlv-Uvtman7DiHEii54fbEE", "installs": "10,000,000+", "minInstalls": 10000000, "offersIAP": true, "price": 0, "privacyPolicy": "https://www.any.do/privacy", "ratings": 343174, "recentChanges": "Faster and smoother for better user experience!", "recentChangesHTML": "Faster and smoother for better user experience!", "released": "Nov 10, 2011", "reviews": 122170, "score": 4.43388, "screenshots": [ "https://lh3.googleusercontent.com/C-L3_FPMlKVrZItAORaszhnQzlzMyXcqF_-oGaabHm_OnwUW1jz02BXBVSKi0HRUtQ", "https://lh3.googleusercontent.com/uAP6G5ANQcgVs4Uj6yrcsAo4OUhejTJRVCXOxnAVA5Efit_OtAnrOYyL1SUHj1rv", "https://lh3.googleusercontent.com/AI5mLFu0Atsl0km2FO9_IwJXNy_1q1_X6Ua3EVMZNedp0dsDToDRaWQ1UDvI6mb1-I0", "https://lh3.googleusercontent.com/bYCAn3mjgB4ugSY0PL-PCcMBfbvXCSFkzL-pLSIIbZ8sQByQPerHboPQ2fA126K4LDtU", "https://lh3.googleusercontent.com/u-dX4lpTepsvXs33ds4xxYpApuGS4JBAEb0UsvY_fPbptxnF0QxaKNW0-tJVXaP8a1E", "https://lh3.googleusercontent.com/qvUz_9IXHQd6FSLUALZo8NKLx-s4uDGyElPOGRsU28TCEficQc0BoNRloRRLqUkH2A", "https://lh3.googleusercontent.com/tEyGs6MGlY97ccLc4c_HxV9xNOpsvwQyHz6uGAezkVtxm1ydAaTj5EZSUgqlg69qrrk", "https://lh3.googleusercontent.com/StN0i2BskOs6HCfaPO0DMBOCQMCag3okWVI_SlFJtMytwbgNMBnD5i9hbSqdNlGxffmn", "https://lh3.googleusercontent.com/GRKqWfo-PLzCKwpgZ8fej4PGsUp1q9eM5a3LQeiYCOW-KUpCOIHXOp3mteZWbJ-pz4My", "https://lh3.googleusercontent.com/pFQQ_qi8u92duWCNXpEcNKpH2lVpD_hFd5f-UlTP_f6wft3YyYLMzwLitxt-UI6G8vs", "https://lh3.googleusercontent.com/AoeCU6bT1x0eHRvJwvQyOSKJ31oSayox959qMNVaSzz3uN9bvk1cGek5zyRDe1BdtA", "https://lh3.googleusercontent.com/vICme1f4J9vFt8wY3xBY-LshGgYyvSbsa4TLJyEtNsy0alUI0i9oMQVq8oJ4l_yR1Aw", "https://lh3.googleusercontent.com/7sn9m__iVM-peiG6_jkKBuE-QVH_xDaycF_oR1XJlwcAC45ybNZ_Exor09ENOJ41Q2U", "https://lh3.googleusercontent.com/9I_m2ZXgPtiU4Po4cw_cyIaEpZxynxQ1n3YkhFgakATfbu63a8_f8vGQDxKOHYITzew" ], "size": "Varies with device", "summary": "Task Manager \u2705 Organizer \ud83d\udcc5 Agenda \ud83d\udcdd Daily Reminders \ud83d\udd14 All-in-One Simple App.", "summaryHTML": "Task Manager \u2705 Organizer \ud83d\udcc5 Agenda \ud83d\udcdd Daily Reminders \ud83d\udd14 All-in-One Simple App.", "title": "Any.do: To do list, Calendar, Planner & Reminders", "updated": 1586258773, "url": "https://play.google.com/store/apps/details?id=com.anydo&hl=en&gl=us", "version": "Varies with device", "video": "https://www.youtube.com/embed/2nkllLD0x6o?ps=play&vq=large&rel=0&autohide=1&showinfo=0", "videoImage": "https://i.ytimg.com/vi/2nkllLD0x6o/hqdefault.jpg" }

This offers a great deal of information, such as the number of ratings, reviews, and ratings for each score (1 to 5). Let’s set aside all of that and have a look at their lovely icons:

def format_title(title): sep_index = title.find(':') if title.find(':') != -1 else title.find('-') if sep_index != -1: title = title[:sep_index] return title[:10] fig, axs = plt.subplots(2, len(app_infos) // 2, figsize=(14, 5)) for i, ax in enumerate(axs.flat): ai = app_infos[i] img = plt.imread(ai['icon']) ax.imshow(img) ax.set_title(format_title(ai['title'])) ax.axis('off')

By transforming the JSON objects into a Pandas data frame and storing the output to a CSV file, we can save the app information for later:

app_infos_df = pd.DataFrame(app_infos) app_infos_df.to_csv('apps.csv', index=None, header=True)

For balanced dataset you can use the scraping package and filter the review score. And, for getting sample of the reviews for every app you can sort the reviews for their helpfulness which Google play ranks most important.

app_reviews = [] for ap in tqdm(app_packages): for score in list(range(1, 6)): for sort_order in [Sort.MOST_RELEVANT, Sort.NEWEST]: rvs, _ = reviews( ap, lang='en', country='us', sort=sort_order, count= 200 if score == 3 else 100, filter_score_with=score ) for r in rvs: r['sortOrder'] = 'most_relevant' if sort_order == Sort.MOST_RELEVANT else 'newest' r['appId'] = ap app_reviews.extend(rvs)

We’ve included the app id and sort order in each review. As an example, consider the following:

print_json(app_reviews[0]) { "appId": "com.anydo", "at": "2020-04-05 22:25:57", "content": "Update: After getting a response from the developer I would change my rating to 0 stars if possible. These guys hide behind confusing and opaque terms and refuse to budge at all. I'm so annoyed that my money has been lost to them! Really terrible customer experience. Original: Be very careful when signing up for a free trial of this app. If you happen to go over they automatically charge you for a full years subscription and refuse to refund. Terrible customer experience and the app is just OK.", "repliedAt": "2020-04-07 14:09:03", "replyContent": "Our policy and TOS are completely transparent and can be found in the Help Center and our main page. In addition, payment can only be made upon the user's authorization via the app and Google Play. We provide users with a full 7 days trial to test the app with an additional 48 hours for a refund, along with priority support for all issues.", "reviewCreatedVersion": "4.17.0.3", "score": 1, "sortOrder": "most_relevant", "thumbsUpCount": 37, "userImage": "https://lh3.googleusercontent.com/a-/AOh14GiHdfNEu1DwwcJ6yNyju8Yvn4JwjpzuXvD74aVmDA", "userName": "Andrew Thomas" }

repliedAt and replyContent are the developer’s answer to the review is included in the content which can be sometimes found missing.

len(app_reviews) Saving reviews to CSV file: app_reviews_df = pd.DataFrame(app_reviews) app_reviews_df.to_csv('reviews.csv', index=None, header=True)

Looking to scrape Google Play App reviews? Contact X-Byte Enterprise Crawling now and request a quote.

Originally published at https://www.xbyte.io.

--

--

X-Byte Enterprise Crawling

Offer web scraping & Data extraction services like Amazon data scraping, Real Estate,eBay, Travel & all type of services per client requirements. www.xbyte.io