0 votes
in Education by (1.7m points)
I am trying to scrape the data off the webpage below, using Selenium in Python 3:

https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield

If this URL is viewed as the page source (for Chrome users: view-source: https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield), there are some JSONs within the text. My aim is to scrape the first, and quite substantial JSON, which sits in the 'var matchCentreData' bit. A snippet is featured below:

<script type="text/javascript">

var matchCentreData = {"playerIdNameDictionary":{"14244":"Pablo Zabaleta",

   "89998":"Manuel Lanzini","34693":"Marko Arnautovic","93026":"Felipe Anderson",

   "300359":"Issa Diop","122980"

I am able to scrape the entirety of the page source, however, I am struggling to extract only the JSON above. Any help would be much appreciated!

JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by (1.7m points)
This is all you need.

page_json = driver.execute_script("return JSON.stringify(matchCentreData)")

# Do what you want with the json.

Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.
...