in Education by
I am trying to connect to remote site via https and download some information. I am doing this: library("httr") library("XML") library(RCurl) url<-c("https://salesweb.civilview.com/Sales/SalesSearch?countyId=3") file<-getURL(url, ssl.verifyhost = 0L, ssl.verifypeer = 0L) each row has "Details" link that gives more information on each record. I need to download the url and go into each "Details" section and merge it with the initial data set. How can I do this? JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
If I've understood your question you'd like to retrieve the data from the main table at the https://salesweb.civilview.com/Sales/SalesSearch?countyId=3 url, as well as the details data for each of the records in the main url. As an example I wrote a code that lets you retrieve the data from the main page in a structured dataframe, in which the first column is the url of the details record. #load libraries library(rvest) library (tidyverse) #assign url url <- "https://salesweb.civilview.com/Sales/SalesSearch?countyId=3" #extract td tags contents readUrlHtml <- read_html(url) %>% html_nodes("td") #create empty dataframe df <- data.frame(Details=character(), Sheriff=character(), SalesDate=character(), Plaintiff=character(), Defendant=character(), Address=character(), stringsAsFactors=FALSE) #loop to harvest the data j = 1 for (i in 1:(length(readUrlHtml)/6)) { df[i,c('Details')] <- paste0("https://salesweb.civilview.com/Sales/SaleDetails?PropertyId=",substr(readUrlHtml[j],65,73)) df[i,c('Sheriff')] <- readUrlHtml[j+1] %>% html_text() df[i,c('SalesDate')] <- readUrlHtml[j+2] %>% html_text() df[i,c('Plaintiff')] <- readUrlHtml[j+3] %>% html_text() df[i,c('Defendant')] <- readUrlHtml[j+4] %>% html_text() df[i,c('Address')] <- readUrlHtml[j+5] %>% html_text() j = j + 6 } #values check df[1,] df[50,] df[525,] With the rvest package you'll be able to retrieve and save in a new dataframe the data of the details page. EDIT 2019-03-29 In order to retrieve the details data you need to save the cookies information from the main url. Once done you can create a new dataframe to store that data: this is shown in the updated version of the code. 1) the new library httr is used to retrieve the cookies data 2) the details data being retrieved is the one inside the red rectangle in the printscreen (to retrieve the last I suggest to create a new dataframe to store the additional data, but I guess that this will highly increase the amount of type needed to process all the data!) 3) the two dataframes df & dfDetails may be merged by using the Details key #load libraries library(rvest) library (tidyverse) library (httr) #new library #assign url url <- "https://salesweb.civilview.com/Sales/SalesSearch?countyId=3" #extract td tags contents readUrlHtml <- read_html(url) %>% html_nodes("td") #create empty dataframe df <- data.frame(Details=character(), Sheriff=character(), SalesDate=character(), Plaintiff=character(), Defendant=character(), Address=character(), stringsAsFactors=FALSE) #loop to harvest the data j = 1 for (i in 1:(length(readUrlHtml)/6)) { df[i,c('Details')] <- paste0("https://salesweb.civilview.com/Sales/SaleDetails?PropertyId=",substr(readUrlHtml[j],65,73)) df[i,c('Sheriff')] <- readUrlHtml[j+1] %>% html_text() df[i,c('SalesDate')] <- readUrlHtml[j+2] %>% html_text() df[i,c('Plaintiff')] <- readUrlHtml[j+3] %>% html_text() df[i,c('Defendant')] <- readUrlHtml[j+4] %>% html_text() df[i,c('Address')] <- readUrlHtml[j+5] %>% html_text() j = j + 6 } #values check df[1,] df[50,] df[525,] ## UPDATED SECTION TO RETRIEVE THE URLS DETAILS ## #retrieve session cookie by taking the url of the main page urlInfos <- GET(url) #create empty details dataframe dfDetails <- data.frame(Details=character(), Sheriff=character(), CourtCase=character(), SalesDate=character(), Plaintiff=character(), Defendant=character(), Address=character(), Description=character(), ApproxUpset=character(), Attorney=character(), AttorneyPhone=character(), stringsAsFactors=FALSE) #loop to harvest the details for (i in 1:length(df$Details)) #takes a while to retrieve all records! (5-6 mins) #for (i in 1:3) #loop through few record for testing purposes { responseDetail <- GET(df[i,c('Details')], set_cookies(`urlInfos$cookies[6]` = paste0('"',urlInfos$cookies[7],'"'))) readUrlHtmlDetail <- read_html(responseDetail) %>% html_nodes("td") dfDetails[i,c('Details')] <- df[i,c('Details')] dfDetails[i,c('Sheriff')] <- readUrlHtmlDetail[2] %>% html_text() dfDetails[i,c('CourtCase')] <- readUrlHtmlDetail[4] %>% html_text() dfDetails[i,c('SalesDate')] <- readUrlHtmlDetail[6] %>% html_text() dfDetails[i,c('Plaintiff')] <- readUrlHtmlDetail[8] %>% html_text() dfDetails[i,c('Defendant')] <- readUrlHtmlDetail[10] %>% html_text() dfDetails[i,c('Address')] <- readUrlHtmlDetail[12] %>% html_text() dfDetails[i,c('ApproxUpset')] <- readUrlHtmlDetail[14] %>% html_text() dfDetails[i,c('Attorney')] <- readUrlHtmlDetail[16] %>% html_text() dfDetails[i,c('AttorneyPhone')] <- readUrlHtmlDetail[18] %>% html_text() } #values detail check dfDetails[1,] dfDetails[50,] dfDetails[525,]

Related questions

0 votes
    I'm trying to configure SSL for Kafka Connect REST API (2.11-2.1.0). The problem I tried two ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 14, 2022 in Education by JackTerrance
0 votes
    I am trying to connect to MySQL. I have defined the db connection vars in a .env file in my root ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 7, 2022 in Education by JackTerrance
0 votes
    The _______ operator is used to connect multiple verb actions together into a pipeline. (a) pipe (b) piper ... of R Programming Select the correct answer from above options...
asked Feb 15, 2022 in Education by JackTerrance
0 votes
    A ________________ in R programming language can also contain numeric and alphabets along with special characters like ... Programming Select the correct answer from above options...
asked Feb 16, 2022 in Education by JackTerrance
0 votes
    R comes with a ________ to help you optimize your code and improve its performance. (a) debugger (b) ... Analysis of R Programming Select the correct answer from above options...
asked Feb 15, 2022 in Education by JackTerrance
0 votes
    You can check to see whether an R object is NULL with the _________ function. (a) is.null() (b) is. ... and Debugging of R Programming Select the correct answer from above options...
asked Feb 15, 2022 in Education by JackTerrance
0 votes
    The line of code in R language should begin with a ________________ (a) Hash symbol (b) Alphabet (c) ... Debugging of R Programming Select the correct answer from above options...
asked Feb 15, 2022 in Education by JackTerrance
0 votes
    R language has superficial similarity with _________ (a) C (b) Python (c) MATLAB (d) SAS I had been asked ... Started of R Programming Select the correct answer from above options...
asked Feb 15, 2022 in Education by JackTerrance
0 votes
    ________ function can be used to add datasets in R provided with the columns in the datasets should be the same ... of R Programming Select the correct answer from above options...
asked Feb 15, 2022 in Education by JackTerrance
0 votes
    If commands are stored in an external file, say commands.R in the working directory work, they may be executed ... of R Programming Select the correct answer from above options...
asked Feb 13, 2022 in Education by JackTerrance
0 votes
    In which IDE we can interact with R? (a) R studio (b) Console (c) GCC (d) Power shell I had ... and Getting Started of R Programming Select the correct answer from above options...
asked Feb 13, 2022 in Education by JackTerrance
0 votes
    One way to pass data around is by de parsing the R object with _________ (a) dput() (b) write() (c ... and Operations of R Programming Select the correct answer from above options...
asked Feb 12, 2022 in Education by JackTerrance
0 votes
    Which of the following package contains functions for reading and displaying satellite data for oceanographic ... Programming Select the correct answer from above options...
asked Feb 12, 2022 in Education by JackTerrance
0 votes
    How to replace the values of NA with zeros in some column and data frame I have? Select the correct answer from above options...
asked Jan 20, 2022 in Education by JackTerrance
...