I am trying to open a zip file and iterate through the PDFs in the zip file. I want to scrape a certain portion of the text in the pdf. I am using the following code:
def get_text(part):
#Create path
path = f'C:\\Users\\user\\Data\\Part_{part}.zip'
with zipfile.ZipFile(path) as data:
listdata = data.namelist()
onlypdfs = [k for k in listdata if '_2018' in k or '_2019' in k or '_2020' in k or '_2021' in k or '_2022' in k]
for file in onlypdfs:
with data.open(file, "r") as f:
#Get the pdf
pdffile = pdftotext.PDF(f)
text = ("\n\n".join(pdffile))
#Remove the newline characters
text = text.replace('\r\n', ' ')
text = text.replace('\r', ' ')
text = text.replace('\n', ' ')
text = text.replace('\x0c', ' ')
#Get the text that will talk about what I want
try:
text2 = re.findall(r'FEES (.+?) Types', text, re.IGNORECASE)[-1]
except:
text2 = 'PROBLEM'
#Return the file name and the text
return file, text2
Then in the next line I am running:
info = []
for i in range(1,2):
info.append(get_text(i))
info
My output is only the first file and text. I have 4 PDFs in the zip folder. Ideally, I want it to iterate through the 30+ zip files. But I am having trouble with just one. I've seen this question asked before, but the solutions didn't fit my problem. Is it something with the with statement?
JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)