web-scraping, regex and iteration in python -


i have following url 'http://www.alriyadh.com/file/278?&page=1' write regex access urls page=2 till page=12

for example, url needed 'http://www.alriyadh.com/file/278?&page=4', not page = 14

i reckon work function iterate specified 10 pages access urls within them. have tried regex not work '.*?=[2-9]'

my aim content urls using newspaper package. want data research

thanks in advance

does not require regex, simple preset loop do.

import requests bs4 import beautifulsoup bs  url = 'http://www.alriyadh.com/file/278?&page='  page in range(2,13):     html = requests.get(url+str(page)).text     soup = bs(html) 

Comments

Popular posts from this blog

gcc - MinGW's ld cannot perform PE operations on non PE output file -

How to connect android app to App engine -

hadoop - Running Map Reduce Job shows error - Mkdirs failed to create /var/folders/ -