web-scraping, regex and iteration in python -
i have following url 'http://www.alriyadh.com/file/278?&page=1' write regex access urls page=2 till page=12
for example, url needed 'http://www.alriyadh.com/file/278?&page=4', not page = 14
i reckon work function iterate specified 10 pages access urls within them. have tried regex not work '.*?=[2-9]'
my aim content urls using newspaper package. want data research
thanks in advance
does not require regex, simple preset loop do.
import requests bs4 import beautifulsoup bs url = 'http://www.alriyadh.com/file/278?&page=' page in range(2,13): html = requests.get(url+str(page)).text soup = bs(html)
Comments
Post a Comment