web-scraping, regex and iteration in python -


i have following url 'http://www.alriyadh.com/file/278?&page=1' write regex access urls page=2 till page=12

for example, url needed 'http://www.alriyadh.com/file/278?&page=4', not page = 14

i reckon work function iterate specified 10 pages access urls within them. have tried regex not work '.*?=[2-9]'

my aim content urls using newspaper package. want data research

thanks in advance

does not require regex, simple preset loop do.

import requests bs4 import beautifulsoup bs  url = 'http://www.alriyadh.com/file/278?&page='  page in range(2,13):     html = requests.get(url+str(page)).text     soup = bs(html) 

Comments

Popular posts from this blog

timeout - Handshake_timeout on RabbitMQ using python and pika from remote vm -

gcc - MinGW's ld cannot perform PE operations on non PE output file -

c# - Search and Add Comment with OpenXML for Word -