web-scraping, regex and iteration in python -


i have following url 'http://www.alriyadh.com/file/278?&page=1' write regex access urls page=2 till page=12

for example, url needed 'http://www.alriyadh.com/file/278?&page=4', not page = 14

i reckon work function iterate specified 10 pages access urls within them. have tried regex not work '.*?=[2-9]'

my aim content urls using newspaper package. want data research

thanks in advance

does not require regex, simple preset loop do.

import requests bs4 import beautifulsoup bs  url = 'http://www.alriyadh.com/file/278?&page='  page in range(2,13):     html = requests.get(url+str(page)).text     soup = bs(html) 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -