web-scraping, regex and iteration in python -

July 15, 2013

i have following url 'http://www.alriyadh.com/file/278?&page=1' write regex access urls page=2 till page=12

for example, url needed 'http://www.alriyadh.com/file/278?&page=4', not page = 14

i reckon work function iterate specified 10 pages access urls within them. have tried regex not work '.*?=[2-9]'

my aim content urls using newspaper package. want data research

thanks in advance

does not require regex, simple preset loop do.

import requests bs4 import beautifulsoup bs  url = 'http://www.alriyadh.com/file/278?&page='  page in range(2,13):     html = requests.get(url+str(page)).text     soup = bs(html)

Search This Blog

Macro

web-scraping, regex and iteration in python -

Comments

Post a Comment

Popular posts from this blog

symfony - TEST environment only: The database schema is not in sync with the current mapping file -

twig - Using Twigbridge in a Laravel 5.1 Package -

jdbc - Not able to establish database connection in eclipse -