regex - grep string between two other strings as delimiters -


i have report on how many times css class appears in content of our pages (over 10k pages). trouble is, header , footer contains class, grep returns every single page.

so, how grep content?

edit: looking if page has list-unstyled between <main> , </main>

so use regular expression grep? or need use powershell have more functionality?

i have grep @ disposal , powershell, use portable software if option.

ideally, report (.txt or .csv) pages , line numbers class shows up, list of pages suffice.

edit: progress

i have in powershell

$files = get-childitem -recurse -path w:\test\york\ -filter *.html  foreach ($file in $files) { $htmlfile=[system.io.file]::readalltext($file.fullname) $regex="(?m)<main([\w\w]*)</main>" if ($htmlfile -match $regex) {      $middle=$matches[1]      [regex]::matches($middle,"list-unstyled")     write-host $file.fullname has matches in middle: } } 

which run command .\findstr.ps1 | export-csv c:\tools\text.csv

it outputs filename , path string in console, put not add csv. how can added in?

what ansgar wiechers' answer says advice. don't string search html files. don't have problem worth noting not html files same , regex searches can produce flawed results. if tools exists aware of file content structure should use them.

i take simple approach reports files have enough occurrences of text list-unstyled in html files in given directory. expect there 2? if more show there enough. have done more complicated regex solution since want line number came compromise.

$pattern = "list-unstyled" get-childitem c:\temp -recurse -filter *.html |      select-string $pattern |      group-object path |      where-object{$_.count -gt 2} |      foreach-object{         $props = @{             file = $_.group | select-object -first 1 -expandproperty path             patternfound = ($_.group | select-object -expandproperty linenumber) -join ";"         }          new-object -typename pscustomobject -property $props     } 

select-string grep tool can search files string. reports located line number in file why using here.

you should output looks on powershell console.

file                                                                           patternfound                                                                   ----                                                                           ------------                                                                   c:\temp\content.html                                                           4;11;54 

where 4,11,54 lines text found. code filters out results count of lines less 3. if expect once in header , footer results should excluded.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -