List lines from partial match in file
Posted: 01 Mar 2018 09:16
Hello DOStips,
I am looking for some advice on how to manage wget.
We are currently downloading with wget from a list of urls, called urls.txt
The log shows downloads that succeeded and also failed
I made a working batch to list the fails:
findstr /i "ERROR:" c:\urlstrip\test.txt >c:\urlstrip\certificate_errors.txt
ERROR: cannot verify fas.org's certificate, issued by `/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2':
ERROR: cannot verify www.berghahnjournals.com's certificate, issued by `/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2':
Where I am struggling, is to make a new list of ONLY the domains that failed (not the extra commentary), in a file (failed_domains.txt) so:
fas.org
www.berghahnjournals.com
...hopefully we can then use the failed_domains.txt and create a new file, with only urls from the urls.txt file that were also on the failed_domains.txt list
https://fas.org/irp/agency/dod/ig020907.pdf
https://fas.org/irp/agency/dod/ig020907.pdf
https://www.berghahnjournals.com/downlo ... 480111.pdf
etc.
Any help, even on part of it would be really appreciated.
I am looking for some advice on how to manage wget.
We are currently downloading with wget from a list of urls, called urls.txt
The log shows downloads that succeeded and also failed
I made a working batch to list the fails:
findstr /i "ERROR:" c:\urlstrip\test.txt >c:\urlstrip\certificate_errors.txt
ERROR: cannot verify fas.org's certificate, issued by `/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2':
ERROR: cannot verify www.berghahnjournals.com's certificate, issued by `/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2':
Where I am struggling, is to make a new list of ONLY the domains that failed (not the extra commentary), in a file (failed_domains.txt) so:
fas.org
www.berghahnjournals.com
...hopefully we can then use the failed_domains.txt and create a new file, with only urls from the urls.txt file that were also on the failed_domains.txt list
https://fas.org/irp/agency/dod/ig020907.pdf
https://fas.org/irp/agency/dod/ig020907.pdf
https://www.berghahnjournals.com/downlo ... 480111.pdf
etc.
Any help, even on part of it would be really appreciated.