How to extract data from website?
Moderator: DosItHelp
-
- Posts: 118
- Joined: 02 Apr 2017 06:11
How to extract data from website?
Hey Guys!
I wanted to extract all the links beginning with "http://www.mediafire.com" in this:http://www.mediafire.com/file/4yks2b0u18auy69/Doc.txt
file. I tried using findstr and find command but it won't do the trick.
Help plz!
Thanks,
PaperTronics
I wanted to extract all the links beginning with "http://www.mediafire.com" in this:http://www.mediafire.com/file/4yks2b0u18auy69/Doc.txt
file. I tried using findstr and find command but it won't do the trick.
Help plz!
Thanks,
PaperTronics
Last edited by PaperTronics on 21 May 2017 01:14, edited 1 time in total.
Re: How to extract data from website?
You need a utility that supports Regular Expressions better than FINDSTR. Either use a 3rd party or elsewise I'm virtually certain dbenham's JREPL hybrid batch will work, too.
viewtopic.php?f=3&t=6044
Steffen
viewtopic.php?f=3&t=6044
Steffen
-
- Posts: 16
- Joined: 25 Feb 2017 12:55
- Location: Russia
Re: How to extract data from website?
Code: Select all
grep -P -o "http\:\/\/www\.mediafire\.com[^\x22]*" Doc.txt
or
Code: Select all
type Doc.txt | geturls | find "mediafire"
geturls.zip(~32kb) here http://ss64.net/westlake/nt/index.html
Last edited by igor_andreev on 01 May 2017 06:57, edited 1 time in total.
Re: How to extract data from website?
Using JREPL
also possible
Steffen
Code: Select all
@echo off &setlocal
cmd /c ""jrepl.bat" "\bhttp://www\.mediafire\.com[^^\x22]*" "" /F "Doc.txt" /I /MATCH"
pause
also possible
Code: Select all
@jrepl.bat "\bhttp://www\.mediafire\.com[^\x22]*" "" /F "Doc.txt" /O "mediafire.txt" /I /MATCH
Steffen
Re: How to extract data from website?
Code: Select all
@echo off
setlocal enableDelayedExpansion
for /f "tokens=*" %%i in (url.txt) do (
set "line=%%i"
for /l %%k in (1 1 20) do (
for /F "tokens=1* delims= " %%A in ("!line!") do (
set "nextToken=%%A"
if "!nextToken:~7,17!" == "www.mediafire.com" echo %%A
set "line=%%B"
)))
endlocal
exit /b
"url.txt" file:
Code: Select all
This is line 1 ab: http://www.mediafire.com/file/1yks2b0u18auy01/Doc.htm This is line 1 end.
This is line 2: http://www.abc.com/file/4yks2b0u18auy69/Doc.txt
This is line 3 ab cd: http://www.mediafire.com/file/2yks2b0u18auy02/Doc.bmp This is line 2 end.
This is line 4: http://www.def.com/file/4yks2b0u18auy69/Doc.txt
This is line 5 ab cd ef: http://www.mediafire.com/file/3yks2b0u18auy03/Doc.gif This is line 3 end.
This is line 6: http://www.ghi.com/file/4yks2b0u18auy69/Doc.txt
This is line 7 ab cd ef gh: http://www.mediafire.com/file/4yks2b0u18auy04/Doc.jpg This is line 4 end.
This is line 8: http://www.jkl.com/file/4yks2b0u18auy69/Doc.txt
This is line 9 ab cd ef gh ij: http://www.mediafire.com/file/5yks2b0u18auy05/Doc.png This is line 5 end.
This is line 10: http://www.mno.com/file/4yks2b0u18auy69/Doc.txt
This is line 11 ab cd ef gh ij kl: http://www.mediafire.com/file/6yks2b0u18auy06/Doc.tif This is line 6 end.
This is line 12: http://www.pqr.com/file/4yks2b0u18auy69/Doc.txt
This is line 13 ab cd ef gh ij kl mn: http://www.mediafire.com/file/7yks2b0u18auy07/Doc.docx This is line 7 end.
This is line 14: http://www.stu.com/file/4yks2b0u18auy69/Doc.txt
This is line 15 ab cd ef gh ij kl mn op: http://www.mediafire.com/file/8yks2b0u18auy08/Doc.xlsx This is line 8 end.
This is line 16: http://www.wxy.com/file/4yks2b0u18auy69/Doc.txt
This is line 17 ab cd ef gh ij kl mn op qr: http://www.mediafire.com/file/9yks2b0u18auy09/Doc.ptsx This is line 9 end.
This is line 18: http://www.zab.com/file/4yks2b0u18auy69/Doc.txt
This is line 19 ab cd ef gh ij kl mn op qr st: http://www.mediafire.com/file/10yks2b0u18auy10/Doc.txt This is line 10 end.
This is line 20: http://www.zde.com/file/4yks2b0u18auy69/Doc.txt
Last edited by Thor on 03 May 2017 17:09, edited 3 times in total.
Re: How to extract data from website?
The 3-lines Batch file below (save it with .BAT extension) takes less than 1 second to generate the output file with the 56 result lines from your data:
Output:
Antonio
Code: Select all
@set @a=0 // & cscript //nologo //E:JScript "%~F0" < Doc.txt > output.txt & goto :EOF
var search = /http:\/\/www\.mediafire\.com[^"]*/g, file = WScript.StdIn.ReadAll(), match;
while ( match = search.exec(file) ) WScript.Stdout.WriteLine(match[0]);
Output:
Code: Select all
http://www.mediafire.com/file/dbu0pgraknjfma3/Snaper_1.0_By_Lego_Stoppro.zip
http://www.mediafire.com/file/wo15pswxydfkaa5/Hover_Test.zip
http://www.mediafire.com/download/a9yyp9vnmlmhxal/Example_1.zip
. . . . .
http://www.mediafire.com/download/dpm0yti5f8q29fh/swap_Mouse_Buttons.zip
http://www.mediafire.com/download/d1vu3csnlh6i2yi/Rights_Modifier_by_Kvc.zip
http://www.mediafire.com/view/c0cge2ks8i676n2/Hiding_data.bat
Antonio
-
- Posts: 118
- Joined: 02 Apr 2017 06:11
Re: How to extract data from website?
@Thor: Nice coding but it's kind of slow.
@Aacini: Your example isn't working. I've put in the same folder as Doc.txt. Am I doing something wrong?
@Aacini: Your example isn't working. I've put in the same folder as Doc.txt. Am I doing something wrong?
Re: How to extract data from website?
PaperTronics wrote:@Thor: Nice coding but it's kind of slow.
Try my code again, it should runs pretty decent now.
Re: How to extract data from website?
PaperTronics wrote:@Aacini: Your example isn't working. I've put in the same folder as Doc.txt. Am I doing something wrong?
Did you saved the code with .BAT extension? Did you reviewed that the output.txt file was not created? You may also test it removing the "> output.txt" part. If still don't works, please copy the output from the command-line window and paste it here...
Antonio
-
- Posts: 118
- Joined: 02 Apr 2017 06:11
Re: How to extract data from website?
Aacini wrote:
Did you saved the code with .BAT extension? Did you reviewed that the output.txt file was not created? You may also test it removing the "> output.txt" part. If still don't works, please copy the output from the command-line window and paste it here...
Antonio
I wasn't able to read clearly since CMD was shutting down every time because of the error. I saved it with .BAT extension and output.txt was just a blank file. CMD says something like "Conditional Compiling is turned off"
PaperTronics
-
- Posts: 118
- Joined: 02 Apr 2017 06:11
Re: How to extract data from website?
Try my code again, it should runs pretty decent now.
It did get a slight bit faster
Re: How to extract data from website?
PaperTronics wrote:I wasn't able to read clearly since CMD was shutting down every time because of the error. I saved it with .BAT extension and output.txt was just a blank file. CMD says something like "Conditional Compiling is turned off"
PaperTronics
A couple points here:
In the very first place, you should run any problematic Batch file opening a cmd.exe window (the way to do that vary by Windows versions), then execute a CD command to the directory where the Batch file is, and finally run it entering its name. In this way any message remains in the screen, so you may paste it (via a right button click -> Mark), select the desired text pressing Shift key or left button, and press Enter key to end. After that, you may copy such a text. Do NOT run the Batch file from the explorer via a double-click on it.
Accordingly to the documentation, this error should not occur:
the documentation wrote:Conditional compilation is activated by using the @cc_on statement, or using an @if or @set statement.
Please, try this version of the code:
Code: Select all
@if (@CodeSection == @Batch) @then
@echo off
cscript //nologo //E:JScript "%~F0" < Doc.txt > output.txt
goto :EOF
@end
var search = /http:\/\/www\.mediafire\.com[^"]*/g, file = WScript.StdIn.ReadAll(), match;
while ( match = search.exec(file) ) WScript.Stdout.WriteLine(match[0]);
If still don't works, post the output from the command-line window...
Antonio
-
- Posts: 118
- Joined: 02 Apr 2017 06:11
Re: How to extract data from website?
The error states
Code: Select all
C:\Users\pratik\Desktop\BatchStore\DummyBase.bat(1, 6) Microsoft JScript compila
tion error: Conditional compilation is turned off
Re: How to extract data from website?
Hi
Just give a try with this batch file :
Just give a try with this batch file :
Code: Select all
@echo off
Title Extract Mediafire href links by Hackoo 2017
mode con cols=70 lines=3 & color 9E
Set "vbsfile=%tmp%\%~n0.vbs"
Set "InputFile=Doc.txt"
Set "OutPutFile=All_Links.txt"
set "MediaFireLinks=MediaFireLinks.txt"
echo(
echo Please wait a while ... Extracting is in progress ...
Call :ExtractLinks "%InputFile%" "%OutPutFile%"
Type "%OutPutFile%" | find /i "mediafire" > "%MediaFireLinks%"
start "" "%MediaFireLinks%"
exit
::****************************************************
:ExtractLinks <InputData> <OutPutData>
(
echo InputFile = wscript.Arguments(0^)
echo OutPutFile = wscript.Arguments(1^)
echo Call ExtractLinks(InputFile,OutPutFile^)
echo Function ExtractLinks(inputfile,outfile^)
echo Set fso = CreateObject("Scripting.FileSystemObject"^)
echo Set Link = fso.OpenTextFile(OutPutFile,2,True,-1^)
echo Set f = Fso.OpenTextFile(InputFile,1^)
echo Data = f.ReadAll
echo Set reLink = New RegExp
echo reLink.Global = True
echo reLink.IgnoreCase = True
echo reLink.Pattern = "<a\b[^>]*\bhref=(?:([""'])([\s\S]+?)\1|([^\s>]*))[^>]*>([\s\S]+?)</a>"
echo Set reText = New RegExp
echo reText.GLobal = True
echo reText.Pattern = "<[^>]*>"
echo For Each Match in reLink.Execute(Data^)
echo HREF = Match.SubMatches(1^) ^& Match.SubMatches(2^)
echo 'InnerText = reText.Replace(Match.SubMatches(3^), ""^)
echo Link.WriteLine HREF
echo Next
echo End Function
)>%vbsfile%
cscript /nologo "%vbsfile%" "%~1" "%~2"
exit /b
::**********************************************************************************
Re: How to extract data from website?
Hi
This another tweaked version in order to extract all links from source code of a website, and also, can be filtered by string to be searched like ("Mediafire" "Aacini" "Thebateam") and If you want to extract the InnerText, just uncomment this line after HREF (get rid from quote)
or simply write like that :
So the whole code of ExtractLinks.bat
This another tweaked version in order to extract all links from source code of a website, and also, can be filtered by string to be searched like ("Mediafire" "Aacini" "Thebateam") and If you want to extract the InnerText, just uncomment this line after HREF (get rid from quote)
becomesLink.WriteLine HREF '^& " ========> " ^& InnerText
Code: Select all
Link.WriteLine HREF ^& " ========> " ^& InnerText
Code: Select all
Link.WriteLine HREF
Code: Select all
@echo off
Title Extracting HREF links from website source code by Hackoo 2017
REM Extract all links from source code of a website, and also, can be filtered by string to be searched
mode con cols=75 lines=3 & color 9E
set "vbsfile=%tmp%\%~n0.vbs"
set "InputFile=Doc.txt"
If Not exist "%InputFile%" (
Color 0C
echo(
echo The "%InputFile%" does not exist,please check it and re-run this batch again
pause>nul
exit
)
Set "OutPutFile=All_Links.txt"
set Filter_Strings="Mediafire" "Aacini" "Thebateam"
echo(
echo Please Wait a While ... Extrating Links is in Progress ....
Call :ExtractLinks "%InputFile%" "%OutPutFile%"
For %%a in (%Filter_Strings%) Do (
Type "%OutPutFile%" | find /I %%a > %~dp0%%a_Links.txt
If exist "%~dp0%%a_Links.txt" Start "" "%~dp0%%a_Links.txt"
)
start "" "%OutPutFile%" & Exit
::*************************************************************************************************
:ExtractLinks <InputFile> <OutPutFile>
(
echo InputFile = Wscript.Arguments(0^)
echo OutPutFile = Wscript.Arguments(1^)
echo Call ExtractLinks(InputFile,OutPutFile^)
echo '-------------------------------------------------------------------------------------------
echo Function ExtractLinks(InputFile,OutPutFile^)
echo Set fso = CreateObject("Scripting.FileSystemObject"^)
echo Set f = Fso.OpenTextFile(InputFile,1^)
echo Set Link = fso.OpenTextFile(OutPutfile,2,True,-1^)
echo Data = f.ReadAll
echo Set reLink = New RegExp
echo reLink.Global = True
echo reLink.IgnoreCase = True
echo reLink.Pattern = "<a\b[^>]*\bhref=(?:([""'])([\s\S]+?)\1|([^\s>]*))[^>]*>([\s\S]+?)</a>"
echo Set reText = New RegExp
echo reText.GLobal = True
echo reText.Pattern = "<[^>]*>"
echo For Each Match in reLink.Execute(Data^)
echo HREF = Match.SubMatches(1^) ^& Match.SubMatches(2^)
echo InnerText = reText.Replace(Match.SubMatches(3^), ""^)
echo 'If you want to extract the InnerText just uncomment this line after HREF (get rid from quote^)
echo Link.WriteLine HREF '^& " ========> " ^& InnerText
echo Next
echo End Function
echo '-------------------------------------------------------------------------------------------
)>"%vbsfile%"
Cscript /nologo "%vbsfile%" "%~1" "%~2"
exit /b
::*************************************************************************************************
- Attachments
-
- ExtractLinks_by_Hackoo.rar
- Extract all links from source code of a website and can be filtered by string to be searched
- (1.05 KiB) Downloaded 575 times