Page 1 of 3
How to extract data from website?
Posted: 30 Apr 2017 08:01
by PaperTronics
Hey Guys!
I wanted to extract all the links beginning with "
http://www.mediafire.com" in this:
http://www.mediafire.com/file/4yks2b0u18auy69/Doc.txt file. I tried using findstr and find command but it won't do the trick.
Help plz!
Thanks,
PaperTronics
Re: How to extract data from website?
Posted: 30 Apr 2017 11:13
by aGerman
You need a utility that supports Regular Expressions better than FINDSTR. Either use a 3rd party or elsewise I'm virtually certain dbenham's JREPL hybrid batch will work, too.
viewtopic.php?f=3&t=6044Steffen
Re: How to extract data from website?
Posted: 30 Apr 2017 15:15
by igor_andreev
Code: Select all
grep -P -o "http\:\/\/www\.mediafire\.com[^\x22]*" Doc.txt
or
Code: Select all
type Doc.txt | geturls | find "mediafire"
geturls.zip(~32kb) here
http://ss64.net/westlake/nt/index.html
Re: How to extract data from website?
Posted: 01 May 2017 06:14
by aGerman
Using JREPL
Code: Select all
@echo off &setlocal
cmd /c ""jrepl.bat" "\bhttp://www\.mediafire\.com[^^\x22]*" "" /F "Doc.txt" /I /MATCH"
pause
also possible
Code: Select all
@jrepl.bat "\bhttp://www\.mediafire\.com[^\x22]*" "" /F "Doc.txt" /O "mediafire.txt" /I /MATCH
Steffen
Re: How to extract data from website?
Posted: 01 May 2017 10:29
by Thor
Code: Select all
@echo off
setlocal enableDelayedExpansion
for /f "tokens=*" %%i in (url.txt) do (
set "line=%%i"
for /l %%k in (1 1 20) do (
for /F "tokens=1* delims= " %%A in ("!line!") do (
set "nextToken=%%A"
if "!nextToken:~7,17!" == "www.mediafire.com" echo %%A
set "line=%%B"
)))
endlocal
exit /b
"url.txt" file:
Code: Select all
This is line 1 ab: http://www.mediafire.com/file/1yks2b0u18auy01/Doc.htm This is line 1 end.
This is line 2: http://www.abc.com/file/4yks2b0u18auy69/Doc.txt
This is line 3 ab cd: http://www.mediafire.com/file/2yks2b0u18auy02/Doc.bmp This is line 2 end.
This is line 4: http://www.def.com/file/4yks2b0u18auy69/Doc.txt
This is line 5 ab cd ef: http://www.mediafire.com/file/3yks2b0u18auy03/Doc.gif This is line 3 end.
This is line 6: http://www.ghi.com/file/4yks2b0u18auy69/Doc.txt
This is line 7 ab cd ef gh: http://www.mediafire.com/file/4yks2b0u18auy04/Doc.jpg This is line 4 end.
This is line 8: http://www.jkl.com/file/4yks2b0u18auy69/Doc.txt
This is line 9 ab cd ef gh ij: http://www.mediafire.com/file/5yks2b0u18auy05/Doc.png This is line 5 end.
This is line 10: http://www.mno.com/file/4yks2b0u18auy69/Doc.txt
This is line 11 ab cd ef gh ij kl: http://www.mediafire.com/file/6yks2b0u18auy06/Doc.tif This is line 6 end.
This is line 12: http://www.pqr.com/file/4yks2b0u18auy69/Doc.txt
This is line 13 ab cd ef gh ij kl mn: http://www.mediafire.com/file/7yks2b0u18auy07/Doc.docx This is line 7 end.
This is line 14: http://www.stu.com/file/4yks2b0u18auy69/Doc.txt
This is line 15 ab cd ef gh ij kl mn op: http://www.mediafire.com/file/8yks2b0u18auy08/Doc.xlsx This is line 8 end.
This is line 16: http://www.wxy.com/file/4yks2b0u18auy69/Doc.txt
This is line 17 ab cd ef gh ij kl mn op qr: http://www.mediafire.com/file/9yks2b0u18auy09/Doc.ptsx This is line 9 end.
This is line 18: http://www.zab.com/file/4yks2b0u18auy69/Doc.txt
This is line 19 ab cd ef gh ij kl mn op qr st: http://www.mediafire.com/file/10yks2b0u18auy10/Doc.txt This is line 10 end.
This is line 20: http://www.zde.com/file/4yks2b0u18auy69/Doc.txt
Re: How to extract data from website?
Posted: 01 May 2017 11:43
by Aacini
The 3-lines Batch file below (save it with .BAT extension) takes less than 1 second to generate the output file with the 56 result lines from your data:
Code: Select all
@set @a=0 // & cscript //nologo //E:JScript "%~F0" < Doc.txt > output.txt & goto :EOF
var search = /http:\/\/www\.mediafire\.com[^"]*/g, file = WScript.StdIn.ReadAll(), match;
while ( match = search.exec(file) ) WScript.Stdout.WriteLine(match[0]);
Output:
Code: Select all
http://www.mediafire.com/file/dbu0pgraknjfma3/Snaper_1.0_By_Lego_Stoppro.zip
http://www.mediafire.com/file/wo15pswxydfkaa5/Hover_Test.zip
http://www.mediafire.com/download/a9yyp9vnmlmhxal/Example_1.zip
. . . . .
http://www.mediafire.com/download/dpm0yti5f8q29fh/swap_Mouse_Buttons.zip
http://www.mediafire.com/download/d1vu3csnlh6i2yi/Rights_Modifier_by_Kvc.zip
http://www.mediafire.com/view/c0cge2ks8i676n2/Hiding_data.bat
Antonio
Re: How to extract data from website?
Posted: 03 May 2017 10:15
by PaperTronics
@Thor: Nice coding but it's kind of slow.
@Aacini: Your example isn't working. I've put in the same folder as Doc.txt. Am I doing something wrong?
Re: How to extract data from website?
Posted: 03 May 2017 11:53
by Thor
PaperTronics wrote:@Thor: Nice coding but it's kind of slow.
Try my code again, it should runs pretty decent now.
Re: How to extract data from website?
Posted: 03 May 2017 11:54
by Aacini
PaperTronics wrote:@Aacini: Your example isn't working. I've put in the same folder as Doc.txt. Am I doing something wrong?
Did you saved the code with .BAT extension? Did you reviewed that the output.txt file was not created? You may also test it removing the "> output.txt" part. If still don't works, please copy the output from the command-line window and paste it here...
Antonio
Re: How to extract data from website?
Posted: 04 May 2017 10:26
by PaperTronics
Aacini wrote:
Did you saved the code with .BAT extension? Did you reviewed that the output.txt file was not created? You may also test it removing the "> output.txt" part. If still don't works, please copy the output from the command-line window and paste it here...
Antonio
I wasn't able to read clearly since CMD was shutting down every time because of the error. I saved it with .BAT extension and output.txt was just a blank file. CMD says something like "Conditional Compiling is turned off"
PaperTronics
Re: How to extract data from website?
Posted: 04 May 2017 10:31
by PaperTronics
Try my code again, it should runs pretty decent now.
It did get a slight bit faster
Re: How to extract data from website?
Posted: 04 May 2017 11:37
by Aacini
PaperTronics wrote:I wasn't able to read clearly since CMD was shutting down every time because of the error. I saved it with .BAT extension and output.txt was just a blank file. CMD says something like "Conditional Compiling is turned off"
PaperTronics
A couple points here:
In the very first place, you should run any problematic Batch file opening a cmd.exe window (the way to do that vary by Windows versions), then execute a CD command to the directory where the Batch file is, and finally run it entering its name. In this way any message remains in the screen, so you may paste it (via a right button click -> Mark), select the desired text pressing Shift key or left button, and press Enter key to end. After that, you may copy such a text. Do NOT run the Batch file from the explorer via a double-click on it.
Accordingly to the documentation, this error should not occur:
the documentation wrote:Conditional compilation is activated by using the @cc_on statement, or using an @if or @set statement.
Please, try this version of the code:
Code: Select all
@if (@CodeSection == @Batch) @then
@echo off
cscript //nologo //E:JScript "%~F0" < Doc.txt > output.txt
goto :EOF
@end
var search = /http:\/\/www\.mediafire\.com[^"]*/g, file = WScript.StdIn.ReadAll(), match;
while ( match = search.exec(file) ) WScript.Stdout.WriteLine(match[0]);
If still don't works,
post the output from the command-line window...
Antonio
Re: How to extract data from website?
Posted: 09 May 2017 07:08
by PaperTronics
The error states
Code: Select all
C:\Users\pratik\Desktop\BatchStore\DummyBase.bat(1, 6) Microsoft JScript compila
tion error: Conditional compilation is turned off
Re: How to extract data from website?
Posted: 10 May 2017 12:35
by Hackoo
Hi
Just give a try with this batch file :
Code: Select all
@echo off
Title Extract Mediafire href links by Hackoo 2017
mode con cols=70 lines=3 & color 9E
Set "vbsfile=%tmp%\%~n0.vbs"
Set "InputFile=Doc.txt"
Set "OutPutFile=All_Links.txt"
set "MediaFireLinks=MediaFireLinks.txt"
echo(
echo Please wait a while ... Extracting is in progress ...
Call :ExtractLinks "%InputFile%" "%OutPutFile%"
Type "%OutPutFile%" | find /i "mediafire" > "%MediaFireLinks%"
start "" "%MediaFireLinks%"
exit
::****************************************************
:ExtractLinks <InputData> <OutPutData>
(
echo InputFile = wscript.Arguments(0^)
echo OutPutFile = wscript.Arguments(1^)
echo Call ExtractLinks(InputFile,OutPutFile^)
echo Function ExtractLinks(inputfile,outfile^)
echo Set fso = CreateObject("Scripting.FileSystemObject"^)
echo Set Link = fso.OpenTextFile(OutPutFile,2,True,-1^)
echo Set f = Fso.OpenTextFile(InputFile,1^)
echo Data = f.ReadAll
echo Set reLink = New RegExp
echo reLink.Global = True
echo reLink.IgnoreCase = True
echo reLink.Pattern = "<a\b[^>]*\bhref=(?:([""'])([\s\S]+?)\1|([^\s>]*))[^>]*>([\s\S]+?)</a>"
echo Set reText = New RegExp
echo reText.GLobal = True
echo reText.Pattern = "<[^>]*>"
echo For Each Match in reLink.Execute(Data^)
echo HREF = Match.SubMatches(1^) ^& Match.SubMatches(2^)
echo 'InnerText = reText.Replace(Match.SubMatches(3^), ""^)
echo Link.WriteLine HREF
echo Next
echo End Function
)>%vbsfile%
cscript /nologo "%vbsfile%" "%~1" "%~2"
exit /b
::**********************************************************************************
Re: How to extract data from website?
Posted: 11 May 2017 06:38
by Hackoo
Hi
This another tweaked version in order to extract all links from source code of a website, and also, can be filtered by string to be searched like
("Mediafire" "Aacini" "Thebateam") and If you want to extract the
InnerText, just uncomment this line after
HREF (get rid from quote)
Link.WriteLine HREF '^& " ========> " ^& InnerText
becomes
Code: Select all
Link.WriteLine HREF ^& " ========> " ^& InnerText
or simply write like that :
So the whole code of
ExtractLinks.batCode: Select all
@echo off
Title Extracting HREF links from website source code by Hackoo 2017
REM Extract all links from source code of a website, and also, can be filtered by string to be searched
mode con cols=75 lines=3 & color 9E
set "vbsfile=%tmp%\%~n0.vbs"
set "InputFile=Doc.txt"
If Not exist "%InputFile%" (
Color 0C
echo(
echo The "%InputFile%" does not exist,please check it and re-run this batch again
pause>nul
exit
)
Set "OutPutFile=All_Links.txt"
set Filter_Strings="Mediafire" "Aacini" "Thebateam"
echo(
echo Please Wait a While ... Extrating Links is in Progress ....
Call :ExtractLinks "%InputFile%" "%OutPutFile%"
For %%a in (%Filter_Strings%) Do (
Type "%OutPutFile%" | find /I %%a > %~dp0%%a_Links.txt
If exist "%~dp0%%a_Links.txt" Start "" "%~dp0%%a_Links.txt"
)
start "" "%OutPutFile%" & Exit
::*************************************************************************************************
:ExtractLinks <InputFile> <OutPutFile>
(
echo InputFile = Wscript.Arguments(0^)
echo OutPutFile = Wscript.Arguments(1^)
echo Call ExtractLinks(InputFile,OutPutFile^)
echo '-------------------------------------------------------------------------------------------
echo Function ExtractLinks(InputFile,OutPutFile^)
echo Set fso = CreateObject("Scripting.FileSystemObject"^)
echo Set f = Fso.OpenTextFile(InputFile,1^)
echo Set Link = fso.OpenTextFile(OutPutfile,2,True,-1^)
echo Data = f.ReadAll
echo Set reLink = New RegExp
echo reLink.Global = True
echo reLink.IgnoreCase = True
echo reLink.Pattern = "<a\b[^>]*\bhref=(?:([""'])([\s\S]+?)\1|([^\s>]*))[^>]*>([\s\S]+?)</a>"
echo Set reText = New RegExp
echo reText.GLobal = True
echo reText.Pattern = "<[^>]*>"
echo For Each Match in reLink.Execute(Data^)
echo HREF = Match.SubMatches(1^) ^& Match.SubMatches(2^)
echo InnerText = reText.Replace(Match.SubMatches(3^), ""^)
echo 'If you want to extract the InnerText just uncomment this line after HREF (get rid from quote^)
echo Link.WriteLine HREF '^& " ========> " ^& InnerText
echo Next
echo End Function
echo '-------------------------------------------------------------------------------------------
)>"%vbsfile%"
Cscript /nologo "%vbsfile%" "%~1" "%~2"
exit /b
::*************************************************************************************************