Discussion forum for all Windows batch related topics.
Moderator: DosItHelp
-
PaperTronics
- Posts: 118
- Joined: 02 Apr 2017 06:11
#16
Post
by PaperTronics » 12 May 2017 07:41
Hackoo wrote:Hi
This another tweaked version in order to extract all links from source code of a website, and also, can be filtered by string to be searched like
("Mediafire" "Aacini" "Thebateam") and If you want to extract the
InnerText, just uncomment this line after
HREF (get rid from quote)
Link.WriteLine HREF '^& " ========> " ^& InnerText
becomes
Code: Select all
Link.WriteLine HREF ^& " ========> " ^& InnerText
or simply write like that :
So the whole code of
ExtractLinks.batCode: Select all
@echo off
Title Extracting HREF links from website source code by Hackoo 2017
REM Extract all links from source code of a website, and also, can be filtered by string to be searched
mode con cols=75 lines=3 & color 9E
set "vbsfile=%tmp%\%~n0.vbs"
set "InputFile=Doc.txt"
If Not exist "%InputFile%" (
Color 0C
echo(
echo The "%InputFile%" does not exist,please check it and re-run this batch again
pause>nul
exit
)
Set "OutPutFile=All_Links.txt"
set Filter_Strings="Mediafire" "Aacini" "Thebateam"
echo(
echo Please Wait a While ... Extrating Links is in Progress ....
Call :ExtractLinks "%InputFile%" "%OutPutFile%"
For %%a in (%Filter_Strings%) Do (
Type "%OutPutFile%" | find /I %%a > %~dp0%%a_Links.txt
If exist "%~dp0%%a_Links.txt" Start "" "%~dp0%%a_Links.txt"
)
start "" "%OutPutFile%" & Exit
::*************************************************************************************************
:ExtractLinks <InputFile> <OutPutFile>
(
echo InputFile = Wscript.Arguments(0^)
echo OutPutFile = Wscript.Arguments(1^)
echo Call ExtractLinks(InputFile,OutPutFile^)
echo '-------------------------------------------------------------------------------------------
echo Function ExtractLinks(InputFile,OutPutFile^)
echo Set fso = CreateObject("Scripting.FileSystemObject"^)
echo Set f = Fso.OpenTextFile(InputFile,1^)
echo Set Link = fso.OpenTextFile(OutPutfile,2,True,-1^)
echo Data = f.ReadAll
echo Set reLink = New RegExp
echo reLink.Global = True
echo reLink.IgnoreCase = True
echo reLink.Pattern = "<a\b[^>]*\bhref=(?:([""'])([\s\S]+?)\1|([^\s>]*))[^>]*>([\s\S]+?)</a>"
echo Set reText = New RegExp
echo reText.GLobal = True
echo reText.Pattern = "<[^>]*>"
echo For Each Match in reLink.Execute(Data^)
echo HREF = Match.SubMatches(1^) ^& Match.SubMatches(2^)
echo InnerText = reText.Replace(Match.SubMatches(3^), ""^)
echo 'If you want to extract the InnerText just uncomment this line after HREF (get rid from quote^)
echo Link.WriteLine HREF '^& " ========> " ^& InnerText
echo Next
echo End Function
echo '-------------------------------------------------------------------------------------------
)>"%vbsfile%"
Cscript /nologo "%vbsfile%" "%~1" "%~2"
exit /b
::*************************************************************************************************
I appreciate the time and effort you've put into this code. Can you:
1. Make it extract the "
TheBATeam" Links only? I was too lazy to dig in the source code.
2. Make it not echo the "Please Wait... Extracting Links in Progress" thingy?
3. Make it not open the files when the extracting is finished.
I have to say, your method is pretty fast!
Thanks,
PaperTronics
-
Hackoo
- Posts: 103
- Joined: 15 Apr 2014 17:59
#17
Post
by Hackoo » 12 May 2017 08:39
PaperTronics wrote:I appreciate the time and effort you've put into this code. Can you:
1. Make it extract the "TheBATeam" Links only? I was too lazy to dig in the source code.
2. Make it not echo the "Please Wait... Extracting Links in Progress" thingy?
3. Make it not open the files when the extracting is finished.
I have to say, your method is pretty fast!
Thanks,
PaperTronics
I have one question : Which tool or script did you use to get the source code of the website ?
Here is the modification that you request for it
Code: Select all
@echo off
Title Extracting HREF links from website source code by Hackoo 2017
REM Extract all links from source code of a website, and also, can be filtered by string to be searched
mode con cols=75 lines=3 & color 9E
set "vbsfile=%tmp%\%~n0.vbs"
set "InputFile=Doc.txt"
If Not exist "%InputFile%" (
Color 0C
echo(
echo The "%InputFile%" does not exist,please check it and re-run this batch again
pause>nul
exit
)
Set "OutPutFile=All_Links.txt"
set Filter_Strings="Thebateam"
Call :ExtractLinks "%InputFile%" "%OutPutFile%"
For %%a in (%Filter_Strings%) Do (
Type "%OutPutFile%" | find /I %%a > %~dp0%%a_Links.txt
)
Exit
::*************************************************************************************************
:ExtractLinks <InputFile> <OutPutFile>
(
echo InputFile = Wscript.Arguments(0^)
echo OutPutFile = Wscript.Arguments(1^)
echo Call ExtractLinks(InputFile,OutPutFile^)
echo '-------------------------------------------------------------------------------------------
echo Function ExtractLinks(InputFile,OutPutFile^)
echo Set fso = CreateObject("Scripting.FileSystemObject"^)
echo Set f = Fso.OpenTextFile(InputFile,1^)
echo Set Link = fso.OpenTextFile(OutPutfile,2,True,-1^)
echo Data = f.ReadAll
echo Set reLink = New RegExp
echo reLink.Global = True
echo reLink.IgnoreCase = True
echo reLink.Pattern = "<a\b[^>]*\bhref=(?:([""'])([\s\S]+?)\1|([^\s>]*))[^>]*>([\s\S]+?)</a>"
echo Set reText = New RegExp
echo reText.GLobal = True
echo reText.Pattern = "<[^>]*>"
echo For Each Match in reLink.Execute(Data^)
echo HREF = Match.SubMatches(1^) ^& Match.SubMatches(2^)
echo InnerText = reText.Replace(Match.SubMatches(3^), ""^)
echo Link.WriteLine HREF
echo Next
echo End Function
echo '-------------------------------------------------------------------------------------------
)>"%vbsfile%"
Cscript /nologo "%vbsfile%" "%~1" "%~2"
exit /b
::*************************************************************************************************
-
thefeduke
- Posts: 211
- Joined: 05 Apr 2015 13:06
- Location: MA South Shore, USA
#18
Post
by thefeduke » 12 May 2017 16:19
Hackoo wrote:Here is the modification that you request for it
Good work. This reply is directed more to you, than @PaperTronics. I don't know what editor you use but escaping those special characters using ECHO looks tedious. I favor a more WYSIWYG technique. I use it for most of my inline test files. Here is your code slightly modified so that the .vbs code is entered more simply.
Code: Select all
@echo off
Title Extracting HREF links from website source code by Hackoo 2017
::
:: Posted: Fri May 12, 2017 10:39 am by Hackoo
:: http://www.dostips.com/forum/viewtopic.php?p=52303#p52303
:: Post subject: Re: How to extract data from website?
:: thefeduke altered :ExtractLinks to eliminate those hard to work with ECHOes
::
REM Extract all links from source code of a website, and also, can be filtered by string to be searched
mode con cols=75 lines=3 & color 9E
set "vbsfile=%tmp%\%~n0.vbs"
set "InputFile=Doc.txt"
If Not exist "%InputFile%" (
Color 0C
echo(
echo The "%InputFile%" does not exist,please check it and re-run this batch again
pause>nul
exit
)
Set "OutPutFile=All_Links.txt"
set Filter_Strings="Thebateam"
Call :ExtractLinks "%InputFile%" "%OutPutFile%"
For %%a in (%Filter_Strings%) Do (
Type "%OutPutFile%" | find /I %%a > %~dp0%%a_Links.txt
)
Exit
::*************************************************************************************************
:ExtractLinks <InputFile> <OutPutFile>
(
echo InputFile = Wscript.Arguments(0^)
echo OutPutFile = Wscript.Arguments(1^)
echo Call ExtractLinks(InputFile,OutPutFile^)
echo '-------------------------------------------------------------------------------------------
echo Function ExtractLinks(InputFile,OutPutFile^)
echo Set fso = CreateObject("Scripting.FileSystemObject"^)
echo Set f = Fso.OpenTextFile(InputFile,1^)
echo Set Link = fso.OpenTextFile(OutPutfile,2,True,-1^)
echo Data = f.ReadAll
echo Set reLink = New RegExp
echo reLink.Global = True
echo reLink.IgnoreCase = True
echo reLink.Pattern = "<a\b[^>]*\bhref=(?:([""'])([\s\S]+?)\1|([^\s>]*))[^>]*>([\s\S]+?)</a>"
echo Set reText = New RegExp
echo reText.GLobal = True
echo reText.Pattern = "<[^>]*>"
echo For Each Match in reLink.Execute(Data^)
echo HREF = Match.SubMatches(1^) ^& Match.SubMatches(2^)
echo InnerText = reText.Replace(Match.SubMatches(3^), ""^)
echo Link.WriteLine HREF
echo Next
echo End Function
echo '-------------------------------------------------------------------------------------------
)>"%vbsfile%"
Rem.Cscript /nologo "%vbsfile%" "%~1" "%~2"
Call :TempFile "WebSite.vbs" "file"
Call Cscript /nologo "%Temp%\~Scripts~\%~n0_WebSite.vbs" "%~1" "%~2"
exit /b
::*************************************************************************************************
GoTo :EndOfWebSite.vbsFile
InputFile = Wscript.Arguments(0)
OutPutFile = Wscript.Arguments(1)
Call ExtractLinks(InputFile,OutPutFile)
'-------------------------------------------------------------------------------------------
Function ExtractLinks(InputFile,OutPutFile)
Set fso = CreateObject("Scripting.FileSystemObject")
Set f = Fso.OpenTextFile(InputFile,1)
Set Link = fso.OpenTextFile(OutPutfile,2,True,-1)
Data = f.ReadAll
Set reLink = New RegExp
reLink.Global = True
reLink.IgnoreCase = True
reLink.Pattern = "<a\b[^>]*\bhref=(?:([""'])([\s\S]+?)\1|([^\s>]*))[^>]*>([\s\S]+?)</a>"
Set reText = New RegExp
reText.GLobal = True
reText.Pattern = "<[^>]*>"
For Each Match in reLink.Execute(Data)
HREF = Match.SubMatches(1) & Match.SubMatches(2)
InnerText = reText.Replace(Match.SubMatches(3), "")
Link.WriteLine HREF
Next
End Function
'-------------------------------------------------------------------------------------------
:EndOfWebSite.vbsFile
:TempFile Name.Ext_Val[In] OutFormatVal[In]
@echo Off & SetLocal EnableDelayedExpansion
If %~1==. First-argument-is-mandatory-but-an-empty-string
If %~2==. Second-argument-is-mandatory-but-an-empty-string
For %%E In ("%~1") DO Set "fName=%%~nxE"
If Not Exist "%Temp%\~Scripts~" MkDir "%Temp%\~Scripts~"
If Exist "%Temp%\~Scripts~\%~n0_%fName%" DEL "%Temp%\~Scripts~\%~n0_%fName%"
For /f "delims=:" %%i in (
'findstr /nir /c:"^goto[ ]*\:EndOf%fName%" /c:"^\:EndOf%fName%" "%~fs0"'
) Do Set "DataRange=!DataRange! %%i"
For /f "tokens=1,2" %%i in ("%DataRange%") Do (Set /A "BeginData=%%i+1" & Set /A "EndData=%%j-1")
(For /L %%i In (2 1 %BeginData%) Do Set /P "="
For /L %%i In (!BeginData! 1 %EndData%) Do (
Set "line=" &Set /P "line="
If /I "%~2" EQU "File" Echo(!line!
set "whole=!Whole!!line!"
)
If /I "%~2" EQU "Line" Echo(!whole!
) < "%~f0" >"%Temp%\~Scripts~\%~n0_%fName%"
EndLocal
Exit /B
John A.
-
PaperTronics
- Posts: 118
- Joined: 02 Apr 2017 06:11
#19
Post
by PaperTronics » 13 May 2017 06:36
Thnx for the help @Hackoo and @thefeduke
I'm having another problem in my program. The links being extracted are in a wrong order not because of your code but because of the website's source code. So I want to sort the links.
I found a way to do this using the the year and date of publish in the middle of the links. e.g :
http://www.thebateam.org/2017/02/how-to-customize-cmd-completely-by.html
But the code I tried to apply wasn't working. I need some pro help here.
Answer to Hackoo's question :
I didn't use anything to get the source code of the site. thebateam.org is my own website so that's why I easily got the source code of it. Though if you wanna do the same to any other website I prefer :
1. Go to the website from which you want to extract the code
2. Press Ctrl+U. The source code should open immediately in a new tab
Thanks,
PaperTronics
-
Aacini
- Expert
- Posts: 1914
- Joined: 06 Dec 2011 22:15
- Location: México City, México
-
Contact:
#20
Post
by Aacini » 13 May 2017 14:39
Aacini wrote:Please, try this version of the code:
Code: Select all
@if (@CodeSection == @Batch) @then
@echo off
cscript //nologo //E:JScript "%~F0" < Doc.txt > output.txt
goto :EOF
@end
var search = /http:\/\/www\.mediafire\.com[^"]*/g, file = WScript.StdIn.ReadAll(), match;
while ( match = search.exec(file) ) WScript.Stdout.WriteLine(match[0]);
If still don't works,
post the output from the command-line window...
Antonio
PaperTronics wrote:The error states
Code: Select all
C:\Users\pratik\Desktop\BatchStore\DummyBase.bat(1, 6) Microsoft JScript compilation error: Conditional compilation is turned off
Ok. Such an error is unusual. However, the next version
should correctly run in your computer:
Code: Select all
@echo off
> extract.js echo var search = /http:\/\/www\.mediafire\.com[^^"]*/g, file = WScript.StdIn.ReadAll(), match;
>> extract.js echo while ( match = search.exec(file) ) WScript.Stdout.WriteLine(match[0]);
cscript //nologo extract.js < Doc.txt
PaperTronics wrote:I'm having another problem in my program. The links being extracted are in a wrong order not because of your code but because of the website's source code. So I want to sort the links.
I found a way to do this using the the year and date of publish in the middle of the links. e.g :
http://www.thebateam.org/2017/02/how-to-customize-cmd-completely-by.html
But the code I tried to apply wasn't working. I need some pro help here.
This works here:
Code: Select all
@echo off
> extract.js echo var search = /http:\/\/www\.thebateam\.org[^^"]*/g, file = WScript.StdIn.ReadAll(), match;
>> extract.js echo while ( match = search.exec(file) ) WScript.Stdout.WriteLine(match[0]);
cscript //nologo extract.js < Doc.txt | sort /+26 > output.txt
Antonio
-
PaperTronics
- Posts: 118
- Joined: 02 Apr 2017 06:11
#21
Post
by PaperTronics » 19 May 2017 06:14
Thnx Aacini the method of sorting and extracting is working perfect now. Now all I need is a downloading program which downloads any file from the internet. Although I had one, but it wasn't that much good and was a lil' bit buggy.
-
Hackoo
- Posts: 103
- Joined: 15 Apr 2014 17:59
#22
Post
by Hackoo » 19 May 2017 11:09
PaperTronics wrote:Now all I need is a downloading program which downloads any file from the internet. Although I had one, but it wasn't that much good and was a lil' bit buggy.
Hi
Can you provide us a sample direct link to test the downloading ?
-
Hackoo
- Posts: 103
- Joined: 15 Apr 2014 17:59
#24
Post
by Hackoo » 20 May 2017 09:11
Hi
NB: To ensure that this script works, you should put a direct link of the URL Just give a try for this batch that can download your file on your desktop
I tested it before posting this here, and it's works for me 5/5, and i hope that will work on your side too !
Code: Select all
@echo off
Title Batch script to download a file from a direct link by Hackoo
Color 9E & Mode con cols=90 lines=3
Set "URL=http://download1334.mediafire.com/i77fj7bj37xg/4tovbku6kcercc7/Speecher.rar"
REM To extract the name of the file to be downloaded from the URL.
For %%F in (%URL%) Do (
Set "MyProgram=%%~nxF"
Set "MyProgram_Name=%%~nF"
)
REM We set the Location of MyProgram where to be downloaded
Set "Location=%userprofile%\Desktop\%MyProgram%"
REM If there is any previous version of MyProgram we delete it.
If Exist "%Location%" Del "%Location%"
REM We download the last version of MyProgram from its original web site.
If Not Exist "%Location%" (
echo(
echo Please wait a while ... Downloading the last version of "%MyProgram_Name%" is in progress ...
Call:Download "%URL%" "%Location%"
)
Explorer.exe /select,"%Location%"
Exit
::*********************************************************************************
:Download <url> <File>
Powershell.exe -command "(New-Object System.Net.WebClient).DownloadFile('%1','%2')"
exit /b
::*********************************************************************************
-
aGerman
- Expert
- Posts: 4678
- Joined: 22 Jan 2010 18:01
- Location: Germany
#25
Post
by aGerman » 20 May 2017 10:00
Mediafire doesn't want you to download the file directly because they offer their service for free. The way they earn money is via advertising. Thus, the id "i77fj7bj37xg" was "2xcjc69j2npg" when I browsed the site. I don't say it's impossible, but you would need to make a lot of efforts to get the current direct-link to download the file.
Steffen
-
PaperTronics
- Posts: 118
- Joined: 02 Apr 2017 06:11
#26
Post
by PaperTronics » 21 May 2017 01:55
As @aGerman said it's a lot of effort to get the current direct download link to the file, I think I should use the alternative method which is to:
Download all the source code files of the mediafire links, use Aacini's algorithm to find the names of the program and then use download.exe(the previous program which I had selected for downloading the files) to download those programs
Actually the only problem with download.exe is that it requires the program's name of which it's downloading, so that's why I asked y'all to suggest me another downloading program
-
igor_andreev
- Posts: 16
- Joined: 25 Feb 2017 12:55
- Location: Russia
#27
Post
by igor_andreev » 21 May 2017 02:36
Approximate order of actions
1. Download by wget mediafire-URL to anyname.tmp
2. Find in anyname.tmp(it's just html-page) line with words "DownloadButtonAd-startDownload gbtnSecondary"
3. Extract direct link
i made step 2 and step 3 in one line with sed&grep:
type anyname.tmp | sed s/\x27/\n/g | grep -o "^http:\/\/.*$"
or by sed only:
type anyname.tmp | sed s/\x27/\n/g | sed "/^http:\/\/.*$/!d"
4. wget direct-link
5. Profit
-
aGerman
- Expert
- Posts: 4678
- Joined: 22 Jan 2010 18:01
- Location: Germany
#28
Post
by aGerman » 21 May 2017 04:23
A few weeks ago we had a similar topic.
viewtopic.php?f=3&t=7797Adapted to meet your requirements:
Code: Select all
@if (@a)==(@b) @end /* Batch part:
@echo off &setlocal
:: mediafire site where to find the direct link directory where to save the file
cscript //nologo //e:jscript "%~fs0" "http://www.mediafire.com/file/4tovbku6kcercc7/Speecher.rar" "%userprofile%\Desktop"
pause
exit /b
JScript Part : */
var objIE = null;
try {
WScript.Echo('Searching link ...');
objIE = new ActiveXObject('InternetExplorer.Application');
// objIE.Visible = true;
objIE.Navigate(WScript.Arguments(0));
while (objIE.Busy) { WScript.Sleep(100); }
WScript.Sleep(3000);
var link = objIE.document.getElementsByClassName('DownloadButtonAd-startDownload gbtnSecondary')[0].getAttribute('href');
WScript.Echo('Found: ' + link);
WScript.Echo('Downloading ...');
var objXMLHTTP = new ActiveXObject('MSXML2.ServerXMLHTTP');
objXMLHTTP.open('GET', link, false);
objXMLHTTP.send();
var objADOStream = new ActiveXObject('ADODB.Stream');
objADOStream.Type = 1;
objADOStream.Mode = 3;
objADOStream.Open();
objADOStream.Write(objXMLHTTP.responseBody);
objADOStream.Position = 0;
objIE.Quit();
objIE = null;
WScript.Echo('Saving ...');
var objFSO = new ActiveXObject('Scripting.FileSystemObject');
objADOStream.SaveToFile(objFSO.BuildPath(WScript.Arguments(1), objFSO.GetFileName(link)), 2);
objADOStream.Close();
WScript.Quit(0);
}
catch(e) {
if (objIE != null) { objIE.Quit(); }
WScript.Echo('Error!');
WScript.Quit(1);
}
Even if loading the site has been completed the link isn't available immediately. It will be updated after a while. In the meantime you'll see "Preparing Download" if you browse the site manually. I can't predict how long it takes. That's the reason why I added a 3 seconds delay (WScript.Sleep(3000);). It might or might not be too long.
You should be aware that as soon as Mediafire decides to change the site (e.g. they change the class name of the style) the script won't work anymore.
Steffen
-
PaperTronics
- Posts: 118
- Joined: 02 Apr 2017 06:11
#29
Post
by PaperTronics » 26 May 2017 05:37
@aGerman
The script was working when I placed it on my desktop. So I copy/pasted the script to my program's code and it gave the same error as Aacini's code :
Code: Select all
C:\Users\pratik\Desktop\BATCHS~1\DUMMYB~1.BAT(1, 6) Microsoft JScript compilatio
n error: Conditional compilation is turned off
I did some research and found that whenever I place some batch code in the same file then it gives this error so I made a separate file in which I placed your code and in my original file I wrote the command
but still the same error. I don't know if something is wrong with my computer or what.
-
thefeduke
- Posts: 211
- Joined: 05 Apr 2015 13:06
- Location: MA South Shore, USA
#30
Post
by thefeduke » 26 May 2017 09:09
PaperTronics wrote:so I made a separate file in which I placed your code and in my original file I wrote the command
but still the same error. I don't know if something is wrong with my computer or what.
The first operand of the start command is not the program name but the title of the started window. You can use "" as a default, as in
John A.