Extract information from a website to txt
Moderator: DosItHelp
Extract information from a website to txt
Hello,
i have a big request and im not good in batch scripting at all...
So, here's my problem:
The cmd should start and should ask for a link (Set /P variable=Paste Link:)
I want to extract the title, the Episodes,etc. basically everything from the link. (The site should be https://www.anisearch.de/anime/9357,tokyo-ghoul )
How do i do that? I know that is possible with Python, but i dont understand python at all, too...
Thanks!
i have a big request and im not good in batch scripting at all...
So, here's my problem:
The cmd should start and should ask for a link (Set /P variable=Paste Link:)
I want to extract the title, the Episodes,etc. basically everything from the link. (The site should be https://www.anisearch.de/anime/9357,tokyo-ghoul )
How do i do that? I know that is possible with Python, but i dont understand python at all, too...
Thanks!
-
- Expert
- Posts: 1166
- Joined: 06 Sep 2013 21:28
- Location: Virginia, United States
Re: Extract information from a website to txt
Is there an API you can call instead? I had to do something like this one with TheTVDB and calling the API was easier than grabbing the entire page with curl and parsing it.
Re: Extract information from a website to txt
The web site doesn't seem to have an API. To avoid the installation of any kind of 3rd party utilities I'd suggest to automate Internet Explorer using VBScript or JScript.
http://stackoverflow.com/questions/16629228/extract-text-between-html-tags
Steffen
http://stackoverflow.com/questions/16629228/extract-text-between-html-tags
Steffen
Re: Extract information from a website to txt
Could you tell me how to do that or do you have a tutorial anywhere? :/
Re: Extract information from a website to txt
That's quite a lot you have to learn. If you want to use it in a Batch code you have to learn Batch and how to write hybrid scripts. For the hybrid scripts you may use JScript and thus, you have to learn JScrip. If you want to automate the Internet Explorer you have to learn how to create a InternetExplorer object and how to use its properties and methods. Last but not least if you want to access elements of an HTML source text you have to learn how to work with the HTML document object model using JScript.
This having said you can imagine that there isn't simply a single tutorial that teaches you everything at once.
You want to see an example? Here you are
Steffen
This having said you can imagine that there isn't simply a single tutorial that teaches you everything at once.
You want to see an example? Here you are
Code: Select all
@if (@a)==(@b) @end /* Batch part:
@echo off &setlocal
for /f "delims=" %%i in ('cscript //nologo //e:jscript "%~fs0"') do set "name=%%i"
echo %name%
pause
exit /b
JScript Part : */
var ie = null;
try {
ie = new ActiveXObject('InternetExplorer.Application');
ie.Navigate('https://www.anisearch.de/anime/9357,tokyo-ghoul');
while (ie.Busy) { WScript.Sleep(100); }
var name = ie.document.getElementById('content').getElementsByTagName('header')[0].getElementsByTagName('div')[0].getElementsByTagName('h1')[0].getElementsByTagName('a')[1].getElementsByTagName('span')[0].innerText;
ie.Quit();
ie = null;
WScript.Echo(name);
}
catch(e) {
if (ie != null) { ie.Quit(); }
WScript.Echo('Error!');
}
Steffen
-
- Posts: 16
- Joined: 25 Feb 2017 12:55
- Location: Russia
Re: Extract information from a website to txt
Trivial job for external tools.
curl(wget) host | grep regex | sed regex
curl(wget) host | grep regex | sed regex
-
- Posts: 118
- Joined: 02 Apr 2017 06:11
Re: Extract information from a website to txt
The simple solution is: When the user gives that link, you add "view-source:" to the beginning of the link and then download that file using download.exe which can be found here: http://www.f2ko.de/en/cmd.php
After downloading the HTML code of the website you can extract the episodes, titles etc. whatever you want from that code.
If you don't have HTML knowledge tell me I'll extract the titles etc. for you.
After downloading the HTML code of the website you can extract the episodes, titles etc. whatever you want from that code.
If you don't have HTML knowledge tell me I'll extract the titles etc. for you.
Re: Extract information from a website to txt
I asked them for an API and they will help me^^ If i get the API, would you help me making the tool?
Re: Extract information from a website to txt
You can try with winhttpjs.bat:
Though it will not render the javascript. Also the downloaded file probably will be not well-formatted xml and you wont be able to process it with an xml tool.
Probably the aGerman's option is the best you have.
You can take look also to phantomjs if the pages are not compatible with IE.
Code: Select all
call winhttpjs.bat "https://www.anisearch.de/anime/9357,tokyo-ghoul" -saveto tokyo-ghoul.txt
Though it will not render the javascript. Also the downloaded file probably will be not well-formatted xml and you wont be able to process it with an xml tool.
Probably the aGerman's option is the best you have.
You can take look also to phantomjs if the pages are not compatible with IE.
-
- Expert
- Posts: 1166
- Joined: 06 Sep 2013 21:28
- Location: Virginia, United States
Re: Extract information from a website to txt
Tami wrote:I asked them for an API and they will help me^^ If i get the API, would you help me making the tool?
Honestly, the API will make it so easy that you'll likely be able to figure it out yourself, but sure.