| Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente |
| httrack [Le 23/10/2016, 02:26] – [Aspiration de sites avec httrack] fouadessahlaoui | httrack [Le 19/01/2025, 20:39] (Version actuelle) – ancienne révision (Le 27/01/2024, 10:13) restaurée Amiralgaby |
|---|
| {{tag>Lucid Precise Quantal internet développement}} | {{tag>Bionic internet programmation BROUILLON}} |
| |
| ---- | ---- |
| |
| httrack --mirror http://website.com | |
| |
| ====== Aspiration de sites avec httrack ====== | ====== Aspiration de sites avec httrack ====== |
| **Httrack** est un célèbre aspirateur de sites web. | **Httrack** est un célèbre aspirateur de sites web. |
| |
| === Avertissement === | <note warning> |
| //Les sites volumineux (le forum et la documentation Ubuntu-fr compris), **ne doivent pas** être aspirés automatiquement, sous peine de blocage de votre adresse IP par le site. L'aspiration de sites doit respecter une certaine éthique et doit être utilisée uniquement lorsqu'il y a un besoin d'accéder à des contenus hors lignes. L'aspiration demande au site visé des ressources matérielles bien plus importante que le simple affichage d'une page web. Demandez l'autorisation au webmaster avant de procéder ! N'oublions pas non plus les problématiques liées à la propriété intellectuelle.// | Les sites volumineux (le forum et la documentation Ubuntu-fr compris), **ne doivent pas** être aspirés automatiquement, sous peine de blocage de votre adresse IP par le site. L'aspiration de sites doit respecter une certaine éthique et doit être utilisée uniquement lorsqu'il y a un besoin d'accéder à des contenus hors lignes. L'aspiration demande au site visé des ressources matérielles bien plus importante que le simple affichage d'une page web. Demandez l'autorisation au webmaster avant d'agir ! N'oublions pas non plus les problématiques liées à la propriété intellectuelle.</note> |
| |
| |
| ===== Installation ===== | ===== Installation ===== |
| Il existe deux versions de httrack : | Il existe deux versions de httrack : |
| * La version de base : [[:tutoriel:comment_installer_un_paquet|installez le paquet]] **[[apt://httrack|httrack]]** (dépôt Universe). | * La version de base : [[:tutoriel:comment_installer_un_paquet|installez le paquet]] **[[apt>httrack]]** |
| * La version graphique, qui va utiliser votre navigateur préféré : [[:tutoriel:comment_installer_un_paquet|installez le paquet]] **[[apt://webhttrack|webhttrack]]** (dépôt Universe). | * La version graphique, qui va utiliser votre navigateur préféré : [[:tutoriel:comment_installer_un_paquet|installez le paquet]] **[[apt>webhttrack]]**. |
| |
| |
| | =====Utilisation===== |
| | httrack --mirror http://website.com |
| |
| httrack(1) General Commands Manual httrack(1) | httrack(1) General Commands Manual httrack(1) |
| -%s update hacks: various hacks to limit re-transfers when updating (identical size, bogus response..) (--updatehack) | -%s update hacks: various hacks to limit re-transfers when updating (identical size, bogus response..) (--updatehack) |
| |
| -%u url hacks: various hacks to limit duplicate URLs (strip //, www.foo.com==foo.com..) (--urlhack) | -%u url hacks: various hacks to limit duplicate URLs (strip , www.foo.com==foo.com..) (--urlhack) |
| |
| -%A assume that a type (cgi,asp..) is always linked with a mime type (-%A php3,cgi=text/html;dat,bin=application/x-zip) (--assume <param>) | -%A assume that a type (cgi,asp..) is always linked with a mime type (-%A php3,cgi=text/html;dat,bin=application/x-zip) (--assume <param>) |
| |
| |
| Details: Option K | |
| -K0 foo.cgi?q=45 -> foo4B54.html?q=45 (relative URI, default) | |
| |
| -K -> http://www.foobar.com/folder/foo.cgi?q=45 (absolute URL) (--keep-links[=N]) | |
| |
| -K3 -> /folder/foo.cgi?q=45 (absolute URI) | |
| |
| -K4 -> foo.cgi?q=45 (original URL) | |
| |
| -K5 -> http://www.foobar.com/folder/foo4B54.html?q=45 (transparent proxy URL) | |
| |
| |
| Shortcuts: | |
| --mirror | |
| <URLs> *make a mirror of site(s) (default) | |
| |
| --get | |
| <URLs> get the files indicated, do not seek other URLs (-qg) | |
| |
| --list | |
| <text file> add all URL located in this text file (-%L) | |
| |
| --mirrorlinks | |
| <URLs> mirror all links in 1st level pages (-Y) | |
| |
| --testlinks | |
| <URLs> test links in pages (-r1p0C0I0t) | |
| |
| --spider | |
| <URLs> spider site(s), to test links: reports Errors & Warnings (-p0C0I0t) | |
| |
| --testsite | |
| <URLs> identical to --spider | |
| |
| --skeleton | |
| <URLs> make a mirror, but gets only html files (-p1) | |
| |
| --update | |
| update a mirror, without confirmation (-iC2) | |
| |
| --continue | |
| continue a mirror, without confirmation (-iC1) | |
| |
| |
| --catchurl | |
| create a temporary proxy to capture an URL or a form post URL | |
| |
| --clean | |
| erase cache & log files | |
| |
| |
| --http10 | |
| force http/1.0 requests (-%h) | |
| |
| |
| Details: Option %W: External callbacks prototypes | |
| see htsdefines.h | |
| FILES | |
| /etc/httrack.conf | |
| The system wide configuration file. | |
| |
| ENVIRONMENT | |
| HOME Is being used if you defined in /etc/httrack.conf the line path ~/websites/# | |
| |
| DIAGNOSTICS | |
| Errors/Warnings are reported to hts-log.txt by default, or to stderr if the -v option was specified. | |
| |
| LIMITS | |
| These are the principals limits of HTTrack for that moment. Note that we did not heard about any other utility that would have solved them. | |
| |
| |
| - Several scripts generating complex filenames may not find them (ex: img.src='image'+a+Mobj.dst+'.gif') | |
| |
| - Some java classes may not find some files on them (class included) | |
| |
| - Cgi-bin links may not work properly in some cases (parameters needed). To avoid them: use filters like -*cgi-bin* | |
| |
| BUGS | |
| Please reports bugs to <bugs@httrack.com>. Include a complete, self-contained example that will allow the bug to be reproduced, and say which version of | |
| httrack you are using. Do not forget to detail options used, OS version, and any other information you deem necessary. | |
| |
| COPYRIGHT | |
| Copyright (C) 1998-2014 Xavier Roche and other contributors | |
| |
| This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Soft†| |
| ware Foundation, either version 3 of the License, or (at your option) any later version. | |
| |
| This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS | |
| FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. | |
| |
| You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. | |
| |
| |
| AVAILABILITY | |
| The most recent released version of httrack can be found at: http://www.httrack.com | |
| |
| AUTHOR | |
| Xavier Roche <roche@httrack.com> | |
| |
| SEE ALSO | |
| The HTML documentation (available online at http://www.httrack.com/html/ ) contains more detailed information. Please also refer to the httrack FAQ | |
| (available online at http://www.httrack.com/html/faq.html ) | |
| |
| |
| |
| httrack website copier 28 July 2014 httrack(1) | |
| |
| ===== Utilisation en ligne de commande ===== | ===== Utilisation en ligne de commande ===== |