summaryrefslogtreecommitdiffstats
path: root/modules/snoopy
diff options
context:
space:
mode:
Diffstat (limited to 'modules/snoopy')
-rw-r--r--modules/snoopy/AUTHORS11
-rw-r--r--modules/snoopy/COPYING.lib458
-rw-r--r--modules/snoopy/ChangeLog105
-rw-r--r--modules/snoopy/FAQ14
-rw-r--r--modules/snoopy/NEWS61
-rw-r--r--modules/snoopy/Snoopy.class.php1250
-rw-r--r--modules/snoopy/TODO9
7 files changed, 1908 insertions, 0 deletions
diff --git a/modules/snoopy/AUTHORS b/modules/snoopy/AUTHORS
new file mode 100644
index 00000000..dbbe3f4f
--- /dev/null
+++ b/modules/snoopy/AUTHORS
@@ -0,0 +1,11 @@
+Monte Ohrt <monte@ispi.net>
+ - main Snoopy work
+
+Andrei Zmievski <andrei@ispi.net>
+ - miscellaneous fixes
+ - read timeout support
+ - file submission capability
+
+Gene Wood <gene_wood@users.sourceforge.net>
+ - bug fixes
+ - security fixes
diff --git a/modules/snoopy/COPYING.lib b/modules/snoopy/COPYING.lib
new file mode 100644
index 00000000..3b204400
--- /dev/null
+++ b/modules/snoopy/COPYING.lib
@@ -0,0 +1,458 @@
+ GNU LESSER GENERAL PUBLIC LICENSE
+ Version 2.1, February 1999
+
+ Copyright (C) 1991, 1999 Free Software Foundation, Inc.
+ 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+[This is the first released version of the Lesser GPL. It also counts
+ as the successor of the GNU Library Public License, version 2, hence
+ the version number 2.1.]
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+Licenses are intended to guarantee your freedom to share and change
+free software--to make sure the software is free for all its users.
+
+ This license, the Lesser General Public License, applies to some
+specially designated software packages--typically libraries--of the
+Free Software Foundation and other authors who decide to use it. You
+can use it too, but we suggest you first think carefully about whether
+this license or the ordinary General Public License is the better
+strategy to use in any particular case, based on the explanations below.
+
+ When we speak of free software, we are referring to freedom of use,
+not price. Our General Public Licenses are designed to make sure that
+you have the freedom to distribute copies of free software (and charge
+for this service if you wish); that you receive source code or can get
+it if you want it; that you can change the software and use pieces of
+it in new free programs; and that you are informed that you can do
+these things.
+
+ To protect your rights, we need to make restrictions that forbid
+distributors to deny you these rights or to ask you to surrender these
+rights. These restrictions translate to certain responsibilities for
+you if you distribute copies of the library or if you modify it.
+
+ For example, if you distribute copies of the library, whether gratis
+or for a fee, you must give the recipients all the rights that we gave
+you. You must make sure that they, too, receive or can get the source
+code. If you link other code with the library, you must provide
+complete object files to the recipients, so that they can relink them
+with the library after making changes to the library and recompiling
+it. And you must show them these terms so they know their rights.
+
+ We protect your rights with a two-step method: (1) we copyright the
+library, and (2) we offer you this license, which gives you legal
+permission to copy, distribute and/or modify the library.
+
+ To protect each distributor, we want to make it very clear that
+there is no warranty for the free library. Also, if the library is
+modified by someone else and passed on, the recipients should know
+that what they have is not the original version, so that the original
+author's reputation will not be affected by problems that might be
+introduced by others.
+
+ Finally, software patents pose a constant threat to the existence of
+any free program. We wish to make sure that a company cannot
+effectively restrict the users of a free program by obtaining a
+restrictive license from a patent holder. Therefore, we insist that
+any patent license obtained for a version of the library must be
+consistent with the full freedom of use specified in this license.
+
+ Most GNU software, including some libraries, is covered by the
+ordinary GNU General Public License. This license, the GNU Lesser
+General Public License, applies to certain designated libraries, and
+is quite different from the ordinary General Public License. We use
+this license for certain libraries in order to permit linking those
+libraries into non-free programs.
+
+ When a program is linked with a library, whether statically or using
+a shared library, the combination of the two is legally speaking a
+combined work, a derivative of the original library. The ordinary
+General Public License therefore permits such linking only if the
+entire combination fits its criteria of freedom. The Lesser General
+Public License permits more lax criteria for linking other code with
+the library.
+
+ We call this license the "Lesser" General Public License because it
+does Less to protect the user's freedom than the ordinary General
+Public License. It also provides other free software developers Less
+of an advantage over competing non-free programs. These disadvantages
+are the reason we use the ordinary General Public License for many
+libraries. However, the Lesser license provides advantages in certain
+special circumstances.
+
+ For example, on rare occasions, there may be a special need to
+encourage the widest possible use of a certain library, so that it becomes
+a de-facto standard. To achieve this, non-free programs must be
+allowed to use the library. A more frequent case is that a free
+library does the same job as widely used non-free libraries. In this
+case, there is little to gain by limiting the free library to free
+software only, so we use the Lesser General Public License.
+
+ In other cases, permission to use a particular library in non-free
+programs enables a greater number of people to use a large body of
+free software. For example, permission to use the GNU C Library in
+non-free programs enables many more people to use the whole GNU
+operating system, as well as its variant, the GNU/Linux operating
+system.
+
+ Although the Lesser General Public License is Less protective of the
+users' freedom, it does ensure that the user of a program that is
+linked with the Library has the freedom and the wherewithal to run
+that program using a modified version of the Library.
+
+ The precise terms and conditions for copying, distribution and
+modification follow. Pay close attention to the difference between a
+"work based on the library" and a "work that uses the library". The
+former contains code derived from the library, whereas the latter must
+be combined with the library in order to run.
+
+ GNU LESSER GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License Agreement applies to any software library or other
+program which contains a notice placed by the copyright holder or
+other authorized party saying it may be distributed under the terms of
+this Lesser General Public License (also called "this License").
+Each licensee is addressed as "you".
+
+ A "library" means a collection of software functions and/or data
+prepared so as to be conveniently linked with application programs
+(which use some of those functions and data) to form executables.
+
+ The "Library", below, refers to any such software library or work
+which has been distributed under these terms. A "work based on the
+Library" means either the Library or any derivative work under
+copyright law: that is to say, a work containing the Library or a
+portion of it, either verbatim or with modifications and/or translated
+straightforwardly into another language. (Hereinafter, translation is
+included without limitation in the term "modification".)
+
+ "Source code" for a work means the preferred form of the work for
+making modifications to it. For a library, complete source code means
+all the source code for all modules it contains, plus any associated
+interface definition files, plus the scripts used to control compilation
+and installation of the library.
+
+ Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running a program using the Library is not restricted, and output from
+such a program is covered only if its contents constitute a work based
+on the Library (independent of the use of the Library in a tool for
+writing it). Whether that is true depends on what the Library does
+and what the program that uses the Library does.
+
+ 1. You may copy and distribute verbatim copies of the Library's
+complete source code as you receive it, in any medium, provided that
+you conspicuously and appropriately publish on each copy an
+appropriate copyright notice and disclaimer of warranty; keep intact
+all the notices that refer to this License and to the absence of any
+warranty; and distribute a copy of this License along with the
+Library.
+
+ You may charge a fee for the physical act of transferring a copy,
+and you may at your option offer warranty protection in exchange for a
+fee.
+
+ 2. You may modify your copy or copies of the Library or any portion
+of it, thus forming a work based on the Library, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) The modified work must itself be a software library.
+
+ b) You must cause the files modified to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ c) You must cause the whole of the work to be licensed at no
+ charge to all third parties under the terms of this License.
+
+ d) If a facility in the modified Library refers to a function or a
+ table of data to be supplied by an application program that uses
+ the facility, other than as an argument passed when the facility
+ is invoked, then you must make a good faith effort to ensure that,
+ in the event an application does not supply such function or
+ table, the facility still operates, and performs whatever part of
+ its purpose remains meaningful.
+
+ (For example, a function in a library to compute square roots has
+ a purpose that is entirely well-defined independent of the
+ application. Therefore, Subsection 2d requires that any
+ application-supplied function or table used by this function must
+ be optional: if the application does not supply it, the square
+ root function must still compute square roots.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Library,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Library, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote
+it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Library.
+
+In addition, mere aggregation of another work not based on the Library
+with the Library (or with a work based on the Library) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may opt to apply the terms of the ordinary GNU General Public
+License instead of this License to a given copy of the Library. To do
+this, you must alter all the notices that refer to this License, so
+that they refer to the ordinary GNU General Public License, version 2,
+instead of to this License. (If a newer version than version 2 of the
+ordinary GNU General Public License has appeared, then you can specify
+that version instead if you wish.) Do not make any other change in
+these notices.
+
+ Once this change is made in a given copy, it is irreversible for
+that copy, so the ordinary GNU General Public License applies to all
+subsequent copies and derivative works made from that copy.
+
+ This option is useful when you wish to copy part of the code of
+the Library into a program that is not a library.
+
+ 4. You may copy and distribute the Library (or a portion or
+derivative of it, under Section 2) in object code or executable form
+under the terms of Sections 1 and 2 above provided that you accompany
+it with the complete corresponding machine-readable source code, which
+must be distributed under the terms of Sections 1 and 2 above on a
+medium customarily used for software interchange.
+
+ If distribution of object code is made by offering access to copy
+from a designated place, then offering equivalent access to copy the
+source code from the same place satisfies the requirement to
+distribute the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 5. A program that contains no derivative of any portion of the
+Library, but is designed to work with the Library by being compiled or
+linked with it, is called a "work that uses the Library". Such a
+work, in isolation, is not a derivative work of the Library, and
+therefore falls outside the scope of this License.
+
+ However, linking a "work that uses the Library" with the Library
+creates an executable that is a derivative of the Library (because it
+contains portions of the Library), rather than a "work that uses the
+library". The executable is therefore covered by this License.
+Section 6 states terms for distribution of such executables.
+
+ When a "work that uses the Library" uses material from a header file
+that is part of the Library, the object code for the work may be a
+derivative work of the Library even though the source code is not.
+Whether this is true is especially significant if the work can be
+linked without the Library, or if the work is itself a library. The
+threshold for this to be true is not precisely defined by law.
+
+ If such an object file uses only numerical parameters, data
+structure layouts and accessors, and small macros and small inline
+functions (ten lines or less in length), then the use of the object
+file is unrestricted, regardless of whether it is legally a derivative
+work. (Executables containing this object code plus portions of the
+Library will still fall under Section 6.)
+
+ Otherwise, if the work is a derivative of the Library, you may
+distribute the object code for the work under the terms of Section 6.
+Any executables containing that work also fall under Section 6,
+whether or not they are linked directly with the Library itself.
+
+ 6. As an exception to the Sections above, you may also combine or
+link a "work that uses the Library" with the Library to produce a
+work containing portions of the Library, and distribute that work
+under terms of your choice, provided that the terms permit
+modification of the work for the customer's own use and reverse
+engineering for debugging such modifications.
+
+ You must give prominent notice with each copy of the work that the
+Library is used in it and that the Library and its use are covered by
+this License. You must supply a copy of this License. If the work
+during execution displays copyright notices, you must include the
+copyright notice for the Library among them, as well as a reference
+directing the user to the copy of this License. Also, you must do one
+of these things:
+
+ a) Accompany the work with the complete corresponding
+ machine-readable source code for the Library including whatever
+ changes were used in the work (which must be distributed under
+ Sections 1 and 2 above); and, if the work is an executable linked
+ with the Library, with the complete machine-readable "work that
+ uses the Library", as object code and/or source code, so that the
+ user can modify the Library and then relink to produce a modified
+ executable containing the modified Library. (It is understood
+ that the user who changes the contents of definitions files in the
+ Library will not necessarily be able to recompile the application
+ to use the modified definitions.)
+
+ b) Use a suitable shared library mechanism for linking with the
+ Library. A suitable mechanism is one that (1) uses at run time a
+ copy of the library already present on the user's computer system,
+ rather than copying library functions into the executable, and (2)
+ will operate properly with a modified version of the library, if
+ the user installs one, as long as the modified version is
+ interface-compatible with the version that the work was made with.
+
+ c) Accompany the work with a written offer, valid for at
+ least three years, to give the same user the materials
+ specified in Subsection 6a, above, for a charge no more
+ than the cost of performing this distribution.
+
+ d) If distribution of the work is made by offering access to copy
+ from a designated place, offer equivalent access to copy the above
+ specified materials from the same place.
+
+ e) Verify that the user has already received a copy of these
+ materials or that you have already sent this user a copy.
+
+ For an executable, the required form of the "work that uses the
+Library" must include any data and utility programs needed for
+reproducing the executable from it. However, as a special exception,
+the materials to be distributed need not include anything that is
+normally distributed (in either source or binary form) with the major
+components (compiler, kernel, and so on) of the operating system on
+which the executable runs, unless that component itself accompanies
+the executable.
+
+ It may happen that this requirement contradicts the license
+restrictions of other proprietary libraries that do not normally
+accompany the operating system. Such a contradiction means you cannot
+use both them and the Library together in an executable that you
+distribute.
+
+ 7. You may place library facilities that are a work based on the
+Library side-by-side in a single library together with other library
+facilities not covered by this License, and distribute such a combined
+library, provided that the separate distribution of the work based on
+the Library and of the other library facilities is otherwise
+permitted, and provided that you do these two things:
+
+ a) Accompany the combined library with a copy of the same work
+ based on the Library, uncombined with any other library
+ facilities. This must be distributed under the terms of the
+ Sections above.
+
+ b) Give prominent notice with the combined library of the fact
+ that part of it is a work based on the Library, and explaining
+ where to find the accompanying uncombined form of the same work.
+
+ 8. You may not copy, modify, sublicense, link with, or distribute
+the Library except as expressly provided under this License. Any
+attempt otherwise to copy, modify, sublicense, link with, or
+distribute the Library is void, and will automatically terminate your
+rights under this License. However, parties who have received copies,
+or rights, from you under this License will not have their licenses
+terminated so long as such parties remain in full compliance.
+
+ 9. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Library or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Library (or any work based on the
+Library), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Library or works based on it.
+
+ 10. Each time you redistribute the Library (or any work based on the
+Library), the recipient automatically receives a license from the
+original licensor to copy, distribute, link with or modify the Library
+subject to these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties with
+this License.
+
+ 11. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Library at all. For example, if a patent
+license would not permit royalty-free redistribution of the Library by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Library.
+
+If any portion of this section is held invalid or unenforceable under any
+particular circumstance, the balance of the section is intended to apply,
+and the section as a whole is intended to apply in other circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 12. If the distribution and/or use of the Library is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Library under this License may add
+an explicit geographical distribution limitation excluding those countries,
+so that distribution is permitted only in or among countries not thus
+excluded. In such case, this License incorporates the limitation as if
+written in the body of this License.
+
+ 13. The Free Software Foundation may publish revised and/or new
+versions of the Lesser General Public License from time to time.
+Such new versions will be similar in spirit to the present version,
+but may differ in detail to address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Library
+specifies a version number of this License which applies to it and
+"any later version", you have the option of following the terms and
+conditions either of that version or of any later version published by
+the Free Software Foundation. If the Library does not specify a
+license version number, you may choose any version ever published by
+the Free Software Foundation.
+
+ 14. If you wish to incorporate parts of the Library into other free
+programs whose distribution conditions are incompatible with these,
+write to the author to ask for permission. For software which is
+copyrighted by the Free Software Foundation, write to the Free
+Software Foundation; we sometimes make exceptions for this. Our
+decision will be guided by the two goals of preserving the free status
+of all derivatives of our free software and of promoting the sharing
+and reuse of software generally.
+
+ NO WARRANTY
+
+ 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
+WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
+EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
+OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
+KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
+LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
+THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+ 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
+WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
+AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
+FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
+CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
+LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
+RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
+FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
+SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
+DAMAGES.
+
+ END OF TERMS AND CONDITIONS
diff --git a/modules/snoopy/ChangeLog b/modules/snoopy/ChangeLog
new file mode 100644
index 00000000..f4557613
--- /dev/null
+++ b/modules/snoopy/ChangeLog
@@ -0,0 +1,105 @@
+Version 1.2.4
+-------------
+
+ - fix command line escapement vulnerability with execution of curl binary on https fetches (mohrt)
+
+Version 1.2.3
+-----------
+ - updated the version variable in the code to reflect the new version number
+ - fixed a typo that I introduced in 1.2.2 (the first character of the file is a "z" (gene_wood, Marc Desrousseaux, Jan Pedersen)
+ - fixed BUG # 1328793 : fetch is case sensetive when it comes to the scheme (http / https) (gene_wood)
+
+Version 1.2.2
+-----------
+ - incorporated PATCH # 985470 : pass port information in http 1.1 Host header ( http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23 ) (gene_wood)
+ - fixed BUG # 1110049 : redirect is case sensitive
+ - fixed bug in security bugfix from 1.2.1 (gene_wood, kellan, zaruba)
+
+Version 1.2.1
+-----------
+ - fixed potential security issue with unchecked variables being passed to exec (for https with curl) (gene_wood)
+ - fixed BUG # 1086830 : submitlinks,fetchlinks and submittext expandlinks with the URI of the original page not the refreshed page (gene_wood)
+ - fixed BUG # 1077870 : Snoopy can't deal with multiple spaces in a refresh tag (gene_wood)
+ - fixed BUG # 864047 : Root relative links are treated as relative (gene_wood)
+ - fixed BUG # 1097134 : Undefined URI_PARTS["path"] generates Notice (gene_wood)
+
+Version 1.2
+-----------
+ - fixed BUG # 1014823 : Meta redirect regex inaccurate (gene_wood)
+ - fixed BUG # 999079 : Trailing slashes not removed in uri passed to fetchlinks (gene_wood)
+ - fixed BUG # 642958 and 912060 : $URI_PARTS["query"] causing undefined variable notices (gene_wood)
+ - fixed BUG # 626849 : cURL security risk (Tajh Leitso, gene_wood)
+ - fixed BUG # 626849 : Corrects the redirect function under the submit functions (Tajh Leitso, gene_wood)
+ - fixed BUG # 912060 : Undefined variable: postdata (gene_wood)
+ - fixed BUG # 858526 : win32 tmp/$headerfile create error (gene_wood)
+ - fixed BUG # 929682 : Called undefined function is_executable() on line 194. (gene_wood)
+ - fixed BUG # 859711 : typo: http://snoopy.sourceforge.com (gene_wood)
+ - fixed BUG # 852993 : double urlencoding breaks redirect (gene_wood)
+ - added proxy user/pass support (Robert Zwink, Monte)
+ - fixed post data array problem (stefan, Monte)
+
+Version 1.01
+-----------
+ - fixed problem with PHP 4.3.2 and fread() (Monte)
+
+Version 1.0
+-----------
+ - added textarea to stripform functionality (Monte)
+ - fixed multiple cookie setting problem (Monte)
+ - fixed problem where extra text inside <frame src (Monte)
+ - fixed problem where extra text inside <a href (Monte)
+ - removed http request header from curl fetched
+ documents, not needed (Monte)
+ - added carriage return to newlines on headers (Monte)
+ - fixed bug with curl, removed single quotes
+ - fixed bug with curl and "&" in the URL
+ - added ability to post files. (Andrei)
+
+Version 0.94
+------------
+ - Added fetchform() function
+ - Fixed misc issues with frames
+ - Added SSL support via cURL
+ - fixed bug with posting arrays of data
+ - added status variable for http status
+
+Version 0.93
+------------
+ - fixed bug with hostname match in a redirect location header
+ - added $lastredirectaddr variable
+
+Version 0.92
+------------
+ - fixed redirect bug with MS web server
+ - added ability to pass set cookies through redirects
+ - added ability to traverse html frames
+
+Version 0.91
+------------
+ - fixed bug with return headers being overwritten.
+ Please read the NEWS file for important notes. (Monte)
+
+Version 0.9
+-----------
+ - added support for read timeouts (Andrei)
+ - standardized distribution (Andrei)
+
+Version 0.1e
+------------
+ - fixed bug in fetchlinks logic (Monte)
+
+Version 0.1d
+------------
+ - fixed redirect bug without fully qualified url (Monte)
+
+Version 0.1c
+------------
+ - fixed bug on submitting formvars after a redirect (Monte)
+
+Version 0.1b
+------------
+ - fixed bug to allow empty post vars on a submit (Monte)
+
+Version 0.1
+------------
+ - initial release (Monte)
diff --git a/modules/snoopy/FAQ b/modules/snoopy/FAQ
new file mode 100644
index 00000000..89c251ae
--- /dev/null
+++ b/modules/snoopy/FAQ
@@ -0,0 +1,14 @@
+Q: Why can't I fetch https pages?
+A: Using Snoopy to fetch an https page requires curl. Check if curl is installed on your host. If curl is installed, it may be located in a different place than the default. By default Snoopy looks for curl in /usr/local/bin/curl. Run 'which curl' and find out your location. If it differs from the default, then you'll need to set the $snoopy->curl_path variable to the location of your curl installation. Here's an example of the code :
+ include "Snoopy.class.php";
+ $snoopy = new Snoopy;
+ $snoopy->curl_path="/usr/bin/curl";
+
+Q: where does the function preg_match_all come from?
+A: PCRE functions in PHP 3.0.9 and later
+
+Q: I get the error: Warning: Wrong parameter count for fsockopen()
+A: Upgrade your verion of PHP to 3.0.9 or later
+
+Q: Snoopy cuts of my results every time. What's wrong?
+A: Upgrade your verion of PHP to 3.0.9 or later
diff --git a/modules/snoopy/NEWS b/modules/snoopy/NEWS
new file mode 100644
index 00000000..a2ae3d9b
--- /dev/null
+++ b/modules/snoopy/NEWS
@@ -0,0 +1,61 @@
+RELEASE NOTE: v1.2.4
+October 22, 2008
+
+https fetches were not properly escaping shell args for curl binary execution. This is fixed.
+
+RELEASE NOTE: v1.2.3
+November 7, 2005
+
+A typo was introduced in 1.2.2 which broke the whole release. This has been fixed.
+A couple small fixes have been implemented also.
+
+RELEASE NOTE: v1.2.2
+October 30, 2005
+
+Fixed a bug with the bugfix for the security hole.
+
+RELEASE NOTE: v1.2.1
+October 24, 2005
+
+Fixed a few outstanding bugs and a potential security hole.
+
+RELEASE NOTE: v1.2
+November 17, 2004
+
+Fixed a number of outstanding bugs.
+
+RELEASE NOTE: v1.01
+
+PHP fixed a bug with fread() which consequently broke the way Snoopy called it. This has been fixed.
+Renamed Snoopy.class.inc to Snoopy.class.php for proper file extention.
+
+RELEASE NOTE: v1.0
+
+Added fetchform() function for fetching form elements from an html page.
+For SSL support, you must have cURL installed. see http://curl.haxx.se
+for details. Snoopy does not use the cURL library fuctions within PHP,
+as these are not stable as of this Snoopy release.
+Fixed bug with posting arrays of data.
+Added status variable to track http status.
+Several other bug fixes, see Changelog.
+RELEASE NOTE: v0.93
+
+A bug was fixed with redirection headers not containing the hostname, doubling up the redirection location URL.
+
+There is also a new variable, $lastredirectaddr that contains the last redirection URL.
+
+RELEASE NOTE: v0.92
+March 9, 2000
+
+A bug was fixed with redirection on MS web servers. Also, cookies are now passed through redirects.
+
+This release also adds the ability to traverse html framed pages. Just set $maxframes to the recursion depth you want to allow, and results are returned in $this->results as an array. See the README for an example.
+
+-Monte
+
+RELEASE NOTE: v0.91
+February 22, 2000
+
+In previous versions of Snoopy, $this->header was an array containing key/value pairs of headers returned from fetched content, not including HTTP and GET headers. If a key value was the same, the old value was overwritten (Two Set-Cookie: headers for example). This was overcome by making $this->header a simple array containing every header returned. Therefore, it will now be up to the programmer to split these headers into key/value pairs if so desired.
+
+-Monte
diff --git a/modules/snoopy/Snoopy.class.php b/modules/snoopy/Snoopy.class.php
new file mode 100644
index 00000000..53116105
--- /dev/null
+++ b/modules/snoopy/Snoopy.class.php
@@ -0,0 +1,1250 @@
+<?php
+
+/*************************************************
+
+Snoopy - the PHP net client
+Author: Monte Ohrt <monte@ispi.net>
+Copyright (c): 1999-2008 New Digital Group, all rights reserved
+Version: 1.2.4
+
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+You may contact the author of Snoopy by e-mail at:
+monte@ohrt.com
+
+The latest version of Snoopy can be obtained from:
+http://snoopy.sourceforge.net/
+
+*************************************************/
+
+class Snoopy
+{
+ /**** Public variables ****/
+
+ /* user definable vars */
+
+ var $host = "www.php.net"; // host name we are connecting to
+ var $port = 80; // port we are connecting to
+ var $proxy_host = ""; // proxy host to use
+ var $proxy_port = ""; // proxy port to use
+ var $proxy_user = ""; // proxy user to use
+ var $proxy_pass = ""; // proxy password to use
+
+ var $agent = "Snoopy v1.2.4"; // agent we masquerade as
+ var $referer = ""; // referer info to pass
+ var $cookies = array(); // array of cookies to pass
+ // $cookies["username"]="joe";
+ var $rawheaders = array(); // array of raw headers to send
+ // $rawheaders["Content-type"]="text/html";
+
+ var $maxredirs = 5; // http redirection depth maximum. 0 = disallow
+ var $lastredirectaddr = ""; // contains address of last redirected address
+ var $offsiteok = true; // allows redirection off-site
+ var $maxframes = 0; // frame content depth maximum. 0 = disallow
+ var $expandlinks = true; // expand links to fully qualified URLs.
+ // this only applies to fetchlinks()
+ // submitlinks(), and submittext()
+ var $passcookies = true; // pass set cookies back through redirects
+ // NOTE: this currently does not respect
+ // dates, domains or paths.
+
+ var $user = ""; // user for http authentication
+ var $pass = ""; // password for http authentication
+
+ // http accept types
+ var $accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*";
+
+ var $results = ""; // where the content is put
+
+ var $error = ""; // error messages sent here
+ var $response_code = ""; // response code returned from server
+ var $headers = array(); // headers returned from server sent here
+ var $maxlength = 500000; // max return data length (body)
+ var $read_timeout = 0; // timeout on read operations, in seconds
+ // supported only since PHP 4 Beta 4
+ // set to 0 to disallow timeouts
+ var $timed_out = false; // if a read operation timed out
+ var $status = 0; // http request status
+
+ var $temp_dir = "/tmp"; // temporary directory that the webserver
+ // has permission to write to.
+ // under Windows, this should be C:\temp
+
+ var $curl_path = "/usr/local/bin/curl";
+ // Snoopy will use cURL for fetching
+ // SSL content if a full system path to
+ // the cURL binary is supplied here.
+ // set to false if you do not have
+ // cURL installed. See http://curl.haxx.se
+ // for details on installing cURL.
+ // Snoopy does *not* use the cURL
+ // library functions built into php,
+ // as these functions are not stable
+ // as of this Snoopy release.
+
+ /**** Private variables ****/
+
+ var $_maxlinelen = 4096; // max line length (headers)
+
+ var $_httpmethod = "GET"; // default http request method
+ var $_httpversion = "HTTP/1.0"; // default http request version
+ var $_submit_method = "POST"; // default submit method
+ var $_submit_type = "application/x-www-form-urlencoded"; // default submit type
+ var $_mime_boundary = ""; // MIME boundary for multipart/form-data submit type
+ var $_redirectaddr = false; // will be set if page fetched is a redirect
+ var $_redirectdepth = 0; // increments on an http redirect
+ var $_frameurls = array(); // frame src urls
+ var $_framedepth = 0; // increments on frame depth
+
+ var $_isproxy = false; // set if using a proxy server
+ var $_fp_timeout = 30; // timeout for socket connection
+
+/*======================================================================*\
+ Function: fetch
+ Purpose: fetch the contents of a web page
+ (and possibly other protocols in the
+ future like ftp, nntp, gopher, etc.)
+ Input: $URI the location of the page to fetch
+ Output: $this->results the output text from the fetch
+\*======================================================================*/
+
+ function fetch($URI)
+ {
+
+ //preg_match("|^([^:]+)://([^:/]+)(:[\d]+)*(.*)|",$URI,$URI_PARTS);
+ $URI_PARTS = parse_url($URI);
+ if (!empty($URI_PARTS["user"]))
+ $this->user = $URI_PARTS["user"];
+ if (!empty($URI_PARTS["pass"]))
+ $this->pass = $URI_PARTS["pass"];
+ if (empty($URI_PARTS["query"]))
+ $URI_PARTS["query"] = '';
+ if (empty($URI_PARTS["path"]))
+ $URI_PARTS["path"] = '';
+
+ switch(strtolower($URI_PARTS["scheme"]))
+ {
+ case "http":
+ $this->host = $URI_PARTS["host"];
+ if(!empty($URI_PARTS["port"]))
+ $this->port = $URI_PARTS["port"];
+ if($this->_connect($fp))
+ {
+ if($this->_isproxy)
+ {
+ // using proxy, send entire URI
+ $this->_httprequest($URI,$fp,$URI,$this->_httpmethod);
+ }
+ else
+ {
+ $path = $URI_PARTS["path"].($URI_PARTS["query"] ? "?".$URI_PARTS["query"] : "");
+ // no proxy, send only the path
+ $this->_httprequest($path, $fp, $URI, $this->_httpmethod);
+ }
+
+ $this->_disconnect($fp);
+
+ if($this->_redirectaddr)
+ {
+ /* url was redirected, check if we've hit the max depth */
+ if($this->maxredirs > $this->_redirectdepth)
+ {
+ // only follow redirect if it's on this site, or offsiteok is true
+ if(preg_match("|^http://".preg_quote($this->host)."|i",$this->_redirectaddr) || $this->offsiteok)
+ {
+ /* follow the redirect */
+ $this->_redirectdepth++;
+ $this->lastredirectaddr=$this->_redirectaddr;
+ $this->fetch($this->_redirectaddr);
+ }
+ }
+ }
+
+ if($this->_framedepth < $this->maxframes && count($this->_frameurls) > 0)
+ {
+ $frameurls = $this->_frameurls;
+ $this->_frameurls = array();
+
+ while(list(,$frameurl) = each($frameurls))
+ {
+ if($this->_framedepth < $this->maxframes)
+ {
+ $this->fetch($frameurl);
+ $this->_framedepth++;
+ }
+ else
+ break;
+ }
+ }
+ }
+ else
+ {
+ return false;
+ }
+ return true;
+ break;
+ case "https":
+ if(!$this->curl_path)
+ return false;
+ if(function_exists("is_executable"))
+ if (!is_executable($this->curl_path))
+ return false;
+ $this->host = $URI_PARTS["host"];
+ if(!empty($URI_PARTS["port"]))
+ $this->port = $URI_PARTS["port"];
+ if($this->_isproxy)
+ {
+ // using proxy, send entire URI
+ $this->_httpsrequest($URI,$URI,$this->_httpmethod);
+ }
+ else
+ {
+ $path = $URI_PARTS["path"].($URI_PARTS["query"] ? "?".$URI_PARTS["query"] : "");
+ // no proxy, send only the path
+ $this->_httpsrequest($path, $URI, $this->_httpmethod);
+ }
+
+ if($this->_redirectaddr)
+ {
+ /* url was redirected, check if we've hit the max depth */
+ if($this->maxredirs > $this->_redirectdepth)
+ {
+ // only follow redirect if it's on this site, or offsiteok is true
+ if(preg_match("|^http://".preg_quote($this->host)."|i",$this->_redirectaddr) || $this->offsiteok)
+ {
+ /* follow the redirect */
+ $this->_redirectdepth++;
+ $this->lastredirectaddr=$this->_redirectaddr;
+ $this->fetch($this->_redirectaddr);
+ }
+ }
+ }
+
+ if($this->_framedepth < $this->maxframes && count($this->_frameurls) > 0)
+ {
+ $frameurls = $this->_frameurls;
+ $this->_frameurls = array();
+
+ while(list(,$frameurl) = each($frameurls))
+ {
+ if($this->_framedepth < $this->maxframes)
+ {
+ $this->fetch($frameurl);
+ $this->_framedepth++;
+ }
+ else
+ break;
+ }
+ }
+ return true;
+ break;
+ default:
+ // not a valid protocol
+ $this->error = 'Invalid protocol "'.$URI_PARTS["scheme"].'"\n';
+ return false;
+ break;
+ }
+ return true;
+ }
+
+/*======================================================================*\
+ Function: submit
+ Purpose: submit an http form
+ Input: $URI the location to post the data
+ $formvars the formvars to use.
+ format: $formvars["var"] = "val";
+ $formfiles an array of files to submit
+ format: $formfiles["var"] = "/dir/filename.ext";
+ Output: $this->results the text output from the post
+\*======================================================================*/
+
+ function submit($URI, $formvars="", $formfiles="")
+ {
+ unset($postdata);
+
+ $postdata = $this->_prepare_post_body($formvars, $formfiles);
+
+ $URI_PARTS = parse_url($URI);
+ if (!empty($URI_PARTS["user"]))
+ $this->user = $URI_PARTS["user"];
+ if (!empty($URI_PARTS["pass"]))
+ $this->pass = $URI_PARTS["pass"];
+ if (empty($URI_PARTS["query"]))
+ $URI_PARTS["query"] = '';
+ if (empty($URI_PARTS["path"]))
+ $URI_PARTS["path"] = '';
+
+ switch(strtolower($URI_PARTS["scheme"]))
+ {
+ case "http":
+ $this->host = $URI_PARTS["host"];
+ if(!empty($URI_PARTS["port"]))
+ $this->port = $URI_PARTS["port"];
+ if($this->_connect($fp))
+ {
+ if($this->_isproxy)
+ {
+ // using proxy, send entire URI
+ $this->_httprequest($URI,$fp,$URI,$this->_submit_method,$this->_submit_type,$postdata);
+ }
+ else
+ {
+ $path = $URI_PARTS["path"].($URI_PARTS["query"] ? "?".$URI_PARTS["query"] : "");
+ // no proxy, send only the path
+ $this->_httprequest($path, $fp, $URI, $this->_submit_method, $this->_submit_type, $postdata);
+ }
+
+ $this->_disconnect($fp);
+
+ if($this->_redirectaddr)
+ {
+ /* url was redirected, check if we've hit the max depth */
+ if($this->maxredirs > $this->_redirectdepth)
+ {
+ if(!preg_match("|^".$URI_PARTS["scheme"]."://|", $this->_redirectaddr))
+ $this->_redirectaddr = $this->_expandlinks($this->_redirectaddr,$URI_PARTS["scheme"]."://".$URI_PARTS["host"]);
+
+ // only follow redirect if it's on this site, or offsiteok is true
+ if(preg_match("|^http://".preg_quote($this->host)."|i",$this->_redirectaddr) || $this->offsiteok)
+ {
+ /* follow the redirect */
+ $this->_redirectdepth++;
+ $this->lastredirectaddr=$this->_redirectaddr;
+ if( strpos( $this->_redirectaddr, "?" ) > 0 )
+ $this->fetch($this->_redirectaddr); // the redirect has changed the request method from post to get
+ else
+ $this->submit($this->_redirectaddr,$formvars, $formfiles);
+ }
+ }
+ }
+
+ if($this->_framedepth < $this->maxframes && count($this->_frameurls) > 0)
+ {
+ $frameurls = $this->_frameurls;
+ $this->_frameurls = array();
+
+ while(list(,$frameurl) = each($frameurls))
+ {
+ if($this->_framedepth < $this->maxframes)
+ {
+ $this->fetch($frameurl);
+ $this->_framedepth++;
+ }
+ else
+ break;
+ }
+ }
+
+ }
+ else
+ {
+ return false;
+ }
+ return true;
+ break;
+ case "https":
+ if(!$this->curl_path)
+ return false;
+ if(function_exists("is_executable"))
+ if (!is_executable($this->curl_path))
+ return false;
+ $this->host = $URI_PARTS["host"];
+ if(!empty($URI_PARTS["port"]))
+ $this->port = $URI_PARTS["port"];
+ if($this->_isproxy)
+ {
+ // using proxy, send entire URI
+ $this->_httpsrequest($URI, $URI, $this->_submit_method, $this->_submit_type, $postdata);
+ }
+ else
+ {
+ $path = $URI_PARTS["path"].($URI_PARTS["query"] ? "?".$URI_PARTS["query"] : "");
+ // no proxy, send only the path
+ $this->_httpsrequest($path, $URI, $this->_submit_method, $this->_submit_type, $postdata);
+ }
+
+ if($this->_redirectaddr)
+ {
+ /* url was redirected, check if we've hit the max depth */
+ if($this->maxredirs > $this->_redirectdepth)
+ {
+ if(!preg_match("|^".$URI_PARTS["scheme"]."://|", $this->_redirectaddr))
+ $this->_redirectaddr = $this->_expandlinks($this->_redirectaddr,$URI_PARTS["scheme"]."://".$URI_PARTS["host"]);
+
+ // only follow redirect if it's on this site, or offsiteok is true
+ if(preg_match("|^http://".preg_quote($this->host)."|i",$this->_redirectaddr) || $this->offsiteok)
+ {
+ /* follow the redirect */
+ $this->_redirectdepth++;
+ $this->lastredirectaddr=$this->_redirectaddr;
+ if( strpos( $this->_redirectaddr, "?" ) > 0 )
+ $this->fetch($this->_redirectaddr); // the redirect has changed the request method from post to get
+ else
+ $this->submit($this->_redirectaddr,$formvars, $formfiles);
+ }
+ }
+ }
+
+ if($this->_framedepth < $this->maxframes && count($this->_frameurls) > 0)
+ {
+ $frameurls = $this->_frameurls;
+ $this->_frameurls = array();
+
+ while(list(,$frameurl) = each($frameurls))
+ {
+ if($this->_framedepth < $this->maxframes)
+ {
+ $this->fetch($frameurl);
+ $this->_framedepth++;
+ }
+ else
+ break;
+ }
+ }
+ return true;
+ break;
+
+ default:
+ // not a valid protocol
+ $this->error = 'Invalid protocol "'.$URI_PARTS["scheme"].'"\n';
+ return false;
+ break;
+ }
+ return true;
+ }
+
+/*======================================================================*\
+ Function: fetchlinks
+ Purpose: fetch the links from a web page
+ Input: $URI where you are fetching from
+ Output: $this->results an array of the URLs
+\*======================================================================*/
+
+ function fetchlinks($URI)
+ {
+ if ($this->fetch($URI))
+ {
+ if($this->lastredirectaddr)
+ $URI = $this->lastredirectaddr;
+ if(is_array($this->results))
+ {
+ for($x=0;$x<count($this->results);$x++)
+ $this->results[$x] = $this->_striplinks($this->results[$x]);
+ }
+ else
+ $this->results = $this->_striplinks($this->results);
+
+ if($this->expandlinks)
+ $this->results = $this->_expandlinks($this->results, $URI);
+ return true;
+ }
+ else
+ return false;
+ }
+
+/*======================================================================*\
+ Function: fetchform
+ Purpose: fetch the form elements from a web page
+ Input: $URI where you are fetching from
+ Output: $this->results the resulting html form
+\*======================================================================*/
+
+ function fetchform($URI)
+ {
+
+ if ($this->fetch($URI))
+ {
+
+ if(is_array($this->results))
+ {
+ for($x=0;$x<count($this->results);$x++)
+ $this->results[$x] = $this->_stripform($this->results[$x]);
+ }
+ else
+ $this->results = $this->_stripform($this->results);
+
+ return true;
+ }
+ else
+ return false;
+ }
+
+
+/*======================================================================*\
+ Function: fetchtext
+ Purpose: fetch the text from a web page, stripping the links
+ Input: $URI where you are fetching from
+ Output: $this->results the text from the web page
+\*======================================================================*/
+
+ function fetchtext($URI)
+ {
+ if($this->fetch($URI))
+ {
+ if(is_array($this->results))
+ {
+ for($x=0;$x<count($this->results);$x++)
+ $this->results[$x] = $this->_striptext($this->results[$x]);
+ }
+ else
+ $this->results = $this->_striptext($this->results);
+ return true;
+ }
+ else
+ return false;
+ }
+
+/*======================================================================*\
+ Function: submitlinks
+ Purpose: grab links from a form submission
+ Input: $URI where you are submitting from
+ Output: $this->results an array of the links from the post
+\*======================================================================*/
+
+ function submitlinks($URI, $formvars="", $formfiles="")
+ {
+ if($this->submit($URI,$formvars, $formfiles))
+ {
+ if($this->lastredirectaddr)
+ $URI = $this->lastredirectaddr;
+ if(is_array($this->results))
+ {
+ for($x=0;$x<count($this->results);$x++)
+ {
+ $this->results[$x] = $this->_striplinks($this->results[$x]);
+ if($this->expandlinks)
+ $this->results[$x] = $this->_expandlinks($this->results[$x],$URI);
+ }
+ }
+ else
+ {
+ $this->results = $this->_striplinks($this->results);
+ if($this->expandlinks)
+ $this->results = $this->_expandlinks($this->results,$URI);
+ }
+ return true;
+ }
+ else
+ return false;
+ }
+
+/*======================================================================*\
+ Function: submittext
+ Purpose: grab text from a form submission
+ Input: $URI where you are submitting from
+ Output: $this->results the text from the web page
+\*======================================================================*/
+
+ function submittext($URI, $formvars = "", $formfiles = "")
+ {
+ if($this->submit($URI,$formvars, $formfiles))
+ {
+ if($this->lastredirectaddr)
+ $URI = $this->lastredirectaddr;
+ if(is_array($this->results))
+ {
+ for($x=0;$x<count($this->results);$x++)
+ {
+ $this->results[$x] = $this->_striptext($this->results[$x]);
+ if($this->expandlinks)
+ $this->results[$x] = $this->_expandlinks($this->results[$x],$URI);
+ }
+ }
+ else
+ {
+ $this->results = $this->_striptext($this->results);
+ if($this->expandlinks)
+ $this->results = $this->_expandlinks($this->results,$URI);
+ }
+ return true;
+ }
+ else
+ return false;
+ }
+
+
+
+/*======================================================================*\
+ Function: set_submit_multipart
+ Purpose: Set the form submission content type to
+ multipart/form-data
+\*======================================================================*/
+ function set_submit_multipart()
+ {
+ $this->_submit_type = "multipart/form-data";
+ }
+
+
+/*======================================================================*\
+ Function: set_submit_normal
+ Purpose: Set the form submission content type to
+ application/x-www-form-urlencoded
+\*======================================================================*/
+ function set_submit_normal()
+ {
+ $this->_submit_type = "application/x-www-form-urlencoded";
+ }
+
+
+
+
+/*======================================================================*\
+ Private functions
+\*======================================================================*/
+
+
+/*======================================================================*\
+ Function: _striplinks
+ Purpose: strip the hyperlinks from an html document
+ Input: $document document to strip.
+ Output: $match an array of the links
+\*======================================================================*/
+
+ function _striplinks($document)
+ {
+ preg_match_all("'<\s*a\s.*?href\s*=\s* # find <a href=
+ ([\"\'])? # find single or double quote
+ (?(1) (.*?)\\1 | ([^\s\>]+)) # if quote found, match up to next matching
+ # quote, otherwise match up to next space
+ 'isx",$document,$links);
+
+
+ // catenate the non-empty matches from the conditional subpattern
+
+ while(list($key,$val) = each($links[2]))
+ {
+ if(!empty($val))
+ $match[] = $val;
+ }
+
+ while(list($key,$val) = each($links[3]))
+ {
+ if(!empty($val))
+ $match[] = $val;
+ }
+
+ // return the links
+ return $match;
+ }
+
+/*======================================================================*\
+ Function: _stripform
+ Purpose: strip the form elements from an html document
+ Input: $document document to strip.
+ Output: $match an array of the links
+\*======================================================================*/
+
+ function _stripform($document)
+ {
+ preg_match_all("'<\/?(FORM|INPUT|SELECT|TEXTAREA|(OPTION))[^<>]*>(?(2)(.*(?=<\/?(option|select)[^<>]*>[\r\n]*)|(?=[\r\n]*))|(?=[\r\n]*))'Usi",$document,$elements);
+
+ // catenate the matches
+ $match = implode("\r\n",$elements[0]);
+
+ // return the links
+ return $match;
+ }
+
+
+
+/*======================================================================*\
+ Function: _striptext
+ Purpose: strip the text from an html document
+ Input: $document document to strip.
+ Output: $text the resulting text
+\*======================================================================*/
+
+ function _striptext($document)
+ {
+
+ // I didn't use preg eval (//e) since that is only available in PHP 4.0.
+ // so, list your entities one by one here. I included some of the
+ // more common ones.
+
+ $search = array("'<script[^>]*?>.*?</script>'si", // strip out javascript
+ "'<[\/\!]*?[^<>]*?>'si", // strip out html tags
+ "'([\r\n])[\s]+'", // strip out white space
+ "'&(quot|#34|#034|#x22);'i", // replace html entities
+ "'&(amp|#38|#038|#x26);'i", // added hexadecimal values
+ "'&(lt|#60|#060|#x3c);'i",
+ "'&(gt|#62|#062|#x3e);'i",
+ "'&(nbsp|#160|#xa0);'i",
+ "'&(iexcl|#161);'i",
+ "'&(cent|#162);'i",
+ "'&(pound|#163);'i",
+ "'&(copy|#169);'i",
+ "'&(reg|#174);'i",
+ "'&(deg|#176);'i",
+ "'&(#39|#039|#x27);'",
+ "'&(euro|#8364);'i", // europe
+ "'&a(uml|UML);'", // german
+ "'&o(uml|UML);'",
+ "'&u(uml|UML);'",
+ "'&A(uml|UML);'",
+ "'&O(uml|UML);'",
+ "'&U(uml|UML);'",
+ "'&szlig;'i",
+ );
+ $replace = array( "",
+ "",
+ "\\1",
+ "\"",
+ "&",
+ "<",
+ ">",
+ " ",
+ chr(161),
+ chr(162),
+ chr(163),
+ chr(169),
+ chr(174),
+ chr(176),
+ chr(39),
+ chr(128),
+ "ä",
+ "ö",
+ "ü",
+ "Ä",
+ "Ö",
+ "Ü",
+ "ß",
+ );
+
+ $text = preg_replace($search,$replace,$document);
+
+ return $text;
+ }
+
+/*======================================================================*\
+ Function: _expandlinks
+ Purpose: expand each link into a fully qualified URL
+ Input: $links the links to qualify
+ $URI the full URI to get the base from
+ Output: $expandedLinks the expanded links
+\*======================================================================*/
+
+ function _expandlinks($links,$URI)
+ {
+
+ preg_match("/^[^\?]+/",$URI,$match);
+
+ $match = preg_replace("|/[^\/\.]+\.[^\/\.]+$|","",$match[0]);
+ $match = preg_replace("|/$|","",$match);
+ $match_part = parse_url($match);
+ $match_root =
+ $match_part["scheme"]."://".$match_part["host"];
+
+ $search = array( "|^http://".preg_quote($this->host)."|i",
+ "|^(\/)|i",
+ "|^(?!http://)(?!mailto:)|i",
+ "|/\./|",
+ "|/[^\/]+/\.\./|"
+ );
+
+ $replace = array( "",
+ $match_root."/",
+ $match."/",
+ "/",
+ "/"
+ );
+
+ $expandedLinks = preg_replace($search,$replace,$links);
+
+ return $expandedLinks;
+ }
+
+/*======================================================================*\
+ Function: _httprequest
+ Purpose: go get the http data from the server
+ Input: $url the url to fetch
+ $fp the current open file pointer
+ $URI the full URI
+ $body body contents to send if any (POST)
+ Output:
+\*======================================================================*/
+
+ function _httprequest($url,$fp,$URI,$http_method,$content_type="",$body="")
+ {
+ $cookie_headers = '';
+ if($this->passcookies && $this->_redirectaddr)
+ $this->setcookies();
+
+ $URI_PARTS = parse_url($URI);
+ if(empty($url))
+ $url = "/";
+ $headers = $http_method." ".$url." ".$this->_httpversion."\r\n";
+ if(!empty($this->agent))
+ $headers .= "User-Agent: ".$this->agent."\r\n";
+ if(!empty($this->host) && !isset($this->rawheaders['Host'])) {
+ $headers .= "Host: ".$this->host;
+ if(!empty($this->port))
+ $headers .= ":".$this->port;
+ $headers .= "\r\n";
+ }
+ if(!empty($this->accept))
+ $headers .= "Accept: ".$this->accept."\r\n";
+ if(!empty($this->referer))
+ $headers .= "Referer: ".$this->referer."\r\n";
+ if(!empty($this->cookies))
+ {
+ if(!is_array($this->cookies))
+ $this->cookies = (array)$this->cookies;
+
+ reset($this->cookies);
+ if ( count($this->cookies) > 0 ) {
+ $cookie_headers .= 'Cookie: ';
+ foreach ( $this->cookies as $cookieKey => $cookieVal ) {
+ $cookie_headers .= $cookieKey."=".urlencode($cookieVal)."; ";
+ }
+ $headers .= substr($cookie_headers,0,-2) . "\r\n";
+ }
+ }
+ if(!empty($this->rawheaders))
+ {
+ if(!is_array($this->rawheaders))
+ $this->rawheaders = (array)$this->rawheaders;
+ while(list($headerKey,$headerVal) = each($this->rawheaders))
+ $headers .= $headerKey.": ".$headerVal."\r\n";
+ }
+ if(!empty($content_type)) {
+ $headers .= "Content-type: $content_type";
+ if ($content_type == "multipart/form-data")
+ $headers .= "; boundary=".$this->_mime_boundary;
+ $headers .= "\r\n";
+ }
+ if(!empty($body))
+ $headers .= "Content-length: ".strlen($body)."\r\n";
+ if(!empty($this->user) || !empty($this->pass))
+ $headers .= "Authorization: Basic ".base64_encode($this->user.":".$this->pass)."\r\n";
+
+ //add proxy auth headers
+ if(!empty($this->proxy_user))
+ $headers .= 'Proxy-Authorization: ' . 'Basic ' . base64_encode($this->proxy_user . ':' . $this->proxy_pass)."\r\n";
+
+
+ $headers .= "\r\n";
+
+ // set the read timeout if needed
+ if ($this->read_timeout > 0)
+ socket_set_timeout($fp, $this->read_timeout);
+ $this->timed_out = false;
+
+ fwrite($fp,$headers.$body,strlen($headers.$body));
+
+ $this->_redirectaddr = false;
+ unset($this->headers);
+
+ while($currentHeader = fgets($fp,$this->_maxlinelen))
+ {
+ if ($this->read_timeout > 0 && $this->_check_timeout($fp))
+ {
+ $this->status=-100;
+ return false;
+ }
+
+ if($currentHeader == "\r\n")
+ break;
+
+ // if a header begins with Location: or URI:, set the redirect
+ if(preg_match("/^(Location:|URI:)/i",$currentHeader))
+ {
+ // get URL portion of the redirect
+ preg_match("/^(Location:|URI:)[ ]+(.*)/i",chop($currentHeader),$matches);
+ // look for :// in the Location header to see if hostname is included
+ if(!preg_match("|\:\/\/|",$matches[2]))
+ {
+ // no host in the path, so prepend
+ $this->_redirectaddr = $URI_PARTS["scheme"]."://".$this->host.":".$this->port;
+ // eliminate double slash
+ if(!preg_match("|^/|",$matches[2]))
+ $this->_redirectaddr .= "/".$matches[2];
+ else
+ $this->_redirectaddr .= $matches[2];
+ }
+ else
+ $this->_redirectaddr = $matches[2];
+ }
+
+ if(preg_match("|^HTTP/|",$currentHeader))
+ {
+ if(preg_match("|^HTTP/[^\s]*\s(.*?)\s|",$currentHeader, $status))
+ {
+ $this->status= $status[1];
+ }
+ $this->response_code = $currentHeader;
+ }
+
+ $this->headers[] = $currentHeader;
+ }
+
+ $results = '';
+ do {
+ $_data = fread($fp, $this->maxlength);
+ if (strlen($_data) == 0) {
+ break;
+ }
+ $results .= $_data;
+ } while(true);
+
+ if ($this->read_timeout > 0 && $this->_check_timeout($fp))
+ {
+ $this->status=-100;
+ return false;
+ }
+
+ // check if there is a a redirect meta tag
+
+ if(preg_match("'<meta[\s]*http-equiv[^>]*?content[\s]*=[\s]*[\"\']?\d+;[\s]*URL[\s]*=[\s]*([^\"\']*?)[\"\']?>'i",$results,$match))
+
+ {
+ $this->_redirectaddr = $this->_expandlinks($match[1],$URI);
+ }
+
+ // have we hit our frame depth and is there frame src to fetch?
+ if(($this->_framedepth < $this->maxframes) && preg_match_all("'<frame\s+.*src[\s]*=[\'\"]?([^\'\"\>]+)'i",$results,$match))
+ {
+ $this->results[] = $results;
+ for($x=0; $x<count($match[1]); $x++)
+ $this->_frameurls[] = $this->_expandlinks($match[1][$x],$URI_PARTS["scheme"]."://".$this->host);
+ }
+ // have we already fetched framed content?
+ elseif(is_array($this->results))
+ $this->results[] = $results;
+ // no framed content
+ else
+ $this->results = $results;
+
+ return true;
+ }
+
+/*======================================================================*\
+ Function: _httpsrequest
+ Purpose: go get the https data from the server using curl
+ Input: $url the url to fetch
+ $URI the full URI
+ $body body contents to send if any (POST)
+ Output:
+\*======================================================================*/
+
+ function _httpsrequest($url,$URI,$http_method,$content_type="",$body="")
+ {
+ if($this->passcookies && $this->_redirectaddr)
+ $this->setcookies();
+
+ $headers = array();
+
+ $URI_PARTS = parse_url($URI);
+ if(empty($url))
+ $url = "/";
+ // GET ... header not needed for curl
+ //$headers[] = $http_method." ".$url." ".$this->_httpversion;
+ if(!empty($this->agent))
+ $headers[] = "User-Agent: ".$this->agent;
+ if(!empty($this->host))
+ if(!empty($this->port))
+ $headers[] = "Host: ".$this->host.":".$this->port;
+ else
+ $headers[] = "Host: ".$this->host;
+ if(!empty($this->accept))
+ $headers[] = "Accept: ".$this->accept;
+ if(!empty($this->referer))
+ $headers[] = "Referer: ".$this->referer;
+ if(!empty($this->cookies))
+ {
+ if(!is_array($this->cookies))
+ $this->cookies = (array)$this->cookies;
+
+ reset($this->cookies);
+ if ( count($this->cookies) > 0 ) {
+ $cookie_str = 'Cookie: ';
+ foreach ( $this->cookies as $cookieKey => $cookieVal ) {
+ $cookie_str .= $cookieKey."=".urlencode($cookieVal)."; ";
+ }
+ $headers[] = substr($cookie_str,0,-2);
+ }
+ }
+ if(!empty($this->rawheaders))
+ {
+ if(!is_array($this->rawheaders))
+ $this->rawheaders = (array)$this->rawheaders;
+ while(list($headerKey,$headerVal) = each($this->rawheaders))
+ $headers[] = $headerKey.": ".$headerVal;
+ }
+ if(!empty($content_type)) {
+ if ($content_type == "multipart/form-data")
+ $headers[] = "Content-type: $content_type; boundary=".$this->_mime_boundary;
+ else
+ $headers[] = "Content-type: $content_type";
+ }
+ if(!empty($body))
+ $headers[] = "Content-length: ".strlen($body);
+ if(!empty($this->user) || !empty($this->pass))
+ $headers[] = "Authorization: BASIC ".base64_encode($this->user.":".$this->pass);
+
+ for($curr_header = 0; $curr_header < count($headers); $curr_header++) {
+ $safer_header = strtr( $headers[$curr_header], "\"", " " );
+ $cmdline_params .= " -H \"".$safer_header."\"";
+ }
+
+ if(!empty($body))
+ $cmdline_params .= " -d \"$body\"";
+
+ if($this->read_timeout > 0)
+ $cmdline_params .= " -m ".$this->read_timeout;
+
+ $headerfile = tempnam($temp_dir, "sno");
+
+ exec($this->curl_path." -k -D \"$headerfile\"".$cmdline_params." \"".escapeshellcmd($URI)."\"",$results,$return);
+
+ if($return)
+ {
+ $this->error = "Error: cURL could not retrieve the document, error $return.";
+ return false;
+ }
+
+
+ $results = implode("\r\n",$results);
+
+ $result_headers = file("$headerfile");
+
+ $this->_redirectaddr = false;
+ unset($this->headers);
+
+ for($currentHeader = 0; $currentHeader < count($result_headers); $currentHeader++)
+ {
+
+ // if a header begins with Location: or URI:, set the redirect
+ if(preg_match("/^(Location: |URI: )/i",$result_headers[$currentHeader]))
+ {
+ // get URL portion of the redirect
+ preg_match("/^(Location: |URI:)\s+(.*)/",chop($result_headers[$currentHeader]),$matches);
+ // look for :// in the Location header to see if hostname is included
+ if(!preg_match("|\:\/\/|",$matches[2]))
+ {
+ // no host in the path, so prepend
+ $this->_redirectaddr = $URI_PARTS["scheme"]."://".$this->host.":".$this->port;
+ // eliminate double slash
+ if(!preg_match("|^/|",$matches[2]))
+ $this->_redirectaddr .= "/".$matches[2];
+ else
+ $this->_redirectaddr .= $matches[2];
+ }
+ else
+ $this->_redirectaddr = $matches[2];
+ }
+
+ if(preg_match("|^HTTP/|",$result_headers[$currentHeader]))
+ $this->response_code = $result_headers[$currentHeader];
+
+ $this->headers[] = $result_headers[$currentHeader];
+ }
+
+ // check if there is a a redirect meta tag
+
+ if(preg_match("'<meta[\s]*http-equiv[^>]*?content[\s]*=[\s]*[\"\']?\d+;[\s]*URL[\s]*=[\s]*([^\"\']*?)[\"\']?>'i",$results,$match))
+ {
+ $this->_redirectaddr = $this->_expandlinks($match[1],$URI);
+ }
+
+ // have we hit our frame depth and is there frame src to fetch?
+ if(($this->_framedepth < $this->maxframes) && preg_match_all("'<frame\s+.*src[\s]*=[\'\"]?([^\'\"\>]+)'i",$results,$match))
+ {
+ $this->results[] = $results;
+ for($x=0; $x<count($match[1]); $x++)
+ $this->_frameurls[] = $this->_expandlinks($match[1][$x],$URI_PARTS["scheme"]."://".$this->host);
+ }
+ // have we already fetched framed content?
+ elseif(is_array($this->results))
+ $this->results[] = $results;
+ // no framed content
+ else
+ $this->results = $results;
+
+ unlink("$headerfile");
+
+ return true;
+ }
+
+/*======================================================================*\
+ Function: setcookies()
+ Purpose: set cookies for a redirection
+\*======================================================================*/
+
+ function setcookies()
+ {
+ for($x=0; $x<count($this->headers); $x++)
+ {
+ if(preg_match('/^set-cookie:[\s]+([^=]+)=([^;]+)/i', $this->headers[$x],$match))
+ $this->cookies[$match[1]] = urldecode($match[2]);
+ }
+ }
+
+
+/*======================================================================*\
+ Function: _check_timeout
+ Purpose: checks whether timeout has occurred
+ Input: $fp file pointer
+\*======================================================================*/
+
+ function _check_timeout($fp)
+ {
+ if ($this->read_timeout > 0) {
+ $fp_status = socket_get_status($fp);
+ if ($fp_status["timed_out"]) {
+ $this->timed_out = true;
+ return true;
+ }
+ }
+ return false;
+ }
+
+/*======================================================================*\
+ Function: _connect
+ Purpose: make a socket connection
+ Input: $fp file pointer
+\*======================================================================*/
+
+ function _connect(&$fp)
+ {
+ if(!empty($this->proxy_host) && !empty($this->proxy_port))
+ {
+ $this->_isproxy = true;
+
+ $host = $this->proxy_host;
+ $port = $this->proxy_port;
+ }
+ else
+ {
+ $host = $this->host;
+ $port = $this->port;
+ }
+
+ $this->status = 0;
+
+ if($fp = fsockopen(
+ $host,
+ $port,
+ $errno,
+ $errstr,
+ $this->_fp_timeout
+ ))
+ {
+ // socket connection succeeded
+
+ return true;
+ }
+ else
+ {
+ // socket connection failed
+ $this->status = $errno;
+ switch($errno)
+ {
+ case -3:
+ $this->error="socket creation failed (-3)";
+ case -4:
+ $this->error="dns lookup failure (-4)";
+ case -5:
+ $this->error="connection refused or timed out (-5)";
+ default:
+ $this->error="connection failed (".$errno.")";
+ }
+ return false;
+ }
+ }
+/*======================================================================*\
+ Function: _disconnect
+ Purpose: disconnect a socket connection
+ Input: $fp file pointer
+\*======================================================================*/
+
+ function _disconnect($fp)
+ {
+ return(fclose($fp));
+ }
+
+
+/*======================================================================*\
+ Function: _prepare_post_body
+ Purpose: Prepare post body according to encoding type
+ Input: $formvars - form variables
+ $formfiles - form upload files
+ Output: post body
+\*======================================================================*/
+
+ function _prepare_post_body($formvars, $formfiles)
+ {
+ settype($formvars, "array");
+ settype($formfiles, "array");
+ $postdata = '';
+
+ if (count($formvars) == 0 && count($formfiles) == 0)
+ return;
+
+ switch ($this->_submit_type) {
+ case "application/x-www-form-urlencoded":
+ reset($formvars);
+ while(list($key,$val) = each($formvars)) {
+ if (is_array($val) || is_object($val)) {
+ while (list($cur_key, $cur_val) = each($val)) {
+ $postdata .= urlencode($key)."[]=".urlencode($cur_val)."&";
+ }
+ } else
+ $postdata .= urlencode($key)."=".urlencode($val)."&";
+ }
+ break;
+
+ case "multipart/form-data":
+ $this->_mime_boundary = "Snoopy".md5(uniqid(microtime()));
+
+ reset($formvars);
+ while(list($key,$val) = each($formvars)) {
+ if (is_array($val) || is_object($val)) {
+ while (list($cur_key, $cur_val) = each($val)) {
+ $postdata .= "--".$this->_mime_boundary."\r\n";
+ $postdata .= "Content-Disposition: form-data; name=\"$key\[\]\"\r\n\r\n";
+ $postdata .= "$cur_val\r\n";
+ }
+ } else {
+ $postdata .= "--".$this->_mime_boundary."\r\n";
+ $postdata .= "Content-Disposition: form-data; name=\"$key\"\r\n\r\n";
+ $postdata .= "$val\r\n";
+ }
+ }
+
+ reset($formfiles);
+ while (list($field_name, $file_names) = each($formfiles)) {
+ settype($file_names, "array");
+ while (list(, $file_name) = each($file_names)) {
+ if (!is_readable($file_name)) continue;
+
+ $fp = fopen($file_name, "r");
+ $file_content = fread($fp, filesize($file_name));
+ fclose($fp);
+ $base_name = basename($file_name);
+
+ $postdata .= "--".$this->_mime_boundary."\r\n";
+ $postdata .= "Content-Disposition: form-data; name=\"$field_name\"; filename=\"$base_name\"\r\n\r\n";
+ $postdata .= "$file_content\r\n";
+ }
+ }
+ $postdata .= "--".$this->_mime_boundary."--\r\n";
+ break;
+ }
+
+ return $postdata;
+ }
+}
+
+?>
diff --git a/modules/snoopy/TODO b/modules/snoopy/TODO
new file mode 100644
index 00000000..d46738c1
--- /dev/null
+++ b/modules/snoopy/TODO
@@ -0,0 +1,9 @@
+* fetch other types of protocols such as ftp, nntp, gopher, etc.
+* post forms with http file upload (I didn't have this need,
+ but it should be fairly straightforward)
+* expand links, image tags, and form actions to fully
+ qualified URLs
+
+Bugs
+----
+* none known