You can download them zipped or unzipped:
And if you are going to use the Browser, you will need the following files:
These last three files are an implementation of the MD5 algorithm by Ian Lynagh, and can be found at http://web.comlab.ox.ac.uk/oucl/work/ian.lynagh/.
The modules are arranged into a low level HTTP module which provides connection oriented functions and the basic Request, Response, and Header data types. Sitting on top of this is the Browser module, which further insulates the user from the rigours of the HTTP/1.1 specification.
This module provides the data structures:
ConnectionOpaque connection object, used to maintain persistant connections though IORefs RequestThe request object, the structure of which which you will probably need to know. Has accessor functions: rqXXX such as rqMethod, rqURI, rqHeadersandrqBody.ResponseThe response object, with accessors: rspCode, rspReason, rspHeaders, rspBody. I want to change this so that "rspBody" is some sort of reference, then when used with concurrency primatives we could get a really nice asyncronous interface going. Alas GHC on Windows blocks on I/O, a promising method of getting around this is to use the "would block" feature on the underlying socket interface... we'll see what happens.HeaderA simple data structure that pairs items of type HeaderName and String, i.e. the key-value pairs found in MIME headers. HeaderNameA big list of HTTP/1.1 headers, all of the form HdrXXX, where XXX is the standard capitalisation of the header name, minus hyphens. E.G. We have constructors named HdrWWWAuthenticate, HdrHost, and HdrContentLength. For non-standard header names use HdrCustom String. RequestMethodHas constructors HEAD, GET, POST, PUT, DELETE, OPTIONS, TRACE.ConnErrorAn error.
The module exports principally header manipulation functions and connection oriented functions. The header
manipulators are all fairly self explanatory. The connection-oriented functions provide two ways for managing an HTTP
connection, the first method is to ignore the underlying connection by using simpleHTTP:
main = simpleHTTP myrequest >>= \rsp -> putStrLn (show rsp)
The second method is to take responsibility for your own connections. Use the sequence:
openTCP :: String -> IO Connection
sendHTTP :: Stream s => s -> Request -> IO (Either ConnError Response)
close :: Stream s => s -> IO ()
urlEncode, urlDecode and urlEncodeVars
useful. These provide URL escaping.
After some small deliberation I have decided that a browser monad isn't such a bad idea. The implementation is a
simple DIY state monad, which I bet will make some of you cringe. The monad name is BrowserAction, and
you will find the following functions very useful for mixing the IO monad with BrowserAction.
browse :: BrowserAction x -> IO x
ioAction :: IO x -> BrowserAction x
By using the Browser module you will be gaining a whole bunch of useful features. You will get:
There is no need to directly manipulate the browser state, since the Browser module provides specific functions for
doing just this: get/setAllowRedirects, setCookieFilter, get/setCookies, addCookie, setErrHandler, setOutHandler,
setProxy. The functions out and err help make logging consistent across a
BrowserAction, these are used within the most interesting function request :: Request -> BrowserAction
Response.
The digest authentication scheme requires the MD5 algorithm. I've used this implementation, and the files for this implementation are included in the zip above.
RFC 1521 - MIME stuff.The most important one here is RFC 2616, this is the one that I have attempted to follow. RFC 2617 is also important, especially in the Browser module. Where I have strayed from RFC 2616 it is to make the implementation more robust. RFC 2617 however, is supported only partially - since the more interesting features of nonce-nce have not been implemented, and I have performed precisely no tests.
RFC 1867 - Form based file upload.
RFC 2045 - MIME stuff.
RFC 2068 - Old HTTP/1.1 spec.
RFC 2109 - Cookies.
RFC 2246 - TLS spec (Transport Layer Security, aka SSL).
RFC 2396 - URI format.
RFC 2616 - HTTP/1.1 spec.
RFC 2617 - HTTP/1.1 Authentication spec.
RFC 2817 - TLS upgrading in HTTP.
In the interests of portability you should wrap any IO action containing a call to browse with the
function withSocketsDo from the Socket module. Stictly speaking this is only necessary in Windows, where
winsock initialisation is mandatory, but I think you should do it anyway.
It is quite safe to call withSocketsDo multiple times, but that technique has earnt a place on the
Winsock Programmers FAQ Lame List, since winsock
initialisation has performance overhead. I doubt you can safely nest these calls, so don't do it.
Oh, and you should strive to catch errors within withSocketsDo then pass them out in a synchronous
fashon, since throwing exceptions from within this function will otherwise prevent the safe cleanup of resources (and
on my computer eventually kills all network connectivity).
Originally developed under ghc-5.02.2, using the old package heirarchy, I used:
ghc -o main -package net -package util --make Main.hs
With the new package structure the following should do it:
ghc -o main -package network -package data --make Main.hs
Bugs, comments, requests, and suggestions are all welcome. Send to warrick dot gray at hotmail dot com.
import Network.URI
import System.Environment (getArgs)
import Data.Maybe (fromMaybe)
import System.IO
import HTTP
import Browser
import Network (withSocketsDo)
main :: IO ()
main = withSocketsDo $ catch
(do { h <- openFile "mylog.log" WriteMode
; args <- getArgs
; if null args
then putStrLn "Web page argument required!"
else catch (browse $ fn h (head args))
(\e -> hClose h)
})
(\e -> putStrLn ("Exception!: " ++ show e))
where
myRequest u =
let uri = fromMaybe (error "Nothing from url parse") (parseURI u)
in defaultGETRequest uri
fn h arg =
do setOutHandler (hPutStrLn h)
setCookieFilter (\_ _ -> return True)
let rq = myRequest arg
rsp <- request rq
ioAction $ hFlush h
out (rspBody $ snd rsp)
Neither the name of the ORGANIZATION nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.Is replaced with:
The names of contributors may not be used to endorse or promote products derived from this software without specific prior written permission.Simply because there is no relevant organisation name to substitute.
You can find the licence here.