Here are excerpts:
I am currently working on a project that involves large scale analysis of various countries’ Hansards (this is, transcripts of parliamentary debate). […]
The UK Parliament has such a digitised archive, here.
Frustratingly though, although these zipped XML files are available, there is no bulk download option or simple FTP archive of them. […]
So, to save anyone else the pain, here is a link to a file I built that contains links to every file in this archive. I used the handy FormRequest feature of Scrapy, my favourite, heavily used, scraping tool.
For more details, please see the complete post.