Rehm, Georg. “Language-Independent Text Parsing of Arbitrary HTML-Documents. Towards A Foundation For Web Genre Identification”. Journal for Language Technology and Computational Linguistics, vol. 20, no. 2, July 2005, pp. 53-74, doi:10.21248/jlcl.20.2005.75.