What Is The Best Way To Get Information Off Of A Wiki For Natural Language Processing?

So far I’m using two python libraries

https://pypi.org/project/wikitextparser/ https://mwclient.readthedocs.io/en/latest/

to get pages from categories from a media Wiki architectured website (https://nethackwiki.com). However the parser that I’m using does not offer the ability to interpolate the templates

So I’m either stuck with plain text that removes all the templates and removes valuable data, or I have the raw contents that still have all of the templating syntax.

I have no desire to write an interpolation parsing engine, is my only option to go in and strip the syntax manually?

submitted by /u/ArthurFischel
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *