Why isn't DownThemAll able to recognize my reddit URL regular expression?
So I'm trying to download all my old reddit posts using a combination of AutoPagerize and DownThemAll.
Here are two sample URLs I want to distinguish between:
- http://www.reddit.com/r/China/comments/kqjr1/what_is_the_name_of_this_weird_chinese_medicine/c2med97
- http://www.reddit.com/r/China/comments/kqjr1/what_is_the_name_of_this_weird_chinese_medicine/c2meana?context=3
The regexp I'm trying to use is this: (\b)http://www.reddit.com/([^?\s]*)?
I want all my reddit posts downloaded, but I don't want any redundancy, so I want to match all of my reddit posts except for anything with a question mark (after which there's a "context=3" character).
I've used RegEx Buddy to show that the regexp fits the first URL but not the second one. However, DownThemAll does not recognize th开发者_运维知识库is. Is DownThemAll's ability to parse regexp limited, or am I doing something wrong?
For now, I've just decided to download them all, but to use a renaming mask of *subdirs*.*text*.*html*
so that I can later mass remove anything containing the word "context" in its filename.
Reddit does have an API, you might want to take a look at that instead, might be easier.
https://github.com/reddit/reddit/wiki/API
EDIT: Looks like http://www.reddit.com/user/USERNAME/.json
might be what you want
精彩评论