Best, most recent e621 downloader?

In category: e621 Tools and Applications

I'm not from the US, but whatever this FCC group is doing with Net Neutrality right now is scaring me quite a bit. I'm not sure what's giving me this feel, but I think e6 won't survive because of it. As much as I hope it won't happen :(

That's why I'm thinking of downloading the entire e6 posts, in case this site has to shut down. I've seen people talking about the same idea as well, being just as concerned as me. Call me crazy, but I'd rather be safe than sorry!

However, most downloaders I've tried so far seem to be really outdated or buggy, so I thought I'd ask in here for a proper working one.

Judging from the official stats, downloading everything would require around 1.2TB of disk space. I'd have to shave off a few images from that, so I'm looking for a downloader that eventually has blacklist support. I have a lot of tags in my account's blacklist, so I can easily see myself saving ~0.4TB.

I mean, even if it doesn't have something fully-fledged out like that, I wouldn't mind having a downloader that at least grabs images in their full original quality. That's the most important part, and downloaders I've tried in the past have failed in doing so.

Anything you guys can recommend? Thank you!


Sorry for the creepy 5 minutes after posting response, just happened to be lurking when you opened this tread, but here is my biased answer: my downloader. https://github.com/Wulfre/e621dl

Has blacklist support and was last updated today. If you need any extra features just let me know and I'll see what I can do.


Wulfre said:
Sorry for the creepy 5 minute after posting response, just happened to be lurking when you opened tread, but here is my biased answer: my downloader. https://github.com/Wulfre/e621dl

Has blacklist support and was last updated today. If you need any extra features just let me know and I'll see what I can do.

Woah, thank you so much for the quick reply, you're a livesaver! I'll definitely give this a shot later today. I'll come back and tell how it goes. You're the best!! (ノ´ヮ´)ノ*:・゚✧


I just gave it a quick shot with the windows executable (v4.2.2) and I don't think it's working for me sadly.

It made itself a config.ini file, so I put in "tags = matotoma" as a quick test.
When I start the .exe again, it says there's a new version and it asks me if I want to run it anyway. But when I confirm with yes (y), nothing else happens.

I also tried the source code version (v4.2.3) with the latest version of python, but starting it up doesn't give me anything at all. Just a cmd box for a split second.
Running it with cmd says "No module named 'requests' ", and I can't figure out on how to get the requests file.

I've read your instructions on the Github page, but nothing seems help..


tacklebox said:
I just gave it a quick shot with the windows executable (v4.2.2) and I don't think it's working for me sadly.

It made itself a config.ini file, so I put in "tags = matotoma" as a quick test.
When I start the .exe again, it says there's a new version and it asks me if I want to run it anyway. But when I confirm with yes (y), nothing else happens.

I also tried the source code version (v4.2.3) with the latest version of python, but starting it up doesn't give me anything at all. Just a cmd box for a split second.
Running it with cmd says "No module named 'requests' ", and I can't figure out on how to get the requests file.

I've read your instructions on the Github page, but nothing seems help..

I just released a new binary version (4.3.0) that I was working on today like 10 minutes ago. The last version (4.2.3) had a bug when notifying the user about a new version, so that might have been the issue.

The source version doesn't work unless you have all of the dependencies installed, which I guess I should explain better in the documentation.

If you still have an issue, post your config here. I didn't have any testers during development so I don't have any reference to know if anything is particularly hard to understand.


What about the other websites like wildcritters.ws or veebooru.com or exhentai?


Wulfre said:
I just released a new binary version (4.3.0) that I was working on today like 10 minutes ago. The last version (4.2.3) had a bug when notifying the user about a new version, so that might have been the issue.

The source version doesn't work unless you have all of the dependencies installed, which I guess I should explain better in the documentation.

If you still have an issue, post your config here. I didn't have any testers during development so I don't have any reference to know if anything is particularly hard to understand.

Thank you for your assistance, but I still don't think it's working out for me.. Here's everything I did in order:

started the 4.3.0 executable for the first time
cmd box came up for a split second, then created itself the config.ini file.

I edited the [defaults] section in the config.ini file with "ratings = q, e" (it was only 's' before).
then I added a new line with "tags = hidoritoyama", just as a quick test

when I start the .exe again, the cmd window just shows this:

e621dl  INFO  Running e621dl version 4.3.0
e621dl  INFO  Parsing config.

for a millisecond, then closes itself again. took me a while to screencap it.

I uploaded my config.ini in a pastebin if you want to check it out, am I doing something wrong? Thank you again!


I still feel really creepy replying so fast, but we always seem to be on the forum at the same time.

Anyway, the tags go in their own sections below the defaults and blacklist, you can call the sections anything you want. I modified your config.ini for you.

The default section just fills in the days, ratings, and score if you leave them out of the searches you define below. Let me know if you just overlooked something in the docs or how I can explain the format better. Like I said, I didn't have anyone test the program while I was writing it, so everything makes sense to me because I already know what to expect.

https://pastebin.com/6H1tPgtp


Wulfre said:
I still feel really creepy replying so fast, but we always seem to be on the forum at the same time.

Anyway, the tags go in their own section below the defaults and blacklist. I modified your config.ini for you. Let me know if you just overlooked something in the docs or how I can explain the format better.

https://pastebin.com/6H1tPgtp

Haha don't worry, you're not creepy! I'm just very thankful for your help. Will test this out now, many thanks!


Alright, we're one step closer! \o/

I took your config.ini from the pastebin and replaced it entirely with what I had.
When I run the .exe this time, I get a completely new screen: imgur link

Just like what I was dealing with before, it only shows up for a split second as well, and then closes itself again.
It also made a directory under "e6dl\downloads\test" though, which sounds like a good sign to me! But there are no images inside.

I hope I'm not annoying you in any way with this, but I'm a bit new with programs like these.


Okay, I'm getting the same result, let me look into it. It should work just fine with general tags for the time being. Also, you can run the exe through a command shell to keep the text on screen after the program is done.

It's not annoying at all, I'm actually pretty happy to have someone who doesn't know what they're doing (I hope that doesn't sound bad) because they are more likely to break the program and I can make changes to make it more friendly.

EDIT: I already found the issue. I didn't look at the config hard enough. If you want to download all posts ever put on the site you need to change days to something huge like 999999999. The original intent of the program was to run it daily and only download posts that were posted that day.


Ah! I actually wanted to do both of those things, lucky!

So if I'd want to download everything, I'd have to put the days to something infinitely long and just let it do it's thing.

But if I only want to fetch today's posts, I'd have to put the days on 1 and run it daily.

Seems simple enough! I guess I'd have to do it at a specific time like a schedule though, so I won't miss out on anything. This is great!!

Any way I can make a donation to you? If you accept them of course.


That's exactly how it works. If you wanna leave yourself some room for error you can always set days to something like 2 or 3 and it will check that number of days respectively.

I thought about leaving a donation link at the bottom of the documentation since the original writer that I forked from had one, but I didn't feel like it was appropriate for a program that not many people will use and I wrote for myself anyway. I'm just happy to have people using it and giving me feedback.


Awwh, I would've donated without any hesitation. Either way, definitely keep up the amazing hard work! Can't wait until I try this out for real!

And thank you so much for the troubleshooting help! I'll definitely keep an eye on your Github page too.


No problem. Here is my current config if you need more examples. I didn't put it on GitHub because even though the name implies explicit content, I don't need anyone who is just browsing code seeing my lists. ( ͡° ͜ʖ ͡°)

https://pastebin.com/raw/SpSj8ZJ2


Sweet, seems like both our searches and blacklists are pretty similar.. hehehe

But I think this gave me an even better idea on how to use the config file! I hope I won't mess up anything if I do multiple searches like that.

One way or another it's really useful, big thanks for the extra tip! <3


there's also RipMe which works for more websites as well, like Deviantart and Imgur, but it does require Java to run though...


tacklebox said:
Ah! I actually wanted to do both of those things, lucky!

So if I'd want to download everything, I'd have to put the days to something infinitely long and just let it do it's thing.

But if I only want to fetch today's posts, I'd have to put the days on 1 and run it daily.

Seems simple enough! I guess I'd have to do it at a specific time like a schedule though, so I won't miss out on anything. This is great!!

Any way I can make a donation to you? If you accept them of course.

This tool will NOT work to download the full site archive. It uses page numbers and you can only go to page 750 before you start getting 403 errors. If you continue to generate 403 errors at a high rate, you might get yourself IP banned by CloudFlare. Until the tool is fixed to use before_id and abort on errors it's not something I'd suggest.

I also generally suggest that people don't make full site rips. I guarantee that you're not interested in a lot of the content on here, and you're wasting your time, the sites bandwidth, and about 1.2TB of storage space. You'll end up deleting most of it anyways. Pick some tags that interest you and go from there, you'll be MUCH happier.


KiraNoot said:
This tool will NOT work to download the full site archive. It uses page numbers and you can only go to page 750 before you start getting 403 errors. If you continue to generate 403 errors at a high rate, you might get yourself IP banned by CloudFlare. Until the tool is fixed to use before_id and abort on errors it's not something I'd suggest.

I also generally suggest that people don't make full site rips. I guarantee that you're not interested in a lot of the content on here, and you're wasting your time, the sites bandwidth, and about 1.2TB of storage space. You'll end up deleting most of it anyways. Pick some tags that interest you and go from there, you'll be MUCH happier.

Thanks for specifying that actually. I based this program off of one I used to use before I wanted to add my own (admittedly sloppy) features, so I kept the same format and only glanced at the API documentation for things that I did not already have the structure for. I'll switch to before_id right away, it actually seems much nicer to use.


Wulfre said:
Thanks for specifying that actually. I based this program off of one I used to use before I wanted to add my own (admittedly sloppy) features, so I kept the same format and only glanced at the API documentation for things that I did not already have the structure for. I'll switch to before_id right away, it actually seems much nicer to use.

The only caveat you have is that sort order doesn't work anymore under before_id. before_id forcefully sets the sorting order to post id descending. That generally doesn't matter for a downloader, since you want all of the posts for the query, and the order you get them doesn't matter. However you should be aware of the caveat in case it matters for something you plan to do with the tool later on.

Also make sure that your tool doesn't continually try things over and over again if it is getting non-200 http response codes. Easy way to test that is to see if it aborts when requesting a page above 750, as that will immediately give you a 403 error.


KiraNoot said:
The only caveat you have is that sort order doesn't work anymore under before_id. before_id forcefully sets the sorting order to post id descending. That generally doesn't matter for a downloader, since you want all of the posts for the query, and the order you get them doesn't matter. However you should be aware of the caveat in case it matters for something you plan to do with the tool later on.

Also make sure that your tool doesn't continually try things over and over again if it is getting non-200 http response codes. Easy way to test that is to see if it aborts when requesting a page above 750, as that will immediately give you a 403 error.

I don't think that I'll be using the sort order for anything in the future. I also have a really simple way to check the response codes and I feel dumb for not just using them in the first place.

I'm pretty much already set to use before_id, it was only a few lines that I needed to change. A quick question though, is there any way to get the highest post id other than just checking https://e926.net/post/index.json?limit=1

EDIT: I take back that last question about finding the highest post id. I figured out a better way to get the result I wanted. I was just throwing code together and trying to get my questions out while I knew you might be around to see them.


Wulfre said:
I don't think that I'll be using the sort order for anything in the future. I also have a really simple way to check the response codes and I feel dumb for not just using them in the first place.

I'm pretty much already set to use before_id, it was only a few lines that I needed to change. A quick question though, is there any way to get the highest post id other than just checking https://e926.net/post/index.json?limit=1

EDIT: I take back that last question about finding the highest post id. I figured out a better way to get the result I wanted. I was just throwing code together and trying to get my questions out while I knew you might be around to see them.

At the current time it is safe to provide no before_id for the first query, or if the code isn't flexible enough to do that, you can provide signed maximum 32bit integer, which is 2.47 billion or so.

My architectural suggestion is to create a dictionary and fill it with fields if they should be present, and submit them using the requests post and data= functionality. There is no requirement that fields appear in the URL query section, and POST requests are just as valid for the read APIs.

tags = input_tags
if score is not None:
  tags += ' score:>\d' % score
payload = {'limit': '320', 'tags': 'male female mammal'}
if before_id is not None:
  payload['before_id'] = before_id
requests.post("https://e621.net/post/index.json", data=payload)

You could also create a list of tags as a list and then ' '.join(tags) and ensure that you have everything that way.

This will save you from having to define defaults, and actually will make your searches faster.

P.S. When using before_id it's a good idea to make sure that your loop condition exits if before_id doesn't change on each iteration of the loop, or you can easily infinite loop in result sets.


KiraNoot said:
This tool will NOT work to download the full site archive. It uses page numbers and you can only go to page 750 before you start getting 403 errors. If you continue to generate 403 errors at a high rate, you might get yourself IP banned by CloudFlare. Until the tool is fixed to use before_id and abort on errors it's not something I'd suggest.

I also generally suggest that people don't make full site rips. I guarantee that you're not interested in a lot of the content on here, and you're wasting your time, the sites bandwidth, and about 1.2TB of storage space. You'll end up deleting most of it anyways. Pick some tags that interest you and go from there, you'll be MUCH happier.

Oh damn, I was just about to prepare my config until I read this.
I wanted to blacklist a lot of tags anyway, but now I'm a bit worried of the potential errors and IP bans. So should I not use it at all to stay safe?


tacklebox said:
Oh damn, I was just about to prepare my config until I read this.
I wanted to blacklist a lot of tags anyway, but now I'm a bit worried of the potential errors and IP bans. So should I not use it at all to stay safe?

I'd wait for Kira to respond if you are worried, since they know way more about how the site works than I do, but I did fix everything that they mentioned in the last 3 posts and it will be in the next release.


Wulfre said:
I'd wait for Kira to respond if you are worried, since they know way more about how the site works than I do, but I did fix everything that they mentioned in the last 3 posts and it will be in the next release.

Alright I'll just wait then, thanks!


tacklebox said:
Alright I'll just wait then, thanks!

Other than the missing sanity check that before_id is changing on each request it looks like it should be acceptable. The added checks for the HTTP status code should be enough that it will stop if it encounters problems, and that will keep you out of trouble if it goes too fast or starts to do something the site doesn't like.

I haven't looked through the whole thing from top to bottom, but overall it seems like it should get the job done.

I still suggest setting a list of tags you'rs interested instead of just a blacklist, but you're not going to get blocked for something like that. :P


tacklebox said:
Alright I'll just wait then, thanks!

New release is published now. I'll keep fixing things as I find them or am told about them.

KiraNoot said:
Other than the missing sanity check that before_id is changing on each request it looks like it should be acceptable. The added checks for the HTTP status code should be enough that it will stop if it encounters problems, and that will keep you out of trouble if it goes too fast or starts to do something the site doesn't like.

I haven't looked through the whole thing from top to bottom, but overall it seems like it should get the job done.

I still suggest setting a list of tags you'rs interested instead of just a blacklist, but you're not going to get blocked for something like that. :P

Of course, I wasn't expecting a full code review for my hacky program. Just making sure that no one gets in trouble for using it while I keep tweaking it.


Wulfre said:
New release is published now. I'll keep fixing things as I find them or am told about them.

Great! I'm still completely formatting my hard drive, which can take a while.. I should've done it while sleeping last night.

Either way, I'll finally get everything started once it's done. Thank you very much!


And to think, I started panic-downloading my porn the hard way.


111111111 said:
And to think, I started panic-downloading my porn the hard way.

I got you covered.😉


Just wanted to update really quick and say everything went simply wonderful!
No issues, silky smooth and super fast, plus it barely even filled up my drive too!

Definitely feel like I should've made multiple directories like [muscular] and [pokemon], but I was sort of in a rush due to me panicking back then.
I might do it another day, but for now I definitely feel more relieved! I'll run it daily to keep my collection updated too~

Are you really sure you don't accept any donations? You pretty much just saved my life over here and I couldn't be more thankful.