[Feature] Allow wildcards in `md5:...` search.

In category: Site Bug Reports & Feature Requests

Requested feature overview description.
Allow the use of wildcards (particularly the * wildcard) when searching with md5:....
So for example, searching for md5:61c93e4bdfe0f880fd7e0113c3c10f9* should find all posts whose md5 starts with 61c93e4bdfe0f880fd7e0113c3c10f9 (only one in this case). Right now it doesn't find any posts at all.

Why would it be useful?
When I browse through e621 I download images I like by dragging them from the browser to a folder. (I use Windows.)
The following things happen, in order:

  • 1. I open ~30 to 50 tabs of posts where the thumbnails are promising.
  • 2. As tabs finish loading I go through them and close the ones I don't like, or...
  • 3. ... when I see one I like I drag the image from the browser to some folder.
  • 4. Windows opens a little box saying Verschieben von "61c93e4bdfe0f880fd7e0113c3c10f9, which means Moving "61c93e4bdfe0f880fd7e0113c3c10f9. See how the text is clipped at the end? It also creates an empty dummy file with the md5 as its filename.
  • 5. I close the tab the image came from because I want to keep browsing.
  • 6. Goto 1 while downloads complete in the background and the little boxes close one after another.

But! Sometimes downloads time out. So Windows opens another box above the first one saying Unbekannter Fehler, which means `Unknown error". I can keep the boxes open for as long as I like but Windows decides to immediately delete the dummy file anyways so I can't see the whole filename. I can't tell which post the file came from because I have closed the tab half an hour ago. So I am stuck with the first ~30 of 32 characters of the hash.

My current workaround is to do use the API to automatically search through every possible hash which starts with the known characters. But that takes almost no time for 1 missing character (16 possible hashes), about a minute for 2 (256) and something like half an hour for 3 (4096). Being able to search for hashes with wildcards would be extremely helpful.

What part(s) of the site page(s) are affected?
The search pages: /post/index, /post/index.xml, etc.


There are workarounds for that problem: Most browsers cache any given file and since the name is contained in the url any tool that allows you to view your browser's cache will likely find the image with your partial file names.
For example I just found ImageCacheViewer which seems to work well. It doesn't need to be installed, just drop the executable somewhere, and it works with Chrome, IE, Edge, and Firefox.

You may also be able to manually search things directly inside the browser. For example on Firefox you'd go to about:cache, Disk: List Cache Entries, then Ctrl+F for the file name, click the hyperlink, click the next hyperlink, and done.


I'd suggest a md5summing program. If you are not editing the image (eg. adding EXIF tags) after downloading, then this will calculate the full correct md5sum, and you can then use that to fix the filename or just to input into the search.

(If you did edit the image, then of course that would change the MD5sum to something completely different)

An example of a MD5summing program is 'WinMD5'. As I am not a Windows user, I can't comment on its quality; personally I use md5sum command in Linux. These programs are pretty simple to use, though, and there seem to be several options for Windows (WinMD5, Checksums Calculator, winMD5Sum)


NotMeNotYou said:
There are workarounds for that problem: Most browsers cache any given file
[...]
on Firefox you'd go to about:cache, Disk: List Cache Entries, then Ctrl+F for the file name

That is a good workaround. I'll have to experiment with it, though. Right now I have the disk-cache disabled (I'm using FireFox) so neither ImageCacheViewer nor about:cache?storage=disk show any files. (about:cache?storage=memory does contain some entries, but it's only for small files so the big images won't be there.)
If you're wondering why I would do such a thing as disabling the cache: I don't know the exact reason either. But I do remember being very upset about an issue relating to the cache so I'm hesitant to enabling it again.

On a side-note: Is there a specific reason why wildcards don't work with hashes as of yet?

savageorange said:
I'd suggest a md5summing program.

How can I md5sum the file when the download has failed? ;)
And if the download succeeds, the filename is the md5 hash of that file so when I do have the whole file I also automatically know the full hash.


Calimero000 said:
That is a good workaround. I'll have to experiment with it, though. Right now I have the disk-cache disabled (I'm using FireFox) so neither ImageCacheViewer nor about:cache?storage=disk show any files. (about:cache?storage=memory does contain some entries, but it's only for small files so the big images won't be there.)
If you're wondering why I would do such a thing as disabling the cache: I don't know the exact reason either. But I do remember being very upset about an issue relating to the cache so I'm hesitant to enabling it again.

I think I faintly remember that cache issue as well, but I believe it has been fixed ages ago, I'd try enabling it. Just be sure to somewhat regularly clear it manually from your history (only clearing the cache will leave all other history items intact) so new JavaScript stuff and the likes gets refreshed properly.
As a rule of thumb the best times to clear the cache are after large browser updates or if a page you frequent had an update. In the case of the latter a quick Ctrl+F5 will take care of the outdated cache for that particular page as well.

Calimero000 said:

On a side-note: Is there a specific reason why wildcards don't work with hashes as of yet?

My guess is it's an issue with the current implementation of the metatag. They're, uh, "basic" at best from what I've been told. Maybe Kira will be able to add this functionality, but I guess nobody cared enough to change it beforehand.


NotMeNotYou said:
I think I faintly remember that cache issue as well, but I believe it has been fixed ages ago,
[...]
My guess is it's an issue with the current implementation of the metatag.

Thank you very much for the information.
I'm really glad to hear that my memory didn't lie to me about the cache thing.


Calimero000 said:
On a side-note: Is there a specific reason why wildcards don't work with hashes as of yet?

I would bet that's a detail of the SQL used/ performance considerations.

Doing a query that uses exact equality testing, like select [..] where md5="somemd5", is always cheaper to compute than wildcards : select [..] where md5 LIKE "somemd5%".

(% is the SQL equivalent of * -- 'match 0 or more arbitrary characters')

And of course, as you may have inferred from the above, a) user-inputted *s must be translated into % in order to do a LIKE comparison, and b) % (or *) has no special meaning outside of LIKE comparisons.

How can I md5sum the file when the download has failed? ;)

Ooh.

Sorry, should have read more carefully.

Hmm. Not sure what browser you are using, but installing a download manager addon like DownThemAll might best address your issues. It achieves

a) tracking downloads so you can always retry failed downloads
b) keeping the full URL there so it's easy to look up by md5

Of course, it would require a change to exactly how you download images.


Ok, that worked out nicely. I didn't notice any problems cache-wise, so that's great. FireFox now doesn't download files again when I drag them to the folder, it just reuses the cached data. So the initial problem of downloads not finishing properly worked itself out anyways.
Thanks.

Edit:
Nobody actually looked at the post I was referring to :(