Topic: [Feature] Search: Additional Sort Option - Artist Grouping

Posted under Site Bug Reports & Feature Requests

Requested feature overview description.
Add order:artgroup and order:artgroup_asc as options for sorting art, so as to group art in search results by artist.

Why would it be useful?
If someone wants to search for art and wishes to order the results by artist tags, so that all art by each artist will be grouped together, this will be very helpful (admittedly, while writing this out, I'm realizing this is quite a bit more complicated a suggestion than I initially assumed it to be).
The idea is that search will determine how many artworks each artist that appears in search have that meet the initial search criterion, and sort the search results in a manner to keep each artist's works in a group.
If a post has multiple artist tags, it will be sorted into the artist group of whichever artist has a greater quantity of posts matching the search query.
In theory, it could be compatible with other sorting options in the Sorting Cheat Sheet, but will be the final sort, meaning that it will use the other sorts first, followed by the artist group sort

  • Examples:
    • If one searched "female order:artgroup" then all art for whichever artist has the greatest quantity of art tagged with female will appear first, followed by whichever artist has the second greatest quantity, and so on.
    • If, instead, one searched for "female order:artgroup_asc" then it will sort by artists who have the least number of artworks (no less than 1 artwork) tagged with female first, followed by whichever artist has the second least quantity, and so on.
    • If two or more artists have matching quantities of results for the search query, I can think of two resolutions as sorting tiebreakers
      • artist groups with matching quantities would be sorted by name (simpler)
      • artist group's ID numbers, are averaged, with newer being pushed to the front of the results (more complex, but sorts newer art closer to the front of results, more inline with the default method of search result order)
    • If one searched for "female order:score order:artgroup" then it would sort each artist grouping by their score, rather than the default post ID numbers).
    • Regarding posts with multiple artist tags, if one searched "female order:artgroup", then the post:

When it comes to programming languages, if it isn't HTML or CSS, I am the epitome of the term n00b. I prompted a few AIs with this problem, asking for a potential solution that could, in theory, be appended to the source code without breaking the pre-existing source code.
Another thing I don't know is if this proposed feature would, if/when used, slow search for the user, or, worst case scenario, everyone, to a comparative crawl.
Regardless, here is a collapsed [section] containing the code that made the most sense to me out of what the different AIs I prompted suggested to me.

AI suggested code
class PostQueryBuilder
  # Existing instance variables and methods assumed: @tags, @order, etc.
  
  # Add this method to handle the new order:artgroup functionality
  def build_artist_group_query
    # Check if the order parameter includes 'artgroup' or 'artgroup_asc'
    return nil unless order&.include?('artgroup')
    
    # Determine sort direction: DESC for artgroup, ASC for artgroup_asc
    direction = order.include?('artgroup_asc') ? 'ASC' : 'DESC'
    
    # Extract additional sorting criteria (e.g., "score" from "order:score order:artgroup")
    additional_order = order.split(' ').reject { |o| o =~ /artgroup/ }.join(' ')
    additional_order_clause = build_standard_order_clause(additional_order) || 'posts.id DESC'

    # Step 1: Define base query for posts matching search criteria
    base_query = Post.joins(:tags).where(search_conditions)
                     .where(tags: { category: Tag.categories.artist })
                     .select('posts.id, tags.name AS artist_name')

    # Step 2: Calculate post counts per artist
    artist_counts_subquery = Arel.sql(
      <<-SQL.squish
        SELECT artist_name, COUNT(*) AS post_count,
               AVG(posts.id) AS avg_post_id
        FROM (#{base_query.to_sql}) AS matching_posts
        GROUP BY artist_name
      SQL
    )

    # Step 3: Determine primary artist for each post (artist with highest post count)
    #         Uses a subquery to pick the top artist per post
    primary_artist_subquery = Arel.sql(
      <<-SQL.squish
        WITH artist_counts AS (#{artist_counts_subquery})
        SELECT mp.id,
               (SELECT ac.artist_name
                FROM artist_counts ac
                WHERE ac.artist_name IN (
                  SELECT artist_name FROM (#{base_query.to_sql}) sub
                  WHERE sub.id = mp.id
                )
                ORDER BY ac.post_count DESC, ac.artist_name ASC
                LIMIT 1) AS primary_artist
        FROM (#{base_query.to_sql}) mp
        GROUP BY mp.id
      SQL
    )

    # Step 4: Construct the final query
    # - Join posts with their primary artist and artist counts
    # - Order by artist post count, then additional criteria
    Post.joins(
      <<-SQL.squish
        INNER JOIN (#{primary_artist_subquery}) ppa
          ON posts.id = ppa.id
        INNER JOIN (#{artist_counts_subquery}) ac
          ON ppa.primary_artist = ac.artist_name
      SQL
    )
    .select('posts.*', 'ac.post_count', 'ac.avg_post_id')
    .order(Arel.sql("ac.post_count #{direction}, #{additional_order_clause}"))
  end

  # Helper method to adapt existing order parsing for additional sorts
  def build_standard_order_clause(order_str)
    return nil if order_str.blank?
    # Map common order options to SQL (simplified; expand based on existing logic)
    case order_str
    when 'score'
      'posts.score DESC'
    when 'date'
      'posts.created_at DESC'
    else
      'posts.id DESC' # Default fallback
    end
  end

  # Override or extend the existing build method to include artist grouping
  def build
    # If order:artgroup is specified, use the new query
    if order&.include?('artgroup')
      query = build_artist_group_query
    else
      # Existing build logic for standard queries
      query = Post.where(search_conditions)
                  .order(build_standard_order_clause(order))
      # ... (rest of existing build logic)
    end
    query
  end

  private

  # Placeholder for existing search conditions method
  # Assumes this constructs the WHERE clause based on @tags and other params
  def search_conditions
    # Simplified example; adapt to actual implementation
    TagQueryScanner.new(@tags).to_conditions
  end
end

What part(s) of the site page(s) are affected?
Posts, search, e621:cheatsheet#Sorting

Updated

Donovan DMC

Former Staff

I am pretty certain this is flat out not possible, both in opensearch and direct sql queries
If it is by some magic means possible, it sounds like a performance nightmare

SCTH

Member

Definitely doable from the DB export, so I could make a tool that does it. Would be way harder server side.

scth said:
Would be way harder server side.

Hence my realization while I was typing it up that it's more complicated than I initially assumed. I'm actually running the query source code through an AI to see if it can see or come up with a simple set of code that may form a partial solution.

Donovan DMC

Former Staff

scth said:
Definitely doable from the DB export, so I could make a tool that does it. Would be way harder server side.

doing it would essentially require pulling down all matching records for a search (which is practically what you'd be doing with the db export) because sorting like this simply isn't possible due to everything just being in a string or array without any care as to what category the tag is or how many times it appears in the search, so at bare minimum you'd also need to join to the tags table and lookup every single tag
Unless maybe you pick all the artist tags beforehand and do manual array ordering in a later query, but I doubt it would be stoked for thousands of entries to order by

Mind you the global timeout is 3000ms, so all of this has to be completed within that window

fk2169 said:
Hence my realization while I was typing it up that it's more complicated than I initially assumed. I'm actually running the query source code through an AI to see if it can see or come up with a simple set of code that may form a partial solution.

Y'know pulling out an ai like this to actual programmers is nearly just as bad as bringing ai art to an actual artist? It's a bit of a slap to the face
I know the codebase like the back of my hand and cannot fathom any conceivable way for this to happen in opensearch (which is where it actually needs to happen), or even in the database
I'd eat my foot if an ai could manage to come up with an actual solution to this that doesn't involve some convoluted method of a whole new index or just doing it in code rather than the index or database

donovan_dmc said:
Y'know pulling out an ai like this to actual programmers is nearly just as bad as bringing ai art to an actual artist? It's a bit of a slap to the face

No offense was intended. Obviously an actual programmer who knows the codebase would be required, something of which I am neither.
I didn't know what you do or don't know about it (for all I knew, you could have been a moderator (though, I should have been able to guess from your mentions of opensearch and sql)).

I've had enough problems with AI's proposed solutions to know that they rarely, if ever perfectly nail down the solution to complicated problems, which is why when I do use one, I'm using it predominantly for inspiration (hence my statement about a "partial solution").

Ultimately, I know very little about the backend of things, so the most I can do is make a suggestion, hope it sounds even remotely plausible, and, if I can come up with anything that seems from what I know to be even remotely plausible, offer it up.

Donovan DMC

Former Staff

fk2169 said:
No offense was intended. Obviously an actual programmer who knows the codebase would be required, something of which I am neither.
I didn't know what you do or don't know about it (for all I knew, you could have been a moderator (though, I should have been able to guess from your mentions of opensearch and sql)).

I've had enough problems with AI's proposed solutions to know that they rarely, if ever perfectly nail down the solution to complicated problems, which is why when I do use one, I'm using it predominantly for inspiration (hence my statement about a "partial solution").

Ultimately, I know very little about the backend of things, so the most I can do is make a suggestion, hope it sounds even remotely plausible, and, if I can come up with anything that seems from what I know to be even remotely plausible, offer it up.

For the record I was just a Moderator, but I helped out extensively on the development side as well (I have over 100 PRs on github and now run my own fork which has given me very in depth knowledge)

I misunderstood your partial solution bit (or might have missed it entirely), using it for inspiration is fine in my mind (I'd be a hypocrite if I were to bash that), I was under the hasty assumption that you were taking an ai and running with it entirely, like those "vibe coders" ๐Ÿ˜…

donovan_dmc said:
I misunderstood your partial solution bit (or might have missed it entirely), using it for inspiration is fine in my mind (I'd be a hypocrite if I were to bash that), I was under the hasty assumption that you were taking an ai and running with it entirely, like those "vibe coders" ๐Ÿ˜…

I completely understand the misunderstanding. I've misunderstood my fair share of conversations.
It would be nice if it was reasonably possible to accomplish my suggestion, but, as I initially remarked, and as you confirmed, it's much more complicated than you'd think looking at it from a surface level perspective.