skip to Main Content

So, I have a large set of numbers, basically 1001 .. 150000 for a database using MySQL

There are a ton of gaps in the IDs on the database, so not all IDs exist. It can go from 100000 – 10500, then the next number will be 10675, and so fourth.

I want to make the IDs shortened, such as 1001..3000, x, x, x, 55000..101000 etc

I’m sure it’s simple. Thanks in advanced!!

SELECT id FROM table_name WHERE data = x 

give me above info.

I used

select group_concat(id) from items where id>1000

to get all ids in a comma separated list. How do I shrink this to be more clean? Basically to add ".." to a series of sequential numbers

I am using perl – just not sure of the syntax to make it work

3

Answers


  1. To shorten the sequence of IDs into ranges like 1001..3000, 55000..101000, etc., you can use a combination of SQL and Perl. MySQL itself doesn’t have built-in functionality to directly compress ranges with ".." notation, but you can achieve this with a bit of SQL and some post-processing in Perl.

    Here’s a step-by-step approach to accomplish this:

    1. Extract Sequential IDs in SQL

    First, get the IDs from the database:

    SELECT id
    FROM items
    WHERE id > 1000
    ORDER BY id;
    

    2. Process IDs in Perl

    Use Perl to process the list of IDs and convert them into ranges. Here’s a basic Perl script to accomplish this:

    #!/usr/bin/perl
    use strict;
    use warnings;
    
    # Sample input list of IDs
    my @ids = (1001, 1002, 1003, 1004, 3000, 3001, 3002, 55000, 55001, 55002, 101000);
    
    # Initialize variables
    my @ranges;
    my $start = $ids[0];
    my $end = $start;
    
    # Iterate through the list
    for my $i (1 .. $#ids) {
        if ($ids[$i] == $end + 1) {
            $end = $ids[$i];
        } else {
            push @ranges, ($start == $end) ? $start : "$start..$end";
            $start = $ids[$i];
            $end = $start;
        }
    }
    
    # Add the last range
    push @ranges, ($start == $end) ? $start : "$start..$end";
    
    # Print ranges
    print join(", ", @ranges), "n";
    

    Explanation of the Perl Script

    1. Initialize Variables: Starts with the first ID and sets it as the beginning of the range.
    2. Iterate Through IDs: Checks if each ID is sequential; if it is, it updates the end of the current range. If not, it saves the current range and starts a new one.
    3. Output the Ranges: Formats the ranges and prints them.

    3. Integration

    • Run the SQL Query: Export the list of IDs to a file or directly feed it into the Perl script.
    • Run the Perl Script: Process the ID list with the Perl script to get the formatted ranges.

    This approach separates the concerns of querying the database and processing the data, making it easier to manage and adapt to different scenarios.

    Login or Signup to reply.
  2. This is a variation on the "gaps and islands" problem.

    First, match up each ID with its previous ID.

    select 
      id, 
      lag(id) over(order by id) as prev_id 
    from test
    

    If we have 1, 2, 3, 5, 7, 8, 9, 10, 12 this produces…

    id prev_id
    1 null
    2 1
    3 2
    5 3
    7 5
    8 7
    9 8
    10 9
    12 10

    Now we can find the start of each group by finding the rows where id – prev_id <> 1, or if prev_id is null. That’s 1, 5, 7, and 12 above. The end is the prev_id of the next matching row. If there is no next row, the end is the highest ID.

    with lag_ids as (
      select 
        id, 
        lag(id) over(order by id) as prev_id
      from test
    )
    select 
      id as start, 
      coalesce(
        lead(prev_id) over(order by id),
        (select max(id) from test)
      ) as end
    from lag_ids 
    where prev_id is null 
       or id - prev_id <> 1
    
    start end
    1 3
    5 5
    7 10
    12 12

    We can then format this using a case statement to identify groups of one ID.

    with lag_ids as (
      select 
        id, 
        lag(id) over(order by id) as prev_id 
      from test
    ),
    start_end as (
      select 
        id as start, 
        coalesce(
          lead(prev_id) over(order by id),
          (select max(id) from test)
        ) as end
      from lag_ids 
      where prev_id is null 
         or id - prev_id <> 1
    )
    select
      case
      when start = end then start
      when start < end then concat(start, '..', end)
      else 'error'
      end as id_range
    from start_end
    order by start
    
    id_range
    1..3
    5
    7..10
    12

    You can then group concat the case to get 1..3,5,7..10,12.

    Demonstration.

    Login or Signup to reply.
  3. here is a short query for you

    SELECT GROUP_CONCAT(
      CONCAT( IF(l=h, l, CONCAT(l,"..",h)))
    ) GAP
    FROM ( 
      SELECT MIN(t1.id) l,MAX(t1.id) h, MAX(IF(t2.id IS NULL,@grp:=@grp+1,@grp)) AS grp
      FROM id_table t1
      LEFT JOIN id_table t2 ON t2.id = t1.id +1
      CROSS JOIN ( SELECT @grp := 0 ) AS INIT
      GROUP BY @grp
    ) as r;
    

    result

    100..103,110,120..121,200..203,400,500
    

    sample

    dbfiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search