skip to Main Content

I am looking to generate a slug in the following string format:

'/job/hello-this-is-my-job-posting-for-a-daycare-im-looking-for-in-91770-rosemead-california-12345'

The string will always start with /job/ and end with in-zipcode-city-state-job_id. However, there is a 150 character limit to the slug and I am looking to truncate words before the zipcode one at a time so this character limit can be reached. I know I have to use Regex/explode. How can I do this? I tried the following but my matches array seems to have too many elements:

$pattern = '/-in-d{5}-(.*)-(.*)-(d*)/';
$string = '/job/hello-this-is-my-job-posting-for-a-daycare-im-looking-for-in-91770-rosemead-california-12345';

preg_match($pattern, $string, $matches);
print_r($matches);

// Array
(
    [0] => -in-91770-rosemead-california-12345
    [1] => rosemead
    [2] => california
    [3] => 12345
)

Why is rosemead, california, 12345 considered matches? Shouldn’t there only be the first element?

How do I ensure that the complete slug will be a maximum 150 characters long, with the trailing part (location) included in its entirety, and the leading part (job name) truncated if necessary?

2

Answers


  1. You can do this without using explode() and iterating, just with some standard string manipulation:

    $pattern = '/-in-d{5}-.*-.*-d*/';
    $string = '/job/hello-this-is-my-job-posting-for-a-daycare-im-looking-for-in-91770-rosemead-california-12345';
    $matches = [];
    
    if (!preg_match($pattern, $string, $matches)) {
        // mismatched string - error handling here
    }
    
    $totalLength = 150;
    $maxPrefixLength = $totalLength - strlen($matches[0]);
    if ($maxPrefixLength < strlen('/job/')) {
        // no prefix words possible at all - error handling here
    }
    $prefixLength = max(strlen('/job/'), strrpos(substr($string, 0, $maxPrefixLength), '-'));
    $slug = substr($string, 0, $prefixLength) . $matches[0];
    
    Login or Signup to reply.
  2. Trimming the leading part of the slug to a specified length can be done in a number of ways. I’ve given my answer as a function with comments. We use a regex that extracts the leading part (job name) and the trailing part (location) as the starting point. Then, the maximum allowed length for the job name is counted based on total allowed length, minus location slug length. See the comments for more insight.

    function trim_slug(string $slug, int $maxlen = 150): string
    {
        // check if trimming is required:
        if(strlen($slug) <= $maxlen) {
            return $slug; 
        }
        
        $pattern = '/^(?<job>.+)(?<loc>-in-d{5}-.*-.*-d*)$/';
        // $match will have 'job' and 'loc' named keys with the matched values
        preg_match($pattern, $slug, $match);
        
        // raw cut of job name to maximum length:
        $max_job_chars = $maxlen - strlen($match['loc']);
        $job_name = substr($match['job'], 0, $max_job_chars);
        
        // tidy up to last delimiter, if exists, instead of mincing words:
        if($last_delim = strrpos($job_name, '-')) {
            $job_name = substr($match['job'], 0, $last_delim);      
        }
        
        return $job_name . $match['loc'];
    }
    
    $string = '/job/hello-this-is-my-job-posting-for-a-daycare-im-looking-for-in-91770-rosemead-california-12345';
    
    echo trim_slug($string, 80);
    // result: /job/hello-this-is-my-job-posting-for-a-in-91770-rosemead-california-12345
    

    In the usage sample, max length is 80, since your sample string is only 97 chars, and as such would return from the function as-is with the default 150 characters limit. Demo at 3v4l.

    N.B. this answer uses PHP standard string functions that are not multibyte-aware. If you expect multibyte content, you should use the corresponding multibyte string functions to avoid mangled data. (Whether you want multibyte chars in a slug to begin with, and what are the best ways of handling that, is the topic for another question.)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search