skip to Main Content

my string may be like this:

@ *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?

in fact – it is a dirty csv string – having names of jpg images

I need to remove any non-alphanum chars – from both sides of the string
then – inside the resulting string – remove the same – except commas and dots
then – remove duplicates commas and dots – if any – replace them with single ones

so the final result should be:
lorem.jpg,ipsum.jpg,dolor.jpg

I firstly tried to remove any white space – anywhere

$str = str_replace(" ", "", $str);  

then I used various forms of trim functions – but it is tedious and a lot of code

the additional problem is – duplicates commas and dots may have one or more instances – for example – .. or ,,,,

is there a way to solve this using regex, pls ?

3

Answers


  1. Can you try this :

    $string = ' @ *lorem.jpg,,,,  ip sum.jpg,dolor .jpg,-/ ?';
    // this will left only alphanumirics
    $result = preg_replace("/[^A-Za-z0-9,.]/", '', $string);
    
    // this will remove duplicated dot and ,
    $result = preg_replace('/,+/', ',', $result);
    $result = preg_replace('/.+/', '.', $result);
    
    // this will remove ,;. and space from the end
    $result = preg_replace("/[ ,;.]*$/", '', $result);
    
    Login or Signup to reply.
  2. Look at

    https://www.php.net/manual/en/function.preg-replace.php

    It replace anything inside a string based on pattern. s represent all space char, but care of NBSP (non breakable space, h match it )

    Exemple 4

    $str = preg_replace('/ss+/', '', $str);
    

    It will be something like that

    Login or Signup to reply.
  3. List of modeled steps following your words:

    Step 1

    • "remove any non-alphanum chars from both sides of the string"

    • translated: remove trailing and tailing consecutive [^a-zA-Z0-9] characters

    • regex: replace ^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$ with $1

    Step 2

    • "inside the resulting string – remove the same – except commas and dots"
    • translated: remove any [^a-zA-Z0-9.,]
    • regex: replace [^a-zA-Z0-9.,] with empty string

    Step 3

    • "remove duplicates commas and dots – if any – replace them with single ones"
    • translated: replace consecutive [,.] as a single
      instance
    • regex: replace (.{2,}) with .
    • regex: replace (,{2,}) with ,

    PHP Demo:

    https://onlinephp.io/c/512e1

    <?php
    
    $subject = " @ *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?";
    
    $firstStep = preg_replace('/^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$/', '$1', $subject);
    $secondStep = preg_replace('/[^a-z,A-Z0-9.,]/', '', $firstStep);
    $thirdStepA = preg_replace('(.{2,})', '.', $secondStep);
    $thirdStepB = preg_replace('(,{2,})', ',', $thirdStepA);
    
    echo $thirdStepB; //lorem.jpg,ipsum.jpg,dolor.jpg
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search