skip to Main Content

Can someone please add some code and help me complete the assignment ?
I need help selecting first sentence in a group of sentences of five. Then put these first sentences from each group into one paragraph. Then display them together.

I have already completed major part of this complex task: Using PHP I open file, put contents in array, divide the array into sentences, group the sentences in groups of five.

The code is here:

//Reading file contents
$text = file_get_contents( 'majmunskikompjuter.txt' ); 
echo $text;


 //Divide the text into sentences
 
$result = preg_split('/(?<=[.?!;:])s+/', $text, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);


//Divide the text into equal parts containing 5 sentences each
$input_array = $result;
print_r(array_chunk($input_array, 5, true)); 

//Select first sentence from each part of the divided text and display together ???? Need help

2

Answers


  1. You could for example omit the last parameter in array_chunk to not keep the keys, and then use array_map returning the first entry of the array returned from array chunk.

    Then use implode with a space.

    Example using an arrow function:

    $result = preg_split('/(?<=[.?!;:])s+/', $text, -1, PREG_SPLIT_NO_EMPTY);
    $result = implode(" ", array_map(
        fn($arr) => $arr[0],
        array_chunk($result, 5))
    );
    
    Login or Signup to reply.
  2. I would not say this task is a complex one. You only need to realize what sentence makes a sentence. Typically it is some interpunction as you are trying to do with preg_split. I am not English native speaker, so I am not sure that ; or : counts as sentence separator in English, but let us say it is. What you can do is to select (by regex) everything from start of the paragraph up to one of those symbols.

    This way you only need one regex to match the first sentence:

    $text = '...';
    $firstSentences = [];
    
    preg_match_all('/^.*?[.!?:;](?:s|$)/m', $text, $firstSentences);
    

    Now $firstSentences should have your first sentences. A bit of explanation here:

    1. /pattern/m looks for multiline inputs. Typically paragraphs are divided into many lines. There is one downside of this – if someone writes paragraphs with breaklines (which should not be done ever), this will work at each line separately. It skips empty lines though, therefore those are not problematic.

    2. ^.*? gives anything . that is there *, but this section is here only once or none ?.

    3. [.!?:;] A class (character, that might ., !, ?, : or ;.

    4. (?:s|$) non capturing ?: group () of a white space s or | the end $.

    Hope this will help you for the asignement. I recommend to have some regular expression cheatsheet if you are not working with it every day.

    https://www.rexegg.com/regex-quickstart.html

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search