skip to Main Content

I have text that I receive from the source (can’t change the source) and need to convert the first line lettering (a) from line by line to a consolidated list. The main issue is level 1 and level 3 can both be valid based on the value. The same for level 2 and 5. I thought about using the currentlevel versus the prior level but that doesn’t seem to work.

Raw Text Received:

(a) Each manual accessed in paper format must display the date of last revision on each page. Each manual accessed in electronic format must display the date of last revision in a manner in which a person can immediately ascertain it. Each manual required by § 121.133 must:
(1) Include instructions and information necessary to allow the personnel concerned to perform their duties and responsibilities with a high degree of safety; 
(2) Be in a form that is easy to revise and; 
(3) Not be contrary to any applicable Federal regulation and, in the case of a flag or supplemental operation, any applicable foreign regulation, or the certificate holder's operations specifications or operating certificate. 

(b) The manual may be in two or more separate parts, containing together all of the following information, but each part must contain that part of the information that is appropriate for each group of personnel: 
(1) General policies. 
(2) Duties and responsibilities of each crewmember, appropriate members of the ground organization, and management personnel. 
(3) Reference to appropriate Federal Aviation Regulations. 
(4) Flight dispatching and operational control, including procedures for coordinated dispatch or flight control or flight following procedures, as applicable. 
(5) En route flight, navigation, and communication procedures, including procedures for the dispatch or release or continuance of flight if any item of equipment required for the particular type of operation becomes inoperative or unserviceable en route. 
(6) For domestic or flag operations, appropriate information from the en route operations specifications, including for each approved route the types of airplanes authorized, the type of operation such as VFR, IFR, day, night, etc., and any other pertinent information. 
(7) For supplemental operations, appropriate information from the operations specifications, including the area of operations authorized, the types of airplanes authorized, the type of operation such as VFR, IFR, day, night, etc., and any other pertinent information. 
(8) Appropriate information from the airport operations specifications, including for each airport—
(i) Its location (domestic and flag operations only); 
(ii) Its designation (regular, alternate, provisional, etc.) (domestic and flag operations only); 
(iii) The types of airplanes authorized (domestic and flag operations only); 
(iv) Instrument approach procedures; 
(v) Landing and takeoff minimums; and 
(vi) Any other pertinent information. 
(9) Takeoff, en route, and landing weight limitations. 
(10) For ETOPS, airplane performance data to support all phases of these operations. 
(11) Procedures for familiarizing passengers with the use of emergency equipment, during flight. 
(12) Emergency equipment and procedures. 
(13) The method of designating succession of command of flight crewmembers. 
(14) Procedures for determining the usability of landing and takeoff areas, and for disseminating pertinent information thereon to operations personnel. 
(15) Procedures for operating in periods of ice, hail, thunderstorms, turbulence, or any potentially hazardous meteorological condition. 
(16) Each training program curriculum required by § 121.403.
(17) Instructions and procedures for maintenance, preventive maintenance, and servicing. 
(18) Time limitations, or standards for determining time limitations, for overhauls, inspections, and checks of airframes, engines, propellers, appliances and emergency equipment. 
(19) Procedures for refueling aircraft, eliminating fuel contamination, protection from fire (including electrostatic protection), and supervising and protecting passengers during refueling. 
(20) Airworthiness inspections, including instructions covering procedures, standards, responsibilities, and authority of inspection personnel. 
(21) Methods and procedures for maintaining the aircraft weight and center of gravity within approved limits. 
(22) Where applicable, pilot and dispatcher route and airport qualification procedures. 
(23) Accident notification procedures. 
(24) After February 15, 2008, for passenger flag operations and for those supplemental operations that are not all-cargo operations outside the 48 contiguous States and Alaska,
(i) For ETOPS greater than 180 minutes a specific passenger recovery plan for each ETOPS Alternate Airport used in those operations, and
(ii) For operations in the North Polar Area and South Polar Area a specific passenger recovery plan for each diversion airport used in those operations. 
(25)(i) Procedures and information, as described in paragraph (b)(25)(ii) of this section, to assist each crewmember and person performing or directly supervising the following job functions involving items for transport on an aircraft:
(A) Acceptance;
(B) Rejection;
(C) Handling;
(D) Storage incidental to transport;
(E) Packaging of company material; or
(F) Loading.
(ii) Ensure that the procedures and information described in this paragraph are sufficient to assist the person in identifying packages that are marked or labeled as containing hazardous materials or that show signs of containing undeclared hazardous materials. The procedures and information must include:
(A) Procedures for rejecting packages that do not conform to the Hazardous Materials Regulations in 49 CFR parts 171 through 180 or that appear to contain undeclared hazardous materials;
(B) Procedures for complying with the hazardous materials incident reporting requirements of 49 CFR 171.15 and 171.16 and discrepancy reporting requirements of 49 CFR 175.31
(C) The certificate holder's hazmat policies and whether the certificate holder is authorized to carry, or is prohibited from carrying, hazardous materials; and
(D) If the certificate holder's operations specifications permit the transport of hazardous materials, procedures and information to ensure the following:
(1) That packages containing hazardous materials are properly offered and accepted in compliance with 49 CFR parts 171 through 180;
(2) That packages containing hazardous materials are properly handled, stored, packaged, loaded, and carried on board an aircraft in compliance with 49 CFR parts 171 through 180;
(3) That the requirements for Notice to the Pilot in Command (49 CFR 175.33) are complied with; and
(4) That aircraft replacement parts, consumable materials or other items regulated by 49 CFR parts 171 through 180 are properly handled, packaged, and transported.
(26) Other information or instructions relating to safety. 
(c) Each certificate holder shall maintain at least one complete copy of the manual at its principal base of operations.

I convert this text to an array using explode on new lines. Here is the output of the explode:

Converted Array

Array
(
    [0] => (a) Each manual accessed in paper format must display the date of last revision on each page. Each manual accessed in electronic format must display the date of last revision in a manner in which a person can immediately ascertain it. Each manual required by § 121.133 must:
    [1] => (1) Include instructions and information necessary to allow the personnel concerned to perform their duties and responsibilities with a high degree of safety; 
    [2] => (2) Be in a form that is easy to revise and; 
    [3] => 
    [4] =>  
    [5] => (3) Not be contrary to any applicable Federal regulation and, in the case of a flag or supplemental operation, any applicable foreign regulation, or the certificate holder's operations specifications or operating certificate. 
    [6] => (b) The manual may be in two or more separate parts, containing together all of the following information, but each part must contain that part of the information that is appropriate for each group of personnel: 
    [7] => (1) General policies. 
    [8] => (2) Duties and responsibilities of each crewmember, appropriate members of the ground organization, and management personnel. 
    [9] => (3) Reference to appropriate Federal Aviation Regulations. 
    [10] => (4) Flight dispatching and operational control, including procedures for coordinated dispatch or flight control or flight following procedures, as applicable. 
    [11] => (5) En route flight, navigation, and communication procedures, including procedures for the dispatch or release or continuance of flight if any item of equipment required for the particular type of operation becomes inoperative or unserviceable en route. 
    [12] => (6) For domestic or flag operations, appropriate information from the en route operations specifications, including for each approved route the types of airplanes authorized, the type of operation such as VFR, IFR, day, night, etc., and any other pertinent information. 
    [13] => (7) For supplemental operations, appropriate information from the operations specifications, including the area of operations authorized, the types of airplanes authorized, the type of operation such as VFR, IFR, day, night, etc., and any other pertinent information. 
    [14] => (8) Appropriate information from the airport operations specifications, including for each airport—
    [15] => (i) Its location (domestic and flag operations only); 
    [16] => (ii) Its designation (regular, alternate, provisional, etc.) (domestic and flag operations only); 
    [17] => (iii) The types of airplanes authorized (domestic and flag operations only); 
    [18] => (iv) Instrument approach procedures; 
    [19] => (v) Landing and takeoff minimums; and 
    [20] => (vi) Any other pertinent information. 
    [21] => (9) Takeoff, en route, and landing weight limitations. 
    [22] => (10) For ETOPS, airplane performance data to support all phases of these operations. 
    [23] => (11) Procedures for familiarizing passengers with the use of emergency equipment, during flight. 
    [24] => (12) Emergency equipment and procedures. 
    [25] => (13) The method of designating succession of command of flight crewmembers. 
    [26] => (14) Procedures for determining the usability of landing and takeoff areas, and for disseminating pertinent information thereon to operations personnel. 
    [27] => (15) Procedures for operating in periods of ice, hail, thunderstorms, turbulence, or any potentially hazardous meteorological condition. 
    [28] => (16) Each training program curriculum required by § 121.403.
    [29] => (17) Instructions and procedures for maintenance, preventive maintenance, and servicing. 
    [30] => (18) Time limitations, or standards for determining time limitations, for overhauls, inspections, and checks of airframes, engines, propellers, appliances and emergency equipment. 
    [31] => (19) Procedures for refueling aircraft, eliminating fuel contamination, protection from fire (including electrostatic protection), and supervising and protecting passengers during refueling. 
    [32] => (20) Airworthiness inspections, including instructions covering procedures, standards, responsibilities, and authority of inspection personnel. 
    [33] => (21) Methods and procedures for maintaining the aircraft weight and center of gravity within approved limits. 
    [34] => (22) Where applicable, pilot and dispatcher route and airport qualification procedures. 
    [35] => (23) Accident notification procedures. 
    [36] => (24) After February 15, 2008, for passenger flag operations and for those supplemental operations that are not all-cargo operations outside the 48 contiguous States and Alaska,
    [37] => (i) For ETOPS greater than 180 minutes a specific passenger recovery plan for each ETOPS Alternate Airport used in those operations, and
    [38] => (ii) For operations in the North Polar Area and South Polar Area a specific passenger recovery plan for each diversion airport used in those operations. 
    [39] => (25)(i) Procedures and information, as described in paragraph (b)(25)(ii) of this section, to assist each crewmember and person performing or directly supervising the following job functions involving items for transport on an aircraft:
    [40] => (A) Acceptance;
    [41] => (B) Rejection;
    [42] => (C) Handling;
    [43] => (D) Storage incidental to transport;
    [44] => (E) Packaging of company material; or
    [45] => (F) Loading.
    [46] => (ii) Ensure that the procedures and information described in this paragraph are sufficient to assist the person in identifying packages that are marked or labeled as containing hazardous materials or that show signs of containing undeclared hazardous materials. The procedures and information must include:
    [47] => (A) Procedures for rejecting packages that do not conform to the Hazardous Materials Regulations in 49 CFR parts 171 through 180 or that appear to contain undeclared hazardous materials;
    [48] => (B) Procedures for complying with the hazardous materials incident reporting requirements of 49 CFR 171.15 and 171.16 and discrepancy reporting requirements of 49 CFR 175.31
    [49] => (C) The certificate holder's hazmat policies and whether the certificate holder is authorized to carry, or is prohibited from carrying, hazardous materials; and
    [50] => (D) If the certificate holder's operations specifications permit the transport of hazardous materials, procedures and information to ensure the following:
    [51] => (1) That packages containing hazardous materials are properly offered and accepted in compliance with 49 CFR parts 171 through 180;
    [52] => (2) That packages containing hazardous materials are properly handled, stored, packaged, loaded, and carried on board an aircraft in compliance with 49 CFR parts 171 through 180;
    [53] => (3) That the requirements for Notice to the Pilot in Command (49 CFR 175.33) are complied with; and
    [54] => (4) That aircraft replacement parts, consumable materials or other items regulated by 49 CFR parts 171 through 180 are properly handled, packaged, and transported.
    [55] => (26) Other information or instructions relating to safety. 
    [56] => (c) Each certificate holder shall maintain at least one complete copy of the manual at its principal base of operations.
)

The goal is to convert the first part of the text, such as (a) to the structured format below.

(a)
(a)(1)
(a)(2)
(a)(3)
(a)(3)(i)
(a)(3)(ii)
(a)(3)(ii)(A)
(a)(3)(ii)(B)
(a)(3)(ii)(B)(1)
(a)(3)(ii)(B)(2)
(a)(4)
(a)(5)

The five levels are as follows:

(a) = first level
(1) = second level
(i) = third level
(A) = fourth level
(1) = fifth level

This is the code I have created so far, but keep getting stuck because the first and third levels may be valid simultaneously. The same goes for level two and five.

Example Code

$level = 1;
$previous_level = 1;
$first_text = "";
$second_text = "";
$third_text = "";
$fourth_text = "";
$fifth_text = "";
$previous_first_text = "";
$previous_second_text = "";
$previous_third_text = "";
$previous_fourth_text = "";
$previous_fifth_text = "";

if (is_array($content_array) && count($content_array) > 0) {
    for ($c=0;$c<count($content_array);$c++) {
        
        $check = explode(" ",$content_array[$c]);
        
        if (isset($check[0]) && $check[0] !== "") {

            $value = trim($check[0]);

            if (str_starts_with($value, '(')) {

                if ($value === "(25)(i)") {
                    preg_match_all('#((.*?))#', $value, $match);
                    if (isset($match[0]) && count($match[0]) === 2) {
                        $second_text = $match[0][0];
                        $third_text = $match[0][1];
                    }
                } else {
                    $first = $this->first_level_check($value);
                    if ($first && $first_text !== $value) {
                        $previous_first_text = $first_text;
                        $first_text = $value;
                        $level = 1;
                    }

                    $second = $this->second_level_check($value);
                    if ($second && $second_text !== $value) {
                        $previous_second_text = $second_text;
                        $second_text = $value;
                        $level = 2;
                    }

                    $third = $this->third_level_check($value);
                    if ($third && $third_text !== $value) {
                        $previous_third_text = $third_text;
                        $third_text = $value;
                        $level = 3;
                    }

                    $fourth = $this->fourth_level_check($value);
                    if ($fourth && $first_text !== $value) {
                        $previous_fourth_text = $fourth_text;
                        $fourth_text = $value;
                        $level = 4;
                    }

                    $fifth = $this->fifth_level_check($value);
                    if ($fifth && $fifth_text !== $value) {
                        $previous_fifth_text = $fifth_text;
                        $fifth_text = $value;
                        $level = 5;
                    }

                }

                if ($level === 1) {
                    echo $first_text."n";
                } else if ($level === 2) {
                    echo $first_text.$second_text."n";
                } else if ($level === 3) {
                    echo $first_text.$second_text.$third_text."n";
                } else if ($level === 4) {
                    echo $first_text.$second_text.$third_text.$fourth_text."n";
                } else if ($level === 5) {
                    echo $first_text.$second_text.$third_text.$fourth_text.$fifth_text."n";
                }

                $previous_level = $level;

            }
        }
    }
}

public function first_level_check($value): int
    {
        $return = 0;
        $data = array(
            '(a)','(b)','(c)','(d)','(e)','(f)','(g)','(h)','(i)','(j)','(k)','(l)','(m)','(n)','(o)','(p)','(q)','(r)','(s)','(t)','(u)','(v)','(w)','(x)','(y)','(z)'
        );
        if (in_array($value,$data)) {
            $return = 1;
        }
        return $return;
    }

    public function second_level_check($value): int
    {
        $return = 0;
        preg_match('#((.*?))#', $value, $match);
        if (is_array($match) && count($match) === 2) {
            if (is_numeric($match[1])) {
                $return = 1;
            }
        }
        return $return;
    }

    public function third_level_check($value): int
    {
        $return = 0;
        $re = "~L?(?:X{0,3}(?:IX|IV|V|V?I{1,3})|IX|X{1,3})|XL|L~m";
        preg_match($re, strtoupper($value), $matches);
        if (is_array($matches) && count($matches) === 1) {
            $return = 1;
        }
        return $return;
    }

    public function fourth_level_check($value): int
    {
        $return = 0;
        $data = array(
            '(A)','(B)','(C)','(D)','(E)','(F)','(G)','(H)','(I)','(J)','(K)','(L)','(M)','(N)','(O)','(P)','(Q)','(R)','(S)','(T)','(U)','(V)','(W)','(X)','(Y)','(Z)'
        );
        if (in_array($value,$data)) {
            $return = 1;
        }
        return $return;
    }

    public function fifth_level_check($value): int
    {
        $return = 0;
        preg_match('#((.*?))#', $value, $match);
        if (is_array($match) && count($match) === 2) {
            if (is_numeric($match[1])) {
                $return = 1;
            }
        }
        return $return;
    }

I’m starting to think this is impossible based on the data values and how it’s originally formatted. This was also part of a prior post, but have been able to flush out some more details to help describe the issue in detail. Let me know your thoughts.

2

Answers


  1. the first and third levels may be valid simultaneously

    Yes. In the absence of additional grammar rules this is unsolvable

    Login or Signup to reply.
  2. You can reduce (but not eliminate) the ambiguity by tracking both the current level and a stack with the next expected value at each higher level. Then make use of two assumptions:

    • Levels are never skipped, e.g. a level 2 prefix can never be followed by a level 4 prefix. It can either be followed by a level 3 prefix (starting a sub-list) or a level 1 or 0 prefix (ending current sub-lists and continuing at a higher level).
    • Levels always follow an exact sequence, e.g. 'i' is valid for level 2 only when it follows 'h'. Rather than matching each prefix style, you need a function to generate the next value in each sequence.

    For instance, the starting state might look like this:

    $level = 0;
    $first = ['a', 1, 'i', 'A', 1];
    $next = ['a'];
    

    And a later state might look like this:

    $level = 4;
    $first = ['a', 1, 'i', 'A', 1];
    $next = ['b', 4, 'iii', 'C', 3];
    

    Then each time you encounter a prefix, check the following in order:

    1. If the prefix matches $next[$level], then it is on the current level. Set $next[$level] to the next value according to that level’s sequence.
    2. Else, if the prefix matches $first[$level + 1], descend one level: increment $level, and append to the $next stack the next value at that level. For instance, if $level is 1, and you encounter 'i', set $level=2; $next[2]='ii';
    3. Else, go up one level and test again: unset $next[$level]; and decrement $level; then if the prefix matches $next[$level], use that level and update $next as with step 1.
    4. If no match, repeat step 3. If you reach level -1, abort with an error: the input is malformed or ambiguous.

    Note that in the example in the question, the (26) on line 55 is no longer ambiguous: $level will be 4, and $next[4] will be '5', which doesn’t match; however, $next[1] will have been set to '26' on line 39, so we will pop the stack until we find that match.

    Further ambiguities could be eliminated by some form of lookahead or backtracking, e.g. if (a)(1)(iii)(A)(1) is followed by (2) then (i), the presented algorithm will analyse it as (a)(1)(iii)(A)(1) -> (a)(1)(iii)(A)(2) -> ERROR, when the valid interpretation would be (a)(1)(iii)(A)(1) -> (a)(2) -> (a)(2)(i).

    However, some inputs are genuinely ambiguous, e.g. if (a)(1)(i)(A)(1) is followed by (2) then (b), it could represent (a)(1)(iii)(A)(1) -> (a)(2) -> (b) or (a)(1)(iii)(A)(1) -> (a)(1)(i)(A)(2) -> (b). The algorithm proposed here will choose (a)(1)(i)(A)(2) but there is no way to know if that is the author’s intent for a particular input. You could detect such cases by checking all possible levels and aborting if there were multiple matches, but it requires a more complex algorithm to manage the state of the stack.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search