skip to Main Content

I have very unfamiliar csv file where lines like this:

"31 lip 2021,""Inna opłata"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""-1,29"",""EUR"",""2 sie 2021"",""111"",""mBank *7981"",""Środki zostały wysłane"",""--"",""111"",""--"",--,""--"",""--"",""--"",""--"",""--"",""--"",""0%"",""--"",""--"",""--"",""--"",""--"",""-5,7"",""PLN"",""4,43151"",""FEE-111"",""Opłata za nazwę pomocniczą przedmiotu """

I’ve used GenericParserAdapter but result not happy:
Result (ItemArray):

        [0] "31 lip 2021"   object {string}
        [1] ""Inna opłata""   object {string}
        [2] ""--""    object {string}
        [3] ""--""    object {string}
        [4] ""--""    object {string}
        [5] ""--""    object {string}
        [6] ""--""    object {string}
        [7] ""--""    object {string}
        [8] ""--""    object {string}
        [9] ""--""    object {string}
        [10]    """-1"    object {string}
        [11]    "29"""    object {string}
        [12]    ""EUR""   object {string}
        [13]    ""2 sie 2021""    object {string}
        [14]    ""111""   object {string}
        [15]    ""mBank *7981""   object {string}
        [16]    ""Środki zostały wysłane""    object {string}
        [17]    ""--""    object {string}
        [18]    ""111""   object {string}
        [19]    ""--""    object {string}
        [20]    "--"    object {string}
        [21]    ""--""    object {string}
        [22]    ""--""    object {string}
        [23]    ""--""    object {string}
        [24]    ""--""    object {string}
        [25]    ""--""    object {string}
        [26]    ""--""    object {string}
        [27]    ""0%""    object {string}
        [28]    ""--""    object {string}
        [29]    ""--""    object {string}
        [30]    ""--""    object {string}
        [31]    ""--""    object {string}
        [32]    ""--""    object {string}
        [33]    """-5"    object {string}
        [34]    "7""" object {string}
        [35]    ""PLN""   object {string}
        [36]    """4" object {string}
        [37]    "43151""" object {string}
        [38]    ""FEE-111""   object {string}
        [39]    """Opłata za nazwę pomocniczą przedmiotu "    object {string}

Column 10 and 11 are split (36, 37 too) , but this is one value and cannot be split.
How to properly configure parser (or split idea) and resolve this issue? Any solution?

2

Answers


  1. Chosen as BEST ANSWER

    Finally i resolve this problem like this:

     var kodowanie = sciezkaPliku.GetEncoding();
                var plik = new StringBuilder();
                var linie = File.ReadAllLines(sciezkaPliku, kodowanie);
                for (int i = 0; i < File.ReadAllLines(sciezkaPliku, kodowanie).Length; i++)
                {
                    plik.AppendLine(linie[i]
                        .Trim('"')
                        .Replace(",""", ";")
                        .Replace(""",", ";")
                        .Replace("""", ""));
                }
                sciezkaPliku = $"{sciezkaPliku}_parsed";
                if (File.Exists(sciezkaPliku))
                {
                    File.Delete(sciezkaPliku);
                }
                File.AppendAllText(sciezkaPliku, plik.ToString(), kodowanie);
                using (var parser = new GenericParserAdapter(sciezkaPliku, sciezkaPliku.GetEncoding()))
                {
                    parser.FirstRowHasHeader = true;
                    parser.ColumnDelimiter = ';';
                    var pozycje = parser.GetDataTable();
    
                    foreach (var item in pozycje.Rows)
                    {
    //ToDo
                    }
                }
    

  2. "31 lip 2021,""Inna opłata"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""-1,29"",""EUR"",""2 sie 2021"",""111"",""mBank *7981"",""Środki zostały wysłane"",""--"",""111"",""--"",--,""--"",""--"",""--"",""--"",""--"",""--"",""0%"",""--"",""--"",""--"",""--"",""--"",""-5,7"",""PLN"",""4,43151"",""FEE-111"",""Opłata za nazwę pomocniczą przedmiotu """
    

    Somehow the full row is converted to a single field, and all double quotes are escaped with another double quote.

    The row should look like this instead (which parses fine):

    31 lip 2021,"Inna oplata","--","--","--","--","--","--","--","--","-1,29","EUR","2 sie 2021","111","mBank *7981","Srodki zostaly wyslane","--","111","--",--,"--","--","--","--","--","--","0%","--","--","--","--","--","-5,7","PLN","4,43151","FEE-111","Oplata za nazwe pomocnicza przedmiotu "
    

    One solution might be to parse the data twice. First to convert to the original row, then to parse the data.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search