skip to Main Content

@MarkPflug I have a requirement to read 12 columns out of 45 – 85 total columns. This is from multiple csv files (in the hundreds). But here is the problem, a lot of the times a column or two will be missing from some csv data files. How do I check in C# for a missing column in a csv file given I use the nuget package sylvan csv reader. Here is some code:

// Create a reader
CsvDataReader reader = CsvDataReader.Create(file, new CsvDataReaderOptions { ResultSetMode = ResultSetMode.MultiResult });
// Get column by name from csv. This is where the error occurs only in the files that have missing columns. I store these and then use them in a GetString(Ordinal).
reader.GetOrdinal("HomeTeam");
reader.GetOrdinal("AwayTeam");
reader.GetOrdinal("Referee");
reader.GetOrdinal("FTHG");
reader.GetOrdinal("FTAG");
reader.GetOrdinal("Division");
// There is more data here, but anyway you get the point.
// Here I run the reader and for each piece of data I run my database write method.
    while (await reader.ReadAsync())
    {
        await AddEntry(idCounter.ToString(), idCounter.ToString(), attendance, referee, division, date, home_team, away_team, fthg, ftag, hthg, htag, ftr, htr);
    }

I tried the following:

// This still causes it to go out of bounds.
if(reader.GetOrdinal("Division") < reader.FieldCount)
    // only if the ordinal exists then assign it in a temp variable
else
    // skip this column (set the data in add entry method to "")

2

Answers


  1. Looking at the source, it appears that GetOrdinal throws if the column name isn’t found or is ambiguous. As such I expect you could do:

    int blah1Ord = -1;
    try{ blah1Ord = reader.GetOrdinal("blah1"); } catch { }
    
    int blah2Ord = -1;
    try{ blah2Ord = reader.GetOrdinal("blah2"); } catch { }
    
    while (await reader.ReadAsync())
    {
        var x = new Whatever();
        if(blah1Ord > -1) x.Blah1 = reader.GetString(blah1Ord);
        if(blah2Ord > -1) x.Blah2 = reader.GetString(blah2Ord);
    }
    

    And so on, so you effectively sound out whether a column exists – the ordinal remains -1 if it doesn’t – and then use that to decide whether to read the column or not

    Incidentally, I’ve been dealing with CSVs with poor/misspelled/partial header names, and I’ve found myself getting the column schema and searching it for partials, like:

    using var cdr = CsvDataReader.Create(sr);
    
    var cs = await cdr.GetColumnSchemaAsync();
    var sc = StringComparison.OrdinalIgnoreCase;
    var blah1Ord = cs.FirstOrDefault(c => c.ColumnName.Contains("blah1", sc))?.ColumnOrdinal ?? -1;
    
    Login or Signup to reply.
  2. I started using the Sylvan library and it is really powerful.
    Not sure if this could help you but if you use the DataBinder.Create<T> generic method from an entity, you can do the following to get columns in your CSV file that do not map to any of the entity properties:

    var dataBinderOptions = new DataBinderOptions()
    {
        // AllColumns is required to throw UnboundMemberException
        BindingMode = DataBindingMode.AllColumns,
    };
    
    IDataBinder<TEntity> binder;
    
    try
    {
        binder = DataBinder.Create<TEntity>(dataReader, dataBinderOptions);
    }
    catch (UnboundMemberException ex)
    {
        // Use ex.UnboundColumns to get unmapped columnns
        readResult.ValidationProblems.Add($"Unmapped columns: {String.Join(", ", ex.UnboundColumns)}");
        return;
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search