skip to Main Content

I need help in working with big Excel Tables.

Description


I have an export of Data from our ERP System that has 400K Rows at least.
In this report the format is quite messed up and I want to write a script that will clean up all the data.

I started to write little sub just to delete empty rows and such that have a special behavior.
please see below:

Sub Main()
OptimizeVBA (True)
DeleteLastRows
OptimizeVBA (False)
End Sub

Sub DeleteLastRows()
'Achtung, diese Funktion dauert sehr lange
Dim total
total = ActiveSheet.UsedRange.Rows.Count
Dim Tim1 As Single
Tim1 = Timer
For i = total To total - 100 Step -1
    If ThatSpecialLine("0", i, 1, 9) Then
        'DeleteRow (i)
        Rows(i).EntireRow.Delete
        ElseIf EmptyRow(i, 1, 13) Then
        'DeleteRow (i)
        Rows(i).EntireRow.Delete
    End If
Next
Tim1 = Timer - Tim1
MsgBox ("Anzahl der Zeilen nach der Bearbeitung: " & ActiveSheet.UsedRange.Rows.Count & vbNewLine & "Dafür wurde: " & Tim1 & " gebraucht")

End Sub

Function EmptyRow(ByVal Row As Long, ByVal startc As Integer, ByVal EndC As Integer) As Boolean
EmptyRow = True
Dim temp As String
    For i = startc To EndC
     temp = Cells(Row, i).Value
     temp = Trim(temp)
        If temp <> "" Then
            EmptyRow = False
            Exit Function
        End If
    Next
End Function

Function ThatSpecialLine(ByVal val As String, ByVal Row As Long, ByVal startc As Integer, ByVal EndC As Integer) As Boolean
ThatSpecialLine = False
If EmptyRow(Row, startc, EndC) Then
    If Cells(Row, EndC + 1).Value = val Then
        ThatSpecialLine = True
    End If
End If
End Function

Sub OptimizeVBA(isOn As Boolean)
    Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic)
    Application.EnableEvents = Not (isOn)
    Application.ScreenUpdating = Not (isOn)
    ActiveSheet.DisplayPageBreaks = Not (isOn)
End Sub

This code needs about 14 seconds to execute for just 100 lines.
I am wondering why the performance is so bad.
I have no experience in making an application performance optimizing so please be kind if my question is very stupid :).

Questions


  1. Would it be better / faster to export this .xlsx file to .txt file and process with a programm i write in Visual studio with vb.net or C#? this would be my next idea.
  2. How to improve my vba code?

Would it be better / faster to export this .xlsx file to .txt file and process with a programm i write in Visual studio with vb.net or C#? this would be my next idea.

Thanks in advance

2

Answers


  1. Chosen as BEST ANSWER

    The Solution was to make a quick program in VB.Net with just reading the needed Lines. Also i made some improvements to the code.

    The following code needs just 1 sec to read the File in the List(of string()) and to write it back again to .csv

    I don't think i will use vba again for big data. Feel free to change my mind.

    Imports System.IO
    Imports System.IO.File
    Imports System.Text
    
    Public Class Form1
    
        Public Datas As New List(Of String())
    
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            FileToList()
            DataToFile()
        End Sub
    
    
        Sub FileToList()
            Using sr As StreamReader = New StreamReader("Bestand 31.10.2022.CSV", Encoding.Default)
                Dim Time As DateTime = Now
                Dim span As TimeSpan
                Do Until sr.Peek() = -1
                    Dim s As String = sr.ReadLine()
                    Dim a() As String = s.Split(";")
                    If Not EmptyRow(a) Then
                        Datas.Add(a)
                    End If
                Loop
                span = Now - Time
    
                Dim i As Long = Datas.Count
                MessageBox.Show(String.Format("Es sind: {0} Zeilen vorhanden in der Liste" & vbCrLf &
                                              "Dies benötigte: {1}s", i, span.TotalSeconds))
            End Using
    
        End Sub
    
        Sub DataToFile()
            Dim Time As DateTime = Now
            Dim span As TimeSpan
            Using fs As FileStream = New FileStream("Test_" & DateTime.Now.ToShortDateString & ".csv", FileMode.Create)
                Using sw As StreamWriter = New StreamWriter(fs, Encoding.Default)
                    For i = 0 To Datas.Count - 1
                        sw.WriteLine(Join(Datas(i), ";"))
                    Next
                End Using
            End Using
    
            span = Now - Time
            MessageBox.Show(String.Format("Das Erstellen der neuen Datei hat: {0}s gedauert", span.TotalSeconds))
        End Sub
    
        Function EmptyRow(ByVal Array As String()) As Boolean
    
            For i = 0 To Array.Count - 1
                If Array(i) <> "" Then
                    Return False
                End If
                Return True
            Next
    
        End Function
    
    
    End Class
    

  2. There are 2 things in your code that makes the execution slow.

    The first thing has something to do with Excel vs VBA. Every time your VBA code needs something from Excel, it has to call an internal interface and that is rather slow. You can’t measure this when you have a sheet with a few rows/columns, but in a sheet with 400k rows and (at least) 13 columns of data, you have 5 millions cells, and your code reads most of them 2 times. This can be speed up by reading large chunks of data into an array. This is only one read and for that 5 million cells it’s a matter of maybe a second.

    The second thing is pure Excel: Deleting a row of data from a worksheet is painfully slow, even if you switch off recalculation and screen update. That means, you should decrease the number of deletes by "collecting" rows to be deleted into a Range variable and then delete them all at once. However, the number of rows collected shouldn’t bee too high. I experimented a little bit and 1000 seemed to be reasonable.

    Sub DeleteLastRows()
        Const DeleteChunkSize = 1000
        
        Dim lastRow As Long
        With ThisWorkbook.Sheets(1)
            lastRow = .Cells(.Rows.Count, "A").End(xlUp).row
        
            ' Read all Data into Memory
            Dim AllData As Variant
            AllData = .Range(.Cells(1, 1), .Cells(lastRow, 13))
            Debug.Print "data read"
            Dim row As Long
            For row = lastRow To 2 Step -1
                If row Mod 100 = 0 Then DoEvents
                Dim deleteRange As Range, deleteCount As Long
                
                Dim toBeDeleted As Boolean
                toBeDeleted = ThatSpecialLine(AllData, "0", row, 1, 9) Or EmptyRow(AllData, row, 1, 13)
                If toBeDeleted Then
                    deleteCount = deleteCount + 1
                    If deleteRange Is Nothing Then
                        Set deleteRange = .Cells(row, 1).EntireRow
                    Else
                        Set deleteRange = Union(deleteRange, .Cells(row, 1).EntireRow)
                    End If
    
                    ' Delete only if a certain number of rows to be deleted is reached to speed up runtime
                    If deleteCount >= DeleteChunkSize Then
                        DoEvents
                        deleteRange.Delete xlUp
                        Set deleteRange = Nothing
                        deleteCount = 0
                    End If
                End If
            
            Next row
        End With
        ' delete the last chunk of data if any
        If Not deleteRange Is Nothing Then
            deleteRange.Delete xlUp
        End If
    
    End Sub
    

    I adapted your helper routine so that they work on the array of data which is passed as argument:

    Function EmptyRow(data As Variant, row As Long, startc As Long, EndC As Long) As Boolean
        EmptyRow = True
        Dim temp As String
        
        Dim i As Long
        For i = startc To EndC
            temp = Trim(data(row, i))
            If temp <> "" Then
                EmptyRow = False
                Exit Function
            End If
        Next
    End Function
    
    Function ThatSpecialLine(data As Variant, val As String, row As Long, startc As Long, EndC As Long) As Boolean
        If Not EmptyRow(data, row, startc, EndC) Then Exit Function
        ThatSpecialLine = (data(row, EndC + 1) = val)
    End Function
    

    That code took more or less 1s for 1000 rows that where to be deleted – my example sheet had approx 30% of such rows. That would lead to a runtime in the range of few minutes.

    But there is a much faster attempt, assuming that you are only interested in the data, not in formatting. Instead of deleting rows in the Excel sheet, copy the data you want to keep in a second array. When done, delete all data of your sheet and write the copied data back to Excel. This took maybe 2 or 3 seconds in my example sheet with > 800k rows:

    Sub CopyRelevantData()
        Dim lastRow As Long
        With ThisWorkbook.Sheets(1)
            lastRow = .Cells(.Rows.Count, "A").End(xlUp).row
        
            ' Read all Data into Memory
            Dim AllData As Variant, newData As Variant
            AllData = .Range(.Cells(1, 1), .Cells(lastRow, 13))
            ' Create a second array where you copy the data you want to keep
            ReDim newData(LBound(AllData, 1) To UBound(AllData, 1), LBound(AllData, 2) To UBound(AllData, 2))
            Debug.Print "data read"
            Dim row As Long, newRow As Long
            For row = 1 To lastRow
                Dim toBeDeleted As Boolean
                toBeDeleted = ThatSpecialLine(AllData, "0", row, 1, 9) Or EmptyRow(AllData, row, 1, 13)
                If Not toBeDeleted Then
                    ' Copy this row of data
                    newRow = newRow + 1
                    Dim col As Long
                    For col = LBound(AllData, 2) To UBound(AllData, 2)
                        newData(newRow, col) = AllData(row, col)
                    Next col
                End If
                If row Mod 100 = 0 Then DoEvents
            Next row
            
            .UsedRange.Clear
            .Cells(1, 1).Resize(UBound(AllData, 1), UBound(AllData, 2)) = newData
        End With
    End Sub
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search