I have a directory containing tens of thousands of csv files (that used to be HTML files but I wrote a different vba script that changed the extension to csv), that I want placed into one Main xls file (called Main Import File VBA). Given the huge number of csv files that I need to loop over, I want to do it as efficiently as possible (lowest amount of time necessary. If I can get my code a few percentages faster, that would already make a huge difference.
The csv files are all similar. Basically HTML files that now have a csv extension. All the data is in one column (although length of rows can differ). So I need to perform a similar task in that I only need to copy column A from every csv file.
I have written a code that opens each csv file and copies the content of column A and then (transpose) pastes it into the main import file (myMasterFile). It then closes the csv file and loops to the next file.
Sub PutInMasterFile()
Dim wb As Workbook
Dim myMasterFile As String
Dim masterWB As Workbook
Dim rowNum As Integer
Dim copyRange As Range
Dim pasteRange As Range
Dim myPath As String
Dim myFile As String
Dim FirstAddress As String
Dim x As Variant
Dim C As Variant
Application.ScreenUpdating = FALSE
Application.EnableEvents = FALSE
Application.Calculation = xlCalculationManual
Application.DisplayAlerts = FALSE
x = 2
myMasterFile = "C:S_AVBA TestMain Import File VBA.xlsm"
'Workbooks("Main Import File VBA").Activate
Set pasteRange = ActiveWorkbook.Sheets(1).Range("A" & x)
myPath = "C:S_AVBA TestOutput"
myFile = Dir(myPath & "*.csv")
Start = Timer
Do While myFile <> vbNullString
'csv file opened here
Workbooks.Open Filename:=myPath & myFile
'selection made for copying the data
With Workbooks(myFile).Sheets(1)
Range("A1").Select
Range(Selection, Selection.End(xlDown)).Select
Selection.Copy
pasteRange.PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:= _
False, Transpose:=True
'Close the csv file
Workbooks(myFile).Close
myFile = Dir()
x = x + 1
Set pasteRange = ActiveWorkbook.Sheets(1).Range("A" & x)
End With
Loop
Finish = Timer
TotalTime = Finish - Start
MsgBox "Total time is " & TotalTime & " seconds!"
End Sub
How can I optimize this script? I think using Select and Copy (for the csv files) is inefficient, but I’m not sure I can use the optimized manner since I want to transpose paste it into myMasterFile.
Also I wonder whether it’s really necessary to open (and close) each csv file. Is there a way to avoid this and get the content using a better manner?
EDIT:
Response to the remarks/questions from Taller and Tim Williams. I have tested the code for 25 csv files. It takes about 9.8 seconds to finish. When I leave out the copy/transpose paste part it takes about 6.5 seconds. So I guess considerably faster. I should make a change there right? (avoiding Select and Selection Copy).
As for how the csv files looks: hope this helps (it’s about 1600 rows long (but it can differ per file).
I have no experience with Powerquery/Parsing HTML, or the suggestion that you posted @Tim Williams (array flipping)
enter image description here
2
Answers
I’d be interested if VBA is faster then just plain powerquery
Parsing the HTML is faster than opening the files in Excel.
Approx 12sec for 100 files.