Let’s say I have some text:
Lorem ipsum dolor sit amet, consectetur adipiscing elit,n
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.n
Ut enim ad minim veniam, quis nostrud exercitation ullamco laborisn
nisi ut aliquip ex ea commodo consequat.n
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum doloren
eu fugiat nulla pariatur.n
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui offician
deserunt mollit anim id est laborum.n
What is the most efficient way to cut it into chunks of x bytes, where the cut can only happen at the carriage return?
Two methods come to mind:
-
split the text into lines, add lines to a buffer until the buffer is full, roll back the last line that caused the overflow, and repeat.
-
find the offset in the text at the buffer length and walk back to the previous carriage return, with proper handling of the beginning and ending of the text
I couldn’t find a solution online, but I can’t believe that this problem hasn’t already been solved many times, and there may be a common implementation of this.
Edit:
more information about my use case:
The code is for a Telegram bot which is used as a communication tool with an internal system.
Telegram allows up to 4kb per message and throttles the number of calls.
Right now I collect all messages, put them in a concurrent queue and then a tasks flushes the queue every second.
Messages can be a single line, can be a collection of lines and can sometimes be larger than 4kb.
I take all the messages (some being multiple lines in one block), aggregate them into a single string, then split the string by carriage return and then I can compose blocks of up to 4kb.
One additional problem I haven’t tackled yet, but that’s for later, is that Telegram will reject incomplete markup, so I will also need to cut the text based on that at some point.
2
Answers
Not very efficient, and also laboring under the assumptions
to a single newline;
then, an implementation along the lines of your first approach is both functional and straightforward. Just split into lines and combine them unless their combined length exceeds the threshold.
Most common stream related tasks are already implemented very efficiently in the BCL.
It’s probably a good idea to stick with tried-and-tested
Stream
classes.You can just flush the queue, writing to the same
MemoryStream
. And callreadBlock
to keep getting new blocks of at-most your specified size.