How to remove all lines and borders in an image while keeping text programmatically? - Photoshop

wind
November 27, 2015
151 views
7 votes
7 Answers

I’m trying to extract text from an image using Tesseract OCR.
Currently, with this original input image, the output has very poor quality (about 50%). But when I try to remove all lines and borders using photoshop, the output improves a lot (~90%). Is there any way to remove all lines and borders in an image (keeping text) programmatically using OpenCV, Imagemagick,.. or some other technique?

Original Image:

Expected Image:

Answers

- Kanat
- November 27, 2015 at 9:07 am
- 0 votes
0
I have got an idea. But it will work only if you have absolutely horizontal and vertical lines. You can do binarization on this image first (If it is not already). Then write some code which iterates through each row of the image at the same time checking if there is sequence of black pixels containing more than some threshold. For example if there is a continuous sequence of black points in some row starting from 100th pixel to 150th pixel then make these pixels white. After finding all horizontal lines you can do the same to get rid of vertical lines.

Here in my example I consider that black pixel sequence starts exactly from 100th pixel and ends in 150th because if there is another black pixel in 151st pixel then I have to add that pixel too. In other words, try to find the lines fully.

If you solve this question, please let me know)

Login or Signup to reply.

- MarkSetchell
- November 28, 2015 at 2:05 pm
- 0 votes
0
Not using OpenCV, but just a one-liner of ImageMagick in the Terminal, but it may give you an idea how to do it in OpenCV. ImageMagick is installed on most Linux distros and is available for OSX and Windows.

The crux of the concept is to create a new image where each pixel is set to the median of the 100 neighbouring pixels to its left and the 100 neighbouring pixels to its right. That way, pixels that have lots of horizontal neighbours that are black (i.e. horizontal black lines) will be white in the output image. Then the same processing is applied in the vertical direction to remove vertical lines.

The command that you type into the Terminal will be:
```
convert input.png                                                 
   ( -clone 0 -threshold 50% -negate -statistic median 200x1 )  
   -compose lighten -composite                                    
   ( -clone 0 -threshold 50% -negate -statistic median 1x200 )  
   -composite result.png
```
The first line says to load your original image.

The second line starts some “aside-processing” that copies the original image, thresholds it and inverts it, then the median of all neighbouring pixels 100 either side is calculated.

The third line then takes the result of the second line and composites it over the original image, choosing the lighter of the pixels at each location – i.e. the ones that my horizontal line mask has whitened out.

The next two lines do the same thing again but oriented vertically for vertical lines.

Result is like this:

If I difference that with your original image, like this, I can see what it did:
```
convert input.png result.png -compose difference -composite diff.png
```
I guess, if you wanted to remove a bit more of the lines, you could actually blur the difference image a little and apply that to the original. Of course, you can play with the filter lengths and the thresholds and stuff too.
Login or Signup to reply.

- delkant
- March 31, 2016 at 3:53 am
- 0 votes
0
What you need is Leptonica and Lept4j.

There is a example on how to accomplish this in the source code of the project, in the tests here: LineRemovalTest.java

Input:

output:

Login or Signup to reply.

You can use an edge detection algorithm from Sobel/Laplacian/Canny and use Hough’s transform to identify the Lines in OpenCV and color them white to remove the Lines:

laplacian = cv2.Laplacian(img,cv2.CV_8UC1) # Laplacian OR
edges = cv2.Canny(img,80,10,apertureSize = 3) # canny Edge OR
# Output dtype = cv2.CV_8U # Sobel
sobelx8u = cv2.Sobel(img,cv2.CV_8U,1,0,ksize=5)
# Output dtype = cv2.CV_64F. Then take its absolute and convert to cv2.CV_8U
sobelx64f = cv2.Sobel(img,cv2.CV_64F,1,0,ksize=5)
abs_sobel64f = np.absolute(sobelx64f)
sobel_8u = np.uint8(abs_sobel64f)

# Hough's Probabilistic Line Transform 
minLineLength = 900
maxLineGap = 100
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)
for line in lines:
    for x1,y1,x2,y2 in line:
        cv2.line(img,(x1,y1),(x2,y2),(255,255,255),2)

cv2.imwrite('houghlines.jpg',img)

- luizv
- October 2, 2017 at 11:08 pm
- 0 votes
0
There is a better way to do this with ImageMagick.

Identifying the line shape and removing it

ImageMagick has a neat feature, called Morphology of Shapes. You can use it to identify shapes like table lines and remove them.

One Liner
```
convert in.png                              
-type Grayscale                             
-negate                                     
-define morphology:compose=darken           
-morphology Thinning 'Rectangle:1x80+0+0<'  
-negate                                     
out.png
```
Explanation
- convert in.png : load the picture.
- -type Grayscale: make sure ImageMagick knows it’s a grayscale image.
- -negate: invert image color layers (already properly adjusted by setting up grayscale). Lines and characters will be white and background black.
- -define morphology:compose=darken: define that areas identified by morphology will be darkened.
- -morphology Thinning ‘Rectangle:1×80+0+0<‘ define a 1px by 80px rectangle kernel that will be used for identify the line shapes. Only if this kernel fits inside a white shape (remember we negate colors) this big or bigger, it will be darkened. The < flag allows it to rotate.
- -negate: Invert colors a second time. Now characters will be black again, and background will be white.
- out.png: The output file to be generated.
Resulting Image

After applying
```
convert in.png -type Grayscale -negate -define morphology:compose=darken -morphology Thinning 'Rectangle:1x80+0+0<' -negate out.png
```
this was the output image:

Observations
- You should choose a rectangle kernel size bigger than your bigger character size, to make sure the rectangle doesn’t fit inside a character.
- Some small dotted lines and small table cell divisions still remain, but it’s because they’re smaller than 80 pixels.
- The merits of this technique are that it preserves the characters better than median pixel color difference approach proposed here by other user, and despite the little clutter, it still have a really better result removing the table lines.
Login or Signup to reply.

Faced the same problem. And I feel a more logical solution could be (Reference : Extract Table Borders)

//assuming, b_w is the binary image
inv = 255 - b_w    
horizontal_img = new_img
vertical_img = new_img

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (100,1))
horizontal_img = cv2.erode(horizontal_img, kernel, iterations=1)
horizontal_img = cv2.dilate(horizontal_img, kernel, iterations=1)


kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,100))
vertical_img = cv2.erode(vertical_img, kernel, iterations=1)
vertical_img = cv2.dilate(vertical_img, kernel, iterations=1)

mask_img = horizontal_img + vertical_img
no_border = np.bitwise_or(b_w, mask_img)

- nathancy
- August 15, 2019 at 3:52 am
- 0 votes
0
Since no one has posted a complete OpenCV solution, here’s a simple approach
1. Obtain binary image. Load the image, convert to grayscale, and Otsu’s threshold
2. Remove horizontal lines. We create a horizontal shaped kernel
  with cv2.getStructuringElement()
  then find contours
  and remove the lines with cv2.drawContours()
3. Remove vertical lines. We do the same operation but with a vertical shaped kernel
Load image, convert to grayscale, then Otsu’s threshold to obtain a binary image
```
image = cv2.imread('1.png')
result = image.copy()
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
```
Now we create a horizontal kernel to detect horizontal lines with cv2.getStructuringElement() and find contours with cv2.findContours()
.To remove the horizontal lines, we use cv2.drawContours()
and fill in each horizontal contour with white. This effectively "erases" the horizontal line. Here’s the detected horizontal lines in green
```
# Remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (40,1))
remove_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(remove_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(result, [c], -1, (255,255,255), 5)
```
Similarly we create a vertical kernel to remove the vertical lines, find contours, and fill each vertical contour with white. Here’s the detected vertical lines highlighted in green
```
# Remove vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,40))
remove_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(remove_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(result, [c], -1, (255,255,255), 5)
```
After filling in both horizontal and vertical lines with white, here’s our result

Note: Depending on the image, you may have to modify the kernel size. For instance to capture longer horizontal lines, it may be necessary to increase the horizontal kernel from (40, 1) to say (80, 1). If you wanted to detect thicker horizontal lines, then you could increase the width of the kernel to say (80, 2). In addition, you could increase the number of iterations when performing cv2.morphologyEx(). Similarly, you could modify the vertical kernels to detect more or less vertical lines. There is a trade-off when increasing or decreasing the kernel size as you may capture more or less of the lines. Again, it all varies depending on the input image

Full code for completeness
```
import cv2

image = cv2.imread('1.png')
result = image.copy()
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (40,1))
remove_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(remove_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(result, [c], -1, (255,255,255), 5)

# Remove vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,40))
remove_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(remove_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(result, [c], -1, (255,255,255), 5)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.imwrite('result.png', result)
cv2.waitKey()
```
Login or Signup to reply.