skip to Main Content

Image

Hello friends,
I have a hard time to ocr the above image due to the gaps that were made due to line removal.So could anyone kindly guide me on how to fill the gaps in chinese character using imagemagick

2

Answers


  1. If I understand this correctly then you want to find a way of removing the white lines and then still get it to go through an OCR?

    The best way would be by eye and connect the dots so to speak so the last pixel of the characters line up.

    A programitcal way would be to remove the white line ad then duplicate the line above (or below) and shift it into place.

    ocr image with gaps filled by hand

    康 家 月 而 视 , 喝 道
    " 你 想 做 什 么 !"
    秦 微 微 一 笑 , 轻 声 道
    不 知 道 看 着 些 亲 死 眼 前 ,
    前 辈 会 不 会 有 痛 的 感 觉 。"
    说 , 伸 手 一 指 , 一 位 少 妇
    身 形 一 顿 , 小 出 现 了 一 个 血 洞
    倒 地 身 广 。
    康 家 相 又 惊 又 , 痛 声 道
    

    I don’t read Chinese but this is what it got machine translated as

    Kang Jia month and watch, drink
    "What do you want to do !"
    Qin Weiwei smiled, softly
    I don't know. look at some dead eyes. ,
    Predecessors will not feel pain ."
    And said, stretch out a finger , a young woman.
    In The Shape of a meal, a small blood hole appeared
    Down to the ground wide.
    The Kang family was shocked and sore
    
    Login or Signup to reply.
  2. Cool question! There are many ways of approaching this but unfortunately I can’t tell which ones work! So I’ll give you some code and you can experiment by changing it around.

    For the moment, I tried simply removing any lines that have white pixels in them, but you could look at the lines above and below, or do something else.

    #!/bin/bash -xv
    
    # Get lines containing white pixels
    convert chinese.gif -colorspace gray -threshold 80% DEBUG-white-lines.png
    
    # Develop that idea and get the line numbers in an array
    wl=( $(convert chinese.gif -colorspace gray -threshold 80% -resize 1x! -threshold 20% txt: | awk -F '[,:]' '/FFFFFF/{print $2}') )
    
    # White lines are:
    echo "${wl[@]}"
    
    # Build a string of a whole load of "chop" commands to apply in one go, rather than applying one-at-a-time and saving/re-loading
    # As we chop each line, the remaining lines move up, changing their offset by one line - UGHH. Apply a correction!
    chop=""
    correction=0
    for line in "${wl[@]}" ; do
       ((y=line-correction))
       chop="$chop -chop 0x1+0+$y "
       ((correction=correction+1))
    done
    echo $chop
    
    convert chinese.gif $chop result.png
    

    Here’s the image DEBUG-white-lines.png:

    enter image description here

    The white lines are identified as:

    44 74 134 164 194 254 284 314 374 404
    

    The final command run is:

    convert chinese.gif -chop 0x1+0+44 -chop 0x1+0+73 -chop 0x1+0+132 -chop 0x1+0+161 -chop 0x1+0+190 -chop 0x1+0+249 -chop 0x1+0+278 -chop 0x1+0+307 -chop 0x1+0+366 -chop 0x1+0+395 result.png
    

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search