第一句子网 > python识别图片文字_python实现简单图片文字识别翻译OCR

python识别图片文字_python实现简单图片文字识别翻译OCR

时间：2024-02-18 16:40:10

场景描述

图片识别翻译前

图片识别翻译后

第一步引入所需要的库

from PIL import ImageFont

from PIL import Image

from PIL import ImageDraw

import hashlib

from urllib import parse

from urllib import request

import random

import base64

import json

第二步图片文字

识别

ocr（翻译）

文字识别(Optical Character Recognition,OCR)，简单讲就是识别出图片中包含的文字信息。由于这是个很深的一个领域（贫道修行尚欠），有兴趣的可以关注下第三方框架openCV，在这里简单通过第三方接口

有道智云

来实现，其他如百度等也都有免费的接口提供。之所以选有道，主要考虑是一般仅支持一个外文翻译为中文，有道智云相对而言多种一起识别，其次直接就帮我翻译成中文了，比较简单直接上代码。

# 替换成您的应用ID

appKey = "29df4hs2342"

# 替换您的应用密钥

appSecret = "9bPJj8Lh7933hlJHGOLJDSocTRh"

# 参数部分

f = open(r'd_4.png', 'rb') # 二进制方式打开图文件

q = base64.b64encode(f.read()) # 读取文件内容，转换为base64编码

q = q.decode('UTF-8', 'strict')

f.close()

# 源语言

fromLan = "en"

# 目标语言

to = "zh-CHS"

# 上传类型

type = "1"

# 随机数，自己随机生成，建议时间戳

salt = random.randint(1, 65536)

# 签名

sign = appKey + q + str(salt) + appSecret

m1 = hashlib.md5()

m1.update(sign.encode("utf8"))

sign = m1.hexdigest()

data = {'appKey': appKey, 'q': q, 'from': fromLan, 'to': to, 'type': type, 'salt': str(salt), 'sign': sign}

data = parse.urlencode(data).encode(encoding='UTF8')

req = request.Request('/ocrtransapi', data)

response = request.urlopen(req)

res = response.read()

res = json.loads(res, encoding='utf-8')

resRegions = res['resRegions']

# 输出识别内容

for i in resRegions:

print(i)

第三步根据定位替换图片文字

这一步主要涉及python的PIL库，这个库很强大，主要用于图片的各种处理，可以自行根据python版本进行安装，python2.X和python3.X会有稍微区别。

# 绘制图片

def dw(boundingBox, linesCount, lineheight, tranContent):

# 文本box起点x,y,宽，高

x, y, w, h = boundingBox.split(',')

x = int(x)

y = int(y)

w = int(w)

h = int(h)

# 设置字体字号

word_size = int(lineheight)

word_css = "msyh.ttf"

font = ImageFont.truetype(word_css, word_size)

# 绘制文字

W, H = font.getsize(tranContent) # 文字总长和高

if W > w and int(linesCount) > 1:

word_len = len(tranContent)

r = w / W

limit = int(w / word_size)

i = limit

tranContent = list(tranContent)

while i < word_len:

tranContent.insert(i, '\n')

i += limit + 1

tranContent = ''.join(tranContent)

X = x + w

Y = y + h

# 绘制矩形

draw.rectangle((x, y, X, Y), 'yellowgreen', 'wheat')

draw.text((x, y), tranContent, 'DimGrey', font=font)

if __name__ == "__main__":

im = Image.open('d_4.png')

textAngle = res['textAngle']

imNew = im.rotate(float(textAngle))

draw = ImageDraw.Draw(imNew)

for resRegion in resRegions:

boundingBox = resRegion['boundingBox']

linesCount = resRegion['linesCount']

lineheight = resRegion['lineheight']

tranContent = resRegion['tranContent']

dw(boundingBox, linesCount, lineheight, tranContent)

imNew = imNew.rotate(-float(textAngle))

del draw

# im.save('test.png')

imNew.show()

imNew.close()

小结

以上代码放在一起就可以跑通，代码写的比较稀碎，只是简单实现这么一个图片文字识别翻译的场景，有兴趣的可以自行研究下OCR实现。

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。