- 浏览: 170834 次
- 性别:
- 来自: 上海
文章分类
- 全部博客 (174)
- rails (25)
- js (15)
- ruby (30)
- webserver (5)
- mysql (13)
- security (5)
- thinking (5)
- common sense (2)
- linux (18)
- android (26)
- web browser (1)
- config and deploy (1)
- mac (5)
- css (2)
- db (8)
- version manager (1)
- editor (1)
- job (1)
- OOA (1)
- php (1)
- apache (2)
- mongrel (1)
- Mongodb (1)
- facebook (1)
- 架构 (1)
- 高并发 (1)
- twitter (1)
- Erlang (1)
- Scala (1)
- Lua (1)
- ubuntu (3)
- cache (1)
- 面试题 (2)
- android layout (2)
- android控件属性 (2)
- java (5)
- customize view (1)
- advanced (2)
- python (2)
- 机器学习 (5)
最新评论
Decoding CAPTCHA's
extract captcha image
OCR (Optical Character Recognition) is pretty accurate these days and can easily read printed text.
rails ocr
ruby ocr
break google captcha
http://stackoverflow.com/search?q=rails+ocr
http://www.wausita.com/captcha/
-----------------------------------------------------------
1.tesseract-x.xx.tar.gz contains all the source code.
2.tesseract-2.xx.<lang>.tar.gz contains the Tesseract 2 language data files for <lang>. You need at least one of these or tesseract 2 will not work.
3. <lang>.traineddata.gz contains the Tesseract 3 language data file for <lang>. You need at least one of these or tesseract 3 will not work.
4.Note that tesseract-2.04.tar.gz unpacks to the tesseract-2.04 directory.
tesseract-2.01.<lang>.tar.gz unpacks to the tessdata directory which belongs inside your tesseract-2.04 directory. It is therefore best to download them
into your tesseract-2.04 directory, so you can use unpack here or equivalent.
You can unpack as many of the language packs as you care to, as they all
contain different files. Note that if you are using make install you should
unpack your language data to your source tree before you run make install.
If you unpack them as root to the destination directory of make install,
then the user ids and access permissions might be messed up.
If they are not already installed, you need the following libraries (Ubuntu):
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install zlibg-dev
E: 无法找到软件包 zlibg-dev => download source
sudo apt-get install zlib1g-dev
download Leptonica from http://www.leptonica.org/source/leptonlib-1.67.tar.gz
tar zxvf leptonlib-1.67.tar.gz
You also need to install Leptonica. There is an apt-get package (name unknown), or the sources are at http://www.leptonica.org/. The instructions at Leptonica README are clear, but basically it is the usual
./configure
make
sudo make install
sudo ldconfig
Now back to Tesseract. Download the source from svn:
svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr-read-only
or package tesseract-3.00.tar.gz from download page. The same build process as usual applies:
http://code.google.com/p/tesseract-ocr/downloads/list
./runautoconf
./configure
make
sudo make install
sudo vi /etc/profile
vi ~/.bashrc
gunzip FileName.gz
1. Download langugage data file (e.g. 'wget http://tesseract-ocr.googlecode.com/files/eng.traineddata.gz')
2. Decompress it ('gzip -d eng.traineddata.gz')
3. Move it to instalation tessdata (e.g. 'mv eng.traineddata $TESSDATA_PREFIX' if defined TESSDATA_PREFIX)
You may still get an error when trying to run tesseract:
$ tesseract foo.png bar
tesseract: error while loading shared libraries: libtesseract_api.so.3 cannot open shared object file: No such file or directory
You need to update the cache for the runtime linker. The following should get you up and running:
$ sudo ldconfig
--------------------------------------------------
copy eng.traineddata to /usr/local/share/tessdata
pwd
/usr/local/share/tessdata
ls
configs eng.traineddata tessconfigs
-------------------------------------------------
tesseract digit only
improve tesseract digits accuracy
use tesseract to get plain ascii text out of the bitmap.
`curl 'http://www.stc.gov.cn/search/image_code.asp?rnd=0.7641146600113322' > /home/simon/Desktop/weizh/ca.jpg`
tesseract ca.bmp outputbase -l eng
more outputbase.txt
tesseract ca.bmp outputbase nobatch digits
more outputbase.txt
only support jpg:
curl 'http://www.stc.gov.cn/search/image_code.asp?rnd=0.7641146600111234' > ca.jpg
tesseract ca.jpg outputbase nobatch digits
cat outputbase.txt
Reloading /etc/profile
source ~/.profile
$ source /etc/profile
.profile settings overwrite those in /etc/profile. You can also use .bash_profile in your home directory to customize your bash shell's profile.
Basically, if you need to load shell variables from any file just run the .
(dot) command, followed by space and (the absolute path is necessary) the path
to the file. (Be carefull what file you're loading variables from because
you meight overwrite some important environment variables and your system
could become unstable).
$ tesseract wenzhou.jpeg outputbase -l eng
Error openning data file /usr/local/sharetessdata/eng.traineddata
=> cp eng.traineddata to /usr/local/sharetessdata
cd /home/simon/Desktop/weizh
curl 'http://117.36.53.122:9081/wfcx/servlet/ValidateCodeServlet?t=1304472587796' > xian.png
tesseract xian.png out /usr/local/share/tessdata/tessconfigs/nobatch /usr/local/share/tessdata/configs/digits
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<script>
alert("验证码错误!");
window.close();
</script>
</head>
</html>
curl --cookie-jar newcookies.txt 'http://117.36.53.122:9081/wfcx/servlet/ValidateCodeServlet?t=1304494360513' > xian.png
curl --cookie newcookies.txt 'http://117.36.53.122:9081/wfcx/query.do?actiontype=vioSurveil&vcode=2148&hpzl=02&hphm=AUL695&tj=CLSBDH&tj_val=LFV2A11GX93178557'
tesseract xian.png out /usr/local/share/tessdata/tessconfigs/nobatch /usr/local/share/tessdata/configs/digits
-----------------------------------
cd /usr/local/sharetessdata:
eng.traineddata
/usr/local/share/tessdata:
chi_sim.traineddata
configs
eng.traineddata
tessconfigs
-----------------------------------
$ sudo apt-get install imagemagick
$ dpkg -l |grep imagemagick
imagemagick
imagemagick-doc
$ convert
$ whereis convert
$ which is convert
$ convert -compress none -depth 8 -alpha off zhejiang.gif zhejiang.tif
enlarge the image can improve ocr accuracy
I believe the real challenge to apply ocr for plate recognition is
that the plate image are "too dirty" comparing to paper documents.
There are frames, skews, un-even shadows, etc. You have to do your own
work to parse the plate into separate chars and feed the ocr engine. I
don't think tesseract itself can handle this automatically given the
raw image. But I believe it will do pretty well once you get the
binarized separate chars. Basically, plate recognition is more a image
processing problem than ocr problem.
You can use the grammar as post-process to make corrections.
to convert the pdf I used Image Magick convert application. bellow the set command that I use.
convert -density 288 src.pdf -colorspace Gray -depth 8 -alpha off tmp.tif
tesseract tmp.tif out.txt
how to eliminate noise
发表评论
-
git命令
2015-06-06 15:05 757git命令: man git例如:工作目录下有个zh目录, ... -
搭建git服务器
2015-06-05 10:32 543原文:http://blog.chinaunix.net/ ... -
ubuntu下SVN服务器安装配置
2015-06-04 20:34 450一、SVN安装1.安装包$ sudo apt-get inst ... -
eth0 Device not found
2014-05-03 20:38 2582查看CPU信息(型号)# cat /proc/cpuinf ... -
webserver负载均衡
2012-03-29 16:11 835LVS是Linux Virtual Server的缩写,意思是 ... -
sed命令
2012-03-16 17:05 775------------------------------- ... -
安装apt应用
2012-03-16 16:07 693sudo apt-get install google-ch ... -
ubuntu下安装mongoDB
2011-09-08 00:05 1083ubuntu下安装mongoDB $ id sim ... -
重要概念
2011-07-21 20:04 689原文:http://bbs.chinaitlab.com/vi ... -
定时任务
2011-06-08 18:21 882crontab crontab log Redhat (R ... -
ubuntu file encoding
2011-05-24 18:02 921ubuntu file encoding sudo apt- ... -
config ssh auto login
2011-03-29 23:22 1217http://baike.baidu.com/view/161 ... -
配置CentOS
2011-03-19 18:19 969root帐号登录服务器 查看版本 cat /etc/iss ... -
linux commands
2011-03-19 18:04 774最基本的是cat、more和less。 1. ... -
Linux系统命令Top/free
2011-03-19 18:02 1045Defunct processes are corrupted ... -
vi基本命令
2011-03-19 17:40 974* ★命 ... -
mount命令挂载共享文件
2011-01-18 10:55 1676机器重启 网络共享功能失效 必须重新mount ...
相关推荐
when you want to create captcha image on asp.net development, can use this module.
Linux下captchaimage-1.4安装包 python-captchaimage is a fast and easy to use Python extension for creating images with distorted text that are easy for humans and difficult for computers to read.
: " :copyright: Dhruv " , " font " : " arial.ttf " , " img_url " : " https://Captcha-Image-Api.dhruvnation1.repl.co/captchame/FkciuPXxCnJ5d9Dyg4UA2Dr6d4e5cPWla9A2eABEp0ZdSYs4bmFIVab5iCg "} Dhruv...
php验证码
Captcha breaker can identify the number in captcha image and label them.CNN was trained on custom dataset made out of captcha image
赠送jar包:captcha-1.3.0.jar; 赠送原API文档:captcha-1.3.0-javadoc.jar; 赠送源代码:captcha-1.3.0-sources.jar; 赠送Maven依赖信息文件:captcha-1.3.0.pom; 包含翻译后的API文档:captcha-1.3.0-javadoc-...
switch($captcha->validate_submit($_POST['image'],$_POST['attempt'])) { // form was submitted with incorrect key case 0: echo '<p><br>Sorry. Your code was incorrect.'; echo ' <br...
cool-php-captcha 是一个很酷的 PHP 用来生成验证码的库。示例代码:session_start();$captcha = new SimpleCaptcha();// Change configuration...//$captcha->... // Change session variable$captcha->CreateImage();
python的captcha库python的captcha库python的captcha库python的captcha库python的captcha库python的captcha库python的captcha库
###参数s: user defined captcha text c: captcha type 可以在课堂上更改更多设置... ###如何使用它只需调用 captcha.php 文件并传递所需的类型和/或预定义的验证码文本。 captcha.php?s=123456 输出: ...
captcha验证码js文件,希望对使用验证码的童鞋有帮助,正在学习中。
简单的验证码图片点击后实现图片刷新,并且进行输入框失去焦点后验证输入是否正确。
cool-php-captcha 是一个很酷的 PHP 用来生成验证码的库。 示例代码: session_start(); $captcha = new SimpleCaptcha(); // Change configuration...... // Change session variable $captcha->CreateImage();
Drupal 如何配置CAPTCHA模块; Captcha模块用于表单验证码的配置,开启即可在发表留言,发布文章,用户注册等行为上加载验证码安全校验。
验证码 captcha
thinkphp5图片组件解决captcha_src()/captcha_img() 已经生成好 直接解压到vendor目录即可 快速解决壁盯墙
赠送jar包:captcha-core-2.2.1.jar; 赠送原API文档:captcha-core-2.2.1-javadoc.jar; 赠送源代码:captcha-core-2.2.1-sources.jar; 赠送Maven依赖信息文件:captcha-core-2.2.1.pom; 包含翻译后的API文档:...
no-captcha, Laravel 没有 CAPTCHA reCAPTCHA 没有验证码 reCAPTCHA 对于 Laravel 4,使用 v1 分支。安装composer require anhskohbo/no-captcha Laravel 5设置注意这
集成aj-captcha实现滑块验证码.zip
赠送jar包:captcha-core-2.2.1.jar; 赠送原API文档:captcha-core-2.2.1-javadoc.jar; 赠送源代码:captcha-core-2.2.1-sources.jar; 赠送Maven依赖信息文件:captcha-core-2.2.1.pom; 包含翻译后的API文档:...