Umsizi wokuqashelwa kombhalo we-OCR

【Ukufunda okujulile kwe-OCR Series·6】Ukuhlaziywa okujulile kwezakhiwo ze-CRNN

Ukuhlaziywa okuningiliziwe kwesakhiwo se-CRNN, kufaka phakathi ukukhishwa kwesici se-CNN, ukumodelwa kokulandelana kwe-RNN, kanye nokuqaliswa okuphelele komsebenzi wokulahleka kwe-CTC. Ngena ekuhlanganisweni okuphelele kwe-CNN ne-RNN.

## Isingeniso I-CRNN (Convolutional Recurrent Neural Network) ingenye yezakhiwo ezibaluleke kakhulu emkhakheni wokufunda okujulile kwe-OCR, ehlongozwayo nguBai Xiang et al. ngo-2015. I-CRNN ihlanganisa ngobuhlakani amakhono okukhipha isici samanethiwekhi we-neural (CNNs) namakhono okumodela okulandelana kwamanethiwekhi we-neural aphindaphindayo (RNNs) ukufeza ukuqashelwa kombhalo kokuphela kokuphela. Le ndatshana izohlinzeka ngokuhlaziywa okujulile kokwakhiwa kwezakhiwo ze-CRNN, izimiso zokusebenza, izindlela zokuqeqesha, kanye nezicelo ezithile ku-OCR, ukuhlinzeka abafundi ngokuqonda okuphelele kwezobuchwepheshe. ## Ukubuka konke kwe-CRNN Architecture ### Ukugqugquzela Ukuklama Ngaphambi kwe-CRNN, izinhlelo ze-OCR zazivame ukwamukela indlela yesinyathelo ngesinyathelo: ukutholwa kwezinhlamvu nokuhlukaniswa kwenziwa kuqala, bese uhlamvu ngalunye lwaqashelwa. Le ndlela inezinkinga ezilandelayo: * Ukulinganiselwa kwezindlela zendabuko **: - Ukusabalalisa iphutha: Amaphutha ekuhlukanisweni kwezinhlamvu angathinta ngqo imiphumela yokuqashelwa - Inkimbinkimbi: Idinga ukuklama ama-algorithms ayinkimbinkimbi wokuhlukaniswa kwezinhlamvu - Ukuqina okumpofu: Ukuzwela isikhala sezinhlamvu nezinguquko zefonti - Ukungakwazi ukusingatha imivimbo eqhubekayo: Isimo semivimbo eqhubekayo embhalweni obhalwe ngesandla kunzima ukuhlukanisa ** Imibono emisha ye-CRNN **: - Ukufunda kokuphela kokuphela: Imephu ngqo kusuka ezithombeni kuya ekulandelaneni kombhalo - Ayikho i-Segmentation: Igwema ubunzima bokuhlukaniswa kwezinhlamvu - Ukulandelana Modeli: Sebenzisa ama-RNN ukulingisa ukuncika phakathi kwezinhlamvu - Ukuqondanisa kwe-CTC: Ikhelani nokungahambisani kobude bokulandelana kokufaka-okukhipha ### Ukwakhiwa okuphelele Ukwakhiwa kwe-CRNN kuqukethe izingxenye ezintathu eziyinhloko: **1. Izingqimba ze-Convolutional **: - Umsebenzi: Khipha ukulandelana kwesici kusuka ezithombeni zokufaka - Okufaka: Isithombe somugqa wombhalo (ukuphakama okuhleliwe, ububanzi obuguquguqukayo) - Okukhiphayo: Ukulandelana kwemephu yesici **2. Izingqimba eziphindaphindiwe **: - Umsebenzi: Modeli yokuncika komongo ekulandelaneni kwesici - Input: Ukulandelana kwesici esikhishwe yi-CNN - Okukhiphayo: Ukulandelana kwesici ngolwazi olunomongo **3. Ungqimba lokubhalwa **: - Umsebenzi: Guqula ukulandelana kwesici kube ukulandelana kombhalo - Indlela: Ukusebenzisa i-CTC (i-Connectionist Temporal Classification) - Okukhiphayo: Umphumela wokugcina wokuqashelwa kombhalo ## Incazelo eningiliziwe yezingqimba ze-convolutional ### Amasu okukhipha isici Isendlalelo se-convolutional se-CRNN senzelwe ngokukhethekile ukuqashelwa kombhalo: ** Izici zesakhiwo senethiwekhi **: - Ukujula okungajulile: Izingqimba ezingu-7 zezingqimba ze-convolutional zivame ukusetshenziswa - Ama-kernels amancane we-convolutional: Ama-kernels angama-3×3 asetshenziswa kakhulu - Pooling Strategy: Sebenzisa pooling kancane ohlangothini ububanzi ** Ukumiswa Kwenethiwekhi Ethile **: Okufaka: 32×W×1 (Ukuphakama 32, Ububanzi W, Isiteshi Esisodwa) I-Conv1: 64 3×3 i-convolutional nuclei, isinyathelo 1, gcwalisa i-1 MaxPool1: 2×2 amachibi, isinyathelo ubude 2 Conv2: 128 3×3 izinhlamvu ze-convolutional, isinyathelo 1, gcwalisa i-1 MaxPool2: 2×2 pooled, isinyathelo usayizi 2 Conv3: 256 3×3 convolutional nuclei, isinyathelo 1, ukugcwalisa 1 I-Conv4: 256 3×3 ama-cores we-convolutional, isinyathelo 1, gcwalisa i-1 MaxPool3: 2×1 pooled, usayizi wesinyathelo (2,1) I-Conv5: 512 3×3 ama-cores we-convolutional, isinyathelo 1, gcwalisa i-1 I-BatchNorm + ReLU Conv6: 512 3×3 izinhlamvu ze-convolutional, isinyathelo 1, gcwalisa i-1 I-BatchNorm + ReLU MaxPool4: 2×1 pooled, usayizi wesinyathelo (2,1) I-Conv7: 512 2×2 i-nuclei ye-convolutional, isinyathelo 1, gcwalisa i-0 Okukhiphayo: 512×1×W / 4 ### Ukucatshangelwa Okusemqoka Kokuklama ** Isu lokucindezela okuphezulu **: - Umgomo: Cindezela isithombe sibe yi-1 pixel ephakeme - Indlela: Kancane kancane cindezela ukuphakama usebenzisa izingqimba eziningi zokuhlanganisa - Isizathu: Ukuphakama komugqa wombhalo akubalulekile ** Isu lokubamba ububanzi **: - Inhloso: Gcina ulwazi lobubanzi besithombe ngangokunokwenzeka - Indlela: Nciphisa imisebenzi yokuhlanganisa ohlangothini lobubanzi - Isizathu: Imininingwane yokulandelana kombhalo ibonakala ngokuyinhloko ekuqondisweni kobubanzi ** Ukuguqulwa Kwemephu Yesici **: Okukhiphayo kwesendlalelo se-convolutional kufanele kuguqulwe kufomethi yokufaka ye-RNN: - Okukhiphayo okuluhlaza: C × H×W (isiteshi × ukuphakama× ububanzi) - Kuguquliwe: W × C (Ubude bokulandelana× ubukhulu besici) - Indlela: Thatha i-vector yesici sesikhundla ngasinye sobubanzi njengesinyathelo sesikhathi ## Incazelo eningiliziwe yesendlalelo esiyindilinga ### Ukukhethwa kwe-RNN Ama-CRNN ngokuvamile asebenzisa ama-LSTM ama-bidirectional njengongqimba lweluphu: ** Izinzuzo ze-LSTM ye-bidirectional **: - Ulwazi Lomongo: Sebenzisa umongo ophambili nangemuva - Ukuncika Kwamabanga Amade: I-LSTM iyakwazi ukusingatha ukuncika kwebanga elide - Ukuzinza kwe-gradient: Kugwema inkinga yokunyamalala kwe-gradient ** Ukumiswa kwenethiwekhi **: Input: W×512 (ukulandelana ubude × isici ubukhulu) I-BiLSTM1: Amaseli afihliwe angama-256 (128 phambili + 128 emuva) I-BiLSTM2: Amaseli afihliwe angama-256 (128 phambili + 128 emuva) Okukhiphayo: W×256 (ubude bokulandelana× ubukhulu obufihliwe) ### Ukulandelana Kwezindlela Zokumodela ** Imodeli Yokuncika Kwesikhathi **: Ungqimba lwe-RNN lubamba ukuncika kwesikhathi phakathi kwezinhlamvu: - Ulwazi lomlingiswa wangaphambilini lusiza ekuqapheliseni uhlamvu lwamanje - Ulwazi lwabalingiswa abalandelayo lunganikeza umongo owusizo - Ulwazi lwegama lonke noma ibinzana lisiza ukuhlukanisa ** Izithuthukisi Zezici **: Izici ezicutshungulwe yi-RNN zinezici ezilandelayo: - Umongo ozwelayo: Izici zendawo ngayinye ziqukethe ulwazi lomongo - Ukungaguquguquki kwesikhathi: Izici ezindaweni eziseduze zinokuqhubeka okuthile - Ingcebo ye-Semantic: Ihlanganisa izici ezibonakalayo nokulandelana ## Incazelo eningiliziwe yesendlalelo sokubhalwa ### Indlela ye-CTC I-CTC (i-Connectionist Temporal Classification) iyingxenye ebalulekile ye-CRNN: ** Indima ye-CTC **: - Ukubhekana Nezinkinga Zokuqondanisa: Ubude bokulandelana kokufaka abuhambisani nobude bokulandelana okukhiphayo - Ukuqeqeshwa kokuphela kokuphela: Asikho isidingo sezichasiselo zokuqondanisa ezingeni lezinhlamvu - Phatha okuphindwe kabili: Phatha amacala ezinhlamvu eziphindwe kabili ngendlela efanele ** Isebenza kanjani i-CTC **: 1. Khulisa isethi yelebula: Engeza amalebula angenalutho ngaphezulu kwesethi yezinhlamvu zokuqala 2. Ukubalwa kwendlela: Bala zonke izindlela zokuqondanisa ezingaba khona 3. Amathuba endlela: Bala amathuba endlela ngayinye 4. Ukubekwa eceleni: ukuhlanganisa amathuba azo zonke izindlela zokuthola amathuba okulandelana ### Umsebenzi wokulahleka kwe-CTC ** Ukumelwa kwezibalo **: Njengoba kunikezwe ukulandelana kokufaka X kanye nokulandelana okuhlosiwe Y, ukulahleka kwe-CTC kuchazwa kanje: L_CTC = -log P(Y| X) where P(Y| X) itholakala ngokufingqa amathuba azo zonke izindlela ezihambisanayo ezingenzeka: P(Y| X) = Σ_π∈B^(-1)(Y) P(π| X) Lapha i-B ^ (-1) (Y) imele wonke amasethi ezindlela ezingafakwa ngokulandelana okuhlosiwe Y. ** I-Algorithm Phambili-Emuva **: Ukubala kahle ukulahleka kwe-CTC, kusetshenziswa i-algorithm eya phambili-emuva yohlelo olushukumisayo: - I-Forward Algorithm: Ibala amathuba okufinyelela esimweni ngasinye - I-Algorithm Ebuyela emuva: Ibala amathuba kusuka esimweni ngasinye kuze kube sekugcineni - Ukubalwa kwe-Gradient: Bala ama-gradients ngokuhambisana namathuba okuya phambili-emuva ## Isu Lokuqeqesha le-CRNN ### Ukucubungula kwangaphambili kwedatha ** Ukucubungula Izithombe **: - Ukujwayelekile kwesayizi: Hlanganisa ukuphakama kwesithombe kumaphikseli angama-32 - Ukugcinwa kwesilinganiso se-Aspect Ratio: Igcina isilinganiso sesici sesithombe sokuqala - Ukuguqulwa kwe-Grayscale: Guqula isithombe sesiteshi esisodwa se-grayscale - Ukujwayelekile kwezinombolo: amanani we-pixel ajwayelekile ku-[0,1] noma [-1,1] ** Ukuthuthukiswa kwedatha **: - Ukuguqulwa kwejometri: ukujikeleza, ukuthambekela, ukuguqulwa kombono - Izinguquko zokukhanyisa: ukukhanya, ukulungiswa kokungafani - Ukwengeza umsindo: Umsindo we-Gaussian, usawoti kanye nomsindo we-pepper - Blur: Motion blur, Gaussian blur ### Amasu okuqeqesha ** Ukuhlelwa kwesilinganiso sokufunda **: - Izinga lokufunda lokuqala: Imvamisa isethwe ku-0.001 - I-Decay Strategy: Ukubola okubonakalayo noma ukubola kwesinyathelo - Isu lokufudumala: Izinkathi ezimbalwa zokuqala zisebenzisa izinga elincane lokufunda ** Amasu okujwayelekile **: - I-Dropout: Engeza i-dropout ngemuva kongqimba lwe-RNN - Ukuwohloka kwesisindo: Ukulungiswa kwe-L2 kuvimbela ukweqisa ngokweqile - Ukujwayelekile kwe-batch: Sebenzisa ukujwayelekile kwe-batch kungqimba lwe-CNN ** Ukukhethwa kwe-Optimizer **: - Adam: Izinga lokufunda eliguquguqukayo, ukuhlanganiswa okusheshayo - RMSprop: Ifanele ukuqeqeshwa kwe-RNN - SGD + Momentum: Inketho yendabuko kodwa ezinzile ## Ukuthuthukiswa nokuthuthukiswa kwe-CRNN ### Ukwakhiwa kwezakhiwo ** Ukuthuthukiswa okuyingxenye kwe-CNN **: - Ukuxhumeka kwe-ResNet: Kungezwe ukuxhumana okusele ukuthuthukisa ukuzinza kokuqeqeshwa - I-DenseNet Fabric: Ukuxhumeka okuminyene kuthuthukisa ukuphindaphinda kwesici - I-Attention Mechanism: Yethula ukunakwa kwendawo kuma-CNNs ** Ukuthuthukiswa okuyingxenye kwe-RNN **: - Ukushintshwa kwe-GRU: Sebenzisa i-GRU ukunciphisa inani lamapharamitha - I-Transformer: Ithatha indawo yama-RNN isebenzisa izindlela zokuzinakekela - Izici ze-Multi-Scale: Faka izici ezivela ezikalini ezahlukahlukene ### Ukusebenza Kahle ** Ukusheshisa kwe-Inference **: - I-Model Quantization: I-INT8 quantization inciphisa umzamo wokubala - Ukuthena imodeli: Susa izixhumanisi ezingabalulekile - I-Knowledge Distillation: Funda ulwazi lwamamodeli amakhulu ngamamodeli amancane ** Memory Optimization **: - Izindawo zokuhlola ze-gradient: Nciphisa inkumbulo ngesikhathi sokuqeqeshwa - Ukunemba okuxubile: Qeqesha nge-FP16 - I-Dynamic graph optimization: Thuthukisa ukwakheka kwegrafu ebaliwe ## Amacala Wesicelo Somhlaba Wangempela ### Ukuqashelwa kombhalo obhalwe ngesandla ** Izimo Zesicelo **: - Digitize amanothi abhalwe ngesandla - Ifomu lokugcwalisa ngokuzenzakalelayo - Ukuqashelwa kwemibhalo yomlando ** Izici Zobuchwepheshe **: - Ukuhlukahluka okukhulu kwezinhlamvu: Kudinga amandla aqinile okukhipha isici - Ukucubungula okuqhubekayo kwe-stroke: Izinzuzo zendlela ye-CTC zisobala - Ukubaluleka Komongo: Amakhono okumodela ukulandelana kwe-RNN abalulekile ### Ukuqashelwa kombhalo oprintiwe ** Izimo Zesicelo **: - Digitize imibhalo - Ukuhlonza ithikithi - Ukuqashelwa kwezimpawu ** Izici Zobuchwepheshe **: - Ukujwayelekile kwefonti: Ukukhishwa kwesici se-CNN kuqondile - Imithetho ye-Typography: Imininingwane yesakhiwo ingasetshenziswa - Izidingo zokunemba okuphezulu: Kudinga ukulungiswa okuhle kwemodeli ### Ukuqashelwa kombhalo wesigcawu ** Izimo Zesicelo **: - Ukuqashelwa kombhalo we-Street View - Ukuhlonza ilebula lomkhiqizo - Ukuqashelwa kwezimpawu zomgwaqo ** Izici Zobuchwepheshe **: - Isizinda esiyinkimbinkimbi: Kudinga ukukhishwa kwesici esiqinile - Ukusonteka okukhulu: Ukwakhiwa okuqinile kwezakhiwo kuyadingeka - Izidingo zesikhathi sangempela: Kudinga ukucabanga okusebenzayo ## Isifinyezo Njengokwakhiwa kwakudala kokufunda okujulile kwe-OCR, i-CRNN ixazulula ngempumelelo izinkinga eziningi zezindlela zendabuko ze-OCR. Indlela yayo yokuqeqesha yokuphela, umqondo wokuklama ngaphandle kokuhlukaniswa kwezinhlamvu, kanye nokwethulwa kwendlela ye-CTC konke kunikeza ugqozi olubalulekile ekuthuthukisweni okulandelayo kobuchwepheshe be-OCR. ** Iminikelo Ebalulekile **: - Ukufunda kokuphela kokuphela: Kwenza lula ukwakheka kwezinhlelo ze-OCR - Ukulandelana Modeli: Isebenzisa ngempumelelo izakhiwo zokulandelana kombhalo - Ukuqondanisa kwe-CTC: Ukungalingani kobude bokulandelana okuphathelene - Ukwakhiwa okulula: Kulula ukuyiqonda nokusebenzisa ** Ukuqondiswa kwentuthuko **: - Indlela yokunakwa: Ukwethula ukunakwa ukuze kuthuthukiswe ukusebenza - I-Transformer: Ithatha indawo yama-RNN ngokuzinakekela - I-Multimodal fusion: Hlanganisa olunye ulwazi njengamamodeli olimi - Idizayini engasindi: ukucindezelwa kwemodeli kumadivayisi eselula Impumelelo ye-CRNN iwubufakazi bekhono elikhulu lokufunda okujulile emkhakheni we-OCR futhi inikeza isipiliyoni esibalulekile sokuqonda ukuthi ungaklama kanjani izinhlelo zokufunda ezisebenzayo zokuphela kokuphela. Esihlokweni esilandelayo, sizocubungula imininingwane yezibalo kanye nokuqaliswa komsebenzi wokulahleka kwe-CTC.
Umsizi we-OCR QQ inthanethi isevisi yamakhasimende
Isevisi yamakhasimende ye-QQ(365833440)
Umsizi we-OCR QQ iqembu lokuxhumana lomsebenzisi
QQIqembu(100029010)
Umsizi we-OCR uxhumane nensizakalo yamakhasimende nge-imeyili
Ibhokisi leposi:net10010@qq.com

Siyabonga ngokuphawula kwakho kanye neziphakamiso zakho!