Umsizi wokuqashelwa kombhalo we-OCR

【Deep Learning OCR Series · 7】 CTC Loss Function and Training Techniques

Umgomo, ukuqaliswa, kanye namasu okuqeqesha wemisebenzi yokulahleka kwe-CTC, kanye namasu ayisisekelo okuxazulula izinkinga zokuhambisana kokulandelana. Ngena kuma-algorithms aphambili-emuva, amasu okukhipha amakhodi, nezindlela zokuthuthukisa.

## Isingeniso I-Connectionist Temporal Classification (CTC) iyimpumelelo ebalulekile ekufundiseni ukulandelana kokufunda, ikakhulukazi emkhakheni we-OCR. I-CTC ixazulula inkinga eyisisekelo yokungalingani phakathi kobude bokulandelana kokufaka nokulandelana kokukhipha, okwenza ukufunda kokulandelana kokuphela kokuphela. Lesi sihloko sizongena ezimisweni zezibalo, ukuqaliswa kwe-algorithm, kanye namasu okuqeqesha ukuqeqeshwa kwe-CTC. ## Imiqondo eyisisekelo ye-CTC ### Izinkinga zokuqondanisa ngokulandelana Emisebenzini ye-OCR, sibhekene nezinselelo ezilandelayo: ** Ubude bokungalingani **: Ubude bokulandelana kwesici sesithombe sokufaka buhlukile kubude bokulandelana kombhalo okukhiphayo. Isibonelo, igama eliqukethe izinhlamvu ezi-3 lingahambisana nokulandelana kwesici sezinyathelo zesikhathi eziyi-100. * Isikhundla esingaqinisekisiwe: Isikhundla esiqondile somlingiswa ngamunye esithombeni akwaziwa. Izindlela zendabuko zidinga ukuhlukaniswa kwezinhlamvu eziqondile, okunzima ekusetshenzisweni okusebenzayo. ** Ubunzima bokuhlukaniswa kwezinhlamvu **: Umbhalo obhalwe ngokuqhubekayo, umbhalo obhalwe ngesandla, noma amafonti obuciko alwela ukuhlukaniswa ngokunembile ngezinhlamvu ngazinye. ### Isixazululo se-CTC I-CTC ixazulula izinkinga zokuhambisana kokulandelana ngezindlela ezintsha ezilandelayo: Ukwethula izimpawu ezingenalutho: Sebenzisa izimpawu ezikhethekile ezingenalutho ukuphatha ukuqondanisa. Amathegi angenalutho awahambisani nanoma yiziphi izinhlamvu ezikhiphayo futhi asetshenziselwa ukuhlukanisa izinhlamvu eziphindwe kabili ekulandeleni kokugcwalisa. Amathuba omgwaqo: Ukubala amathuba azo zonke izindlela zokuqondanisa ezingenzeka. Indlela ngayinye imele ukuxhumana kwesinyathelo esingenzeka. ** Ukuhlela okunamandla **: Bala kahle amathuba omzila usebenzisa ama-algorithms aphambili-emuva, ugweme ukubala zonke izindlela ezingenzeka. ## Izimiso Zezibalo ze-CTC ### Izincazelo eziyisisekelo Njengoba kunikezwe ukulandelana kokufaka X = (x₁, x₂, ..., xt) kanye nokulandelana okuhlosiwe Y = (y₁, y₂, ..., yu), lapho T ≥ U. Isethi yamathegi: L = {1, 2, ..., K}, equkethe izigaba zezinhlamvu ze-K. ** Iqoqo lethegi elinwetshiwe **: L_ext = L ∪ {blank}, equkethe amathegi angenalutho. ** Indlela yokuqondanisa **: Ukulandelana kobude T π = (π₁, π₂, ..., πt), lapho πt ∈ L_ext. ### Imephu yezindlela eziya kumathegi I-CTC ichaza umsebenzi wemephu B oguqula indlela yokuqondanisa ibe ukulandelana kwelebula lokukhipha: 1. Susa zonke izimpawu ezingenalutho 2. Hlanganisa izinhlamvu eziphindwe kabili ezilandelanayo ** Isibonelo semephu **: - π = (a, a, blank, b, blank, b) → B(π) = (a, b, b) - π = (engenalutho, c, c, a, engenalutho, t) → B (π) = (c, a, t) ### Umsebenzi wokulahleka kwe-CTC Umsebenzi wokulahleka kwe-CTC uchazwa njenge-logarithm engemihle yesamba sawo wonke amathuba omzila ahlanganiswe ngokulandelana okuhlosiwe Y: L_CTC = -log P(Y| X) = -log Σ_{π∈B⁻¹(Y)} P(π| X) lapho i-B⁻¹ (Y) iyisethi yazo zonke izindlela ezifakwe ku-Y. Amathuba endlela: Uma ucabanga ukuthi izibikezelo zesinyathelo ngasinye sesikhathi sizimele, amathuba omzila yilezi: P(π| X) = ∏t yt^{πt} lapho i-yt^{πt} yamathuba esikhathi sesikhathi t ukubikezela ilebula πt. ## I-Algorithm Phambili-Emuva ### I-Algorithm Phambili I-algorithm engaphambili ibala amathuba omzila kusukela ekuqaleni kokulandelana kuya endaweni yamanje. ** Ukulandelana Kwelebula Enwetshiwe **: Ukwenza lula ukubala, nweba ukulandelana okuhlosiwe u-Y kuya ku-Y_ext, ufake amathegi angenalutho ngaphambi nangemva kohlamvu ngalunye. ** Ukuqaliswa **: - α₁(1) = y₁^{blank} (isikhundla sokuqala asinalutho) - α₁(2) = y₁^{y₁} (isikhundla sokuqala inguhlamvu lokuqala) - α₁(s) = 0 kwezinye izindawo ** Ifomula ephindaphindayo **: Nge-t > 1 nesikhundla s: - Uma Y_ext [s] engenalutho noma iyafana nomlingiswa wangaphambilini: α_t(s) = (α_{t-1}(s) + α_{t-1}(s-1)) × y_t^{Y_ext[s]} - Ngaphandle kwalokho: α_t(s) = (α_{t-1}(s) + α_{t-1}(s-1) + α_{t-1}(s-2)) × y_t^{Y_ext[s]} ### I-algorithm ebuyela emuva I-algorithm ebuyela emuva ibala amathuba omzila kusuka endaweni yamanje kuze kube sekupheleni kokulandelana. ** Ukuqaliswa **: - β_T(| Y_ext|) = 1 - β_T(| Y_ext|-1) = 1 (uma ithegi yokugcina ingenalutho) - β_T(s) = 0 kwezinye izindawo ** Ifomula ephindaphindayo **: Ukuze t < T nesikhundla s: - Uma Y_ext [s + 1] engenalutho noma iyafana nomlingiswa wamanje: β_t(s) = (β_{t+1}(s) + β_{t+1}(s+1)) × y_{t+1}^{Y_ext[s+1]} - Ngaphandle kwalokho: β_t(s) = (β_{t+1}(s) + β_{t+1}(s+1) + β_{t+1}(s+2)) × y_{t+1}^{Y_ext[s+1]} ### Ukubalwa kwe-gradient Amathuba aphelele:P (Y | X) = α_T(| Y_ext|) + α_T(| Y_ext|-1) ** I-gradient ye-Label Probability**: ∂(-ln P(Y| X))/∂y_k^t = -1/P(Y| X) × Σ_{s:Y_ext[s]=k} (α_t(s) × β_t(s))/y_k^t ## CTC decoding strategy ### Ukukhishwa Okuhahayo I-Greedy ichaza ilebula ngamathuba aphezulu kakhulu esigabeni ngasinye sesikhathi: π_t = argmax_k y_t^k Bese ufaka ibalazwe le-B ukuthola ukulandelana kokugcina. * Izinzuzo **: Ukubala okulula nesivinini esisheshayo * Ukungahambi kahle: Isixazululo esingcono kakhulu somhlaba wonke singase singatholakali ### Bundle search decoding Ukusesha kwe-Beam kugcina izindlela eziningi zokhetho, ukwandisa izindlela ezithembisayo kakhulu esinyathelweni ngasinye. ** Izinyathelo ze-Algorithm **: 1. Qala: Iqoqo lokhetho liqukethe izindlela ezingenalutho 2. Ngesinyathelo ngasinye sesikhathi: - Nweba zonke izindlela zokhetho - Gcina i-K-path ngamathuba aphezulu kakhulu 3. Buyisela indlela ephelele ngamathuba aphezulu kakhulu ** Ukulungiswa kwepharamitha **: - I-Beam Width K: Ilinganisa ubunzima be-computational ngekhwalithi ye-decoding - Isijeziso sobude: Gwema ukuthanda ukulandelana okufushane ### Ukusesha kwenqwaba yesiqalo Ukusesha kwe-prefix bundle kubheka amathuba esiqalo sendlela ukugwema izindlela zokubala kabili ngesiqalo esifanayo. ** Umqondo oyinhloko **: Hlanganisa izindlela ngesiqalo esifanayo, futhi ugcine kuphela indlela yokunweba eyenzeka kakhulu. ## Amasu okuqeqesha kanye nokusebenza kahle ### Ukucubungula kwangaphambili kwedatha ** Ukucubungula Ubude Bokulandelana **: - I-batting enamandla: Ukulandelana kwamaqoqo obude obufanayo - Isu lokugcwalisa: Gcwalisa ukulandelana okufushane ngezimpawu ezikhethekile - I-Truncation Strategy: Nciphisa ngokufanelekile ukulandelana okude ngokweqile ** Ukucubungula Ilebula **: - I-Character Set Standardization: Ukufakwa kwezinhlamvu ezifanayo kanye ne-capitalization - Ukuphathwa kwezinhlamvu ezikhethekile: Iphatha izimpawu zezimpawu zezimpawu nezikhala - Ukwakhiwa Kwesilulumagama: Yakha i-glossary ephelele yezinhlamvu ### Isu lokuqeqesha ** Izifundo Zokufunda **: Qala ukuqeqeshwa ngamasampula alula futhi kancane kancane ukwandisa ubunzima: - Ukulandelana okufushane kuya kwesikhathi eside - Sula isithombe esifiphele esithombeni esifiphele - Amafonti ajwayelekile kumafonti abhalwe ngesandla ** Ukuthuthukiswa kwedatha **: - Ukuguqulwa kwe-geometry: jikeleza, isikali, usike - Ukwengeza umsindo: Umsindo we-Gaussian, usawoti kanye nomsindo we-pepper - Izinguquko zokukhanyisa: ukukhanya, ukulungiswa kokungafani ** Amasu okujwayelekile **: - Dropout: Prevent overfitting - Ukuwohloka kwesisindo: L2 regularization - Ilebula Smoothing: Kunciphisa ukuzethemba ngokweqile ### Ukulungiswa kwe-Hyperparameter ** Ukuhlelwa kwesilinganiso sokufunda **: - Isu lokufudumala: Izinkathi ezimbalwa zokuqala zisebenzisa izinga elincane lokufunda - I-Cosine annealing: Izinga lokufunda liyabola ngokuya ngomsebenzi we-cosine - I-Adaptive Tuning: Ilungisa ngokususelwa ekusebenzeni kwesethi yokuqinisekiswa ** Ukukhethwa kwesayizi we-batch **: - Ukulinganiselwa kwememori: Cabanga umthamo wememori ye-GPU - Ukuzinza kwe-Gradient: Inikeza i-gradient ezinzile kakhulu yamaqoqo amakhulu - Ijubane lokuhlangana: Linganisela isivinini sokuqeqesha nokuzinza ## Ukucatshangelwa Kwesicelo Esisebenzayo ### Ukuthuthukiswa Kwe-Computational ** Memory Optimization **: - Izindawo zokuhlola ze-gradient: Kunciphisa inkumbulo yokusabalalisa phambili - Ukuqeqeshwa okuxubile: Nciphisa izidingo zememori nge-FP16 - I-Dynamic graph optimization: Ithuthukisa ukwabiwa kwememori yamagrafu abaliwe ** Speed Optimization **: - I-Parallel Computing: Isebenzisa amandla okucubungula afanayo e-GPU - I-Algorithm Optimization: Isetshenziswa kusetshenziswa ama-algorithms asebenzayo phambili kuya emuva - Batch Optimization: Setha usayizi we-batch ngendlela efanele ### Ukuzinza kwezinombolo ** Ukubala okungenzeka **: - Ukubalwa kwesikhala se-log: Gwema ukuchichima kwenani okubangelwa ukuphindaphinda okungenzeka - Ukusika kwezinombolo: Kunciphisa ububanzi bamanani okungenzeka - Amasu okujwayelekile: Qinisekisa ubuqiniso bokusatshalaliswa okungenzeka ** Ukuzinza kwe-gradient **: - I-Gradient Cropping: Ivimbela ukuqhuma kwe-gradient - Isisindo Sokuqalisa: Sebenzisa isu elifanele lokuqalisa - Ukujwayelekile kwe-batch: kuzinzisa inqubo yokuqeqesha ## Ukuhlolwa kokusebenza ### Hlola amamethrikhi ** Ukunemba kwezinga lezinhlamvu **: Accuracy_char = Inani lezinhlamvu eziqashelwe kahle / Inani eliphelele lezinhlamvu ** Ukunemba Kwezinga le-Serial **: Accuracy_seq = Inani Lokulandelana Okuqondile / Inani Eliphelele Lokulandelana ** Ibanga lokuhlela **: Ilinganisa umehluko phakathi kokulandelana okubikezelwe nokulandelana kwangempela, kufaka phakathi inani elincane lokufakwa, ukususwa, kanye nemisebenzi yokufaka esikhundleni. ### Ukuhlaziywa Kwephutha ** Izinhlobo Zamaphutha Ajwayelekile **: - Ukudideka Kwezinhlamvu: Ukuhlonza okungalungile kwezinhlamvu ezifanayo - Amaphutha aphindwe kabili: Ama-CTC ajwayele ukukhiqiza izinhlamvu eziphindwe kabili - Iphutha lobude: Izibikezelo zobude bokulandelana okunganembile ** Amasu okuthuthukisa **: - Izimayini zesampula ezinzima: Gxila kumasampula okuqeqesha ngamazinga aphezulu amaphutha - Ukulungiswa kwe-Post-processing: Kulungisa amaphutha usebenzisa amamodeli olimi - Indlela Ehlanganisiwe: Ukuhlanganisa izibikezelo ezivela kumamodeli amaningi ## Isifinyezo Umsebenzi wokulahleka kwe-CTC unikeza ithuluzi elinamandla lokumodela ukulandelana, ikakhulukazi lapho ubhekana nezinkinga zokuqondanisa. Ngokusungula amalebula angenalutho kanye nama-algorithms wohlelo olunamandla, i-CTC ibona ukufunda kokulandelana kokuphela kokuphela futhi igweme izinyathelo eziyinkimbinkimbi zokucubungula kwangaphambili. ** Izinto ezibalulekile **: - I-CTC ixazulula inkinga yokufaka okungahambisani nobude bokulandelana kokukhipha - Ama-algorithms ahamba phambili-emuva ahlinzeka ngezibalo zamathuba asebenzayo - Isu elifanele le-decoding libalulekile ekusebenzeni kokugcina - Amasu okuqeqesha kanye namasu okwenza ngcono kuthinta kakhulu ukusebenza kwemodeli ** Iziphakamiso Zesicelo **: - Khetha isu elifanele le-decoding lomsebenzi othize - Ukugcizelela ekucubunguleni idatha kanye namasu okuthuthukisa - Gxila ekuzinzeni kwezinombolo nokusebenza kahle kwe-computational ● Ukusebenza kahle okususelwa kulwazi lwesizinda Ukusetshenziswa ngempumelelo kwe-CTC kubeke isisekelo esibalulekile sokuthuthukiswa kokufunda okujulile emkhakheni wokulandelana kokulandelana, futhi kwanikeza ukwesekwa okusemqoka kwenqubekela phambili yobuchwepheshe be-OCR.
Umsizi we-OCR QQ inthanethi isevisi yamakhasimende
Isevisi yamakhasimende ye-QQ(365833440)
Umsizi we-OCR QQ iqembu lokuxhumana lomsebenzisi
QQIqembu(100029010)
Umsizi we-OCR uxhumane nensizakalo yamakhasimende nge-imeyili
Ibhokisi leposi:net10010@qq.com

Siyabonga ngokuphawula kwakho kanye neziphakamiso zakho!