Mataimakin Mataimakin Gane Rubutun OCR

【Deep Learning OCR Series · 7】CTC Loss Function and Training Techniques

Ka'ida, aiwatarwa da dabarun horo na aikin asarar CTC, da kuma fasaha ta asali don magance matsalar daidaitawa ta jerin. Nutse cikin algorithms na gaba-baya, dabarun decoding, da hanyoyin ingantawa.

## Gabatarwa Connectionist Temporal Classification (CTC) muhimmiyar ci gaba ce a cikin zurfin ilmantarwa jerin samfurin, musamman a fagen OCR. CTC yana warware matsalolin rashin daidaituwa tsakanin tsawon jerin shigarwa da jerin fitarwa, yana ba da damar koyon jerin ƙarshe zuwa ƙarshe. Wannan labarin zai shiga cikin ka'idodin ilmin lissafi, aiwatar da algorithm, da kuma dabarun inganta horo na CTC. ## CTC Basic Concepts ### Jerin daidaitawa al'amurran da suka shafi A cikin ayyukan OCR, muna fuskantar ƙalubale masu zuwa: ** Length mismatch **: Tsawon jerin siffofin hoto na shigarwa ya bambanta da tsawon jerin rubutun fitarwa. Alal misali, kalmar da ke dauke da haruffa 3 na iya dacewa da jerin siffofi na matakai 100. * Matsayi mara tabbas **: Ba a san ainihin matsayin kowane hali a cikin hoton ba. Hanyoyin gargajiya suna buƙatar daidaitattun rarrabuwar halayen halaye, wanda yake da wuya a aikace-aikace masu amfani. ** Matsala a cikin Rarraba Halayen **: Rubutun da aka rubuta da hannu, rubutun hannu, ko rubutun fasaha suna gwagwarmaya don rarrabuwa daidai zuwa haruffa daban-daban. ### CTC's Solution CTC yana warware matsalolin daidaitawa ta hanyoyi masu zuwa: Gabatar da alamomin blank: Yi amfani da alamomin blank na musamman don kula da daidaitawa. Alamomin fanko ba su dace da kowane haruffan fitarwa ba kuma ana amfani dasu don raba kwafin haruffa daga jerin cikawa. Yiwuwar hanya: Ƙididdige yiwuwar duk hanyoyin daidaitawa masu yiwuwa. Kowace hanya tana wakiltar yiwuwar haruffa na mataki-zuwa-lokaci. ** Dynamic Planning **: Yadda ya kamata lissafin yiwuwar hanya ta amfani da algorithms na gaba-baya, guje wa ƙididdige duk hanyoyin da za a iya yiwuwa. ## Ka'idodin Lissafi na CTC ### Ma'anar Asali Idan aka ba da jerin shigarwa X = (x₁, x₂, ..., xt) da jerin manufa Y = (y₁, y₂, ..., yu), inda T ≥ U. Saitin tag: L = {1, 2, ..., K}, wanda ya ƙunshi nau'ikan halayen K. ** Extended Tag Collection**: L_ext = L ∪ {blank}, dauke da alamun blank. ** Hanyar daidaitawa **: Jerin tsawon T π = (π₁, π₂, ..., πt), inda πt ∈ L_ext. ### Taswirar hanyoyin zuwa tags CTC ya bayyana aikin taswirar B wanda ke canza hanyar daidaitawa zuwa jerin lakabin fitarwa: 1. Cire duk alamomin blank 2. Haɗa haruffa a jere ** Misali na taswirar **: - π = (a, a, blank, b, blank, b, b) → B(π) = (a, b, b) - π = (blank, c, c, a, blank, t) → B(π) = (c, a, t) ### CTC Loss Function An bayyana aikin asarar CTC a matsayin logarithm mara kyau na jimlar duk yiwuwar hanyar da aka tsara zuwa jerin manufa Y: L_CTC = -log P(Y| X) = -log Σ_{π∈B⁻¹(Y)} P(π| X) inda B⁻¹ (Y) shine saitin duk hanyoyin da aka tsara zuwa Y. Yiwuwar hanya: Idan aka yi la'akari da cewa tsinkayen kowane mataki na lokaci yana da zaman kansa, yiwuwar hanyar ita ce: P(π| X) = ∏t yt^{πt} inda yt ^ {πt} shine yiwuwar matakin lokaci t yana hango lakabin πt. ## Gaba-Baya Algorithm ### Forward Algorithm Algorithm na gaba yana ƙididdige yiwuwar hanyar daga farkon jerin zuwa matsayi na yanzu. ** Extended Label Sequence**: Don sauƙaƙe lissafi, faɗaɗa jerin manufa Y zuwa Y_ext, saka alamun fanko kafin da bayan kowane hali. ** Farawa **: - α₁(1) = y₁^{blank} (matsayi na farko ba komai bane) - α₁(2) = y₁^{y₁} (matsayi na farko shine hali na farko) - α₁(s) = 0 don sauran wurare ** Recursive Formula **: Don t > 1 da matsayi s: - Idan Y_ext [s] ba komai ba ne ko kuma daidai da halayen da ya gabata: α_t(s) = (α_{t-1}(s) + α_{t-1}(s-1)) × y_t^{Y_ext[s]} - In ba haka ba: α_t(s) = (α_{t-1}(s) + α_{t-1}(s-1) + α_{t-1}(s-2)) × y_t^{Y_ext[s]} ### Algorithm na baya Algorithm na baya yana ƙididdige yiwuwar hanyar daga matsayi na yanzu zuwa ƙarshen jerin. ** Farawa **: - β_T(| Y_ext|) = 1 - β_T(| Y_ext|-1) = 1 (idan alamar ƙarshe ba komai bane) - β_T(s) = 0 don sauran wurare ** Recursive Formula **: Don t < T da matsayi s: - Idan Y_ext [s + 1] ba komai ba ne ko daidai da halayen yanzu: β_t(s) = (β_{t+1}(s) + β_{t+1}(s+1)) × y_{t+1}^{Y_ext[s+1]} - In ba haka ba: β_t(s) = (β_{t+1}(s) + β_{t+1}(s+1) + β_{t+1}(s+2)) × y_{t+1}^{Y_ext[s+1]} ### Lissafin Gradient Jimlar yiwuwa:P (Y | X) = α_T(| Y_ext|) + α_T(| Y_ext|-1) ** Gradient na Label Yiwuwar **: ∂(-ln P(Y| X))/∂y_k^t = -1/P(Y| X) × Σ_{s:Y_ext[s]=k} (α_t(s) × β_t(s))/y_k^t ## CTC decoding dabarun ### Zazzagewa Greedy yana ƙaddamar da lakabin tare da mafi girman yiwuwar a kowane mataki: π_t = argmax_k y_t^k Sa'an nan kuma yi amfani da taswirar B don samun jerin ƙarshe. 【 】 【 �� * Rashin amfani: Ba za a iya samun mafita mafi kyau na duniya ba. ### Binciken T Binciken Beam yana kula da hanyoyi masu yawa na 'yan takara, yana faɗaɗa hanyoyin da suka fi dacewa a kowane mataki. ** Matakan Algorithm **: 1. Farawa: Tarin 'yan takara ya ƙunshi hanyoyi marasa komai 2. Ga kowane mataki na lokaci: - Extend all candidate paths - Ci gaba da K-hanya tare da mafi girman yiwuwar 3. Dawo da cikakkiyar hanya tare da mafi girman yiwuwar ** Parameter Tuning **: - Beam Width K: Balances computational complexity with decoding quality - Hukuncin Tsawo: Guji fifita gajerun jerin ### Prefix bundle search Prefix bundle search la'akari da prefix yiwuwar hanya don kauce wa ƙidaya hanyoyi biyu tare da wannan prefix. ** Core ra'ayin **: Haɗa hanyoyi tare da prefix iri ɗaya, kuma kawai kiyaye hanyar tsawo mai yiwuwa. ## Dabarun Horo da Ingantawa ### Bayanan da aka yi amfani da su ** Jerin Tsawon Tsawo **: - Dynamic batching: Jerin jerin jerin abubuwa iri ɗaya - Cika dabarun: Cika gajerun jerin tare da alamomi na musamman - Truncation Strategy: Reasonable truncate overively long sequences ** Label Preprocessing **: - Character Set Standardization: Uniform character encoding and capitalization - Special hali handling: Rike punctuation alamomi da sarari - Ginin Ƙamus: Gina cikakken ƙamus na haruffa ### Dabarun Horo ** Course Learning **: Fara horo tare da samfurori masu sauƙi kuma a hankali ƙara wahala: - Gajere zuwa dogon jerin - Clear image to blurry image - Regular fonts to handwritten fonts ** Inganta bayanai **: - Geometry canje-canje: juyawa, sikelin, yanke - Ƙarin hayaniya: Gaussian hayaniya, gishiri da barkono - Canje-canje na haske: haske, daidaitawa na bambanci ** Dabarun Daidaitawa **: - Dropout: Hana overfitting - Weight degradation: L2 regularization - Label Smoothing: Rage overconfidence ### Hyperparameter tuning ** Jadawalin Ƙididdigar Koyo **: - Warm-up dabarun: The farko 'yan epochs amfani da karamin ilmantarwa kudi - Cosine annealing: Yawan ilmantarwa yana lalacewa gwargwadon aikin cosine - Daidaitawa Daidaitacce: Daidaitawa bisa ga aikin saitin tabbatarwa ** Zaɓin Girman Batch **: - Iyakokin ƙwaƙwalwar ajiya: Yi la'akari da ƙarfin ƙwaƙwalwar GPU - Gradient Stability: Yana ba da ƙarin kwayayyen gradient don manyan rukuni - Convergence Speed: Daidaita saurin horo da kwanciyar hankali ## Aikace-aikacen Aikace-aikacen Aikace-aikacen T ### Inganta Lissafi ** Memory Optimization **: - Gradient checkpoints: Rage ƙwaƙwalwar ajiya na gaba yaduwa - Horo mai haɗuwa: Rage buƙatun ƙwaƙwalwar ajiya tare da FP16 - Dynamic graph optimization: Optimizes memory allocation for calculated graphs ** Gudun Ingantawa **: - Parallel Computing: Yana amfani da GPU parallel processing capabilities - Algorithm Optimization: An aiwatar da shi ta amfani da ingantattun algorithms na gaba-zuwa-baya - Batch Optimization: Saita girman rukuni yadda ya kamata ### Kwanciyar hankali na lambobi ** Ƙididdigar yiwuwar **: - Log-sarari lissafi: Guji darajar wuce gona da iri haifar da yiwuwar ninkawa - Lambobin ƙididdiga: Iyakance kewayon ƙimar yiwuwar - Normalization Techniques: Tabbatar da ingancin yiwuwar rarraba ** Gradient Stability **: - Gradient Cropping: Yana hana fashewar gradient - Weight Initialization: Yi amfani da dabarun initialization mai dacewa - Batch normalization: daidaita tsarin horo ## Kimantawa na Aiki ### Kimanta ma'auni ** Daidaito na Matakin Hali **: Accuracy_char = Adadin haruffa da aka gane daidai / Jimlar adadin haruffa ** Daidaito na Serial **: Accuracy_seq = Adadin jerin daidaitattun / jimlar jerin jerin ** Gyara Nisa **: Yana auna bambanci tsakanin jerin da aka annabta da ainihin jerin, gami da mafi ƙarancin adadin sakawa, sharewa, da kuma maye gurbin ayyukan. ### Kuskuren Kuskure ** Nau'ikan kurakurai na yau da kullun **: - Character Confusion: Misidentification of similar characters - Duplicate kurakurai: CTCs ayan samar da duplicate haruffa - Length error: Inaccurate jerin tsawon tsinkaya ** Dabarun Ingantawa **: - Wuya samfurin hakar ma'adinai: Mayar da hankali kan horo samfurori tare da high kuskure rates - Post-processing ingantawa: Gyara kurakurai ta amfani da harshe model - Hadaddiyar hanya: Haɗa tsinkaya daga samfuran da yawa ## Summary Aikin asarar CTC yana ba da kayan aiki mai ƙarfi don ƙirar jerin samfurin, musamman lokacin ma'amala da matsalolin daidaitawa. Ta hanyar gabatar da lakabi mara komai da shirye-shiryen shirye-shirye masu ƙarfi, CTC yana fahimtar koyon jerin ƙarshe zuwa ƙarshe kuma yana guje wa matakai masu rikitarwa. ** Key Takeaways**: - CTC yana warware matsalar shigarwa da tsayin jerin fitarwa - Gaba-baya algorithms samar da ingantaccen ƙididdigar yiwuwar ƙididdigar - Tsarin da ya dace yana da mahimmanci ga aikin ƙarshe - Dabarun horo da dabarun ingantawa suna tasiri sosai ga aikin samfurin ** Shawarwarin Aikace-aikace **: - Zaɓi dabarun decoding don takamaiman aiki - Girmamawa kan data preprocessing da inganta dabarun - Mayar da hankali kan kwanciyar hankali na lambobi da ingantaccen lissafi - Post-processing ingantawa dangane da ilimin yanki Nasarar aikace-aikacen CTC ya aza muhimmiyar tushe don haɓaka zurfin ilmantarwa a fagen ƙirar jerin, kuma ya ba da mahimmin tallafi ga ci gaban fasahar OCR.
OCR mataimakin QQ sabis na abokin ciniki na kan layi
Sabis na abokin ciniki na QQ(365833440)
OCR mataimakin QQ mai amfani sadarwa rukunin
QQrukuni(100029010)
Mataimakin OCR tuntuɓi sabis na abokin ciniki ta imel
Akwatin gidan waya:net10010@qq.com

Na gode da ra'ayoyinku da shawarwarinku!