Mataimakin Mataimakin Gane Rubutun OCR

【Deep Learning OCR Series · 6】 In-depth analysis of CRNN architecture

Cikakken bincike game da gine-ginen CRNN, gami da hakar fasali na CNN, ƙirar jerin RNN, da cikakken aiwatar da aikin asarar CTC. Yi la'akari da cikakkiyar haɗuwa da CNN da RNN.

## Gabatarwa CRNN (Convolutional Recurrent Neural Network) yana ɗaya daga cikin mahimman gine-gine a fagen zurfin ilmantarwa OCR, wanda Bai Xiang et al. suka gabatar a cikin 2015. CRNN ta haɗu da ƙwarewar haɓaka fasalin cibiyoyin sadarwar jijiyoyin cuta (CNNs) tare da ƙididdigar cibiyoyin sadarwar jijiyoyin maimaitawa (RNNs) don cimma ganewar rubutu na ƙarshe zuwa ƙarshe. Wannan labarin zai ba da zurfin bincike game da ƙirar gine-ginen CRNN, ƙa'idodin aiki, hanyoyin horo, da takamaiman aikace-aikace a cikin OCR, yana ba masu karatu cikakkiyar fahimtar fasaha. ## Overview of CRNN Architecture ### Design Motivation Kafin CRNN, tsarin OCR yawanci ya ɗauki mataki-mataki: an yi ganowa da rarrabuwa da farko, sannan kuma an gane kowane hali. Wannan hanyar tana da matsaloli masu zuwa: * Iyakokin Hanyoyin Gargajiya **: - Yaduwar kuskure: Kurakurai a cikin rarrabuwar halayen na iya shafar sakamakon ganewa kai tsaye - Complexity: Yana buƙatar ƙirƙirar hadaddun halayen halayen algorithms - Rashin ƙarfi mara kyau: M ga hali spacing da kuma font canje-canje - Rashin iya sarrafa bugun jini na ci gaba: Yanayin ci gaba da bugun jini a cikin rubutun hannu yana da wuya a rarrabe ** Ra'ayoyin CRNN **: - Ƙare-zuwa-ƙarshen koyo: Taswirar kai tsaye daga hotuna zuwa jerin rubutu - Babu Segmentation: Yana kauce wa rikitarwa na halayen rarrabuwa - Jerin Modeling: Yi amfani da RNNs don tsara dogaro tsakanin haruffa - CTC Alignment: Addresses input-output jerin tsawon rashin daidaituwa ### Gabaɗaya gine-gine Tsarin CRNN ya ƙunshi manyan abubuwa uku: **1. Convolutional Layers*: - Aiki: Cire jerin siffofi daga hotunan shigarwa - Shigarwa: Hoton layin rubutu (tsayayyen tsayi, faɗi mai canzawa) - Fitarwa: Jerin taswirar fasalin **2. Maimaitawa Layers**: - Aiki: Samfurin dogaro da mahallin a cikin jerin siffofin - Input: The feature jerin extracted by CNN - Fitarwa: Jerin fasali tare da bayanan mahallin **3. Transcription Layer **: - Aiki: Canza jerin siffofi zuwa jerin rubutu - Hanya: Amfani da CTC (Connectionist Temporal Classification) - Output: The final text recognition result ## Cikakken bayani game da rikice-rikicen da aka yi amfani da su ### Dabarun Cire Siffofi An tsara maɓallin keɓaɓɓu na CRNN musamman don gane rubutu: ** Siffofin Tsarin Cibiyar Sadarwar **: - Zurfin zurfi: Ana amfani da yadudduka 7 na yadudduka masu rikitarwa - Ƙananan ƙwayoyin convolutional: 3×3 convolutional kernels galibi ana amfani dasu - Pooling Strategy: Yi amfani da pooling sparingly a cikin fadi shugabanci ** Takamaiman Tsarin Cibiyar Sadarwar **: shigarwa: 32×W×1 (tsawo 32, faɗi W, tashar ɗaya) Conv1: 64 3×3 convolutional nuclei, mataki na 1, cika 1 MaxPool1: 2×2 pools, mataki tsawon 2 Conv2: 128 3×3 convolutional kernels, mataki na 1, cika 1 MaxPool2: 2×2 pooled, mataki size 2 Conv3: 256 3×3 convolutional nuclei, mataki na 1, cika 1 Conv4: 256 3×3 convolutional cores, mataki na 1, cika 1 MaxPool3: 2×1 pooled, mataki girman (2,1) Conv5: 512 3×3 convolutional cores, mataki na 1, cika 1 BatchNorm + ReLU Conv6: 512 3×3 convolutional kernels, mataki na 1, cika 1 BatchNorm + ReLU MaxPool4: 2×1 pooled, mataki size (2,1) Conv7: 512 2×2 convolutional nuclei, mataki na 1, cika 0 Fitarwa: 512×1×W / 4 ### Key Design Considerations ** High Compression Strategy **: - Manufa: Matse hoton zuwa 1 pixel high - Hanyar: Sannu a hankali matse tsawo ta amfani da yawa pooling yadudduka - Dalili: Tsayin layin rubutu ba shi da mahimmanci ** Width Holding Strategy **: - Manufa: Kula da faɗin hoto kamar yadda zai yiwu - Hanya: Rage ayyukan pooling a cikin faɗin shugabanci - Dalili: Bayanin jerin rubutun galibi ana nunawa a cikin jagorancin faɗi **Feature Map Conversion**: Dole ne a canza maɓallin keɓaɓɓen - Raw fitarwa: C×H×W (tashar × tsawo× nisa) - Canzawa: W × C (Tsawo Tsawo× Siffar Girma) - Hanyar: Ɗauki fasalin vector don kowane matsayi mai faɗi azaman matakin lokaci ## Cikakken bayani game da Layer Layer ### RNN Selection CRNNs yawanci suna amfani da LSTMs bidirectional azaman Layer madauki: * Abũbuwan amfãni na Bidirectional LSTM **: - Bayanin Mahallin: Yi amfani da mahallin gaba da baya - Dogaro da nisa: LSTM na iya sarrafa dogaro mai nisa - Gradient Stabilization: Yana kauce wa matsalar ɓacewar gradient ** Tsarin cibiyar sadarwa **: Shigarwa: W×512 (jerin tsawon × fasalin girma) BiLSTM1: 256 ɓoyayyun sel (128 gaba + 128 baya) BiLSTM2: 256 ɓoyayyun sel (128 gaba + 128 baya) Fitarwa: W×256 (jerin tsawo× ɓoyayyun girma) ### Tsarin Tsarin T ** Tsarin Dogaro da Lokaci **: Layer na RNN yana ɗaukar dogaro da lokaci tsakanin haruffa: - Bayanin halayen da ya gabata yana taimakawa wajen gane halayen halin yanzu. - Bayanai don haruffa masu zuwa na iya samar da mahallin mai amfani - Cikakken bayani game da kalma ko kalma yana taimakawa wajen kawar da rikice-rikice. ** Abubuwan haɓakawa **: Siffofin da RNN ke sarrafawa suna da halaye masu zuwa: - Context-sensitive: Kowane wuri ta siffofi dauke da contextual bayanai - Daidaito na lokaci: Fasali a wuraren da ke kusa suna da wani ci gaba - Wadatar Semantic: Ya haɗu da siffofin gani da jerin ## Cikakken bayani game da rubutun ra'ayin kanka a yanar gizo ### Tsarin CTC CTC (Connectionist Temporal Classification) wani muhimmin ɓangare ne na CRNN: * Matsayin CTCs: - Magance matsalolin daidaitawa: Tsayin jerin shigarwa ba su dace da tsayin jerin fitarwa ba - Ƙarshen horo: Babu buƙatar ƙididdigar daidaitawa na halayen - Handle duplicates: Handle cases of duplicate characters daidai ** Ta yaya CTC ke aiki **: 1. Fadada saitin lakabin: Ƙara alamun fanko a saman asalin haruffa 2. Ƙididdigar hanya: Ƙididdigar duk hanyoyin daidaitawa 3. Yiwuwar hanya: Lissafin yiwuwar kowace hanya 4. Marginalization: jimla yiwuwar duk hanyoyi don samun jerin yiwuwar ### CTC Loss Function ** Wakilcin Lissafi **: Idan aka yi la'akari da jerin shigarwa X da jerin manufa Y, an bayyana asarar CTC kamar haka: L_CTC = -log P(Y| X) inda P(Y| (b) An yi amfani da shi ta hanyar yin amfani da duk hanyoyin da za a iya amfani da su: P(Y| X) = Σ_π∈B^(-1)(Y) P(π| X) A nan B ^ (-1) (Y) yana wakiltar duk hanyoyin da za a iya taswirar su zuwa jerin manufa Y. ** Gaba-Baya Algorithm **: Don ƙididdige asarar CTC yadda ya kamata, ana amfani da algorithm na gaba-baya don shirye-shirye masu ƙarfi: - Forward Algorithm: Lissafin yiwuwar isa kowace jiha - Algorithm na baya: Ƙididdige yiwuwar daga kowace jiha zuwa ƙarshe - Lissafin Gradient: Lissafin gradients tare da yiwuwar gaba-baya ## Tsarin Horarwa na CRNN ### Bayanan da aka yi amfani da su ** Tsarin hoto **: - Girman daidaitawa: Haɗa tsayin hoto zuwa pixels 32 - Aspect Ratio Maintenance: Kula da al'amari rabo na asali image - Grayscale Conversion: Canza zuwa wani single-tashar grayscale image - Daidaita lambobi: An daidaita ƙimar pixel zuwa [0,1] ko [-1,1] ** Inganta bayanai **: - Canje-canje na geometric: juyawa, karkatarwa, canjin hangen nesa - Canje-canje na haske: haske, daidaitawa na bambanci - Ƙarin hayaniya: Gaussian hayaniya, gishiri da barkono - Blur: Motion blur, Gaussian blur ### Dabarun horo ** Jadawalin Ƙididdigar Koyo **: - Ƙimar Ilmantarwa na Farko: Yawanci an saita shi zuwa 0.001 - Decay Strategy: Exponential decay or step decay - Warm-up dabarun: The farko 'yan epochs amfani da karamin ilmantarwa kudi ** Dabarun Daidaitawa **: - Dropout: Ƙara dropout bayan RNN Layer - Weight degradation: L2 regularization hana overfitting - Batch normalization: Yi amfani da batch normalization a cikin CNN Layer ** Zaɓin Optimizer **: - Adam: Daidaitawa ilmantarwa kudi, saurin haɗuwa - RMSprop: Ya dace da horo na RNN - SGD + Momentum: Zaɓi na gargajiya amma tsayayye ## Ingantawa da haɓaka CRNN ### Inganta gine-gine ** CNN Partial Improvements**: - ResNet Connections: Ƙara saura haɗi don inganta kwanciyar hankali na horo - DenseNet Fabric: Dense connections improve feature multiplexing - Hanyar Kulawa: Gabatar da hankali na sararin samaniya a cikin CNNs ** RNN Partial Improvements**: - GRU Replacement: Yi amfani da GRU don rage adadin sigogi - Transformer: Ya maye gurbin RNNs ta amfani da hanyoyin kulawa da kai - Multi-Scale Features: Haɗa fasali daga sikeli daban-daban ### Inganta Aiki ** Inference Acceleration **: - Model Quantization: INT8 quantization rage ƙididdigar - Model pruning: Cire haɗin da ba shi da mahimmanci - Knowledge Distillation: Koyi ilimin manyan samfura tare da ƙananan samfuran ** Memory Optimization **: - Gradient checkpoints: Rage ƙwaƙwalwar ajiya sawun yayin horo - Mixed Precision: Train tare da FP16 - Dynamic graph ingantawa: Inganta tsarin jadawalin da aka lissafa ## Aikace-aikacen Aikace- ### Zaɓuɓɓukan rubutun hannu ** Aikace-aikacen aikace-aikace **: - Digitize rubutun hannu - Form autofill - Tarihin tarihi ** Siffofin fasaha **: - Babban bambancin halaye: Yana buƙatar ƙarfin haɓaka fasali mai ƙarfi - Ci gaba da sarrafa bugun jini: Fa'idodin tsarin CTC a bayyane yake - Context Matters: RNNs 'jerin samfurin samfurin yana da mahimmanci ### Zaɓuɓɓukan da aka buga ** Aikace-aikacen aikace-aikace **: - Digitize takardu - Bayanin tikitin - Signage gane ** Siffofin fasaha **: - Font Regularity: CNN feature extraction ne mai sauƙi - Dokokin Typography: Ana iya amfani da bayanan shimfidar wuri - Babban Daidaito Bukatun: Yana buƙatar gyare-gyare mai kyau ### Gano rubutun rubutu ** Aikace-aikacen aikace-aikace **: - Street View Text Recognition - Samfurin lakabin ganewa - Gano alamar zirga-zirgar ababen hawa ** Siffofin fasaha **: - Hadaddun baya: Yana buƙatar cire fasali mai ƙarfi - Tsananin lalacewa: Ana buƙatar ƙirar gine-gine mai ƙarfi - Real-Time Requirements: Yana buƙatar ingantaccen tunani ## Summary A matsayin gine-ginen gargajiya na OCR mai zurfi, CRNN ya sami nasarar warware matsaloli da yawa na hanyoyin OCR na gargajiya. Hanyar horo na ƙarshe zuwa ƙarshe, ra'ayin ƙira ba tare da rarrabuwar halayyar ba, da kuma gabatarwar tsarin CTC duk suna ba da muhimmiyar wahayi ga ci gaban fasahar OCR na gaba. ** Mahimman Gudummawa **: - End-to-End Learning: Sauƙaƙe ƙirar tsarin OCR - Tsarin Tsarin: Yadda ya kamata yana amfani da kaddarorin jerin rubutu - CTC Alignment: Addressed jerin tsawon rashin daidaituwa - Tsarin gine-gine: mai sauƙin fahimta da aiwatarwa ** Jagorar ci gaba **: - Hankali Mechanism: Gabatar da hankali don inganta aiki - Transformer: Ya maye gurbin RNNs tare da kulawa da kai - Multimodal fusion: Haɗa wasu bayanai kamar samfuran harshe - Zane mara nauyi: matsawa samfurin don na'urorin hannu Nasarar CRNN shaida ce ga babban yuwuwar ilmantarwa mai zurfi a fagen OCR kuma yana ba da gogewa mai mahimmanci don fahimtar yadda ake tsara ingantattun tsarin ilmantarwa na ƙarshe zuwa ƙarshe. A cikin labarin na gaba, za mu tattauna game da ilimin lissafi da cikakkun bayanai game da aikin asarar CTC.
OCR mataimakin QQ sabis na abokin ciniki na kan layi
Sabis na abokin ciniki na QQ(365833440)
OCR mataimakin QQ mai amfani sadarwa rukunin
QQrukuni(100029010)
Mataimakin OCR tuntuɓi sabis na abokin ciniki ta imel
Akwatin gidan waya:net10010@qq.com

Na gode da ra'ayoyinku da shawarwarinku!