DISTRIBUTED FEATURE EXTRACTION WITH AUTOENCODERS FOR BREAST CANCER DETECTION3
- Details
- Hits: 19
Volume 7, Article e2025.03, 2025, Pages 1-10
Elshan Naghizade
Azerbaijan State Oil and Industry University, Baku, Azerbaijan, This email address is being protected from spambots. You need JavaScript enabled to view it.
Abstract
In this study, a distributed feature extraction pipeline was developed for breast cancer detection us-ing autoencoders. Mammogram images were first derived from the RSNA dataset by converting DICOM files into PNG format. Twelve convolutional autoencoders were trained, where all layers were convolutional except for the bottleneck layer, which was defined as a fully connected layer. This layer served as the compressed feature vector. Variations across models were introduced by modifying the number of convolutional layers in encoders/decoders (3, 4, or 5) and the dimension-ality of the feature vector (128, 256, 512, or 1024). Training was conducted using mean squared error loss in a synchronous multi-worker setup on a four-node CPU cluster utilizing the Keras framework. After training, the extracted features were evaluated using three classification models: logistic regression, XGBoost, and CatBoost. The performance of each feature vector configuration was assessed based on accuracy, precision, recall, and F1-score. Through comparative analysis, the effectiveness of different vector sizes and model complexities in representing diagnostic features was determined. This approach demonstrated the feasibility of scalable, distributed feature extrac-tion for high-resolution medical imaging tasks, offering a practical framework for future breast cancer detection systems.
Keywords:
Autoencoder, Feature Extraction, Breast Cancer, XGBoost, CatBoost.
DOI: https://doi.org/10.32010/26166127.2025.03
Reference
Baur, C., Wiestler, B., Albarqouni, S., and Navab, N. (2021). Autoencoders for unsupervised anomaly segmentation in brain MR images: A comparative study. Medical Image Analysis, 69, 101952. doi:10.1016/j.media.2021.101952
Chen, T., and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. doi:10.1145/2939672.2939785
Chen, Y., Li, H., Huang, P., and Zheng, Y. (2019). Deep learning with hierarchical convolutional autoencoders for automated detection of pulmonary nodules. IEEE Access, 7, 130000–130009. doi:10.1109/ACCESS.2019.2940017
Dorogush, A.V., Ershov, V., and Gulin, A. (2018). CatBoost: Gradient boosting with categorical features support. arXiv preprint, arXiv:1810.11363
Goyal, P., Dollár, P., et al. (2017). Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv preprint, arXiv:1706.02677
Ribli, D., Horváth, A., Unger, Z., Pollner, P., and Csabai, I. (2018). Detecting and classifying lesions in mammograms with deep learning. Scientific Reports, 8(1), 4165. doi:10.1038/s41598-018-22437-z
Shen, L., Margolies, L.R., Rothstein, J.H., Fluder, E., McBride, R., and Sieh, W. (2019). Deep learning to improve breast cancer detection on mammography. Scientific Reports, 9(1), 12495. doi:10.1038/s41598-019-48995-4
Suzuki, K. (2018). Overview of deep learning in medical imaging. Radiological Physics and Technology, 10(3), 257–273. doi:10.1007/s12194-017-0406-5
Tajbakhsh, N., Shin, J.Y., et al. (2016). Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE Transactions on Medical Imaging, 35(5), 1299–1312. doi:10.1109/TMI.2016.2535302
TensorFlow Documentation. (2023). Distributed training with TensorFlow. TensorFlow. Available at: https://www.tensorflow.org/guide/distributed_training (Accessed 4 June 2025)
