{"id":1169,"date":"2024-02-23T09:13:30","date_gmt":"2024-02-23T09:13:30","guid":{"rendered":"https:\/\/imalogic.com\/blog\/?p=1169"},"modified":"2024-03-24T20:32:28","modified_gmt":"2024-03-24T20:32:28","slug":"artificial-intelligence-ai","status":"publish","type":"post","link":"https:\/\/imalogic.com\/blog\/2024\/02\/23\/artificial-intelligence-ai\/","title":{"rendered":"&#8216;Artificial Intelligence&#8217; (AI)"},"content":{"rendered":"<body>\n<p>\u201cThere are numerous algorithms grouped under the general term \u2018<strong>Artificial Intelligence\u2019 (AI)<\/strong>, as<strong> AI<\/strong> encompasses a variety of fields and techniques. Here is a non-exhaustive list of types of algorithms associated with <strong>AI<\/strong>:\u201d<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"810\" height=\"454\" data-attachment-id=\"1170\" data-permalink=\"https:\/\/imalogic.com\/blog\/2024\/02\/23\/artificial-intelligence-ai\/420030172_332328309778937_2100661786999633092_n\/\" data-orig-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/420030172_332328309778937_2100661786999633092_n.jpg?fit=1456%2C816&amp;ssl=1\" data-orig-size=\"1456,816\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"420030172_332328309778937_2100661786999633092_n\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/420030172_332328309778937_2100661786999633092_n.jpg?fit=810%2C454&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/420030172_332328309778937_2100661786999633092_n.jpg?resize=810%2C454&#038;ssl=1\" alt=\"\" class=\"wp-image-1170\" loading=\"lazy\" srcset=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/420030172_332328309778937_2100661786999633092_n.jpg?w=1456&amp;ssl=1 1456w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/420030172_332328309778937_2100661786999633092_n.jpg?resize=300%2C168&amp;ssl=1 300w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/420030172_332328309778937_2100661786999633092_n.jpg?resize=1024%2C574&amp;ssl=1 1024w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/420030172_332328309778937_2100661786999633092_n.jpg?resize=768%2C430&amp;ssl=1 768w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><\/figure>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Supervised Learning:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Linear Regression<\/li>\n\n\n\n<li>Logistic Regression<\/li>\n\n\n\n<li>Support Vector Machines (SVM)<\/li>\n\n\n\n<li>Decision Trees<\/li>\n\n\n\n<li>Random Forests<\/li>\n\n\n\n<li>Artificial Neural Networks (ANN)<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Unsupervised Learning:<\/strong>\n<ul class=\"wp-block-list\">\n<li>K-Means \u2013 e.g  <a href=\"https:\/\/imalogic.com\/blog\/2017\/05\/15\/distance-squared-optimization-using-sse-instruction-set\/\" data-type=\"post\" data-id=\"402\">Distance Squared optimization using SSE instruction set<\/a><\/li>\n\n\n\n<li>Hierarchical Agglomerative Clustering (HAC)<\/li>\n\n\n\n<li>Principal Component Analysis (PCA)<\/li>\n\n\n\n<li>Autoencoders<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Reinforcement Learning:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Q-Learning<\/li>\n\n\n\n<li>SARSA<\/li>\n\n\n\n<li>Deep Q Network (DQN)<\/li>\n\n\n\n<li>Policy Gradient Methods<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Natural Language Processing (NLP):<\/strong>\n<ul class=\"wp-block-list\">\n<li>Pre-trained Language Models (such as BERT, GPT)<\/li>\n\n\n\n<li>Support Vector Machines for NLP<\/li>\n\n\n\n<li>Recurrent Neural Networks (RNN)<\/li>\n\n\n\n<li>Transformers<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Optimization Algorithms:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Gradient Descent<\/li>\n\n\n\n<li>Stochastic Gradient Descent (SGD)<\/li>\n\n\n\n<li>Bayesian Optimization Methods<\/li>\n\n\n\n<li>Evolutionary Algorithms \u2013 e.g <a href=\"https:\/\/imalogic.com\/blog\/2017\/05\/15\/labeling-in-clpfd-with-evolutionary-programming\/\" data-type=\"post\" data-id=\"385\">Labeling in CLP(FD) with evolutionary programming<\/a><\/li>\n\n\n\n<li>Markov Chains \u2013 e.g <a href=\"https:\/\/imalogic.com\/blog\/2016\/11\/08\/reconnaissance-vocale-etude-dun-engine-monolocuteur-approche-globale\/\" data-type=\"post\" data-id=\"178\">Reconnaissance vocale : Etude d\u2019un engine Monolocuteur \u2013 Approche globale<\/a><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pattern Recognition Algorithms:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Convolutional Neural Networks (CNN)<\/li>\n\n\n\n<li>Feature-based Algorithms (e.g., Local Binary Patterns)<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Ensemble Algorithms:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Bagging (Bootstrap Aggregating)<\/li>\n\n\n\n<li>Boosting (e.g., AdaBoost, Gradient Boosting)<\/li>\n\n\n\n<li>Random Forests<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Network Analysis Algorithms:<\/strong>\n<ul class=\"wp-block-list\">\n<li>PageRank<\/li>\n\n\n\n<li>Random Walk Clustering Algorithm<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Fuzzy Logic Algorithms:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Fuzzy Systems for Decision Making<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Probabilistic Inference Algorithms:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Bayesian Networks<\/li>\n\n\n\n<li>Probabilistic Graphical Models<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Genetic Algorithms:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Evolutionary Algorithms<\/li>\n\n\n\n<li>Genetic Programming<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Expert Systems:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Rule-Based Systems<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Image Processing Algorithms:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Feature Extraction \u2013 e.g <a href=\"https:\/\/imalogic.com\/blog\/2016\/11\/08\/reconnaissance-vocale-etude-dun-engine-monolocuteur-approche-globale\/\" data-type=\"post\" data-id=\"178\">Reconnaissance vocale : Etude d\u2019un engine Monolocuteur \u2013 Approche globale<\/a><\/li>\n\n\n\n<li>Image Filtering<\/li>\n\n\n\n<li>Image Transformation<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p>This list provides an overview of the diversity of algorithms used in artificial intelligence. It\u2019s important to note that <strong>AI<\/strong> is a rapidly evolving field, and new approaches and algorithms continue to emerge.<\/p>\n\n\n\n<p class=\"has-large-font-size\">Deep Learning<\/p>\n\n\n\n<p>\u201cDeep Learning is a <strong>subfield of artificial intelligence<\/strong> that uses deep artificial neural networks to model and solve complex problems. Deep neural networks can learn hierarchical representations from data, making them particularly powerful for tasks such as computer vision, speech recognition, natural language understanding, etc.<\/p>\n\n\n\n<figure class=\"wp-block-video\"><video height=\"768\" style=\"aspect-ratio: 1408 \/ 768;\" width=\"1408\" controls src=\"https:\/\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/Gen-2-1129614985-A-walking-man-in-a-c-cam_H-1-cam_V-05-cam_YW-05.mp4\"><\/video><figcaption class=\"wp-element-caption\"><em>\u201cMy first attempt to generate a video through AI, using prompt text!\u201d<\/em><\/figcaption><\/figure>\n\n\n\n<p>In the list of types of algorithms associated with AI, Deep Learning can be included in several categories, including:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Artificial Neural Networks (ANN):<\/strong>\n<ul class=\"wp-block-list\">\n<li>Deep Learning primarily relies on deep neural networks, which are an evolution of traditional neural networks.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Supervised Learning:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Deep neural networks are often used in supervised learning tasks like classification and regression.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Natural Language Processing (NLP):<\/strong>\n<ul class=\"wp-block-list\">\n<li>Pre-trained language models based on Deep Learning, such as BERT and GPT, have become essential in the field of NLP.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Computer Vision:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Convolutional Neural Networks (CNN), a deep neural network architecture, are widely used in computer vision for image classification, object detection, etc.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p>It\u2019s important to note that Deep Learning is often considered as a specific and significant approach within the field of AI due to its extensive use of deep neural networks. However, Deep Learning techniques can be applied across various sub-disciplines of AI, as mentioned above.\u201d<\/p>\n\n\n\n<p class=\"has-large-font-size\">Generative AI<\/p>\n\n\n\n<p>Generative AI utilizes Deep Learning to create new content by mimicking complex patterns in the data.<\/p>\n\n\n\n<p>Generative Artificial Intelligence (Generative AI) is a branch of artificial intelligence focused on creating new content, such as images, text, music, videos, etc., based on existing data. Unlike other approaches to artificial intelligence that concentrate on solving specific problems or analyzing data, Generative AI aims to produce new data that resembles what it has learned.<\/p>\n\n\n\n<p>Generative AI typically uses machine learning models, including neural networks, to generate this new content. These models can be trained on large amounts of data to learn to recognize specific patterns and features in the data, and then reproduce them to generate new data.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><a href=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/03\/IMG_2943.jpg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"810\" height=\"730\" data-attachment-id=\"1191\" data-permalink=\"https:\/\/imalogic.com\/blog\/2024\/02\/23\/artificial-intelligence-ai\/img_2943\/\" data-orig-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/03\/IMG_2943.jpg?fit=836%2C753&amp;ssl=1\" data-orig-size=\"836,753\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;1&quot;}\" data-image-title=\"IMG_2943\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/03\/IMG_2943.jpg?fit=810%2C730&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/03\/IMG_2943.jpg?resize=810%2C730&#038;ssl=1\" alt=\"\" class=\"wp-image-1191\" style=\"width:879px;height:auto\" loading=\"lazy\" srcset=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/03\/IMG_2943.jpg?w=836&amp;ssl=1 836w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/03\/IMG_2943.jpg?resize=300%2C270&amp;ssl=1 300w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/03\/IMG_2943.jpg?resize=768%2C692&amp;ssl=1 768w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><\/a><\/figure>\n\n\n\n<p>Popular applications of Generative AI include generative art creation, voice and music synthesis, natural language generation, creating realistic faces, and much more. These technologies can be used in areas such as artistic creation, video game design, multimedia content production, and even in more serious applications such as generating design prototypes or creating synthetic data for training other AI models.<\/p>\n\n\n\n<p class=\"has-large-font-size\">Machine Learning VS Deep Learning<\/p>\n\n\n\n<p>Machine learning and deep learning are both subsets of artificial intelligence (AI), but they differ in their approaches to learning from data:<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em><strong>Machine Learning:<\/strong><\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Definition<\/strong>: Machine learning is a method of teaching computers to learn from data without being explicitly programmed.<br><strong>Approach<\/strong>: It involves algorithms that can improve their performance on a task through experience (i.e., exposure to data).<br><strong>Feature Engineering<\/strong>: In traditional machine learning, humans often need to manually engineer features from raw data to make it suitable for learning.<br><strong>Algorithm Complexity<\/strong>: Machine learning algorithms tend to be simpler and rely on relatively shallow models to make predictions.<br><strong>Applications<\/strong>: Common applications include spam detection, recommendation systems, and predictive analytics.<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><br><strong><em>Deep Learning:<\/em><\/strong><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Definition<\/strong>: Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers (deep architectures) to learn from large amounts of data.<br><strong>Approach<\/strong>: Deep learning algorithms automatically learn hierarchical representations of data through multiple layers of abstraction, without the need for manual feature engineering.<br><strong>Feature Learning<\/strong>: Deep learning algorithms can learn features directly from raw data, eliminating the need for handcrafted feature extraction.<br><strong>Algorithm Complexity<\/strong>: Deep learning models are more complex and capable of capturing intricate patterns in data, often requiring significant computational resources.<br><strong>Applications<\/strong>: Deep learning has seen remarkable success in tasks such as image recognition, natural language processing, speech recognition, and autonomous driving.<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><br>In summary, while both machine learning and deep learning involve training algorithms to learn from data, deep learning distinguishes itself by its use of deep neural networks with multiple layers, enabling it to automatically learn hierarchical representations of data, leading to superior performance in certain tasks, particularly those involving unstructured data like images, audio, and text.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong><em>Below are some use cases implementing AI algorithms used to solve specific problems.<\/em><\/strong><\/p>\n\n\n\n<p class=\"has-large-font-size\">Use case : Automatic Labeling With GroundingDino<\/p>\n\n\n\n<p>\u201cThis chapter is derived from \u2018A Practical Guide to Tag Object Detection Datasets with the GroundingDino Algorithm\u2019 and represents a specific use case of AI.\u201d<\/p>\n\n\n\n<h1 class=\"wp-block-heading has-medium-font-size\" id=\"ed76\">GroundingDino<\/h1>\n\n\n\n<p id=\"356b\"><strong><em>Background<\/em><\/strong><\/p>\n\n\n\n<p id=\"eaee\">GroundingDino is a state-of-the-art (SOTA) algorithm developed by IDEA-Research in 2023 [1]. It detects objects from images using text prompts. The name \u201cGroundingDino\u201d is a combination of \u201cgrounding\u201d (a process that links vision and language understanding in AI systems) and the transformer-based detector \u201cDINO\u201d [2]. This algorithm is a zero-shot object detector, which means it can identify objects from categories it was not specifically trained on, without needing to see any examples (shots).<\/p>\n\n\n\n<p id=\"1f36\"><strong><em>Architecture<\/em><\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The model takes pairs of image and text description as inputs.<\/li>\n\n\n\n<li>Image features are extracted with an\u00a0<strong>image backbone<\/strong>\u00a0such as Swin Transformer, and text features with a\u00a0<strong>text backbone<\/strong>\u00a0like BERT.<\/li>\n\n\n\n<li>To fuse image and text modalities into a single representation, both types of features are fed into the\u00a0<strong>Feature Enhancer<\/strong>\u00a0module.<\/li>\n\n\n\n<li>Next, the \u2018<strong>Language-guided Query Selection<\/strong>\u2019 module selects the features most relevant to the input text to use as decoder queries.<\/li>\n\n\n\n<li>These queries are then fed into a\u00a0<strong>decoder<\/strong>\u00a0to refine the prediction of object detection boxes that best align with the text information.<\/li>\n\n\n\n<li>The model outputs 900 object bounding boxes and their similarity scores to the input words. The boxes with similarity scores above the\u00a0<code>box_threshold<\/code>\u00a0are chosen, and words whose similarities are higher than the\u00a0<code>text_threshold<\/code>\u00a0as predicted labels.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-5.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"810\" height=\"252\" data-attachment-id=\"1174\" data-permalink=\"https:\/\/imalogic.com\/blog\/2024\/02\/23\/artificial-intelligence-ai\/image-5\/\" data-orig-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-5.png?fit=1712%2C533&amp;ssl=1\" data-orig-size=\"1712,533\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image-5\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-5.png?fit=810%2C252&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-5.png?resize=810%2C252&#038;ssl=1\" alt=\"\" class=\"wp-image-1174\" style=\"width:880px;height:auto\" loading=\"lazy\" srcset=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-5.png?resize=1024%2C319&amp;ssl=1 1024w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-5.png?resize=300%2C93&amp;ssl=1 300w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-5.png?resize=768%2C239&amp;ssl=1 768w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-5.png?resize=1536%2C478&amp;ssl=1 1536w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-5.png?w=1712&amp;ssl=1 1712w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-5.png?w=1620&amp;ssl=1 1620w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><\/a><\/figure>\n\n\n\n<p id=\"caef\"><strong><em>Prompt Engineering<\/em><\/strong><\/p>\n\n\n\n<p id=\"7b08\">The GroundingDino model encodes text prompts into a learned latent space. Altering the prompts can lead to different text features, which can affect the performance of the detector. To enhance prediction performance, it\u2019s advisable to experiment with multiple prompts, choosing the one that delivers the best results. It\u2019s important to note that while writing this article I had to try several prompts before finding the ideal one, sometimes encountering unexpected results.<\/p>\n\n\n\n<h1 class=\"wp-block-heading has-medium-font-size\" id=\"37b3\">Concluding remarks<\/h1>\n\n\n\n<p id=\"f880\">GroundingDino offers a significant leap in object detection annotations by using text prompts. In this chapter, we have explored how to use the model for automated labeling of an image or a whole dataset. It\u2019s crucial, however, to manually review and verify these annotations before they are utilized in training subsequent models.<\/p>\n\n\n\n<p><strong>Original paper for this use case :<\/strong> https:\/\/arxiv.org\/pdf\/2401.17270.pdf \u2013 <a href=\"https:\/\/medium.com\/@lihigurarie?source=post_page-----b66c486656fe--------------------------------\">Lihi Gur Arie, PhD<\/a><\/p>\n\n\n\n<p class=\"has-large-font-size\">Use Case : Weakly supervised instance segmentation (WSIS)<\/p>\n\n\n\n<p>This chapter addresses a challenge in the field of artificial intelligence known as \u201cWeakly Supervised Instance Segmentation\u201d (WSIS). Unlike having detailed annotations for each object in an image, WSIS relies on general indications about the entire image.<\/p>\n\n\n\n<p>Utilizing such indications presents a challenge as AI models may generate redundant results, where a single object is represented by multiple proposals. For instance, when presenting the model with an image of a dog, one would expect it to output a single proposal featuring a dog. However, it may produce multiple proposals.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-7.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"810\" height=\"401\" data-attachment-id=\"1178\" data-permalink=\"https:\/\/imalogic.com\/blog\/2024\/02\/23\/artificial-intelligence-ai\/image-7\/\" data-orig-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-7.png?fit=983%2C487&amp;ssl=1\" data-orig-size=\"983,487\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image-7\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-7.png?fit=810%2C401&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-7.png?resize=810%2C401&#038;ssl=1\" alt=\"\" class=\"wp-image-1178\" loading=\"lazy\" srcset=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-7.png?w=983&amp;ssl=1 983w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-7.png?resize=300%2C149&amp;ssl=1 300w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-7.png?resize=768%2C380&amp;ssl=1 768w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><\/a><figcaption class=\"wp-element-caption\">Redundant segmentation. For each instance, it always<br>corresponds to multiple proposals. <strong>Yellow boxes: expected<\/strong> segmentations. <strong>Red boxes: redundant segmentations<\/strong>.<\/figcaption><\/figure>\n\n\n\n<p>To overcome this issue, this use case introduces a novel approach. It employs \u201cMaskIoU heads\u201d to evaluate proposal quality, a strategy known as \u201cComplete Instances Mining\u201d (CIM) to address redundancy, and an \u201cAnti-noise\u201d strategy to filter errors in the model\u2019s predictions.<\/p>\n\n\n\n<p>The method underwent testing on popular datasets, showcasing outstanding performance. In essence, it proposes a more effective method for training AI models to comprehend images, even when information is limited, as seen in the case of image annotations.<\/p>\n\n\n\n<p>In this scenario, we tackled a common problem that arises when utilizing intelligent technologies to interpret images. Consider showing a computer an image of a dog; instead of providing a straightforward response like \u201cit\u2019s a dog,\u201d the computer might generate multiple similar responses, complicating understanding.<\/p>\n\n\n\n<p>This phenomenon is termed \u201credundant segmentation.\u201d It implies that the computer may identify various parts of the image as important, leading to confusion and reduced precision.<\/p>\n\n\n\n<p>The approach aims to address this issue in real-time, ensuring clearer and more accurate results. It incorporates techniques such as \u201cMaskIoU head\u201d and \u201cCIM strategy\u201d to enhance the computer\u2019s image comprehension without resorting to complex methods.<\/p>\n\n\n\n<p>For instance, if you show the computer a picture of a dog in a park, rather than a simple response like \u201cit\u2019s a dog in a park,\u201d it might redundantly mention \u201cthere is green grass\u201d or \u201cthe sky is blue,\u201d contributing minimally to understanding the dog.<\/p>\n\n\n\n<p>Moreover, the approach employs an \u201cAnti-noise\u201d strategy to rectify errors that may occur in the information provided by the computer. This method has demonstrated excellent results across various images, and ongoing efforts aim to further enhance the system\u2019s robustness in image comprehension without unnecessary complexity.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-6.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"810\" height=\"385\" data-attachment-id=\"1177\" data-permalink=\"https:\/\/imalogic.com\/blog\/2024\/02\/23\/artificial-intelligence-ai\/image-6\/\" data-orig-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-6.png?fit=1421%2C676&amp;ssl=1\" data-orig-size=\"1421,676\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image-6\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-6.png?fit=810%2C385&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-6.png?resize=810%2C385&#038;ssl=1\" alt=\"\" class=\"wp-image-1177\" loading=\"lazy\" srcset=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-6.png?resize=1024%2C487&amp;ssl=1 1024w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-6.png?resize=300%2C143&amp;ssl=1 300w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-6.png?resize=768%2C365&amp;ssl=1 768w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-6.png?w=1421&amp;ssl=1 1421w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><\/a><figcaption class=\"wp-element-caption\">The proposed method consists of three key components: an Anti-noise branch, K Refinement branches, and a Complete Instances Mining (CIM) strategy. Proposal features are generated using MaskFuse and divided into multiple branches. Both the Anti-noise and Refinement branches produce classification and integrity scores. The CIM strategy utilizes the output of the preceding branch to generate refined pseudo labels, guiding the supervision of the next branch. Simultaneously, the Anti-noise branch is supervised using pre-computed pseudo labels. In a visual representation, the right column depicts purple and red elements representing seeds and pseudo ground truth, respectively. The seeds spatially spread to identify complete proposals as pseudo ground truth, considering spatial relationships and integrity scores.<\/figcaption><\/figure>\n\n\n\n<p>In summary, the cahpter delves into a major challenge in the field of artificial intelligence related to redundant segmentation when identifying objects in images. The difficulty lies in managing incomplete data without detailed annotations for each object, which can lead to redundant and less precise results. The presented approach aims to address this issue by proposing innovative strategies such as the Anti-noise branch and CIM strategy. These techniques seek to enhance the understanding of images, even in the absence of detailed information. <\/p>\n\n\n\n<p><strong>Original paper for this use case :<\/strong> <a href=\"https:\/\/arxiv.org\/pdf\/2402.07633.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/pdf\/2402.07633.pdf<\/a>  <\/p>\n\n\n\n<p><em>Zecheng Li 1, Zening Zeng 1, Yuqi Liang 1 and Jin-Gang Yu 1, 2*  \/ 1 \u2013  South China University of Technology, 2 \u2013 Pazhou Laboratory<\/em><\/p>\n\n\n\n<p> <\/p>\n<\/body>","protected":false},"excerpt":{"rendered":"<p>\u201cThere are numerous algorithms grouped under the general term \u2018Artificial Intelligence\u2019 (AI), as AI encompasses a variety of fields and<\/p>\n","protected":false},"author":1,"featured_media":1173,"comment_status":"closed","ping_status":"closed","sticky":true,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[134,65,133,7,66,6,116],"tags":[113,135],"class_list":["post-1169","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-a-i","category-analyse","category-artificial-intelligence","category-coding","category-computer-graphics","category-signal-processing","category-software-engineering","tag-a-i","tag-artificial-intelligence"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2024\/02\/image-4.png?fit=950%2C497&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8J21V-iR","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/1169","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/comments?post=1169"}],"version-history":[{"count":1,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/1169\/revisions"}],"predecessor-version":[{"id":1192,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/1169\/revisions\/1192"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/media\/1173"}],"wp:attachment":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/media?parent=1169"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/categories?post=1169"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/tags?post=1169"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}