‘Artificial Intelligence’ (AI)

“There are numerous algorithms grouped under the general term ‘Artificial Intelligence’ (AI), as AI encompasses a variety of fields and techniques. Here is a non-exhaustive list of types of algorithms associated with AI:”

  1. Supervised Learning:
    • Linear Regression
    • Logistic Regression
    • Support Vector Machines (SVM)
    • Decision Trees
    • Random Forests
    • Artificial Neural Networks (ANN)
  2. Unsupervised Learning:
  3. Reinforcement Learning:
    • Q-Learning
    • SARSA
    • Deep Q Network (DQN)
    • Policy Gradient Methods
  4. Natural Language Processing (NLP):
    • Pre-trained Language Models (such as BERT, GPT)
    • Support Vector Machines for NLP
    • Recurrent Neural Networks (RNN)
    • Transformers
  5. Optimization Algorithms:
  6. Pattern Recognition Algorithms:
    • Convolutional Neural Networks (CNN)
    • Feature-based Algorithms (e.g., Local Binary Patterns)
  7. Ensemble Algorithms:
    • Bagging (Bootstrap Aggregating)
    • Boosting (e.g., AdaBoost, Gradient Boosting)
    • Random Forests
  8. Network Analysis Algorithms:
    • PageRank
    • Random Walk Clustering Algorithm
  9. Fuzzy Logic Algorithms:
    • Fuzzy Systems for Decision Making
  10. Probabilistic Inference Algorithms:
    • Bayesian Networks
    • Probabilistic Graphical Models
  11. Genetic Algorithms:
    • Evolutionary Algorithms
    • Genetic Programming
  12. Expert Systems:
    • Rule-Based Systems
  13. Image Processing Algorithms:

This list provides an overview of the diversity of algorithms used in artificial intelligence. It’s important to note that AI is a rapidly evolving field, and new approaches and algorithms continue to emerge.

Deep Learning

“Deep Learning is a subfield of artificial intelligence that uses deep artificial neural networks to model and solve complex problems. Deep neural networks can learn hierarchical representations from data, making them particularly powerful for tasks such as computer vision, speech recognition, natural language understanding, etc.

“My first attempt to generate a video through AI, using prompt text!”

In the list of types of algorithms associated with AI, Deep Learning can be included in several categories, including:

  1. Artificial Neural Networks (ANN):
    • Deep Learning primarily relies on deep neural networks, which are an evolution of traditional neural networks.
  2. Supervised Learning:
    • Deep neural networks are often used in supervised learning tasks like classification and regression.
  3. Natural Language Processing (NLP):
    • Pre-trained language models based on Deep Learning, such as BERT and GPT, have become essential in the field of NLP.
  4. Computer Vision:
    • Convolutional Neural Networks (CNN), a deep neural network architecture, are widely used in computer vision for image classification, object detection, etc.

It’s important to note that Deep Learning is often considered as a specific and significant approach within the field of AI due to its extensive use of deep neural networks. However, Deep Learning techniques can be applied across various sub-disciplines of AI, as mentioned above.”

Generative AI

Generative AI utilizes Deep Learning to create new content by mimicking complex patterns in the data.

Generative Artificial Intelligence (Generative AI) is a branch of artificial intelligence focused on creating new content, such as images, text, music, videos, etc., based on existing data. Unlike other approaches to artificial intelligence that concentrate on solving specific problems or analyzing data, Generative AI aims to produce new data that resembles what it has learned.

Generative AI typically uses machine learning models, including neural networks, to generate this new content. These models can be trained on large amounts of data to learn to recognize specific patterns and features in the data, and then reproduce them to generate new data.

Popular applications of Generative AI include generative art creation, voice and music synthesis, natural language generation, creating realistic faces, and much more. These technologies can be used in areas such as artistic creation, video game design, multimedia content production, and even in more serious applications such as generating design prototypes or creating synthetic data for training other AI models.

Machine Learning VS Deep Learning

Machine learning and deep learning are both subsets of artificial intelligence (AI), but they differ in their approaches to learning from data:

Machine Learning:

Definition: Machine learning is a method of teaching computers to learn from data without being explicitly programmed.
Approach: It involves algorithms that can improve their performance on a task through experience (i.e., exposure to data).
Feature Engineering: In traditional machine learning, humans often need to manually engineer features from raw data to make it suitable for learning.
Algorithm Complexity: Machine learning algorithms tend to be simpler and rely on relatively shallow models to make predictions.
Applications: Common applications include spam detection, recommendation systems, and predictive analytics.


Deep Learning:

Definition: Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers (deep architectures) to learn from large amounts of data.
Approach: Deep learning algorithms automatically learn hierarchical representations of data through multiple layers of abstraction, without the need for manual feature engineering.
Feature Learning: Deep learning algorithms can learn features directly from raw data, eliminating the need for handcrafted feature extraction.
Algorithm Complexity: Deep learning models are more complex and capable of capturing intricate patterns in data, often requiring significant computational resources.
Applications: Deep learning has seen remarkable success in tasks such as image recognition, natural language processing, speech recognition, and autonomous driving.


In summary, while both machine learning and deep learning involve training algorithms to learn from data, deep learning distinguishes itself by its use of deep neural networks with multiple layers, enabling it to automatically learn hierarchical representations of data, leading to superior performance in certain tasks, particularly those involving unstructured data like images, audio, and text.

Below are some use cases implementing AI algorithms used to solve specific problems.

Use case : Automatic Labeling With GroundingDino

“This chapter is derived from ‘A Practical Guide to Tag Object Detection Datasets with the GroundingDino Algorithm’ and represents a specific use case of AI.”

GroundingDino

Background

GroundingDino is a state-of-the-art (SOTA) algorithm developed by IDEA-Research in 2023 [1]. It detects objects from images using text prompts. The name “GroundingDino” is a combination of “grounding” (a process that links vision and language understanding in AI systems) and the transformer-based detector “DINO” [2]. This algorithm is a zero-shot object detector, which means it can identify objects from categories it was not specifically trained on, without needing to see any examples (shots).

Architecture

  1. The model takes pairs of image and text description as inputs.
  2. Image features are extracted with an image backbone such as Swin Transformer, and text features with a text backbone like BERT.
  3. To fuse image and text modalities into a single representation, both types of features are fed into the Feature Enhancer module.
  4. Next, the ‘Language-guided Query Selection’ module selects the features most relevant to the input text to use as decoder queries.
  5. These queries are then fed into a decoder to refine the prediction of object detection boxes that best align with the text information.
  6. The model outputs 900 object bounding boxes and their similarity scores to the input words. The boxes with similarity scores above the box_threshold are chosen, and words whose similarities are higher than the text_threshold as predicted labels.

Prompt Engineering

The GroundingDino model encodes text prompts into a learned latent space. Altering the prompts can lead to different text features, which can affect the performance of the detector. To enhance prediction performance, it’s advisable to experiment with multiple prompts, choosing the one that delivers the best results. It’s important to note that while writing this article I had to try several prompts before finding the ideal one, sometimes encountering unexpected results.

Concluding remarks

GroundingDino offers a significant leap in object detection annotations by using text prompts. In this chapter, we have explored how to use the model for automated labeling of an image or a whole dataset. It’s crucial, however, to manually review and verify these annotations before they are utilized in training subsequent models.

Original paper for this use case : https://arxiv.org/pdf/2401.17270.pdf – Lihi Gur Arie, PhD

Use Case : Weakly supervised instance segmentation (WSIS)

This chapter addresses a challenge in the field of artificial intelligence known as “Weakly Supervised Instance Segmentation” (WSIS). Unlike having detailed annotations for each object in an image, WSIS relies on general indications about the entire image.

Utilizing such indications presents a challenge as AI models may generate redundant results, where a single object is represented by multiple proposals. For instance, when presenting the model with an image of a dog, one would expect it to output a single proposal featuring a dog. However, it may produce multiple proposals.

Redundant segmentation. For each instance, it always
corresponds to multiple proposals. Yellow boxes: expected segmentations. Red boxes: redundant segmentations.

To overcome this issue, this use case introduces a novel approach. It employs “MaskIoU heads” to evaluate proposal quality, a strategy known as “Complete Instances Mining” (CIM) to address redundancy, and an “Anti-noise” strategy to filter errors in the model’s predictions.

The method underwent testing on popular datasets, showcasing outstanding performance. In essence, it proposes a more effective method for training AI models to comprehend images, even when information is limited, as seen in the case of image annotations.

In this scenario, we tackled a common problem that arises when utilizing intelligent technologies to interpret images. Consider showing a computer an image of a dog; instead of providing a straightforward response like “it’s a dog,” the computer might generate multiple similar responses, complicating understanding.

This phenomenon is termed “redundant segmentation.” It implies that the computer may identify various parts of the image as important, leading to confusion and reduced precision.

The approach aims to address this issue in real-time, ensuring clearer and more accurate results. It incorporates techniques such as “MaskIoU head” and “CIM strategy” to enhance the computer’s image comprehension without resorting to complex methods.

For instance, if you show the computer a picture of a dog in a park, rather than a simple response like “it’s a dog in a park,” it might redundantly mention “there is green grass” or “the sky is blue,” contributing minimally to understanding the dog.

Moreover, the approach employs an “Anti-noise” strategy to rectify errors that may occur in the information provided by the computer. This method has demonstrated excellent results across various images, and ongoing efforts aim to further enhance the system’s robustness in image comprehension without unnecessary complexity.

The proposed method consists of three key components: an Anti-noise branch, K Refinement branches, and a Complete Instances Mining (CIM) strategy. Proposal features are generated using MaskFuse and divided into multiple branches. Both the Anti-noise and Refinement branches produce classification and integrity scores. The CIM strategy utilizes the output of the preceding branch to generate refined pseudo labels, guiding the supervision of the next branch. Simultaneously, the Anti-noise branch is supervised using pre-computed pseudo labels. In a visual representation, the right column depicts purple and red elements representing seeds and pseudo ground truth, respectively. The seeds spatially spread to identify complete proposals as pseudo ground truth, considering spatial relationships and integrity scores.

In summary, the cahpter delves into a major challenge in the field of artificial intelligence related to redundant segmentation when identifying objects in images. The difficulty lies in managing incomplete data without detailed annotations for each object, which can lead to redundant and less precise results. The presented approach aims to address this issue by proposing innovative strategies such as the Anti-noise branch and CIM strategy. These techniques seek to enhance the understanding of images, even in the absence of detailed information.

Original paper for this use case : https://arxiv.org/pdf/2402.07633.pdf

Zecheng Li 1, Zening Zeng 1, Yuqi Liang 1 and Jin-Gang Yu 1, 2* / 1 – South China University of Technology, 2 – Pazhou Laboratory