{"id":1331,"date":"2025-09-16T18:21:34","date_gmt":"2025-09-16T18:21:34","guid":{"rendered":"https:\/\/imalogic.com\/blog\/?p=1331"},"modified":"2025-09-29T08:40:37","modified_gmt":"2025-09-29T08:40:37","slug":"trading-temps-reel-latence-minimale-et-throughput-maximal","status":"publish","type":"post","link":"https:\/\/imalogic.com\/blog\/2025\/09\/16\/trading-temps-reel-latence-minimale-et-throughput-maximal\/","title":{"rendered":"Real-time trading with minimal latency and maximum throughput"},"content":{"rendered":"<body>\n<p style=\"font-style:italic;font-weight:400\"><\/p>\n\n\n\n<p style=\"font-style:italic;font-weight:400\">In the world of high-frequency trading (HFT), every microsecond counts. The ability to process massive data streams, make a decision, and execute an order ahead of the competition is the key to success. This imposes extreme requirements on software, hardware, and overall system architecture. <\/p>\n\n\n\n<p style=\"font-style:italic;font-weight:400\">The rest of the article is in French\u2026<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">La programmation concurrente<\/h1>\n\n\n\n<p>La <strong>programmation concurrente<\/strong> (ou <strong>concurrency programming<\/strong> en anglais) est une mani\u00e8re d\u2019\u00e9crire des programmes qui peuvent <strong>faire plusieurs choses \u00e0 la fois<\/strong> ou <strong>g\u00e9rer plusieurs t\u00e2ches simultan\u00e9ment<\/strong>, sans n\u00e9cessairement les ex\u00e9cuter exactement en parall\u00e8le.<\/p>\n\n\n\n<p><strong>Explication simple :<\/strong><\/p>\n\n\n\n<p>Imaginons que tu cuisines :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tu mets de l\u2019eau \u00e0 bouillir,<\/li>\n\n\n\n<li>Pendant que \u00e7a chauffe, tu coupes des l\u00e9gumes,<\/li>\n\n\n\n<li>Puis tu reviens \u00e0 la casserole, etc.<\/li>\n<\/ul>\n\n\n\n<p>Tu ne fais pas tout en m\u00eame temps, mais tu g\u00e8res <strong>plusieurs t\u00e2ches en parall\u00e8le<\/strong> en optimisant le temps d\u2019attente.<br>C\u2019est \u00e7a, la <strong>concurrence<\/strong> : g\u00e9rer plusieurs op\u00e9rations qui se chevauchent dans le temps.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>En programmation, \u00e7a sert \u00e0 quoi ?<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>R\u00e9agir \u00e0 plusieurs \u00e9v\u00e9nements (clics, messages, requ\u00eates r\u00e9seau\u2026)<\/li>\n\n\n\n<li>Optimiser les performances (pendant qu\u2019une t\u00e2che attend, une autre avance)<\/li>\n\n\n\n<li>Am\u00e9liorer l\u2019exp\u00e9rience utilisateur (interface fluide, chargement en arri\u00e8re-plan\u2026)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>\u00c0 ne pas confondre avec :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Le parall\u00e9lisme<\/strong> : ex\u00e9cution de plusieurs t\u00e2ches <strong>exactement en m\u00eame temps<\/strong> (souvent avec plusieurs c\u0153urs de processeur).<\/li>\n\n\n\n<li><strong>La programmation asynchrone<\/strong> : un style particulier de programmation concurrente, souvent utilis\u00e9 avec des promesses, des callbacks ou async\/await.<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/Concurrent-Programming-1600x944.jp_.webp?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"810\" height=\"478\" data-attachment-id=\"1334\" data-permalink=\"https:\/\/imalogic.com\/blog\/2025\/09\/16\/trading-temps-reel-latence-minimale-et-throughput-maximal\/concurrent-programming-1600x944-jp\/\" data-orig-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/Concurrent-Programming-1600x944.jp_.webp?fit=1600%2C944&amp;ssl=1\" data-orig-size=\"1600,944\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Concurrent-Programming-1600&amp;#215;944.jp\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/Concurrent-Programming-1600x944.jp_.webp?fit=810%2C478&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/Concurrent-Programming-1600x944.jp_.webp?resize=810%2C478&#038;ssl=1\" alt=\"\" class=\"wp-image-1334\" style=\"width:646px;height:auto\" loading=\"lazy\" srcset=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/Concurrent-Programming-1600x944.jp_.webp?resize=1024%2C604&amp;ssl=1 1024w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/Concurrent-Programming-1600x944.jp_.webp?resize=300%2C177&amp;ssl=1 300w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/Concurrent-Programming-1600x944.jp_.webp?resize=768%2C453&amp;ssl=1 768w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/Concurrent-Programming-1600x944.jp_.webp?resize=1536%2C906&amp;ssl=1 1536w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/Concurrent-Programming-1600x944.jp_.webp?w=1600&amp;ssl=1 1600w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><\/a><\/figure>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<p><strong>Exemples de langages\/concepts :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Java<\/strong> : Thread, ExecutorService<\/li>\n\n\n\n<li><strong>Python<\/strong> : threading, asyncio<\/li>\n\n\n\n<li><strong>JavaScript<\/strong> : Promise, async\/await<\/li>\n\n\n\n<li><strong>Go<\/strong> : goroutines<\/li>\n\n\n\n<li><strong>Rust<\/strong> : tokio, async<\/li>\n<\/ul>\n\n\n\n<p>En <strong>C\/C++<\/strong>, la programmation concurrente peut se faire de plusieurs mani\u00e8res, notamment avec :<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Les threads (pthreads en C)<\/strong><\/li>\n<\/ol>\n\n\n\n<p>Voici un exemple simple en <strong>C<\/strong> avec <strong>POSIX threads<\/strong> (pthreads) :<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;stdio.h&gt;\n#include &lt;pthread.h&gt;\n\nvoid* print_message(void* arg) {\n\u00a0\u00a0\u00a0 char* message = (char*) arg;\n\u00a0\u00a0\u00a0 printf(\"%s\\n\", message);\n\u00a0\u00a0\u00a0 return NULL;\n}\n\nint main() {\n\u00a0\u00a0\u00a0 pthread_t thread1, thread2;\n\u00a0\u00a0\u00a0 pthread_create(&amp;thread1, NULL, print_message, \"Bonjour depuis le thread 1\");\n\u00a0\u00a0\u00a0 pthread_create(&amp;thread2, NULL, print_message, \"Bonjour depuis le thread 2\");\n\u00a0\u00a0\u00a0 pthread_join(thread1, NULL);\n\u00a0\u00a0\u00a0 pthread_join(thread2, NULL);\n\u00a0\u00a0\u00a0 return 0;\n}<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>pthread_create lance une nouvelle fonction dans un <strong>thread<\/strong> s\u00e9par\u00e9.<\/li>\n\n\n\n<li>pthread_join attend que le thread se termine.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>2. C++11 et plus : std::thread<\/strong><\/p>\n\n\n\n<p>Depuis <strong>C++11<\/strong>, on peut faire plus simple avec la biblioth\u00e8que standard :<\/p>\n\n\n\n<p><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;iostream&gt;\n#include &lt;thread&gt;\n\nvoid say_hello(const std::string&amp; name) {\n\u00a0\u00a0\u00a0 std::cout &lt;&lt; \"Bonjour \" &lt;&lt; name &lt;&lt; \" depuis un thread !\" &lt;&lt; std::endl;\n}\n\nint main() {\n\u00a0\u00a0\u00a0 std::thread t1(say_hello, \"Alice\");\n\u00a0\u00a0\u00a0 std::thread t2(say_hello, \"Bob\");\n\u00a0\u00a0\u00a0 t1.join();\n\u00a0\u00a0\u00a0 t2.join();\n\u00a0\u00a0\u00a0 return 0;\n}<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>std::thread cr\u00e9e un nouveau thread qui ex\u00e9cute une fonction.<\/li>\n\n\n\n<li>.join() bloque le thread principal jusqu\u2019\u00e0 ce que le thread secondaire soit termin\u00e9.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>3. Ex\u00e9cution parall\u00e8le vs concurrence<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Les deux exemples ci-dessus montrent de la <strong>concurrence<\/strong>.<\/li>\n\n\n\n<li>Si ton ordinateur a plusieurs c\u0153urs, il <strong>pourrait<\/strong> ex\u00e9cuter les threads <strong>en parall\u00e8le<\/strong>, sinon, ils seront <strong>intercal\u00e9s<\/strong> (le syst\u00e8me d\u2019exploitation change de thread r\u00e9guli\u00e8rement).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Points d\u2019attention :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Les <strong>probl\u00e8mes de synchronisation<\/strong> (par exemple : deux threads \u00e9crivant sur la m\u00eame variable en m\u00eame temps)<\/li>\n\n\n\n<li>Utiliser des <strong>mutex<\/strong> (std::mutex, pthread_mutex_t) pour prot\u00e9ger les ressources partag\u00e9es<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">Le throughput<\/h1>\n\n\n\n<p>Le <strong>throughput<\/strong> (ou <strong>d\u00e9bit<\/strong> en fran\u00e7ais) est un <strong>terme technique<\/strong> qui d\u00e9signe la <strong>quantit\u00e9 de travail accomplie dans un certain laps de temps<\/strong>. En informatique, on l\u2019utilise souvent pour mesurer la <strong>performance<\/strong> d\u2019un syst\u00e8me, comme un programme, un r\u00e9seau ou un processeur.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Exemples selon le contexte :<\/strong><\/p>\n\n\n\n<p><strong>1. En r\u00e9seau :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Le throughput est la quantit\u00e9 de <strong>donn\u00e9es<\/strong> transmises <strong>par seconde<\/strong>.<\/li>\n\n\n\n<li>Exemple : 100 Mbps = 100 m\u00e9gabits par seconde.<\/li>\n<\/ul>\n\n\n\n<p><strong>2. En programmation concurrente :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>C\u2019est le <strong>nombre de t\u00e2ches trait\u00e9es par seconde<\/strong>.<\/li>\n\n\n\n<li>Exemple : un serveur web peut avoir un throughput de <strong>1000 requ\u00eates HTTP par seconde<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p><strong>3. En base de donn\u00e9es :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Nombre de <strong>transactions<\/strong> ou de <strong>requ\u00eates<\/strong> trait\u00e9es par seconde.<\/li>\n<\/ul>\n\n\n\n<p><strong>4. En syst\u00e8me (OS) :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Combien de processus ou de threads peuvent \u00eatre ex\u00e9cut\u00e9s dans un temps donn\u00e9.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Pourquoi c\u2019est important ?<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Plus le throughput est \u00e9lev\u00e9, <strong>plus le syst\u00e8me est performant<\/strong>.<\/li>\n\n\n\n<li>Il est souvent en <strong>\u00e9quilibre avec la latence<\/strong> :\n<ul class=\"wp-block-list\">\n<li><strong>Latence<\/strong> = temps de r\u00e9ponse pour une requ\u00eate<\/li>\n\n\n\n<li><strong>Throughput<\/strong> = combien de requ\u00eates on peut g\u00e9rer par seconde<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p><strong>On peut avoir :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Une faible latence mais un faible throughput (rapide mais peu de volume)<\/li>\n\n\n\n<li>Un haut throughput avec une latence plus grande (beaucoup de volume, mais plus lent par requ\u00eate)<\/li>\n<\/ul>\n\n\n\n<p><strong>1. Exemple : mesurer le throughput<\/strong><\/p>\n\n\n\n<p>Imaginons que tu veux lancer 100 000 t\u00e2ches, et mesurer combien sont trait\u00e9es par seconde avec des threads.<\/p>\n\n\n\n<p><strong>Code C++ avec std::thread et std::chrono :<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;iostream&gt;\n#include &lt;thread&gt;\n#include &lt;vector&gt;\n#include &lt;chrono&gt;\n#include &lt;atomic&gt;\n\nvoid task(std::atomic&lt;int&gt;&amp; counter) {\n\u00a0\u00a0\u00a0 \/\/ Simule une petite t\u00e2che\n\u00a0\u00a0\u00a0 counter++;\n}\n\nint main() {\n\n\u00a0\u00a0\u00a0 const int num_tasks = 100000;\n\u00a0\u00a0\u00a0 std::atomic&lt;int&gt; counter{0};\n\u00a0\u00a0\u00a0 auto start = std::chrono::high_resolution_clock::now();\n\u00a0\u00a0\u00a0 std::vector&lt;std::thread&gt; threads;\n\u00a0\u00a0\u00a0 for (int i = 0; i &lt; num_tasks; ++i) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 threads.emplace_back(task, std::ref(counter));\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 for (auto&amp; t : threads) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 t.join();\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 auto end = std::chrono::high_resolution_clock::now();\n\u00a0\u00a0\u00a0 std::chrono::duration&lt;double&gt; duration = end - start;\n\u00a0\u00a0\u00a0 double throughput = num_tasks \/ duration.count();\n\u00a0\u00a0\u00a0 std::cout &lt;&lt; \"Throughput: \" &lt;&lt; throughput &lt;&lt; \" tasks\/sec\" &lt;&lt; std::endl;\n\u00a0\u00a0\u00a0 return 0;\n}<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Ce que fait ce code :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Lance 100 000 threads qui incr\u00e9mentent un compteur.<\/li>\n\n\n\n<li>Mesure le temps total.<\/li>\n\n\n\n<li>Calcule le throughput = nombre de t\u00e2ches \/ temps en secondes.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>2. Comment am\u00e9liorer le throughput ?<\/strong><\/p>\n\n\n\n<p>Lancer 100 000 threads n\u2019est <strong>pas efficace<\/strong>. Voici quelques techniques d\u2019<strong>optimisation<\/strong> :<\/p>\n\n\n\n<p><strong>a) Utiliser un thread pool<\/strong><\/p>\n\n\n\n<p>Au lieu de cr\u00e9er un thread par t\u00e2che, on cr\u00e9e un <strong>petit nombre de threads<\/strong> qui travaillent en boucle.<\/p>\n\n\n\n<p><strong>Pourquoi ?<\/strong><br>Cr\u00e9er\/d\u00e9truire un thread est <strong>co\u00fbteux<\/strong>. Les thread pools r\u00e9utilisent les threads.<\/p>\n\n\n\n<p><strong>b) Limiter les acc\u00e8s partag\u00e9s<\/strong><\/p>\n\n\n\n<p>Les acc\u00e8s concurrents \u00e0 des ressources partag\u00e9es (comme counter) n\u00e9cessitent des <strong>verrous ou des op\u00e9rations atomiques<\/strong>, ce qui <strong>ralentit<\/strong>.<\/p>\n\n\n\n<p><strong>c) Batching (traitement par lots)<\/strong><\/p>\n\n\n\n<p>Si possible, regrouper plusieurs petites t\u00e2ches en une plus grosse r\u00e9duit les appels aux threads.<\/p>\n\n\n\n<p><strong>d) Asynchrone \/ futures \/ coroutines<\/strong><\/p>\n\n\n\n<p>Utiliser des techniques non bloquantes (ex: std::async, std::future, coroutines en C++20) peut am\u00e9liorer le d\u00e9bit.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>En r\u00e9sum\u00e9 :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Throughput = nombre de t\u00e2ches \/ temps<\/strong><\/li>\n\n\n\n<li>Pour le mesurer : chronom\u00e8tre + compteur<\/li>\n\n\n\n<li>Pour l\u2019am\u00e9liorer : \u00e9viter la surcharge (cr\u00e9ation de threads, contention), utiliser des <strong>pools<\/strong>, faire du <strong>batch<\/strong>, etc.<\/li>\n<\/ul>\n\n\n\n<p><strong>Exemple : Thread Pool + mesure du throughput<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;iostream&gt;\n#include &lt;thread&gt;\n#include &lt;vector&gt;\n#include &lt;queue&gt;\n#include &lt;functional&gt;\n#include &lt;mutex&gt;\n#include &lt;condition_variable&gt;\n#include &lt;atomic&gt;\n#include &lt;chrono&gt;\n\nclass ThreadPool {\npublic:\n\u00a0\u00a0\u00a0 ThreadPool(size_t num_threads) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 for (size_t i = 0; i &lt; num_threads; ++i) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 workers.emplace_back([this]() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 while (true) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::function&lt;void()&gt; task;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\u00a0\u00a0 \/\/ zone prot\u00e9g\u00e9e\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::unique_lock&lt;std::mutex&gt; lock(queue_mutex);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 condition.wait(lock, [this]() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return stop || !tasks.empty();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 });\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (stop &amp;&amp; tasks.empty()) return;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 task = std::move(tasks.front());\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 tasks.pop();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 task();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 });\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 void enqueue(std::function&lt;void()&gt; task) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::lock_guard&lt;std::mutex&gt; lock(queue_mutex);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 tasks.push(task);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 condition.notify_one();\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 void shutdown() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::lock_guard&lt;std::mutex&gt; lock(queue_mutex);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 stop = true;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 condition.notify_all();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 for (auto&amp; worker : workers) worker.join();\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 ~ThreadPool() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (!stop) shutdown();\n\u00a0\u00a0\u00a0 }\nprivate:\n\u00a0\u00a0\u00a0 std::vector&lt;std::thread&gt; workers;\n\u00a0\u00a0\u00a0 std::queue&lt;std::function&lt;void()&gt;&gt; tasks;\n\u00a0\u00a0\u00a0 std::mutex queue_mutex;\n\u00a0\u00a0\u00a0 std::condition_variable condition;\n\u00a0\u00a0\u00a0 bool stop = false;\n};\n\n\/\/ Mesure du throughput\nint main() {\n\u00a0\u00a0\u00a0 const int num_tasks = 100000;\n\u00a0\u00a0\u00a0 const int num_threads = std::thread::hardware_concurrency(); \/\/ ex: 4, 8, etc.\n\u00a0\u00a0\u00a0 ThreadPool pool(num_threads);\n\u00a0\u00a0\u00a0 std::atomic&lt;int&gt; counter{0};\n\u00a0\u00a0\u00a0 auto start = std::chrono::high_resolution_clock::now();\n\u00a0\u00a0\u00a0 for (int i = 0; i &lt; num_tasks; ++i) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 pool.enqueue([&amp;counter]() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 counter++;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 });\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 \/\/ Attendre que tout soit termin\u00e9 (simple mais pas parfait)\n\u00a0\u00a0\u00a0 while (counter.load() &lt; num_tasks) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::this_thread::sleep_for(std::chrono::milliseconds(1));\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 auto end = std::chrono::high_resolution_clock::now();\n\u00a0\u00a0\u00a0 std::chrono::duration&lt;double&gt; duration = end - start;\n\u00a0\u00a0\u00a0 double throughput = num_tasks \/ duration.count();\n\u00a0\u00a0\u00a0 std::cout &lt;&lt; \"Throughput avec thread pool (\" &lt;&lt; num_threads\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 &lt;&lt; \" threads) : \" &lt;&lt; throughput &lt;&lt; \" tasks\/sec\" &lt;&lt; std::endl;\n\u00a0\u00a0\u00a0 pool.shutdown();\n\u00a0\u00a0\u00a0 return 0;\n}<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Ce que fait ce programme :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cr\u00e9e un <strong>thread pool<\/strong> avec n threads (selon ton CPU).<\/li>\n\n\n\n<li>Lance <strong>100 000 t\u00e2ches<\/strong> dans la file du pool.<\/li>\n\n\n\n<li>Attend que toutes soient termin\u00e9es.<\/li>\n\n\n\n<li>Affiche le <strong>throughput en t\u00e2ches\/sec<\/strong>.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Comparaison :<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><td><strong>M\u00e9thode<\/strong><\/td><td><strong>Threads cr\u00e9\u00e9s<\/strong><\/td><td><strong>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Performance<\/strong><\/td><\/tr><\/thead><tbody><tr><td>1 thread par t\u00e2che<\/td><td>100 000<\/td><td>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Tr\u00e8s lent, surcharge<\/td><\/tr><tr><td>Thread pool<\/td><td>4-16<\/td><td>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Rapide, scalable<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>Exemple : Thread Pool + mesure du throughput<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;iostream&gt;\n#include &lt;thread&gt;\n#include &lt;vector&gt;\n#include &lt;queue&gt;\n#include &lt;functional&gt;\n#include &lt;mutex&gt;\n#include &lt;condition_variable&gt;\n#include &lt;atomic&gt;\n#include &lt;chrono&gt;\nclass ThreadPool {\npublic:\n\u00a0\u00a0\u00a0 ThreadPool(size_t num_threads) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 for (size_t i = 0; i &lt; num_threads; ++i) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 workers.emplace_back([this]() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 while (true) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::function&lt;void()&gt; task;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\u00a0\u00a0 \/\/ zone prot\u00e9g\u00e9e\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::unique_lock&lt;std::mutex&gt; lock(queue_mutex);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 condition.wait(lock, [this]() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return stop || !tasks.empty();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 });\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (stop &amp;&amp; tasks.empty()) return;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 task = std::move(tasks.front());\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 tasks.pop();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 task();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 });\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 }\n\n\u00a0\u00a0\u00a0 void enqueue(std::function&lt;void()&gt; task) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::lock_guard&lt;std::mutex&gt; lock(queue_mutex);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 tasks.push(task);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 condition.notify_one();\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 void shutdown() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::lock_guard&lt;std::mutex&gt; lock(queue_mutex);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 stop = true;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 condition.notify_all();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 for (auto&amp; worker : workers) worker.join();\n\u00a0\u00a0\u00a0 }\n\n\u00a0\u00a0\u00a0 ~ThreadPool() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (!stop)\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 shutdown();\n\u00a0\u00a0\u00a0 }\n\nprivate:\n\u00a0\u00a0\u00a0 std::vector&lt;std::thread&gt; workers;\n\u00a0\u00a0\u00a0 std::queue&lt;std::function&lt;void()&gt;&gt; tasks;\n\u00a0\u00a0\u00a0 std::mutex queue_mutex;\n\u00a0\u00a0\u00a0 std::condition_variable condition;\n\u00a0\u00a0\u00a0 bool stop = false;\n};\n\n\/\/ Mesure du throughput\nint main() {\n\u00a0\u00a0\u00a0 const int num_tasks = 100000;\n\u00a0\u00a0\u00a0 const int num_threads = std::thread::hardware_concurrency(); \/\/ ex: 4, 8, etc.\n\u00a0\u00a0\u00a0 ThreadPool pool(num_threads);\n\u00a0\u00a0\u00a0 std::atomic&lt;int&gt; counter{0};\n\u00a0\u00a0\u00a0 auto start = std::chrono::high_resolution_clock::now();\n\u00a0\u00a0\u00a0 for (int i = 0; i &lt; num_tasks; ++i) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 pool.enqueue([&amp;counter]() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 counter++;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 });\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 \/\/ Attendre que tout soit termin\u00e9 (simple mais pas parfait)\n\u00a0\u00a0\u00a0 while (counter.load() &lt; num_tasks) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::this_thread::sleep_for(std::chrono::milliseconds(1));\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 auto end = std::chrono::high_resolution_clock::now();\n\u00a0\u00a0\u00a0 std::chrono::duration&lt;double&gt; duration = end - start;\n\u00a0\u00a0\u00a0 double throughput = num_tasks \/ duration.count();\n \u00a0\u00a0 std::cout &lt;&lt; \"Throughput avec thread pool (\" &lt;&lt; num_threads\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 &lt;&lt; \" threads) : \" &lt;&lt; throughput &lt;&lt; \" tasks\/sec\" &lt;&lt; std::endl;\n\u00a0\u00a0\u00a0 pool.shutdown();\n\u00a0\u00a0\u00a0 return 0;\n}<\/code><\/pre>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<strong>Ce que fait ce programme :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cr\u00e9e un <strong>thread pool<\/strong> avec n threads (selon ton CPU).<\/li>\n\n\n\n<li>Lance <strong>100 000 t\u00e2ches<\/strong> dans la file du pool.<\/li>\n\n\n\n<li>Attend que toutes soient termin\u00e9es.<\/li>\n\n\n\n<li>Affiche le <strong>throughput en t\u00e2ches\/sec<\/strong>.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Comparaison :<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><td><strong>M\u00e9thode<\/strong><\/td><td><strong>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Threads cr\u00e9\u00e9s<\/strong><\/td><td><strong>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Performance<\/strong><\/td><\/tr><\/thead><tbody><tr><td>1 thread par t\u00e2che<\/td><td>\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0100 000<\/td><td>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Tr\u00e8s lent, surcharge<\/td><\/tr><tr><td>Thread pool<\/td><td>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 4-16<\/td><td>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Rapide, scalable<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>Exemple complet : throughput avec coroutines asynchrones<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;iostream&gt;\n#include &lt;coroutine&gt;\n#include &lt;chrono&gt;\n#include &lt;thread&gt;\n#include &lt;atomic&gt;\n#include &lt;vector&gt;\n\n\/\/ Awaiter pour simuler un sleep non bloquant\nstruct SleepAwaiter {\n\u00a0\u00a0\u00a0 std::chrono::milliseconds duration;\n\u00a0\u00a0\u00a0 bool await_ready() const noexcept { return false; }\n\u00a0\u00a0\u00a0 void await_suspend(std::coroutine_handle&lt;&gt; handle) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::jthread([handle, d = duration]() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::this_thread::sleep_for(d);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 handle.resume();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 });\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 void await_resume() const noexcept {}\n};\n\n\/\/ T\u00e2che de base retourn\u00e9e par la coroutine\nstruct Task {\n\u00a0\u00a0\u00a0 struct promise_type {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Task get_return_object() { return {}; }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::suspend_never initial_suspend() noexcept { return {}; }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::suspend_never final_suspend() noexcept { return {}; }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 void return_void() {}\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 void unhandled_exception() { std::terminate(); }\n\u00a0\u00a0\u00a0 };\n};\n\n\/\/ Coroutine asynchrone qui incr\u00e9mente un compteur apr\u00e8s un d\u00e9lai\nTask delayed_task(std::atomic&lt;int&gt;&amp; counter, int delay_ms) {\n\u00a0\u00a0\u00a0 co_await SleepAwaiter{std::chrono::milliseconds(delay_ms)};\n\u00a0\u00a0\u00a0 counter++;\n}\n\nint main() {\n\u00a0\u00a0\u00a0 const int num_tasks = 10000; \/\/ exemple raisonnable\n\u00a0\u00a0\u00a0 std::atomic&lt;int&gt; counter{0};\n\u00a0\u00a0\u00a0 auto start = std::chrono::high_resolution_clock::now();\n\u00a0\u00a0\u00a0 \/\/ Lancer toutes les coroutines\n\u00a0\u00a0\u00a0 for (int i = 0; i &lt; num_tasks; ++i) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 delayed_task(counter, 1);\u00a0 \/\/ d\u00e9lai minimal pour simuler non blocage\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 \/\/ Attendre que toutes les t\u00e2ches soient termin\u00e9es\n\u00a0\u00a0\u00a0 while (counter.load() &lt; num_tasks) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::this_thread::sleep_for(std::chrono::milliseconds(1));\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 auto end = std::chrono::high_resolution_clock::now();\n\u00a0\u00a0\u00a0 std::chrono::duration&lt;double&gt; duration = end - start;\n\u00a0\u00a0\u00a0 double throughput = num_tasks \/ duration.count();\n\u00a0\u00a0\u00a0 std::cout &lt;&lt; \"Throughput avec coroutines asynchrones : \"\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 &lt;&lt; throughput &lt;&lt; \" tasks\/sec\" &lt;&lt; std::endl;\n\u00a0\u00a0\u00a0 return 0;\n}<\/code><\/pre>\n\n\n\n<p><strong>Ce que tu vas voir :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Le programme lance 10 000 coroutines asynchrones.<\/li>\n\n\n\n<li>Chacune attend 1 ms (sans bloquer un thread).<\/li>\n\n\n\n<li>Le compteur s\u2019incr\u00e9mente quand la t\u00e2che est termin\u00e9e.<\/li>\n\n\n\n<li>Le throughput (t\u00e2ches par seconde) est affich\u00e9.<\/li>\n<\/ul>\n\n\n\n<p><strong>Notes importantes :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cette simulation utilise std::jthread pour g\u00e9rer le timer dans SleepAwaiter.<\/li>\n\n\n\n<li>En vrai, les coroutines s\u2019int\u00e8grent avec des boucles d\u2019\u00e9v\u00e9nements (event loops) plus efficaces.<\/li>\n\n\n\n<li>Ici, on a un tr\u00e8s l\u00e9ger d\u00e9lai pour mieux simuler un travail \u00ab r\u00e9el \u00bb.<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">OpenMP<\/h1>\n\n\n\n<p><strong>OpenMP<\/strong> peut aider \u00e0 optimiser la performance, surtout quand tu veux parall\u00e9liser des t\u00e2ches <strong>CPU-bound<\/strong> (qui utilisent beaucoup le processeur) de fa\u00e7on simple et efficace.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Qu\u2019est-ce que OpenMP ?<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>C\u2019est une <strong>API<\/strong> pour faire de la programmation parall\u00e8le sur CPU.<\/li>\n\n\n\n<li>Tr\u00e8s utilis\u00e9e en C\/C++ (et Fortran).<\/li>\n\n\n\n<li>Elle te permet de parall\u00e9liser des boucles ou des sections de code avec des <strong>directives<\/strong> (#pragma omp), sans g\u00e9rer explicitement les threads.<\/li>\n\n\n\n<li>OpenMP g\u00e8re la cr\u00e9ation, la synchronisation et la r\u00e9partition des threads automatiquement.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Exemple simple en OpenMP pour parall\u00e9liser une boucle de t\u00e2ches :<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;iostream&gt;\n#include &lt;atomic&gt;\n<strong>#include &lt;omp.h&gt;<\/strong>\n\nint main() {\n\u00a0\u00a0\u00a0 const int num_tasks = 100000;\n\u00a0\u00a0\u00a0 std::atomic&lt;int&gt; counter{0};\n<strong>\u00a0\u00a0\u00a0 #pragma omp parallel for<\/strong>\n\u00a0\u00a0\u00a0 for (int i = 0; i &lt; num_tasks; ++i) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ T\u00e2che CPU-bound simul\u00e9e\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 counter++;\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 std::cout &lt;&lt; \"Counter = \" &lt;&lt; counter &lt;&lt; std::endl;\n\u00a0\u00a0\u00a0 return 0;\n}<\/code><\/pre>\n\n\n\n<p><strong>Comment OpenMP peut aider dans ton contexte ?<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Si tes t\u00e2ches sont <strong>simples et CPU-bound<\/strong>, OpenMP peut parall\u00e9liser la boucle rapidement et efficacement.<\/li>\n\n\n\n<li>OpenMP peut <strong>optimiser l\u2019utilisation des c\u0153urs CPU<\/strong> sans surcharge importante.<\/li>\n\n\n\n<li>Pour le <strong>throughput<\/strong>, \u00e7a peut augmenter le nombre de t\u00e2ches trait\u00e9es par seconde.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Limitations \/ points \u00e0 garder en t\u00eate :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OpenMP est surtout <strong>adapt\u00e9 \u00e0 des t\u00e2ches synchrones<\/strong> et r\u00e9p\u00e9titives (boucles).<\/li>\n\n\n\n<li>Pour des t\u00e2ches <strong>asynchrones<\/strong>, d\u2019attente ou d\u2019I\/O, OpenMP est moins adapt\u00e9.<\/li>\n\n\n\n<li>OpenMP ne g\u00e8re pas nativement les <strong>coroutines<\/strong> ou les m\u00e9canismes asynchrones.<\/li>\n\n\n\n<li>Si tu utilises des coroutines pour de l\u2019async I\/O, OpenMP n\u2019aidera pas directement \u00e0 optimiser \u00e7a.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>En r\u00e9sum\u00e9 :<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><td><strong>Cas d\u2019utilisation<\/strong><\/td><td><strong>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 OpenMP adapt\u00e9 ?<\/strong><\/td><\/tr><\/thead><tbody><tr><td>Parall\u00e9liser une boucle CPU<\/td><td>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Oui, ideal<\/td><\/tr><tr><td>Optimiser un thread pool<\/td><td>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Possible, mais moins flexible<\/td><\/tr><tr><td>Gestion d\u2019async I\/O\/coroutines<\/td><td>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Non, pas adapt\u00e9<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Combiner architecture, mat\u00e9riel et code efficacement<\/h2>\n\n\n\n<p>Les syst\u00e8mes de trading temps r\u00e9el ont des exigences extr\u00eames en termes de <strong>latence minimale<\/strong> et <strong>throughput maximal<\/strong>. Quelques points \u00e0 mettre en place, en combinant architecture, hardware, et code.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>1. Architecture logicielle<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Pipeline ultra-optimis\u00e9<\/strong> en C++ avec un focus sur la faible latence et haute performance.<\/li>\n\n\n\n<li>Utilisation de techniques <strong>lock-free<\/strong> et <strong>wait-free<\/strong> pour \u00e9viter les blocages.<\/li>\n\n\n\n<li><strong>Threads affin\u00e9s (CPU pinning)<\/strong> : chaque thread d\u00e9di\u00e9 \u00e0 une t\u00e2che sp\u00e9cifique, li\u00e9 \u00e0 un c\u0153ur pr\u00e9cis.<\/li>\n\n\n\n<li><strong>Batching minimal<\/strong> : traiter les donn\u00e9es au plus vite, en petites quantit\u00e9s.<\/li>\n\n\n\n<li><strong>Communication inter-thread par queues lock-free<\/strong> (ex: boost::lockfree::queue).<\/li>\n\n\n\n<li>Utilisation de <strong>m\u00e9moires pr\u00e9allou\u00e9es<\/strong> (pas d\u2019allocation dynamique en temps r\u00e9el).<\/li>\n\n\n\n<li>Exploitation des <strong>SIMD<\/strong> (instructions vectorielles) pour acc\u00e9l\u00e9rer les calculs.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>2. Code et parall\u00e9lisme<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>C++ moderne<\/strong> avec optimisations manuelles (inline, pragma vectorization).<\/li>\n\n\n\n<li><strong>Thread pools ou pipelines multi-threads<\/strong> sp\u00e9cialis\u00e9s.<\/li>\n\n\n\n<li>Exploiter les <strong>coroutines C++20<\/strong> pour g\u00e9rer les I\/O sans bloquer.<\/li>\n\n\n\n<li>Minimiser les appels syst\u00e8me, \u00e9viter la contention sur les verrous.<\/li>\n\n\n\n<li>Profiler pour optimiser les \u201chot paths\u201d.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>3. Hardware CPU &amp; M\u00e9moire<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>CPU : <strong>Intel Xeon<\/strong> (gamme haute fr\u00e9quence, faible latence) ou <strong>AMD EPYC<\/strong> selon budget.<\/li>\n\n\n\n<li>Prioriser la fr\u00e9quence (ex: 3.8-4.0 GHz) plus que le nombre de c\u0153urs pour la latence.<\/li>\n\n\n\n<li><strong>Hyper-threading d\u00e9sactiv\u00e9<\/strong> pour \u00e9viter la contention.<\/li>\n\n\n\n<li><strong>Affinit\u00e9 CPU<\/strong> pour contr\u00f4ler o\u00f9 s\u2019ex\u00e9cutent les threads.<\/li>\n\n\n\n<li>M\u00e9moire <strong>DDR4\/DDR5 tr\u00e8s rapide<\/strong>, souvent avec ECC activ\u00e9.<\/li>\n\n\n\n<li>Pr\u00e9f\u00e9rer des <strong>caches L1\/L2 larges<\/strong> et bas\u00e9s sur CPU cibl\u00e9.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>4. Le GPU ?<\/strong> <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>En trading haute fr\u00e9quence classique (HFT), le GPU est <strong>peu utilis\u00e9<\/strong> car la latence d\u2019envoi\/retour vers le GPU est trop \u00e9lev\u00e9e.<\/li>\n\n\n\n<li>En revanche, pour certains calculs massivement parall\u00e8les (ex: pricing d\u2019options, machine learning), le GPU peut aider.<\/li>\n\n\n\n<li>Le GPU est efficace pour des t\u00e2ches <strong>batch\u00e9es, massivement parall\u00e8les<\/strong>, mais pas pour la prise de d\u00e9cision ultra rapide en millisecondes.<\/li>\n\n\n\n<li>Certains syst\u00e8mes hybrides utilisent GPU pour la recherche et CPU pour l\u2019ex\u00e9cution.<\/li>\n\n\n\n<li>Dans le chapitre suivant, on verra les avanc\u00e9es en termes d\u2019acc\u00e8s m\u00e9moires qui pourrait changer la donne.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>5. Autres points cl\u00e9s<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>R\u00e9seau ultra basse latence<\/strong> : carte r\u00e9seau \u00e0 faible latence (ex: Mellanox), kernel bypass (DPDK).<\/li>\n\n\n\n<li>OS minimaliste, souvent Linux custom avec <strong>real-time kernel patches<\/strong>.<\/li>\n\n\n\n<li>Surveillance hardware (temp\u00e9rature, fr\u00e9quence, etc.) pour \u00e9viter throttling.<\/li>\n\n\n\n<li><strong>Co-localisation g\u00e9ographique<\/strong> proche des serveurs d\u2019\u00e9changes.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>R\u00e9sum\u00e9 rapide<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><td><strong>Aspect<\/strong><\/td><td><strong>Recommandation<\/strong><\/td><\/tr><\/thead><tbody><tr><td>Langage<\/td><td>C++ moderne, optimis\u00e9, lock-free<\/td><\/tr><tr><td>CPU<\/td><td>Xeon haute fr\u00e9quence, affinit\u00e9 CPU<\/td><\/tr><tr><td>M\u00e9moire<\/td><td>DDR4\/DDR5 rapide, pr\u00e9-allocation<\/td><\/tr><tr><td>Parall\u00e9lisme<\/td><td>Thread pinning, queues lock-free<\/td><\/tr><tr><td>GPU<\/td><td>Usage limit\u00e9, surtout pour calcul batch<\/td><\/tr><tr><td>R\u00e9seau<\/td><td>Carte low latency + kerne<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Architecture pipeline simplifi\u00e9e<\/strong><\/h2>\n\n\n\n<p class=\"has-text-align-center\" style=\"font-style:normal;font-weight:700;line-height:0.1\">[Network I\/O]<\/p>\n\n\n\n<p class=\"has-text-align-center\" style=\"font-style:normal;font-weight:700;line-height:0.1\">\u2193<\/p>\n\n\n\n<p class=\"has-text-align-center\" style=\"font-style:normal;font-weight:700;line-height:0.1\">[Message Parser \/ Validation]\u00a0 (thread(s) d\u00e9di\u00e9s)<\/p>\n\n\n\n<p class=\"has-text-align-center\" style=\"font-style:normal;font-weight:700;line-height:0.1\">\u2193<\/p>\n\n\n\n<p class=\"has-text-align-center\" style=\"font-style:normal;font-weight:700;line-height:0.1\">[Market Data Handler \/ Strategy Logic] (thread(s) d\u00e9di\u00e9s)<\/p>\n\n\n\n<p class=\"has-text-align-center\" style=\"font-style:normal;font-weight:700;line-height:0.1\">\u2193<\/p>\n\n\n\n<p class=\"has-text-align-center\" style=\"font-style:normal;font-weight:700;line-height:0.1\">[Order Execution \/ Risk Check] (thread(s) d\u00e9di\u00e9s)<\/p>\n\n\n\n<p class=\"has-text-align-center\" style=\"font-style:normal;font-weight:700;line-height:0.1\">\u2193<\/p>\n\n\n\n<p class=\"has-text-align-center\" style=\"font-style:normal;font-weight:700;line-height:0.1\">[Network Send]<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Points cl\u00e9s \u00e0 impl\u00e9menter :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Queues lock-free<\/strong> entre chaque \u00e9tape, pour minimiser la latence.<\/li>\n\n\n\n<li><strong>Thread pinning<\/strong> pour fixer les threads sur des c\u0153urs CPU pr\u00e9cis.<\/li>\n\n\n\n<li><strong>Pr\u00e9-allocation m\u00e9moire<\/strong> des messages.<\/li>\n\n\n\n<li><strong>Traitement sans allocations dynamiques<\/strong> pendant la phase critique.<\/li>\n\n\n\n<li><strong>Profiling et instrumentation<\/strong> pour mesurer les latences.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Exemple minimal en C++ : pipeline avec lock-free queue et pinning<\/strong><\/p>\n\n\n\n<p>Je vais utiliser une queue lock-free simple (un buffer circulaire basique) et pthread pour l\u2019affinit\u00e9 CPU. Ce n\u2019est pas complet mais illustre le principe.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;iostream&gt;\n#include &lt;thread&gt;\n#include &lt;atomic&gt;\n#include &lt;vector&gt;\n#include &lt;chrono&gt;\n#include &lt;cstring&gt;\n#include &lt;sched.h&gt;\n#include &lt;unistd.h&gt;\n\nconstexpr int BUFFER_SIZE = 1024;\n\nstruct Message {\n\u00a0\u00a0\u00a0 int id;\n\u00a0\u00a0\u00a0 char data[64];\n};\nclass LockFreeQueue {\n\u00a0\u00a0\u00a0 Message buffer[BUFFER_SIZE];\n\u00a0\u00a0\u00a0 std::atomic&lt;int&gt; head{0};\n\u00a0\u00a0\u00a0 std::atomic&lt;int&gt; tail{0};\npublic:\n\u00a0\u00a0\u00a0 bool push(const Message&amp; msg) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 int current_tail = tail.load(std::memory_order_relaxed);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 int next_tail = (current_tail + 1) % BUFFER_SIZE;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (next_tail == head.load(std::memory_order_acquire)) return false; \/\/ queue full\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 buffer[current_tail] = msg;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 tail.store(next_tail, std::memory_order_release);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return true;\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 bool pop(Message&amp; msg) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 int current_head = head.load(std::memory_order_relaxed);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (current_head == tail.load(std::memory_order_acquire)) return false; \/\/ queue empty\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 msg = buffer[current_head];\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 head.store((current_head + 1) % BUFFER_SIZE, std::memory_order_release);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return true;\n\u00a0\u00a0\u00a0 }\n};\nvoid pinThreadToCore(int core_id) {\n\u00a0\u00a0\u00a0 cpu_set_t cpuset;\n\u00a0\u00a0\u00a0 CPU_ZERO(&amp;cpuset);\n\u00a0\u00a0\u00a0 CPU_SET(core_id, &amp;cpuset);\n\u00a0\u00a0\u00a0 pthread_t current_thread = pthread_self();\n\u00a0\u00a0\u00a0 if (pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &amp;cpuset) != 0) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::cerr &lt;&lt; \"Failed to set thread affinity\\n\";\n\u00a0\u00a0\u00a0 }\n}\nLockFreeQueue net_to_parser_queue;\nLockFreeQueue parser_to_strategy_queue;\nvoid networkThread() {\n\u00a0\u00a0\u00a0 pinThreadToCore(0);\n\u00a0\u00a0\u00a0 int msg_id = 0;\n\u00a0\u00a0\u00a0 while (msg_id &lt; 10000) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Message msg;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 msg.id = msg_id++;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 strcpy(msg.data, \"MarketData\");\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 while (!net_to_parser_queue.push(msg)) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ queue full, spin or sleep briefly\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::this_thread::yield();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 }\n}\nvoid parserThread() {\n\u00a0\u00a0\u00a0 pinThreadToCore(1);\n\u00a0\u00a0\u00a0 Message msg;\n\u00a0\u00a0\u00a0 while (true) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (net_to_parser_queue.pop(msg)) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ Simulate parsing &amp; validation\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ Forward to strategy\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 while (!parser_to_strategy_queue.push(msg)) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::this_thread::yield();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (msg.id &gt;= 9999) break;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 } else {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::this_thread::yield();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 }\n}\nvoid strategyThread() {\n\u00a0\u00a0\u00a0 pinThreadToCore(2);\n\u00a0\u00a0\u00a0 Message msg;\n\u00a0\u00a0\u00a0 int processed = 0;\n\u00a0\u00a0\u00a0 auto start = std::chrono::high_resolution_clock::now();\n\u00a0\u00a0\u00a0 while (true) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (parser_to_strategy_queue.pop(msg)) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ Simulate strategy logic\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 processed++;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (msg.id &gt;= 9999) break;\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 } else {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 std::this_thread::yield();\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 auto end = std::chrono::high_resolution_clock::now();\n\u00a0\u00a0\u00a0 std::chrono::duration&lt;double&gt; dur = end - start;\n\u00a0\u00a0\u00a0 std::cout &lt;&lt; \"Processed \" &lt;&lt; processed &lt;&lt; \" messages in \" &lt;&lt; dur.count() &lt;&lt; \" seconds\\n\";\n\u00a0\u00a0\u00a0 std::cout &lt;&lt; \"Throughput: \" &lt;&lt; processed \/ dur.count() &lt;&lt; \" msgs\/sec\\n\";\n}\nint main() {\n\u00a0\u00a0\u00a0 std::thread net_thread(networkThread);\n\u00a0\u00a0\u00a0 std::thread parse_thread(parserThread);\n\u00a0\u00a0\u00a0 std::thread strat_thread(strategyThread);\n\u00a0\u00a0\u00a0 net_thread.join();\n\u00a0\u00a0\u00a0 parse_thread.join();\n\u00a0\u00a0\u00a0 strat_thread.join();\n\u00a0\u00a0\u00a0 return 0;\n}<\/code><\/pre>\n\n\n\n<p><strong>Ce que montre cet exemple :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Chaque \u00e9tape est un thread d\u00e9di\u00e9 avec <strong>affinit\u00e9 CPU<\/strong>.<\/li>\n\n\n\n<li>Communication via des <strong>queues lock-free<\/strong> entre threads.<\/li>\n\n\n\n<li>Pas d\u2019allocation dynamique dans la boucle critique.<\/li>\n\n\n\n<li>Mesure du throughput final.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Pour aller plus loin, comment int\u00e9grer :<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Du <strong>profiling pr\u00e9cis (ex: perf, VTune)<\/strong>,<\/li>\n\n\n\n<li>Des optimisations avec SIMD ou instructions sp\u00e9cifiques CPU,<\/li>\n\n\n\n<li>Une architecture r\u00e9seau ultra basse latence avec DPDK ou kernel bypass.<\/li>\n<\/ul>\n\n\n\n<p><strong>1. Profiling pr\u00e9cis<\/strong><\/p>\n\n\n\n<p>Pour bien optimiser un syst\u00e8me HFT, tu dois mesurer pr\u00e9cis\u00e9ment o\u00f9 passent le temps CPU.<\/p>\n\n\n\n<p><strong>Outils recommand\u00e9s :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Linux perf<\/strong> : pour analyser les CPU cycles, cache misses, branch mispredictions, etc.<\/li>\n\n\n\n<li><strong>Intel VTune Profiler<\/strong> : analyse d\u00e9taill\u00e9e du CPU, hot spots, contention, etc.<\/li>\n\n\n\n<li><strong>Google Benchmark<\/strong> ou des timers haute r\u00e9solution dans le code (ex: std::chrono::high_resolution_clock).<\/li>\n<\/ul>\n\n\n\n<p><strong>Exemple rapide d\u2019utilisation perf :<\/strong><\/p>\n\n\n\n<p><code>perf report<\/code><\/p>\n\n\n\n<p><code>perf record -g .\/ton_programme<\/code><\/p>\n\n\n\n<p>Cela te montrera quelles fonctions prennent le plus de temps et o\u00f9 optimiser.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>2. Optimisations SIMD<\/strong><\/p>\n\n\n\n<p>Utiliser les instructions vectorielles (AVX, SSE) peut acc\u00e9l\u00e9rer beaucoup les calculs math\u00e9matiques.<\/p>\n\n\n\n<p><strong>Exemple simple avec AVX2 (intrinsics) :<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;immintrin.h&gt;\nvoid vector_add(const float* a, const float* b, float* result, size_t n) {\n\u00a0\u00a0\u00a0 size_t i = 0;\n\u00a0\u00a0\u00a0 for (; i + 8 &lt;= n; i += 8) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 __m256 va = _mm256_loadu_ps(a + i);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 __m256 vb = _mm256_loadu_ps(b + i);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 __m256 vr = _mm256_add_ps(va, vb);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 _mm256_storeu_ps(result + i, vr);\n\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 for (; i &lt; n; ++i) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 result[i] = a[i] + b[i];\n\u00a0\u00a0\u00a0 }\n}<\/code><\/pre>\n\n\n\n<p><strong>3. R\u00e9seau ultra basse latence<\/strong><\/p>\n\n\n\n<p>Pour \u00e9viter la latence du kernel Linux, on peut utiliser :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>DPDK (Data Plane Development Kit)<\/strong> : acc\u00e8s direct \u00e0 la carte r\u00e9seau depuis l\u2019espace utilisateur.<\/li>\n\n\n\n<li><strong>PF_RING<\/strong>, <strong>netmap<\/strong>, ou <strong>XDP\/eBPF<\/strong> comme alternatives.<\/li>\n<\/ul>\n\n\n\n<p><strong>Pourquoi ?<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Kernel bypass : tu \u00e9vites les appels syst\u00e8me et la copie de paquets.<\/li>\n\n\n\n<li>Traitement direct, en poll-mode, tr\u00e8s rapide.<\/li>\n\n\n\n<li>Exige du code sp\u00e9cifique et parfois des cartes r\u00e9seaux compatibles.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>4. Exemple basique de poll avec DPDK (concept)<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ Pseudo-code simplifi\u00e9, DPDK est tr\u00e8s volumineux \u00e0 installer\nint main() {\n\u00a0\u00a0\u00a0 \/\/ Initialiser DPDK et les ports r\u00e9seau\n\u00a0\u00a0\u00a0 \/\/ Configurer les RX queues en poll-mode\n\u00a0\u00a0\u00a0 while (true) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ Poll packets depuis la carte r\u00e9seau (sans interruption)\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 struct rte_mbuf *pkts_burst[32];\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 uint16_t nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst, 32);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 for (int i = 0; i &lt; nb_rx; ++i) {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ Traitement ultra rapide\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ Lib\u00e9rer le buffer apr\u00e8s traitement\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 rte_pktmbuf_free(pkts_burst[i]);\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\n\u00a0\u00a0\u00a0 }\n}<\/code><\/pre>\n\n\n\n<p><strong>5. Autres bonnes pratiques hardware\/OS<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>CPU Turbo Boost d\u00e9sactiv\u00e9<\/strong> pour fr\u00e9quence stable.<\/li>\n\n\n\n<li>R\u00e9duction du bruit sur la machine (processus non essentiels stopp\u00e9s).<\/li>\n\n\n\n<li><strong>Huge pages<\/strong> pour la m\u00e9moire (r\u00e9duit les TLB misses).<\/li>\n\n\n\n<li>Temps r\u00e9el Linux patch\u00e9 (PREEMPT_RT).<\/li>\n\n\n\n<li><strong>Isolation CPU<\/strong> via cset ou isolcpus dans le bootloader.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>Le vrai d\u00e9fi, c\u2019est de combiner toutes ces couches :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Code ultra optimis\u00e9,<\/li>\n\n\n\n<li>R\u00e9seau ultra rapide,<\/li>\n\n\n\n<li>Hardware adapt\u00e9,<\/li>\n\n\n\n<li>OS minimaliste configur\u00e9 pour la latence.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Autres points, les gpu\u2026<\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignleft size-full is-resized wp-duotone-grayscale\"><a href=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/images.jpeg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"225\" height=\"225\" data-attachment-id=\"1332\" data-permalink=\"https:\/\/imalogic.com\/blog\/2025\/09\/16\/trading-temps-reel-latence-minimale-et-throughput-maximal\/images-5\/\" data-orig-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/images.jpeg?fit=225%2C225&amp;ssl=1\" data-orig-size=\"225,225\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"images\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/images.jpeg?fit=225%2C225&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/images.jpeg?resize=225%2C225&#038;ssl=1\" alt=\"\" class=\"wp-image-1332\" style=\"width:337px;height:auto\" loading=\"lazy\" srcset=\"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/images.jpeg?w=225&amp;ssl=1 225w, https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/images.jpeg?resize=150%2C150&amp;ssl=1 150w\" sizes=\"auto, (max-width: 225px) 100vw, 225px\" \/><\/a><\/figure>\n<\/div>\n\n\n<p>L<strong>es derni\u00e8res g\u00e9n\u00e9rations de GPU NVIDIA<\/strong>, notamment avec CUDA, ont consid\u00e9rablement am\u00e9lior\u00e9 les capacit\u00e9s de transfert de donn\u00e9es, notamment via :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>GPUDirect RDMA<\/strong> : permet au GPU d\u2019acc\u00e9der directement \u00e0 la m\u00e9moire des cartes r\u00e9seau compatibles sans passer par le CPU.<\/li>\n\n\n\n<li><strong>GPUDirect Async DMA<\/strong> : transfert asynchrone des donn\u00e9es entre GPU et p\u00e9riph\u00e9riques externes (r\u00e9seau, stockage) avec un minimum de latence.<\/li>\n\n\n\n<li><strong>NVLink<\/strong> et autres interconnexions rapides r\u00e9duisent aussi la latence entre CPU et GPU.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Alors, pourquoi le GPU reste-t-il rarement utilis\u00e9 en trading ultra basse latence ?<\/strong><\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Latence absolue vs d\u00e9bit<\/strong> :<br>M\u00eame avec GPUDirect RDMA, la latence totale aller-retour (network \u2192 GPU \u2192 CPU ou ordre) est souvent plus \u00e9lev\u00e9e que ce que peut tol\u00e9rer un syst\u00e8me HFT ultra low latency (quelques microsecondes). Le GPU est plus adapt\u00e9 pour du <em>throughput<\/em> massif que de la latence minimale extr\u00eame.<\/li>\n\n\n\n<li><strong>Complexit\u00e9 logicielle<\/strong> :<br>Int\u00e9grer GPU dans une cha\u00eene ultra optimis\u00e9e demande une architecture plus complexe, avec synchronisation et gestion des buffers, ce qui peut introduire des retards.<\/li>\n\n\n\n<li><strong>Nature des calculs<\/strong> :<br>Le GPU excelle dans les calculs massivement parall\u00e8les (pricing, machine learning, simulation) mais pas dans la prise de d\u00e9cision ultra-rapide ou la gestion d\u2019\u00e9v\u00e9nements r\u00e9seau.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Cas o\u00f9 le GPU devient int\u00e9ressant en finance :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Calculs batch\u00e9s et lourds<\/strong> (Monte-Carlo, pricing d\u2019options, risk analytics)<\/li>\n\n\n\n<li><strong>Apprentissage automatique<\/strong> pour la d\u00e9tection de patterns, strat\u00e9gies hors ligne<\/li>\n\n\n\n<li><strong>Pr\u00e9-traitement massif<\/strong> de donn\u00e9es avant de passer \u00e0 la couche trading rapide<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>En r\u00e9sum\u00e9<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>GPUDirect RDMA et autres avanc\u00e9es r\u00e9duisent grandement les transferts<\/strong>, mais le GPU reste limit\u00e9 par la latence globale (notamment pour les d\u00e9cisions en temps r\u00e9el).<\/li>\n\n\n\n<li>Pour les <em>tasks ultra low latency<\/em>, le CPU avec optimisation r\u00e9seau reste la norme.<\/li>\n\n\n\n<li>Le GPU est un excellent compl\u00e9ment pour le calcul intensif, mais pas pour l\u2019ex\u00e9cution des ordres en millisecondes.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Choix du hardware\u2026<\/h2>\n\n\n\n<p>Pour un syst\u00e8me <strong>trading ultra low latency &amp; high throughput<\/strong>, le choix hardware est crucial. <\/p>\n\n\n\n<p>Comparons rapidement certaines options :<\/p>\n\n\n\n<p><strong>1. Serveurs Xeon \/ AMD EPYC haute fr\u00e9quence<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Avantages :<\/strong>\n<ul class=\"wp-block-list\">\n<li>CPU tr\u00e8s performants, cores rapides avec gros caches L1\/L2.<\/li>\n\n\n\n<li>Fonctionnalit\u00e9s avanc\u00e9es (Intel TSX, AVX512, etc).<\/li>\n\n\n\n<li>Support mat\u00e9riel solide pour r\u00e9seau low-latency (PCIe Gen4, NICs haut de gamme).<\/li>\n\n\n\n<li>Facilit\u00e9 d\u2019optimisation (affinit\u00e9 CPU, gestion m\u00e9moire, etc).<\/li>\n\n\n\n<li>OS et \u00e9cosyst\u00e8me mature pour software HFT.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Inconv\u00e9nients :<\/strong>\n<ul class=\"wp-block-list\">\n<li>Co\u00fbt \u00e9lev\u00e9.<\/li>\n\n\n\n<li>Consommation \u00e9nerg\u00e9tique importante.<\/li>\n\n\n\n<li>Moins facilement scalable horizontalement \u00e0 tr\u00e8s grande \u00e9chelle.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>2. Farms de microserveurs \/ Raspberry Pi \/ ARM<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Avantages :<\/strong>\n<ul class=\"wp-block-list\">\n<li>Tr\u00e8s faible co\u00fbt par unit\u00e9.<\/li>\n\n\n\n<li>Faible consommation \u00e9lectrique.<\/li>\n\n\n\n<li>Scalabilit\u00e9 horizontale importante.<\/li>\n\n\n\n<li>Parfait pour certains calculs parall\u00e8les ou syst\u00e8mes de backtesting.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Inconv\u00e9nients :<\/strong>\n<ul class=\"wp-block-list\">\n<li>Fr\u00e9quence CPU basse (~1.5-2 GHz), donc latence plus \u00e9lev\u00e9e.<\/li>\n\n\n\n<li>Pas adapt\u00e9s au temps r\u00e9el ultra bas (latences r\u00e9seau + CPU trop grandes).<\/li>\n\n\n\n<li>Architecture ARM, parfois moins d\u2019outils d\u2019optimisation.<\/li>\n\n\n\n<li>Complexit\u00e9 de synchronisation entre n\u0153uds (r\u00e9seau plus lent et variable).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>3. Rig mining d\u00e9tourn\u00e9<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Des rigs mining GPU\/ASIC sont pens\u00e9s pour calculs massivement parall\u00e8les (hashing) mais ne sont <strong>pas con\u00e7us pour faible latence ou I\/O rapide<\/strong>.<\/li>\n\n\n\n<li>Peu adapt\u00e9s au trading HFT temps r\u00e9el.<\/li>\n\n\n\n<li>Sont tr\u00e8s efficaces pour des workloads batch\u00e9s (cryptomining, ML offline).<\/li>\n<\/ul>\n\n\n\n<p><strong>Le choix optimal pour HFT \/ trading ultra rapide :<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Serveurs <strong>Xeon ou EPYC<\/strong> \u00e0 haute fr\u00e9quence, optimis\u00e9s pour latence, avec :\n<ul class=\"wp-block-list\">\n<li><strong>CPU core pinning<\/strong> et hyper-threading d\u00e9sactiv\u00e9,<\/li>\n\n\n\n<li>RAM rapide et pr\u00e9-allou\u00e9e,<\/li>\n\n\n\n<li>R\u00e9seau ultra basse latence (Mellanox + DPDK),<\/li>\n\n\n\n<li>OS Linux real-time minimaliste,<\/li>\n\n\n\n<li>Logiciel C++ optimis\u00e9 lock-free, vectoris\u00e9, et pipeline multi-thread.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Pour le <strong>scale out<\/strong> : clusters r\u00e9duits \u00e0 quelques serveurs ultra optimis\u00e9s. Pas de fermes massives de petits nodes.<\/li>\n\n\n\n<li>GPUs \u00e9ventuellement en back-office pour calculs batch\u00e9s (pricing, ML).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Pourquoi pas des fermes ARM \/ microserveurs ?<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Le gain financier est souvent annul\u00e9 par la latence r\u00e9seau et la fr\u00e9quence CPU trop basse.<\/li>\n\n\n\n<li>La latence dans le trading se compte en microsecondes, ce qui demande un CPU rapide plus qu\u2019un grand nombre de petits CPU lents.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Conclusion rapide<\/strong><\/p>\n\n\n\n<figure style=\"font-size:17px\" class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><td><strong>Solution<\/strong><\/td><td><strong>Latence<\/strong><\/td><td><strong>Throughput<\/strong><\/td><td><strong>Co\u00fbt<\/strong><\/td><td><strong>Scalabilit\u00e9<\/strong><\/td><td><strong>Usage recommand\u00e9<\/strong><\/td><\/tr><\/thead><tbody><tr><td>Serveurs Xeon\/EPYC<\/td><td>Tr\u00e8s faible<\/td><td>Tr\u00e8s \u00e9lev\u00e9<\/td><td>\u00c9lev\u00e9<\/td><td>Moyenne<\/td><td>Trading HFT ultra basse latence<\/td><\/tr><tr><td>Farms ARM \/ Raspberry Pi<\/td><td>Moyenne\/haute<\/td><td>Moyen<\/td><td>Faible<\/td><td>Tr\u00e8s \u00e9lev\u00e9e<\/td><td>Calcul batch, backtesting<\/td><\/tr><tr><td>Rigs mining GPU\/ASIC<\/td><td>\u00c9lev\u00e9e<\/td><td>Tr\u00e8s \u00e9lev\u00e9<\/td><td>Variable<\/td><td>Moyenne<\/td><td>Calcul batch, ML offline<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h1 class=\"wp-block-heading\">Hardware d\u00e9di\u00e9<\/h1>\n\n\n\n<p>R\u00e9aliser un hardware d\u00e9di\u00e9 pour acc\u00e9l\u00e9rer les chose ? Une carte r\u00e9seau d\u00e9di\u00e9e,\u2026 ou autre ?<\/p>\n\n\n\n<p>Concevoir un hardware d\u00e9di\u00e9 pour acc\u00e9l\u00e9rer un syst\u00e8me de trading ultra low latency, c\u2019est la voie vers la <strong>perfection<\/strong>.<\/p>\n\n\n\n<p><strong>1. Carte r\u00e9seau d\u00e9di\u00e9e ultra basse latence<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>FPGA programmable<\/strong> sur la carte r\u00e9seau (ex: Xilinx, Intel Altera) pour traiter directement les paquets \u00e0 la vol\u00e9e, sans passer par le CPU.<\/li>\n\n\n\n<li>Capable de faire du <strong>pr\u00e9-traitement protocolaire (parsing FIX\/FAST)<\/strong> en hardware.<\/li>\n\n\n\n<li>Impl\u00e9menter des r\u00e8gles ultra rapides de filtrage, enrichissement ou validation.<\/li>\n\n\n\n<li><strong>Timestamping mat\u00e9riel<\/strong> pour mesurer avec pr\u00e9cision la latence.<\/li>\n\n\n\n<li>Communication directe avec CPU via PCIe Gen4\/Gen5.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>2. Accelerators ASIC\/FPGA personnalis\u00e9s<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Un FPGA ou ASIC d\u00e9di\u00e9 pour ex\u00e9cuter la logique trading la plus critique :\n<ul class=\"wp-block-list\">\n<li>D\u00e9cision ultra rapide (ex: algorithmes de market making simples).<\/li>\n\n\n\n<li>Calculs fixes, pas de branches complexes.<\/li>\n\n\n\n<li>Streaming data processing en temps r\u00e9el.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>3. M\u00e9moire ultra rapide &amp; partag\u00e9e<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Int\u00e9grer une <strong>m\u00e9moire HBM (High Bandwidth Memory)<\/strong> directement sur la carte pour stocker les donn\u00e9es de march\u00e9, sans passer par la RAM du serveur.<\/li>\n\n\n\n<li>Acc\u00e8s ultra rapide (microsecondes) \u00e0 la m\u00e9moire par le FPGA\/ASIC.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>4. Interconnexion CPU-Hardware<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>PCIe Gen5<\/strong> pour transfert rapide entre CPU et hardware d\u00e9di\u00e9.<\/li>\n\n\n\n<li>Possibilit\u00e9 de bypasser le CPU pour certaines d\u00e9cisions (trading sur FPGA seul).<\/li>\n\n\n\n<li>Synchronisation via <strong>RDMA<\/strong> pour \u00e9viter les copies m\u00e9moire inutiles.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>5. Architecture globale<\/strong><\/p>\n\n\n\n<p>[March\u00e9] \u2192 [Carte r\u00e9seau FPGA] \u2192 [FPGA acc\u00e9l\u00e9rateur logique] \u2194 [M\u00e9moire HBM] \u2192 [CPU serveur] \u2192 [Ex\u00e9cution Ordres]<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>6. Exemple de cas d\u2019usage<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>La carte r\u00e9seau re\u00e7oit un ordre du march\u00e9 \u2192 le FPGA analyse, filtre et applique la strat\u00e9gie ultra rapide \u2192 si d\u00e9cision prise, elle est envoy\u00e9e directement au moteur d\u2019ex\u00e9cution en \u00e9vitant la latence CPU.<\/li>\n\n\n\n<li>Le CPU g\u00e8re la strat\u00e9gie complexe, la gestion des risques, l\u2019interface utilisateur.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>7. Avantages<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Latence r\u00e9duite \u00e0 quelques centaines de nanosecondes.<\/li>\n\n\n\n<li>D\u00e9chargement massif du CPU.<\/li>\n\n\n\n<li>Fiabilit\u00e9 et r\u00e9p\u00e9tabilit\u00e9 extr\u00eames (hardware d\u00e9di\u00e9).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>8. Inconv\u00e9nients<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Co\u00fbt de d\u00e9veloppement tr\u00e8s \u00e9lev\u00e9.<\/li>\n\n\n\n<li>Complexit\u00e9 du design et maintenance.<\/li>\n\n\n\n<li>Difficult\u00e9 \u00e0 modifier la strat\u00e9gie rapidement (FPGA moins flexible que CPU).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>Conclusion<\/strong><\/p>\n\n\n\n<p>Pour le <strong>trading haute fr\u00e9quence extr\u00eame<\/strong>, c\u2019est souvent ce type d\u2019architecture FPGA + CPU + carte r\u00e9seau d\u00e9di\u00e9e qui est utilis\u00e9 par les plus gros acteurs (ex: Jump Trading, Jane Street).<\/p>\n\n\n\n<p>Il existe d\u00e9j\u00e0 des cartes r\u00e9seau d\u00e9di\u00e9es ultra basse latence con\u00e7ues sp\u00e9cialement pour les environnements exigeants comme le trading haute fr\u00e9quence. Voici quelques exemples et caract\u00e9ristiques cl\u00e9s :<\/p>\n\n\n\n<p><strong>Cartes r\u00e9seau ultra basse latence existantes<\/strong><\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Mellanox (NVIDIA) ConnectX Series<\/strong>\n<ul class=\"wp-block-list\">\n<li>Supporte <strong>RDMA<\/strong>, <strong>GPUDirect<\/strong>, <strong>Kernel Bypass (DPDK, RDMA)<\/strong><\/li>\n\n\n\n<li>Tr\u00e8s faible latence (de l\u2019ordre de 1-2 microsecondes)<\/li>\n\n\n\n<li>Compatible PCIe Gen3\/Gen4\/Gen5<\/li>\n\n\n\n<li>Int\u00e8gre des fonctionnalit\u00e9s avanc\u00e9es comme le timestamping mat\u00e9riel, le filtrage et la classification des paquets.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Solarflare (maintenant partie de Xilinx\/AMD)<\/strong>\n<ul class=\"wp-block-list\">\n<li>Sp\u00e9cialis\u00e9e dans les cartes 10\/25\/40\/100 GbE ultra basse latence<\/li>\n\n\n\n<li>Supporte DPDK, kernel bypass, timestamping pr\u00e9cis<\/li>\n\n\n\n<li>Souvent utilis\u00e9e en trading haute fr\u00e9quence.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Intel Ethernet 800 Series<\/strong>\n<ul class=\"wp-block-list\">\n<li>Cartes r\u00e9seau performantes avec prise en charge de fonctionnalit\u00e9s avanc\u00e9es pour la virtualisation et la faible latence.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Netronome Agilio<\/strong>\n<ul class=\"wp-block-list\">\n<li>Carte SmartNIC avec processeurs embarqu\u00e9s pour offload programmable des fonctions r\u00e9seau et applicatives.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Cartes FPGA r\u00e9seau d\u00e9di\u00e9es<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>NetFPGA<\/strong> : plateforme FPGA open source pour d\u00e9veloppement d\u2019acc\u00e9l\u00e9ration r\u00e9seau (\u00e9ducation et R&amp;D).<\/li>\n\n\n\n<li><strong>Xilinx Alveo<\/strong> : cartes FPGA acc\u00e9l\u00e9ratrices qui peuvent \u00eatre programm\u00e9es pour du traitement r\u00e9seau personnalis\u00e9 (ex: parsing, filtrage ultra rapide).<\/li>\n\n\n\n<li>Certaines solutions FPGA + NIC commerciales proposent un pipeline complet programmable en hardware.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><strong>R\u00e9sum\u00e9<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Des cartes r\u00e9seau d\u00e9di\u00e9es ultra basse latence existent et sont largement utilis\u00e9es en finance et t\u00e9l\u00e9com<\/strong>.<\/li>\n\n\n\n<li>Elles combinent souvent hardware programmable (FPGA\/SmartNIC) avec support logiciel avanc\u00e9 (DPDK, RDMA).<\/li>\n\n\n\n<li>Ces cartes r\u00e9duisent drastiquement la latence d\u2019entr\u00e9e\/sortie r\u00e9seau, ce qui est crucial en trading haute fr\u00e9quence.<\/li>\n<\/ul>\n<\/body>","protected":false},"excerpt":{"rendered":"<p>In the world of high-frequency trading (HFT), every microsecond counts. The ability to process massive data streams, make a decision,<\/p>\n","protected":false},"author":1,"featured_media":1332,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[7,143,116,142],"tags":[149,144,58,147,146,145,148],"class_list":["post-1331","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-coding","category-hft","category-software-engineering","category-trading","tag-concurrency-programming","tag-hft","tag-optimization","tag-progrmmation-concurrente","tag-thread","tag-trading","tag-trading-haute-frequence"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2025\/09\/images.jpeg?fit=225%2C225&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8J21V-lt","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/1331","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/comments?post=1331"}],"version-history":[{"count":5,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/1331\/revisions"}],"predecessor-version":[{"id":1351,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/1331\/revisions\/1351"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/media\/1332"}],"wp:attachment":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/media?parent=1331"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/categories?post=1331"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/tags?post=1331"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}