Timeline and Details

Start date	End date	Associated with	Resources
October 2022	December 2022	Faculty of Informatics and Digital Technologies, University of Rijeka	MPI Implementation · CUDA Implementation

Overview

Comparative performance analysis of Dijkstra’s shortest path algorithm implemented in serial, parallel (MPI), and GPU-accelerated (CUDA) versions using Python. Demonstrates significant speedup through parallelization on large graphs with 2000 nodes and 8000 edges.

Note: Developed in collaboration with a fellow student. Collaborator details available on the GitHub repository.

Key Results

Performance comparison on graph with 2000 nodes and 8000 edges:

Serial execution: 0.284 seconds
Parallel MPI execution: 2.30e-05 seconds
Speedup: ~12,350x faster with parallel implementation

Implementation Details

Serial Implementation:

Classic Dijkstra algorithm with adjacency list representation
Iterative approach visiting nodes sequentially
Random graph generation for testing

MPI Parallel Implementation:

Distributed memory parallelization using MPI4py
Weight matrix partitioned across multiple processors
Processor 0 generates graph and distributes to workers
Each processor computes local shortest distances
Results gathered and combined at root processor

CUDA GPU Implementation:

Custom CUDA kernel for parallel distance computation
Thread-based parallelization (1024 threads per block)
GPU memory allocation for distance arrays and adjacency matrix
Iterative kernel execution for graph relaxation

Technical Stack

Language: Python 3
Parallelization: MPI4py (Message Passing Interface), PyCUDA
Libraries: NumPy, timeit, random
Execution: Multi-processor systems, NVIDIA GPU

Key Findings

Advantages of Parallelization:

Significant speed improvement for large datasets
Better resource utilization across multiple processors
Scalability for complex graph problems

Limitations:

Overhead from thread creation and synchronization
Not optimal for small graphs (serial faster due to overhead)
Increased complexity and debugging difficulty

Conclusion: Parallelization is most beneficial for processing large amounts of data where computational gains outweigh communication overhead.

Skills Demonstrated

Python programming, parallel algorithm design, MPI message passing, CUDA GPU programming, performance benchmarking, distributed systems, algorithm optimization, heterogeneous computing