PROJECTS

Case-study style overview

Search, filter, and open any project for outcomes, responsibilities, and technical detail.

Showing 8 of 8
2024
PDF

Fluid Solver Optimization (CPU, OpenMP, CUDA)

Performance / Systems Engineer

Reduced runtime 30.85s → 3.69s (8.36×) on CPU; best OpenMP ~10.77× at ~16-24 threads; CUDA kernels reached ~90.16 GiB/s.

Optimized a 3D Stable Fluids solver end-to-end: (1) CPU locality + ILP (loop reordering, tiling, division to mult), (2) shared-memory parallelism with OpenMP (collapse, reductions, static scheduling; Red/Black Gauss-Seidel), (3) GPU acceleration with CUDA (persistent device memory, kernelized solver steps, Nsight-guided tuning). Profiled bottlenecks and scalability with perf/gprof and Nsight.

C/C++OpenMPCUDALinuxperfgprof
HPCProfilingOpenMPCUDAOptimizationScalability
2025
PDF

MPI Allgather+Merge on a Cluster (Bruck vs Circulant)

HPC / Parallel Software Engineer

Implemented and benchmarked 3 Allgather+merge variants up to 640 ranks; parallel variants delivered ~1.5×–2× average speedup, with Circulant typically 10–30% faster than Bruck.

Built a sequential baseline (MPI_Allgather + p-way merge) and two allgather-merge algorithms (generalized Bruck and Circulant). Optimized merge and memory traffic (loser-tree merge, SoA layout, ping-pong buffers), implemented non-blocking and pipelined communication, and analyzed scaling limits (eager threshold/rendezvous handshakes, synchronization walls) on the Hydra cluster at TUW over multiple node/process configurations and message sizes.

C/C++MPI (OpenMPI)LinuxCluster Benchmarking
MPIHPCCollectivesBenchmarkingPerformance AnalysisC/C++
2023
Media

Graphics Engine / OpenGL Rendering

C++ Developer

Lean pipeline with VBO rendering and expressive geometry primitives.

Built an XML-driven engine featuring VBO-based rendering, Catmull-Rom splines, Bézier patches, and common lighting/mapping techniques.

C++OpenGLGLSLXML
C/C++OpenGLEngineGeometry
2025
Media

Autonomous Racing (F1TENTH, ROS2)

Robotics / Software EngineerTeam project

1st place and 3rd place in autonomous racing challenges; delivered real-time navigation with SLAM+LiDAR and robust control in ROS2.

Integrated SLAM, obstacle avoidance, and path tracking (Pure Pursuit, Disparity Extender) with ROS2 tooling, telemetry, and reproducible test runs for rapid iteration.

ROS2C++/PythonLiDARRVizFoxgloveNavigation
ROS2SLAMLiDARControlReal-timeC/C++
2021

Custom Anti-Cheat System

Developer / OperatorTournament opsIndependent

Tournament ops at scale with automated integrity checks.

Hosted a 180-player online event and integrated third-party API signals into an anti-cheat workflow and reporting pipeline.

C#SMTPAutomationReporting
API IntegrationAutomationOps
2023

DNS System (Resolver + Primary/Secondary + Zone Transfer)

Software EngineerComputer Communications

Implemented a DNS-like distributed naming service with recursive/iterative resolution, caching, and TCP-based zone transfer between primary and secondary servers.

Built a multi-component DNS system in Python: client queries over UDP; resolver performs iterative/recursive resolution via root/top-level/authoritative servers; primary and secondary servers support replication via a custom TCP zone-transfer protocol. Implemented structured config/data parsing, logging, and a cache layer; organized components using an MVC-style module split.

Python 3.10Sockets (UDP/TCP)LinuxLoggingConfig Parsing
NetworkingClient/ServerUDPTCPProtocolsDistributed Systems
2025

OTT Streaming over an Application-Layer Overlay (CORE)

Software / Networking EngineerNetwork Services Engineering

Built a prototype CDN-style OTT service: overlay nodes + PoPs distribute RTP video streams to multiple clients with dynamic route costs and basic failure recovery.

Implemented an application-layer overlay in Python on the CORE emulator. A bootstrapper provisions neighbor sets from a JSON topology; nodes exchange periodic HELLOs and routing info, compute path costs from latency and packet loss, and forward streams along best routes. Clients select the best PoP using RTT + path cost, then control playback via RTSP and receive video via RTP over UDP. Added teardown propagation and recovery behaviors on node/PoP failure.

PythonSockets (UDP/TCP)ThreadsCORE emulatorRTP/RTSP
NetworkingOverlay NetworksRTPRTSPUDPMonitoring
2024

PictuRAS (Image Processing Web App, Microservices)

Full-Stack / Software Engineer

Built and deployed an image-processing web app using a microservices + API Gateway architecture; supports projects, tool-based image transforms, and payments.

Implemented the PictuRAS platform end-to-end: React frontend, Node.js/Express services backed by MongoDB, API Gateway entry point, Stripe payments, and containerized deployment. Tools run as independent services, enabling an extensible pipeline for image operations. Designed the system around scalability/elasticity and maintainability, and documented architecture views (arc42).

ReactNode.jsExpressMongoDBStripeDocker
Full-StackMicroservicesAPI GatewayWebCloud

© 2026 Eduardo Silva