Write-up 01 / 07

Active Ransomware Response, Full Server Rebuild & Hardening on Alibaba ECS (Mainland China)

Walked into a live multi-payload compromise on a mainland-China Alibaba ECS instance hosting two .cn brand sites. Contained the intrusion, preserved evidence, salvaged the customer databases the ransomware missed, and rebuilt the production stack from scratch with cPanel, hardened Apache, and Let's Encrypt. Engagement still in progress.

May 2026/Canada (mainland China hosting)/Ongoing, 9+ days in progress

StackAlibaba Cloud ECS (cn-shanghai) · AlmaLinux 10.1, kernel 6.12.x (post-rebuild) · cPanel & WHM 11.134.0.25 · Apache 2.4.x via EasyApache 4 · MariaDB 10.11 · Let's Encrypt (SAN cert across 4 hostnames) · WordPress with Impreza theme + Gravity Forms / Visual Composer / RevSlider / CleanTalk · PHP 8.2 (CGI handler) · .sorry ransomware (Chinese-speaking operators) · gsocket reverse shell (THC tools) · XMRig Monero miner · Tiny File Manager + Bujang.online webshell drops

The situation

An Alibaba Cloud Security Center alert pointed to a live intrusion on a mainland-China ECS instance hosting two .cn brand sites for an agency client. It wasn't a simple infection. A gsocket reverse shell was active, an XMRig Monero miner was running quietly out of /root/.rsyslogd, three webshells were live under the cPanel docroot, a backdoor cPanel account had been added that morning, and root SSH was being intercepted by a defaced banner so console-driven password resets couldn't get me back in. The .sorry ransomware family had already swept through about 114,000 files on May 1 and encrypted the static content of both sites. The public IP had to be preserved because of ICP filing rules in mainland China, which ruled out the easy path of rebuilding on a fresh instance. Customer email accounts pointed at the same server, and the agency had no recent off-site backup of the WordPress data.

What I found

The compromise was deep enough that root SSH was unrecoverable through normal channels. Console password resets reported success but live login attempts hit an Indonesian-language defaced SSH banner ('ALLOW BANG HENGKER' etc.) and were silently rejected. SSH had been moved to port 2499 in November 2022, which dated the initial breach to a much earlier event than the May 2026 ransomware.

Read-only inventory of the forensic snapshot turned up an attacker SSH key in /root/.ssh/authorized_keys, a backdoor cPanel user provisioned that morning with a /bin/bash shell, the XMRig Monero miner masquerading as /root/.rsyslogd, three live webshells under the cPanel account's www/ directory (Tiny File Manager plus two PHP file managers from the Indonesian Bujang.online drop), and ransom notes in every directory the .sorry encryptor had touched.

The bash_history reconstructed a multi-stage attack: a Feb 2025 pre-staging phase that established persistence, the May 1 ransomware sweep that encrypted 114,000 static files, and the May 9 multi-payload deployment that added the gsocket reverse shell, the miner, and the webshells. The PAM modules on disk were clean, which ruled out a rootkit and confirmed the attacker had relied on credential theft plus the modified SSH banner for re-entry rather than a kernel-level implant.

The customer MySQL databases (three of them, totalling 1.8 GB raw on disk) had not been encrypted. MySQL was holding file locks while the encryption tool ran, and the .sorry encryptor skipped them as a result. This was the single biggest piece of luck in the engagement: the WordPress content, post history, comments, user accounts, and plugin settings were all recoverable.

DNS for the two .cn domains resolved to three separate public IPs on the same instance, each on its own ENI. cPanel was binding per-account vhosts to the primary private IP only, so DNS-routed traffic landed on the wrong vhost after the rebuild and silently fell through to a default cPanel placeholder rather than the customer docroot. This caused a routing bug post-rebuild that the engineer had to patch separately.

On the operational side, the SG layer carried three security groups, only one of which was hardened. The other two were 2021-vintage 'allow everything' rules still attached to the secondary ENIs, which meant every admin port (SSH 22, SSH 2499, cPanel 2083, WHM 2087, webmail 2096, plus the entire FTP passive range) was reachable from the public internet despite the hardened SG also being in the mix.

What I did

First pass was containment and evidence preservation. I generated a fresh ed25519 SSH keypair locally, registered it in the Alibaba console, bound it to the instance, and triggered a clean restart so I had a known-good way back in once the rebuild started. I built a locked-down Security Group with admin ports gated to my residential IP range plus the Alibaba Workbench CIDR, and detached the original permissive group. Before touching anything else I took a manual forensic snapshot of the system disk with 'Until Deleted' retention so the live state was frozen for analysis.

Provisioned a clean Ubuntu 24.04 rescue instance in the same VPC, attached a copy of the snapshot as a data disk, and mounted it read-only with noload/noexec/nosuid/nodev so the forensic data stayed pristine. From that mount I did a full read-only inventory: PAM, sshd, sudoers, cron, systemd, attacker artefacts, cloud assistant logs, and bash history. The customer databases were rsync'd over the slow China-to-India link with a loop wrapper using --partial --append-verify, which took most of a night to complete the 1.8 GB transfer but survived the link drops.

On the engineer workstation I spun up a MySQL 8.0 Docker container against the rsynced data directory, ran mysqldump for each customer database, and compressed the output with zstd. Each dump got grepped for eval/base64_decode/gzinflate backdoor patterns and unfamiliar admin emails in wp_users. All three databases came back clean.

Replaced the compromised CentOS 7 system disk with a fresh AlmaLinux 10.1 image using Alibaba's 'Replace System Disk' workflow, which preserved the instance ID and the ICP-bound public IP. Installed cPanel & WHM 11.134.0.25 (slow because the cPanel CDN downloads route through GFW), then mounted an existing Let's Encrypt SAN cert (covering all four hostname variants) into cPanel's bundled Apache so the browser experience didn't downgrade to the cPanel-generated self-signed cert.

Wrote a Pre Main Include for Apache (cPanel-preserved across /scripts/rebuildhttpdconf runs) using a single <Directory /> block with <If> rules to deny IP-direct HTTPS access entirely, redirect IP-direct HTTP to the canonical HTTPS URL, force HTTP-to-HTTPS on the legitimate hostnames, and add Strict-Transport-Security: max-age=63072000; includeSubDomains; preload on every HTTPS response. ACME /.well-known/acme-challenge/ paths are exempted from all rules so AutoSSL renewals keep working.

Patched the routing bug by writing a Post VirtualHost Include that mirrors the per-account vhost onto the two secondary IPs, so traffic landing on any of the three public IPs reaches the correct docroot.

Upgraded the kernel from the post-rebuild default through to 6.12.0-124.55.3, clearing five pending ALSAs including the Copy Fail KEV-listed local privilege escalation and the Dirty Frag IPsec/rxrpc pair. Cycled the box with about 46 seconds of downtime, fixed an nginx auto-start race that briefly stole the 80/443 sockets from cPanel's Apache, and disabled nginx so it won't fight cPanel's httpd on future boots.

Imported the salvaged customer databases into fresh cPanel-managed schemas after sanitising MySQL 8 collation values that MariaDB 10.11 doesn't recognise (utf8mb4_0900_ai_ci to utf8mb4_unicode_ci) and stripping the dump's CREATE DATABASE/USE headers that would have routed the import to the wrong DB name. Layered a fresh WordPress install from wordpress.org on top, generated wp-config.php with the new DB credentials and a fresh salt block from the WordPress secret-key API rather than reusing any salt that had been on the compromised host.

Disabled the WPEngine-flavoured plugin and drop-in set bundled in the client's site backup (wpengine-common, wpe-cache-plugin, wpe-update-source-selector, wpe-wp-sign-on-plugin, object-cache.php, advanced-cache.php). These call out to WPEngine-only APIs and Memcached endpoints that don't exist outside that infrastructure and were responsible for connection resets and timeouts on static-asset requests before the disable.

Wrote and deployed a per-account .htaccess that returns a fast Apache 404 directly for missing static asset extensions (jpg/jpeg/png/gif/webp/svg/ico/css/js/woff and friends), skipping WordPress's normal rewrite-into-index.php fallback. With about 130 image attachments in the database pointing at files lost to the ransomware, the fast 404 saved roughly that many PHP-CGI fork events per page render.

Identified the page-render bottleneck via a controlled plugin-isolation test: snapshot active_plugins, disable a candidate set, time the render, restore. CleanTalk was the dominant cost. It makes a synchronous outbound API call to cleantalk.org per page load, and from Shanghai through GFW that call adds about 1.8 seconds of latency. Render time dropped from ~2.4s to ~0.55s with CleanTalk off, and back to ~2.4s when restored. Left active by client decision for the spam protection.

Resolved a daily 'site down' incident caused by cPanel's nightly permission enforcement, which pushes public_html to mode 750. With PHP running as nobody under the CGI handler, mode 750 blocks Apache from reading .htaccess and the site returns 'Server unable to read htaccess file, denying access to be safe' on every request. Wrote /etc/cron.hourly/fix-docroot-perms that re-applies mode 755 on the docroots and 777 on wp-content/uploads every hour, so the worst-case downtime window collapsed from 'until I noticed' to one hour.

Set up a system crontab for the cPanel user to fire wp-cron.php via PHP CLI every 15 minutes. The HTTP-layer wp-cron.php is intentionally blocked by the .htaccess hardening to keep abuse off it, so the visitor-triggered scheduler is dead by design. The system cron replaces it cleanly: Gravity Forms entry cleanup queue, Action Scheduler queue, scheduled-post publishing, and plugin auto-tasks all run on schedule whether or not anyone is visiting.

Bulk-restored the salvaged production database after testing a swap to the client's older WPEngine staging snapshot and confirming the production data was both more current (2026-04-06 vs 2022-06-26 latest post) and more complete (967 vs 712 posts). User records were preserved across the swap so the agency's editor session stayed valid through the DB churn. Modern WordPress (6.8+) no longer accepts a freshly-set legacy MD5 password hash; resets used wp_hash_password via PHP CLI to write the proper $wp$2y$10$ format directly into wp_users.

How it landed

Engagement still in progress. The compromised stack is fully retired, the new AlmaLinux + cPanel + WordPress stack is live on the same IP set behind a Let's Encrypt SAN cert with HSTS preload-ready headers, and the agency's editor can log into wp-admin and update content. The customer database came through the ransomware untouched. Page render is currently bound by CleanTalk's synchronous API call (a deliberate trade-off the client accepted for spam protection). Outstanding work tracked in the engagement log includes a migration of PHP from CGI to suEXEC (which would let several of the hourly workaround crons retire), a search-replace pass to swap remaining staging-environment URLs in serialised plugin options, recovery of the lost media files from any source the client still has (the production uploads were ransomwared and the staging environment never carried them), HSTS preload list submission, and a Phase 2 Security Group hardening pass to collapse the engineer admin attack surface to a single IP. Full timeline, command-level runbook, and a hardening-state snapshot are maintained in the client's project documentation.