Skip to content

Conversation

@airween
Copy link
Member

@airween airween commented Oct 31, 2025

what

This PR fixes the last broken regression test and adds the whole regression test workflow to GH CI.

why

The regression tests were almost finished, now I found another broken test. After fixed that I prepared a new workflow for GH CI, where we run these regression tests.

I switched the operation system to Ubuntu 24.04 (from 22.04). In Ubuntu 24.04 the old PCRE (libpcre3) is not installed by default, I had to add it explicitly.

I tried to follow the build matrix as it's in other workflows, but a complete regression test takes about 29 minutes - I think that's very slow. On my build machine (which is not a strong one) it takes about 6-7 minutes.

Unfortunately we can't fix this, the reason is that the operation of a test: each tests creates a config, start an Apache server with that config, send a request and check the output (files, response, etc).

I don't know if it's worth it.

references

#3425

@airween airween requested a review from theseion October 31, 2025 20:05
@theseion
Copy link
Collaborator

theseion commented Nov 1, 2025

I looked at run-regression-tests.pl.in quickly and it seems to me that match_log might be the issue. Log flushing can differ significantly between systems, so it's possible that searching the log in each test takes a long time. I suggest you add some debug print statements to the Perl script so that you can see in the CI output which operations use a lot of time.

@airween
Copy link
Member Author

airween commented Nov 7, 2025

I tried to review the script again, and fixed two small things in bbe5239:

  • switched to select() from sleep()
  • clear the buffer which hold the read log

Also I removed the test regression/misc/10-tfn-cache.t, because it took many time (on my machine the whole test file was 1m 6secs). The reason is that it seems SecCacheTransformations directive is deprecated.

Until I write this comment, the tests has started, but it seems that no performance improvement. (And beside of that there are two failed tests...)

@airween
Copy link
Member Author

airween commented Nov 7, 2025

Okay, I think the reason is visible now but unknown: see the output of the last test. The Apache stop time is terrible slow.

And I don't know why...

@theseion, @dune73 do you have any idea?

@dune73
Copy link
Contributor

dune73 commented Nov 7, 2025

If I remember correctly, Apache waits for connections to be closed or whatever. I do not think it sends a RST or something, but it really waits. So if it's slow to stop, I would look at the connection table during the stopping.

Stopping Apache in production is always slow. In a test situation that's not ideal. Maybe you want to kill it instead.

Take a look at my script https://www.netnea.com/files/apachex that I use for control of Apache in my lab. It's a very rough tool, more a mallet than a screwdriver, but it never fails me to kill and restore apache instantly.

Not sure this is the right path, but that's what I have in mind.

@theseion
Copy link
Collaborator

theseion commented Nov 8, 2025

There are probably two things you can do:

  1. Test request should be sent with Connection: close
  2. Kill httpd

The first is just a nice-to-have of course. I'd absolutely simply kill httpd. There's no reason for handling open connections gracefully in this scenario.

@airween
Copy link
Member Author

airween commented Nov 8, 2025

Thanks for ideas,

1. Test request should be sent with `Connection: close`

All requests send Connection: close header (I checked with tcpdump), eg:

10:55:19.058098 IP localhost.omniorb > localhost.33008: Flags [P.], seq 1:225, ack 144, win 512, options [nop,nop,TS val 2093795869 ecr 2093795869], length 224
E...~d@.@..}............J..................
|...|...HTTP/1.1 200 OK
Date: Sat, 08 Nov 2025 09:55:19 GMT
Server: Apache/2.4.58 (Ubuntu)
Last-Modified: Sat, 08 Nov 2025 09:46:10 GMT
ETag: "5-643122b0cc4e7"
Accept-Ranges: bytes
Content-Length: 5
Connection: close

TEST

2. Kill httpd

I tried, but then many other weird issues came out. Eg. when httpd_start() function is called, eg:

Loaded 2 tests from /home/airween/src/modsecurity-apache/tests/regression/config/00-load-modsec.t
  1) config - module loaded: passed
Httpd start failed with error messages:
httpd (pid 10748) already running

The first is just a nice-to-have of course. I'd absolutely simply kill httpd. There's no reason for handling open connections gracefully in this scenario.

I'm afraid I have to reorganize the complete start/stop mechanism.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 8, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants