The ideal solution for health checks is to set both a maximum timeout duration, and a maximum number of retries. Typically you would want to fail after X retries first, and up to Y time (to account for network weirdness). But you definitely want to fail earlier, and not just wait for a long-ass time to pass before you finally fail.
That's for a standard service health check anyway. That service and health check shouldn't be started until the container it depends on has started and is healthy. In Kubernetes that's an Init Container in a Pod, in AWS ECS that's a dependsOn stanza in your Task Container Definition, and for Docker Compose it's the depends_on stanza in a Services entry.
set -eu
nowtime="$(date +%s)"
maxwait=300
maxloop=5
c=0
while [ $c -lt $maxloop ] ; do
if timeout "$maxwait" curl --silent --fail-with-body 10.0.0.1:8080/health ; then
exit 0
else
sleep 1
fi
if [ "$(date +%s)" -gt "$((nowtime+maxwait))" ] ; then
echo "$0: Error: max wait time $maxwait exceeded"
exit 1
fi
c=$((c+1))
done
However, Curl already supports this natively so there's no need to write a script. curl --silent --fail-with-body --connect-timeout 5 --retry-all-errors --retry-delay 1 --retry-max-time 300 --retry 300 10.0.0.1:8080/health
I’ve been playing around with trying to make a timeout using just bash builtins, motivated by the fact that my Mac doesn’t have the timeout command.
I haven’t quite been able to do it using _only_ builtins, but if you allow the sleep command (which has been standardised since the first version of POSIX, so it should be available pretty much anywhere that makes any sort of attempt to be POSIX compliant), then this seems ok:
# TIMEOUT SYSTEM
#
# Defines a timeout function:
#
# Usage: timeout <num_seconds> <command>
#
# which runs <command> after <num_seconds> have elapsed, if the script
# has not exited by then.
_alarm() {
local timeout=$1
# Spawn a subshell that sleeps for $timeout seconds
# and then sends us SIGALRM
(
sleep "$timeout"
kill -ALRM $$
) &
# If this shell exits before the timeout has fired,
# clean up by killing the subshell
subshell_pid=$!
trap _cleanup EXIT
}
_cleanup() {
if [ -n "$subshell_pid" ]
then
kill "$subshell_pid"
fi
}
timeout() {
local timeout=$1
local command=$2
trap "$command" ALRM
_alarm "$timeout"
}
# MAIN PROGRAM
times_up() {
echo 'TIME OUT!'
subshell_pid=
exit 1
}
timeout 10 times_up
for i in {1..20}
do
sleep 1
echo $i
done
FYI curl actually helpfully has a `--retry-connrefused` flag to avoid doing this loop in the shell entirely
What I usually do when I need a retry logic is
for i in {0..60}; do
true -- "$i" # shelleck surpression
if eventually_succeeds; then break; fi
sleep 1s
done
Not super elegant, but relatively correct, next level is exponential back off. Generally leaves a bit of composability around.Note that if you need to pass variables into the bash -c invocation, the best way to do it is to append them. e.g.
bash -c 'some command "$1" "$2"' -- "$var1" "$var2"
I use "--" because I like the way it looks but the first parameter goes in argv[0] which doesn't expand in "$@" so IMO something other than an argument should go there for clarity.Note that bash specifically has printf %q which could alternatively be used, but I prefer to use bourne-compatible things when the bash version isn't significantly cleaner.
I used to use
timeout 1800 mplayer show.mp4 ; sudo pm-suspend
As my poor man's parental control to let my kids watch a show for 30 minutes without manual supervision when they were younger. Useful commandI'm generally not a huge fan of inlining the command or cluttering up my local directory with little scripts to get around the fact that it must be a subprocess you can send a signal to. I use a wrapper like this, which exports a function containing whatever complex logic I want to time out. The funky quoting in the timeout bash -c argument is a generalized version of what aidenn0 mentioned in another comment here (passing in args safely to subproc).
#!/usr/bin/env bash
long_fn () { # this can contain anything, like OPs until curl loop
sleep $1
}
# to TIMEOUT_DURATION BASH_FN_NAME BASH_FN_ARGS...
to () {
local duration="$1"; shift
local fn_name="$1"; shift
export -f "$fn_name"
timeout "$duration" bash -c "$fn_name"' "$@"' _ $@
}
time to 1s long_fn 5 # will report it ran 1 second
Literally just added some command timeouts in a new kubernetes setup. This POSIX shell script implementation of await-cmd.sh / await-http.sh / await-tcp.sh is mature and quite handy in some scenarios:
Apparently timeout(1) is part of GNU Coreutils. I wasn't sure after reading whether it was part of Bash itself.
I tend to do something like this. Normally, I wouldn't include the extra jobs calls and extra echo calls, these are just to show what is happening.
#!/usr/bin/env bash
runUntilDoneOrTimeout () {
local -i timeout=0
OPTIND=1
while getopts "t:" opt; do
case $opt in
t) timeout=$OPTARG;;
esac
done
shift $((OPTIND - 1))
runCommand="$*"
$runCommand &
runPID=$!
echo checking jobs
jobs # just to prove there are some
echo job check complete
while jobs %- >& /dev/null && ((timeout > 0)); do
echo "waiting for $runCommand for $timeout seconds"
sleep 1
((timeout--))
done
if (( timeout == 0 )); then
echo "$runCommand timed out"
kill -9 $runPID
wait $runPID
else
echo "$runCommand completed"
fi
echo checking jobs
jobs # just to prove there are none
echo job check complete
}
declare -i timeopt=10
declare -i sleepopt=100
OPTIND=1
while getopts "t:s:" opt; do
case $opt in
t) timeopt=$OPTARG;;
s) sleepopt=$OPTARG;;
esac
done
shift $((OPTIND - 1))
runUntilDoneOrTimeout -t $timeopt sleep $sleepopt
Another fun way to test connectivity in pure bash (need a revision from the past 15 years) is
timeout 5 bash -c 'cat < /dev/null > /dev/tcp/google.com/80'
Replace google.com and port 80 with your web or tcp server (ssh too!). The command will error/time out if there isn’t a server listening or you have some firewall/proxy in the way.In _this_ particular case, you could just tell curl to internally timeout the request (via `-m`) instead of trying to manage the timeout on the process level
A friend recently showed me https://google.github.io/zx/api and it's actually quite enjoyable to use. Very close to a shell and LLMs know it quite well.
could you instead just add a count to how many times the sleep was invoked, and then add that check into the `until` condition to quit after X numbers of sleeps?
You dont need to timeout here, and you won't need to subshell another bash to just get the timeout to work.
From personal experience I would always recommend an output of how many retries were necessary, if one expect zero. Otherwise the retry loop can hide a problems like an unreliable service or network until it's too late.
This is my attempt to reinvent wheel from several years ago: https://github.com/zbigg/bashfoo/blob/master/timeout.sh
This is very complex, because if you.write lots of functions that call functions, you really just want to run something that inherits while env from your process, that's why there is control and sleep process and naive race to decide which finished first...
That's probably reason I ignored built-in timeout...
That reminds me of a blog article I wrote some time ago, where “timeout” gets mentioned: https://gaultier.github.io/blog/way_too_many_ways_to_wait_fo...
It’s more useful if you are implementing this in a general programming language, not in the shell, or if you want to know how it works under the hood.
Is there a language with a less standardized standard library than bash?
Is there an attempt anywhere to build a slightly modern standard library for bash scripts?
You know besides stack overflow?
Anyone know why shell scripts can't set alarm(2)? I assume it's because the shell is already using it for its own needs.
Neat idea. I’ve definitely been burned by silent timeouts in production before. Curious how this handles more complex cases like nested async calls or third-party dependencies that don't expose good hooks. Would be cool if this could somehow integrate with logging tools directly for more visbility.
why didn't he opt to use `timeout --signal=SIGKILL` and instead wrapped everything in extra bash to make it more killable?..
Retry is also a nice little utility that makes the retry loop easier:
Anyone know why we don't have alarm(2) access in the shell? I assume it is because the shell is already using alarm?
TIL Bash has `until` as well as `while`!
> We were using the Bash built-in until to check if the web server was up:
That saves you a whole character of typing:
until command ; do ... done
-->
while ! command; do ... done
curl has a timeout setting
--connect-timeout <seconds>
and retry: --retry <num>
so you could do curl --retry 5 --connect-timeout 10
I recently used timeout + tcpdump to bandaid over a race condition where sometimes a video streaming service started before the camera was ready and got stuck in a loop. So I just captured the video stream's port with tcpdump, then used timeout and tcpdump's exit code to tell if it's working or not
Have you heard of exponential back off? Tl;dr make the sleep time dependent on the mumtof retries
I've got, since forever, an advanced Bash prompt. But I also don't want my Bash prompt to have any visible delay. So back in the days I came up with time outs working with milliseconds (which, AFAIR, isn't the case for the timeout command whose granularity is seconds at best?). It involved processes and killing etc. but it got me what I wanted: either an instant prompt with all the infos I want of an instant prompt which may miss one or two infos. I much prefer that to the "my prompt contains no information because that's quicker".
Been working flawlessly since 20 years: so flawlessly that I don't remember how it works.
My fav little-known trick is to test various syscalls fail with strace fault injection, like:
random link: https://medium.com/@manav503/using-strace-to-perform-fault-i...