Once a survey has been launched by either the Controller, Agent, or the HUNT Server a common error to see "Failed: At least 300s since last heartbeat". This occurs when the Survey falls out of sync with the Cloud instance. If the Infocyte server does not hear back from the Survey for over 5 minutes (300 seconds), then a timeout occurs. This can happen for a variety of reasons, but generally the reasons will fall in one of two buckets:
- The Survey is no longer running.
- The Survey is not able to "talk" back to the SaaS instance.
This article is meant to tackle the most common scenarios which cause the issue. It is by no means comprehensive, but should give you the "next steps" to get your problem solved.
Survey Crash or Communication Problem?
Narrowing down the root-cause:
- At which step did the 300s timeout occur? Most of these occur right after the "Status changed to active" step.
The Survey was started successfully, but then the server never heard back for the first "heartbeat".
- Is the target host in question able to browse to your Infocyte SaaS Instance? (see below)
- Is there an antivirus or endpoint protection application on the target host? If so, the survey hash, and/or Infocyte files, need to be whitelisted.
In the common case where the issue is happening right after "Status changed to active" the most likely culprits are:
- Can't talk to the server due to networking/internet problem.
- Antivirus killed the survey.
How do you tell which one?
Testing/Troubleshooting Network Problems
One test you can try is to browse to your Cloud instance from the target host on HTTPS:
If you are not able to at least get to the login prompt then this is most likely the issue. Troubleshoot the browsing issue first:
- Is the machine able to browse to other sites?
- Is a web proxy or SSL decryption getting in the way?
- Is dl.infocyte.com whitelisted in your environment?
- Are the IP Addresses for your instance allowed through the firewall on port 443?
Infocyte IP Addresses to allow:
Even if you can browse successfully to the Cloud instance Server you may still have a more nuanced networking issue. If you suspect as much there are a couple of tools to employ to gain more info about the problem:
- The Survey log. This log should either be in the Agent install directory under "\logs\" or in "C:\Windows\Temp\logs\", ("/tmp/logs" for Linux).
Once the "Failed: At least 300s..." message occurs collect the log from the target host. You will find it at C:\Windows\Temp\agent.log on Windows and /tmp/agent.log on Linux. Look for errors related to connectivity. For example:
[2020-03-10 09:00:42][hunt_survey::mothership::heartbeat_response] - Error communicating with the server: https://myinfocyte.infocyte.com/api/survey/reply: error trying to connect: peer misbehaved: downgrade to TLS1.2 when TLS1.3 is supported
[2020-03-10 09:04:42][hunt_survey] - Server appears to be down, stopped responding to heartbeats
- Wireshark. This is a more advanced networking diagnostic tool and if you do not already have familiarity with it you may require assistance from our support staff to collect and interpret it. If you do have familiarity with it then to collect packets specific to Cloud traffic use the following Capture Filter:
host 220.127.116.11 or host 18.104.22.168 or host 22.214.171.124 or 126.96.36.199 or host 188.8.131.52 or host 184.108.40.206If you are using the on-premise (HUNT Server) product, use a capture filter for your HUNT Server IP instead.
If the Survey was never heard from after going "Active" and you have ruled out network connectivity problems then the most likely cause is a third party program stopped the Survey from running.
It is recommended to Whitelist the Survey hash in your endpoint security products. You can find the Survey hashes by logging into your Cloud instance, go to your account icon at the right top, push "Admin" and select "Downloads".
If you are not sure if this is the issue or not then to test, download the appropriate Survey Agent for the target host, and from the host itself open an Administrative prompt (Windows) or Sudo terminal (Linux) and type:
Linux 64 bit:
Linux 32 bit:
Check if the Survey finishes. If it does, then go back to network troubleshooting. If it crashes, check for a security popup, or check your endpoint security logs for a block. Then Whitelist the hash (recommended) or filename and try again.
Debugging Other Issues
Occasionally, there are issues that do not fit into the above buckets. You may see the "Failed: At least 300s since last heartbeat" error later in the process, such as failing on the "Autostarts" step, for example:
In these situations, you will want to grab a debug log and send it to our support team. If the failure is specific to one object (like Autostarts) it is helpful to run the Offline Survey using the Manual Scan options verbose mode, collecting "only autostarts":
Note: you will run the following command from an Administrative command prompt.
agent.win64.exe --verbose survey --only-autostarts
Once complete, or once the crash occurs, collect the agent-<DATE>.log and review (or send to Infocyte Support).
Again, this guide is not comprehensive. If you have tried the above and are still unable to reach a resolution please contact us at email@example.com and we'll help you diagnose the problem.