FAXing over VoIP networks doesn't work. You can sometimes arrange things so a fairly high percentage of FAXes get through OK. You can occassionally create setups that work 100% of the time. These are rare and unrepeatable setups. You need to use a proper FAX over IP protocol, such as T.38, to achieve consistent reliable FAXing across IP networks.
Sending FAXes over VoIP networks usually fails. It is human nature to look for simple reasons for that, and simple cures. In reality, there are a number of reasons, and no certain universal cures. VoIP networks are designed to do a good job with speech. Carrying any sound other than a single voice speaking is not generally a system requirement. It shouldn't be too surprising if it works rather poorly.
The commonest problem with sending a FAX over VoIP networks is the easiest to deal with. A low bit rate voice codec is unable to carry a fast modem signal without severe distortion. Would you really expect an 8kbps G.729 codec to convey a 9.6kbps FAX modem signal correctly?. The only common codecs capable of adequately preserving FAX modem signals up to 14,400bps (V.17) are u-law and A-law. Up to 9600bps (V.29) a fully implemented G.726 codec will also work. However not all codecs claiming to be G.726 fully implement the spec. A few shortcuts can save considerable compute power, and only a few people need the spec. to be fully implemented. Your mileage may vary. The G.726 codec was, however, specifically designed to be able to carry medium speed modems, such as the V.29 modem used for FAX.
Recently, FAX machines supporting 33,600bps (V.34bis) have become popular. This rate is unlikely to work with any reliability across any VoIP connection, even when an A-law or u-law codec is used. The codecs will maintain the required signal quality, but the delay across the VoIP channel, even if it is a stable delay, will prevent the echo cancelers in most modems from training well enough. The slower FAX modems - V.27ter, V.29 and V.17 - do not use echo cancellation, so the problem does not exist there.
Lower bit rate codecs have zero chance of working for any standard FAX image modem. Many will convey the 300bps (V.21) FAX control messages OK. They will not convey the fast modem signals, used for the actual image data.
Only one standard has been developed for real time FAX over IP - T.38. Before discussing what T.38 is like, it is important to note a few things about its current status in the real world. A lot of ATA boxes, and other gateway equipment, still do not support T.38. A lot which say they support it actually just have it in the pipeline (e.g. Sipura 2100). Very few T.38 implementations currently support 33,600bps (V.34bis) FAX, although recently low cost all in one printer/scanner/FAX machines supporting V.34 FAX have become fairly common. A lot have very buggy implementations
So, what is T.38?
T.38 is the real-time FAX over IP protocol (FOIP). This means it is designed to work like traditional FAXing. You call another FAX machine, and send the FAX as you wait. Either FAX machine could be a traditional FAX machine connected to the PSTN, an ATA box, or similar; it could be a FAX machine with an RJ-45 connector plugged straight into an IP network; it could be a computer pretending to be a FAX machine.
There are some issues in trying to do FoIP well with traditional FAX machines. Recent versions of the core FAX protocol - T.30 - have introduced flags and features to allow newer FAX machines to be Internet aware FAX devices. These tie in to the T.38 spec. A few makers now say their FAX machines are "Internet Aware" or "Internet Capable". This might mean the machines can connect directly to an IP network. It usually just means the machines are aware of the existance and qualities of T.38.
What does T.38 look like?
The original version of the T.38 spec. defined two methods for transmission across an IP network - one based on UDP and one based on TCP. At that time RTP was the emerging protocol for streaming media across IP networks. Instead of using that, T.38 defined its own method of packaging data within UDP packets, called UDPTL. This has now been accepted as a mistake, and an RTP based form of the protocol has been defined. Currently, this just makes more work for implementors. The only method in widespread use is the non-RTP method, so that has to be implemented. There is no choice. For the future, the RTP form has to be implemented too. AHHHH!
The T.38 spec. says some odd things about when the UDP form is more suitable and when the TCP form is more suitable. I would say the TCP form should be used between two IP devices. When one of the machines is connected to an analogue phone line, the UDP form probably has to be used for its nearer real-time streaming qualities. UDP is, however, an unreliable protocol, and that compromises the benefits of T.38 over trying to use FAX over VoIP.
T.38 is a very loose specification. Most good modern specifications try to really tie down what should happen. T.38 allows a huge spread of implementation decisions.
In what ways does T.38 outperform FAX over VoIP?
If the TCP form of T.38 is used, it is very robust. Used between Internet aware FAX machines, it basically solves all the problems of using VoIP for FAX.
If one of the UDP forms of T.38 is used, it is common for each packet to contain a copy of the main data in the previous packet. This is an option, but most implementations seem to support it. This forward error correction scheme makes T.38 far more tolerate of dropped packets than using VoIP. It requires two successive lost packets to actually loose any data. The overheads in T.38 are so big, the extra data sent in each packet is hardly noticable. If two successive packets are lost, T.38 will still have trouble. However, if that is a common occurance, the network is probably quite bad, and VoIP performance will be poor.
Loosing a packet in a T.38 stream does not cause the modems to loose sync. This means two successive lost packets should only corrupt a section of an image. If the optional FAX error correction (ECM) mode is used, there is a good chance that with a retry or two, a perfect image will be transferred. Not ideal, but functional.
Much of the robustness of T.38 comes not from what the spec. says, but from the potential it offers for smart implementation. The trick is to work out the smartest implementation, which will not cause trouble with the many buggy implementations of T.30 which exist in commercial FAX products.
The T.30 spec. allows transmission of a page to be paused just before the end of any row of pixels. This is used as a method of flow control, by FAX machines with slow paper handling. It can also be used by a T.38 implementation, to wait for more data when a packet is delayed or lost. This means a T.38 gateway can start sending a page as soon as it gets some data, without performing any jitter buffering. When there is little jitter, transmission delay is minimised. When jitter is bad, things will be delayed only as much as necessary. If packets are lost, and FEC is in use, the outgoing gateway can simply wait a while, to try to reconstruct the stream from the redundant information available when further packets arrive. If the required data is irretrievably lost, due to a burst of lost packets, transmission can continue with only the minimum possible page corruption.
HDLC transmission, used for the FAX control messages, offers no similar way to precisely control flow. However, it is possible to achieve pretty good results. The HDLC protocol only supports flow control between HDLC frames. The full HDLC protocol allows frames to be aborted midway, and restarted. However, the protocol as definition in the T.30 spec. doesn't include the abort feature. If we wait until we have received the whole of a long frame, before starting to pass it on, we could introduce substantial delay. However, this is not a big problem for T.30 FAX transmission. Most of the HDLC frames used in the T.30 spec. are quite short, especially the ones which occur between pages. Delaying until we receive all the data for one of these messages will not significantly extend the call. To avoid long delays for very long frames we can apply rules like: if a frame is no more than 30 bytes (1 second) long we wait for the whole frame to be received before passing it on; if the frame is longer we start passing it on with a 1 second delay.