Amr adaptive multi-rate codec


















In a multi-channel session, the codec mode request SHOULD be interpreted by the receiver of the payload as the desired encoding mode for all the channels in the session. That may include adjusting the codec mode, but also includes adjusting the level of redundancy or number of frames per packet. The codec mode selection MAY be restricted by a session parameter to a subset of the available modes.

This is to avoid the loss of data synchronization in the depacketization process, which can result in a huge degradation in speech quality. The extra comfort noise frame types specified in table 1a in [ 2 ] i.

Q 1 bit : Frame quality indicator. The frame quality indicator enables damaged frames to be forwarded to the speech decoder for error concealment. This can improve the speech quality more than dropping the damaged frames. See Section 4. For multi-channel sessions, the ToC entries of all frames from a frame-block are placed in the ToC in consecutive order as defined in Section 4.

When multiple frame-blocks are present in a packet in bandwidth-efficient mode, they will be placed in the packet in order of their creation time. The following figure shows an example of a ToC of three entries in a single-channel session using bandwidth-efficient mode. Speech Data Speech data of a payload contains zero or more speech frames or comfort noise frames, as described in the ToC of the payload. The length of the speech frame is implicitly defined by the mode indicated in the FT field.

As specified there, the bits of speech frames have been rearranged in order of decreasing sensitivity, while the bits of comfort noise frames are in the order produced by the encoder.

The resulting bit sequence for a frame of length K bits is denoted d 0 , d 1 , Algorithm for Forming the Payload The complete RTP payload in bandwidth-efficient mode is formed by packing bits from the payload header, table of contents, and speech frames in order as defined by their corresponding ToC entries in the ToC list , and to bring the payload to octet alignment, 0 to 7 padding bits.

They are packed contiguously into octets beginning with the most significant bits of the fields and the octets. To be precise, the four-bit payload header is packed into the first octet of the payload with bit 0 of the payload header in the most significant bit of the octet.

The four most significant bits numbered of the first ToC entry are packed into the least significant bits of the octet, ending with bit 3 in the least significant bit.

Packing continues in the second octet with bit 4 of the first ToC entry in the most significant bit of the octet. If more than one frame is contained in the payload, then packing continues with the second and successive ToC entries. Bit 0 of the first data frame follows immediately after the last ToC bit, proceeding through all the bits of the frame in numerical order. Bits from any successive frames follow contiguously in numerical order for each frame and in consecutive order of the frames.

Payload Examples 4. Single-Channel Payload Carrying a Single Frame The following diagram shows a bandwidth-efficient AMR payload from a single-channel session carrying a single speech frame-block.

The encoded speech bits, d 0 to d , are arranged in descending sensitivity order according to [ 2 ]. Finally, two padding bits P are added to the end as padding to make the payload octet aligned. The first frame is a speech frame at 6. The fourth frame in the payload is a speech frame at 8. As shown below, the payload carries a mode request for the encoder on the receiver's side to change its future coding mode to AMR-WB 8.

The encoded speech and SID bits, d 0 to d , g 0 to g 39 , and h 0 to h , are arranged in the payload in descending sensitivity order according to [ 4 ]. Note, no speech bits are present for the third frame.

Finally, seven zero bits are padded to the end to make the payload octet aligned. Multi-Channel Payload Carrying Multiple Frames The following diagram shows a two-channel payload carrying 3 frame- blocks, i. In the payload, all speech frames contain the same mode 7. The CMR is set to 15, i. The two channels are defined as left L and right R in that order.

The encoded speech bits is designated dXY Exemplifying this, for frame-block 1 of the left channel, the encoded bits are designated as d1L 0 to d1L Octet-Aligned Mode 4. R: is a reserved bit that MUST be set to zero. Interleaving MUST be performed on a frame-block basis i. The following example illustrates the arrangement of speech frame- blocks in an interleaving group during an interleaving session.

We also assume that the first payload packet of the interleaving group is s, and the number of speech frame-blocks carried in each payload is N.

Then we will have: Sjoberg, et al. There will be no interleaving effect unless the number of frame- blocks per packet N is at least 2. The sender of the payload MUST only apply interleaving if the receiver has signalled its use through out-of-band means. Instead, the presence and order of the frame-blocks in a packet will follow the pattern described in 4.

The following example shows the ToC of three consecutive packets, each carrying three frame-blocks, in an interleaved two-channel session. Here, the two channels are left L and right R with L coming before R, and the interleaving length is 3 i. This results in the interleaving group size of 9 frame-blocks.

FT 4 bits, unsigned integer : see definition in Section 4. Q 1 bit : see definition in Section 4. It only exists if the use of CRC is signalled out-of-band for the session. When present, each CRC in the list is 8 bits long and corresponds to a speech frame NOT a frame- block carried in the payload. Calculation and use of the CRC is specified in the next section.

This section provides more details on how to use the frame CRC in the octet-aligned payload header together with a partial transport layer checksum to achieve UED.

Note, the number of class A bits for various coding modes in AMR codec is specified as informative in [ 2 ] and is therefore copied into Table 1 in Section 3. The receiver of the payload SHOULD examine the data integrity of the received class A bits by re-calculating the CRC over the received class A bits and comparing the result to the value found in the received payload header.

See [ 6 ] and [ 7 ] more details. In binary form, the polynomial appears as follows: MSB.. The CRC Sjoberg, et al. This operation is repeated for each bit that the CRC should cover. In this case, the first bit would be d 0 for the speech frame for which the CRC should cover. When the last bit e. Speech Data In octet-aligned mode, speech data is carried in a similar way to that in the bandwidth-efficient mode as discussed in Section 4.

The padding bits MUST be ignored on reception. In other words, each speech frame MUST be octet-aligned. Since the bits within each frame are ordered with the most error-sensitive bits first, interleaving the octets collects those sensitive bits from all frames to be nearer the beginning of the packet.

The details of assembling the payload are given in the next section. The use of robust sorting order for a payload type MUST be agreed via out-of-band means. Section 8 specifies a media type parameter for this purpose. Note, robust sorting order MUST only be performed on the frame level and thus is independent of interleaving, which is at the frame-block level, as described in Section 4.

In other words, robust sorting can be applied to either non-interleaved or interleaved payload types. Methods for Forming the Payload Two different packetization methods, namely, normal order and robust sorting order, exist for forming a payload in octet-aligned mode.

In both cases, the payload header and table of contents are packed into the payload the same way; the difference is in the packing of the speech frames.

The payload begins with the payload header of one octet, or two octets if frame interleaving is selected. The payload header is followed by the table of contents consisting of a list of one-octet ToC entries. The speech data follows the table of contents, or the CRCs if present. For packetization in the normal order, all of the octets comprising a speech frame are appended to the payload as a unit.

For packetization in robust sorting order, the octets of all speech frames are interleaved together at the octet level.

That is, the data portion of the payload begins with the first octet of the first frame, followed by the first octet of the second frame, then the first octet of the third frame, and so on. After the first octet of the last frame has been appended, the cycle repeats with the second octet of each frame.

The process continues for as many octets as are present in the longest frame. If the frames are not all the same octet length, a shorter frame is skipped once all octets in it have been appended. The order of the frames in the cycle will be sequential if frame interleaving is not in use, or according to the interleave pattern specified in the payload header if frame interleaving is in use. Exactly how many octets need to be covered depends on the network and application.

No frame CRC, interleaving, or robust sorting is in use. Two frame-blocks, each containing two speech frames of 7. The two channels are left L and right R with L coming before R. Moreover, frame CRC, robust sorting, and frame-block interleaving are all enabled for the payload type.

The next two frames are the L and R channel frames of frame-block 3, consisting of bits f3L Finally, the payload is robust sorted. Implementation Considerations An application implementing this payload format MUST understand all the payload parameters in the out-of-band signaling used.

This requirement ensures that an implementation always can decide if it is capable or not of communicating. No operating mode of the payload format is mandatory to implement. The requirements of the application using the payload format should be used to determine what to implement. To achieve basic interoperability, an implementation SHOULD at least implement both bandwidth-efficient and octet-aligned modes for a single audio Sjoberg, et al. The mode-change-period, mode-change-capability, and mode-change- neighbor parameters are intended for signaling with GSM endpoints.

The encoder may arbitrarily select the initial phase odd or even frame- block where codec mode changes are performed, but then SHOULD stick to that phase as far as possible. However, in rare cases, handovers or other events e. The decoder SHALL therefore be prepared to accept changes also in the other phase and to other modes.

Section 8 specifies the usage of the parameters mode-change-period and mode-change-capability to indicate the desired behavior in applications. In gateway scenarios, encoders can be requested through the "mode-set" parameter to use a limited mode-set that is supported by the link beyond the gateway. Further, to avoid congestion on that link, the encoder SHOULD limit the initial codec mode for a session to a lower mode, until at least one frame-block is received with rate control information.

Decoding Validation When processing a received payload packet, if the receiver finds that the calculated payload length, based on the information for the payload type and the values found in the payload header fields, does not match the size of the received packet, the receiver SHOULD discard the packet. This is because decoding a packet that has errors in its length field could severely degrade the speech quality.

Multiple channel content is supported. There also exists another storage format for AMR and AMR-WB that is suitable for applications with more advanced demands on the storage format, like random access or synchronization with video. Its media type is specified by RFC [ 32 ]. The version number in the magic numbers refers to the version of the file format. CHAN 4 bits, unsigned integer : Indicates the number of audio channels contained in this storage file. The valid values and the order of the channels within a frame-block are specified in Section 4.

Speech Frames After the file header, speech frame-blocks consecutive in time are stored in the file. Each frame-block contains a number of octet- aligned speech frames equal to the number of channels, and stored in increasing order, starting with channel 1.

Following this one octet header come the speech bits as defined in 4. The last octet of each frame is padded with zeroes, if needed, to achieve octet alignment. The following example shows an AMR frame in 5. However, the multi-rate capability of AMR and AMR-WB speech coding may provide an advantage over other payload formats for controlling congestion since the bandwidth demand can be adjusted by selecting a different coding mode.

If forward error correction FEC is used to combat packet loss, the amount of redundancy added by FEC will need to be regulated so that the use of FEC itself does not cause a congestion problem. The actual mechanism for congestion control is not specified but should be suitable for real- time flows, possibly "TCP Friendly Rate Control" [ 21 ].

Security Considerations RTP packets using the payload format defined in this specification are subject to the general security considerations discussed in [ 8 ] and in any used profile, like AVP [ 12 ] or SAVP [ 26 ]. As this format transports encoded speech, the main security issues include confidentiality, authentication, and integrity of the speech itself. The payload format itself does not have any built-in security mechanisms. External mechanisms, such as SRTP [ 26 ], need to be used for this functionality.

Note that the appropriate mechanism to provide security to RTP and the payloads following this memo may vary. It is dependent on the application, the transport, and the signaling protocol employed. This payload format does not exhibit any significant non-uniformity in the receiver side computational complexity for packet processing, and thus is unlikely to pose a denial-of-service threat due to the receipt of pathological data.

There is less of a need to encrypt the payload header or the table of contents due to a that they only carry information about the requested speech mode, frame type, and frame quality, and b that this information could be useful to some third party, e. Therefore encryption should be performed after packet encapsulation, and decryption should be performed before packet decapsulation. Encryption may affect interleaving.

Specifically, a change of keys should occur at the boundary between interleaving groups. If it is not done at that boundary on both endpoints, the speech quality will be degraded during the complete interleaving group for any receiver.

The encryption mechanism may impact the robustness of the error correcting mechanism. This is discussed in Section 9. Authentication and Integrity To authenticate the sender and to protect the integrity of the RTP packets in transit, an external mechanism has to be used. Tampering with the CMR field may result in a different speech quality than desired.

The registrations are done following RFC [ 15 ] and the media registration rules [ 14 ]. Equivalent parameters could be defined elsewhere for use with control protocols that do not use media types or SDP.

Two separate media type registrations are made, one for AMR and one for AMR-WB, because they are distinct encodings that must be distinguished by their own media type. Data formats are specified for both real-time transport in RTP and for storage type applications such as email attachments. This media type registration covers both real-time transfer via RTP and non-real-time transfers via stored files.

Note, any unspecified parameter MUST be ignored by the receiver. If 0 or if not present, bandwidth-efficient operation is employed. Possible values are a comma separated list of modes from the set: 0, If not present, all codec modes are allowed for the payload type. The initial phase of the interval is arbitrary, but changes must be separated by a period of N frame-blocks, i. If this parameter is not present, mode changes are allowed at any time during the session, i.

The parameter may take value of 1 or 2. A value of 1 indicates that the client is not capable of restricting the mode change period to 2, and that the codec mode may be changed at any point. If this parameter is not present, the mode- change restriction capability is not supported, i.

Neighboring modes are the ones closest in bit rate to the current mode, either the next higher or next lower rate. If 0 or if not present, change between any two modes in the active codec mode set is allowed.

The time is calculated as the sum of the time that the media present in the packet represents. The presence of this parameter also implies automatically that octet-aligned operation SHALL be used.

The possible values and their respective channel order is specified in Section 4. If omitted, it has the default value of 1. This parameter allows a receiver to have a bounded delay when redundancy is used. Allowed values are between 0 no redundancy will be used and If the parameter is omitted, no limitation on the use of redundancy is present. Encoding considerations: The Audio data is binary data, and must be encoded for non- binary transport; the Base64 encoding is suitable for email.

When used in RTP context the data is framed as defined in [ 14 ]. Some examples include; Voice over IP, streaming media, voice messaging, and voice recording on digital cameras. Additional information: The following applies to stored-file transfer methods: Magic numbers: single-channel: ASCII character string "! This media type is widely used in streaming, VoIP, and messaging applications on many types of devices.

This media type registration covers both real-time transfer via RTP and non-real- time transfers via stored files. Possible values are a comma-separated list of modes from the set: 0, The initial phase of the interval is arbitrary, but changes must be separated by multiples of N frame-blocks, i.

If this parameter is not present, mode changes are allowed at Any time during the session, i. The values of N that are allowed are specified in Section 4. If multiple configurations are of interest to the application, they may all be offered; however, care should be taken not to offer too many payload types.

If a mode set was supplied in the offer, the answerer SHALL return the mode-set unmodified or reject the payload type. However, the answerer is free to choose a mode-set in the answer only if no mode-set was supplied in the offer for a unicast two-peer session. The mode-set in the answer is binding both for offerer and answerer. For multicast sessions, the answerer SHALL only participate in the session if it supports the offered mode-set.

Both parameters are declarative and are combined to allow a session participant to determine if the payload type can be supported. The mode-change-period will indicate what the offerer or answerer requires of data it receives, while the mode-change- capability indicates its transmission capabilities. It is then indicating the answerer's capability to transmit with that mode-change-period for the provided payload format configuration.

The information is useful in future re-negotiation of the payload formats. It is intended to be used in gateway scenarios for example, to GSM networks where the support of Sjoberg, et al. By including the parameter, the offerer or answerer indicates that it desires to receive streams with "mode-change-neighbor" restrictions. The "maxptime" parameter MUST be handled in the same way.

For send-only or send-recv unicast media streams, the parameter declares the limitation on redundancy that the stream sender will use.

For recvonly streams, it indicates the desired value for the stream sent to the receiver. This information is likely to simplify the media stream handling in the receiver. This is especially true if no redundancy will be used, in which case "max-red" is set to 0. While low speech bandwidth schemes provide low speech quality and high error protection. This codec generates the speech bandwidth of — Hz. AMR has 8 source codec of bit rates AMR is a hybrid speech coder that transmits both speech parameters and waveform signals.

The complexity of the AMR algorithm is rated 5 using a relative scale. The opinion score of AMR is AMR is also used in Network analysis and simulation tools. AMR is also an essential part of Consumer electronics, Content creator tools, Test and measurement equipment, and Toys. The main use of AMR is in Multimedia-Audio and videoconferencing, digital radio broadcasting, audiobooks, ringtones, etc.

During network congestion, AMR uses low bit rates in order to preserve audio quality. AMR is dynamically adaptable to all kinds of network conditions. AMR provides good quality speech at low cost and with great robustness. AMR maximize the possibility of receiving signals by trading off the speech bit rate to channel coding.

AMR consists of 8 rates so it is the most widely used codec in the whole world. Recommended Articles. Article Contributed By :. Easy Normal Medium Hard Expert. Writing code in comment? Please use ide. Load Comments. What's New. Most popular in Advanced Computer Subject.



0コメント

  • 1000 / 1000