The new API should provide at least the level of existing functionality. It is also an opportunity to address outstanding issues with the current API. Here are some issues that we should address:
Seeking
The current API is deficient about this and we need to come up with a good way to approach seeking. Some containers have indices within them to allow fast seeking.
Alternatively, no table may be available. In this case, would the codecs seek themselves within their streams? If this were the case, the extractor would just pass the seek time through to the codec.
Or perhaps the extractor uses some function on the codec to get a seek token of some sort that is passed back into the codec to determine where to start playing. For example, perhaps the time is passed into the audio codec, which gives back a token which describes how many samples into the file that data is. Then the extractor calls the codec with the sample number. Similarly, it may be able to convert that sample number into a number of samples to use for another (video?) stream in order to seek with proper sync.
Streams
One problem with the current implementation is that it is not very friendly towards streaming media, from what I can tell. For example, reading to the end of a stream before playing is impossible. Some file formats have streamable and nonstreamable versions that are different. Also, streamable file formats (such as mpeg level 1) can lack indicies which can be used for efficient seeking.
Generic Container Information
There is a set of information that all containers will provide in some form or another, such as number of tracks. We should provide this information through a common interface.
Container Specific Information
There is also information specific to certain types of containers. For example, the mpeg1.extractor might allow you to extract the ID3 code information.
File Layout
Some container formats support interleaved tracks, others don't. Some support both. For some applications it may be preferrable to have the data in one format or another.