Package VCF

Class VCF


  • public class VCF
    extends java.lang.Object
    Represents the data from a VCF file
    • Constructor Summary

      Constructors 
      Constructor Description
      VCF​(java.io.File f)
      Constructor from a VCF file
      VCF​(java.io.File f, java.util.List<PositionFilter> filters)
      Constructor from a file, filtering positions at read time.
      VCF​(java.io.File f, java.util.List<PositionFilter> preFilters, java.util.List<PositionChanger> positionChangers, java.util.List<GenotypeChanger> genotypeChangers, java.util.List<PositionFilter> filters, java.util.List<java.lang.String> requiredFormats)
      Constructor from a file, filtering positions at read time and changing genotypes read in.
      VCF​(Meta meta, java.util.List<Position> positions)
      Create a VCF object from data rather than a file
    • Constructor Detail

      • VCF

        public VCF​(java.io.File f)
            throws VCFException
        Constructor from a VCF file
        Parameters:
        f - The VCF file
        Throws:
        VCFException - If there is a problem with the VCF file or the data in it.
      • VCF

        public VCF​(java.io.File f,
                   java.util.List<PositionFilter> filters)
            throws VCFException
        Constructor from a file, filtering positions at read time. Ensures filtered positions are not stored in memory reducing memory usage when reading a large VCF file with many positions that will be filtered
        Parameters:
        f - The file
        filters - The position filters to apply
        Throws:
        VCFException - If there is a problem with the VCF file or the data in it.
      • VCF

        public VCF​(java.io.File f,
                   java.util.List<PositionFilter> preFilters,
                   java.util.List<PositionChanger> positionChangers,
                   java.util.List<GenotypeChanger> genotypeChangers,
                   java.util.List<PositionFilter> filters,
                   java.util.List<java.lang.String> requiredFormats)
            throws VCFException
        Constructor from a file, filtering positions at read time and changing genotypes read in. By changing genotypes as they are read in any information contained in the genotype field that will not be used can be discarded so saving memory.
        Parameters:
        f - The file
        preFilters - A list of filters to be applied before any changes are applied (e.g. to filter out snps without the required data)
        positionChangers - List of changers to apply to the positions
        genotypeChangers - List of changers to apply to the genotypes
        filters - The position filters to apply (after the changers)
        requiredFormats - A list of formats required to be in the VCF
        Throws:
        VCFException - If there is a problem with the VCF file or the data in it
      • VCF

        public VCF​(Meta meta,
                   java.util.List<Position> positions)
            throws VCFDataException
        Create a VCF object from data rather than a file
        Parameters:
        meta - The meta data for the VCF
        positions - The positions for the VCF (which includes information on samples and genotypes).
        Throws:
        VCFDataException - If there is a problem with the passed in list of positions
    • Method Detail

      • genotypeStream

        public java.util.stream.Stream<Genotype> genotypeStream()
        Gets a stream of genotypes. Genotypes are returned by position i.e. the genotypes for one position are returned before moving onto the next position.
        Returns:
        The stream
      • genotypesByPositionStream

        public java.util.stream.Stream<Genotype> genotypesByPositionStream()
        Gets a stream of genotypes by position. That is the genotypes for one position are returned before moving onto the next position.
        Returns:
        The stream
      • genotypesBySampleStream

        public java.util.stream.Stream<Genotype> genotypesBySampleStream()
        Gets a stream of genotypes by sample. That is the genotypes for one sample are returned before moving onto the next sample.
        Returns:
        The stream
      • singlePosition

        public Position singlePosition​(PositionMeta position)
        Returns the data for a single position
        Parameters:
        position - The position meta data to return the data for
        Returns:
        The position data
      • singleSample

        public Sample singleSample​(java.lang.String sample)
        Returns the data for a single sample
        Parameters:
        sample - The string representing the sample to return the data for
        Returns:
        The position data
      • positionStream

        public java.util.stream.Stream<Position> positionStream()
        Returns a stream of positions in the VCF
        Returns:
        The stream
      • sampleStream

        public java.util.stream.Stream<Sample> sampleStream()
        Returns a stream of samples in the VCF
        Returns:
        The stream
      • getMeta

        public Meta getMeta()
        Get the meta information for this VCF
        Returns:
        The meta information
      • filterSamples

        public void filterSamples​(SampleFilter filter)
                           throws VCFDataException
        Filter the samples based on the given filter. Samples are merely hidden, not deleted, as they can then be unhidden (see resetVisible) which makes applying different filters to the same VCF much easier.
        Parameters:
        filter - The sample filter to be applied.
        Throws:
        VCFDataException - If there is a problem with the data in the VCF
      • filterPositions

        public void filterPositions​(PositionFilter filter)
                             throws VCFDataException
        Filter the positions based on the given filter. Positions are merely hidden, not deleted, as they can then be unhidden (see resetVisible) which makes applying different filters to the same VCF much easier.
        Parameters:
        filter - The position filter to be applied.
        Throws:
        VCFDataException - If there is a problem with the data in the VCF
      • limitToPositions

        public void limitToPositions​(java.util.List<PositionMeta> keep)
        Limits the VCF to the given position. Again positions are hidden, not deleted.
        Parameters:
        keep - The positions to keep.
      • limitToSamples

        public void limitToSamples​(java.util.List<java.lang.String> keep)
        Limits the VCF to the given samples. Again samples are hidden, not deleted.
        Parameters:
        keep - The positions to keep.
      • numberPositions

        public int numberPositions()
        Returns the number of (visible) positions in the VCF
        Returns:
        The number of positions
      • numberSamples

        public int numberSamples()
        Returns the number of (visible) samples in the VCF
        Returns:
        The number of samples
      • writeFile

        public void writeFile​(java.io.File f)
                       throws java.io.IOException
        Writes the VCF to a file. Only visible samples / positions are written.
        Parameters:
        f - The file to write to
        Throws:
        java.io.IOException - If there is an IO problem
      • asArray

        public <V> V[][] asArray​(java.lang.String format,
                                 Mapper<V> mapper)
                          throws VCFDataException
        Gets data from the genotypes in the VCF as an array
        Type Parameters:
        V - The type of data returned
        Parameters:
        format - The format in the genotype data to get the data from
        mapper - A mapper mapping from the string (in the VCF) to the required type
        Returns:
        The array
        Throws:
        VCFDataException - If there is a problem with the data in the VCF
      • asArrayTransposed

        public <V> V[][] asArrayTransposed​(java.lang.String format,
                                           Mapper<V> mapper)
                                    throws VCFDataException
        Gets data from the genotypes in the VCF as a transposed array
        Type Parameters:
        V - The type of data returned
        Parameters:
        format - The format in the genotype data to get the data from
        mapper - A mapper mapping from the string (in the VCF) to the required type
        Returns:
        The array
        Throws:
        VCFDataException - If there is a problem with the data in the VCF
      • asIntegerArray

        public int[][] asIntegerArray​(java.lang.String format,
                                      IntegerMapper mapper)
                               throws VCFDataException
        Gets data from the genotypes in the VCF as an integer array
        Parameters:
        format - The format in the genotype data to get the data from
        mapper - A mapper mapping from the string (in the VCF) to an integer
        Returns:
        The array
        Throws:
        VCFDataException - If there is a problem with the data in the VCF
      • asIntegerArrayTransposed

        public int[][] asIntegerArrayTransposed​(java.lang.String format,
                                                IntegerMapper mapper)
                                         throws VCFDataException
        Gets data from the genotypes in the VCF as a transposed integer array
        Parameters:
        format - The format in the genotype data to get the data from
        mapper - A mapper mapping from the string (in the VCF) to an integer
        Returns:
        The array
        Throws:
        VCFDataException - If there is a problem with the data in the VCF
      • asDoubleArray

        public double[][] asDoubleArray​(java.lang.String format,
                                        DoubleMapper mapper)
                                 throws VCFDataException
        Gets data from the genotypes in the VCF as a double array
        Parameters:
        format - The format in the genotype data to get the data from
        mapper - A mapper mapping from the string (in the VCF) to a double
        Returns:
        The array
        Throws:
        VCFDataException - If there is a problem with the data in the VCF
      • asDoubleArrayTransposed

        public double[][] asDoubleArrayTransposed​(java.lang.String format,
                                                  DoubleMapper mapper)
                                           throws VCFDataException
        Gets data from the genotypes in the VCF as a transposed double array
        Parameters:
        format - The format in the genotype data to get the data from
        mapper - A mapper mapping from the string (in the VCF) to a double
        Returns:
        The array
        Throws:
        VCFDataException - If there is a problem with the data in the VCF
      • asByteArray

        public byte[][] asByteArray​(java.lang.String format,
                                    ByteMapper mapper)
                             throws VCFDataException
        Gets data from the genotypes in the VCF as a byte array
        Parameters:
        format - The format in the genotype data to get the data from
        mapper - A mapper mapping from the string (in the VCF) to a byte
        Returns:
        The array
        Throws:
        VCFDataException - If there is a problem with the data in the VCF
      • asByteArrayTransposed

        public byte[][] asByteArrayTransposed​(java.lang.String format,
                                              ByteMapper mapper)
                                       throws VCFDataException
        Gets data from the genotypes in the VCF as a transposed byte array
        Parameters:
        format - The format in the genotype data to get the data from
        mapper - A mapper mapping from the string (in the VCF) to a byte
        Returns:
        The array
        Throws:
        VCFDataException - If there is a problem with the data in the VCF
      • getSamples

        public java.lang.String[] getSamples()
        Get a stream of sample names
        Returns:
        The stream
      • getPositions

        public PositionMeta[] getPositions()
        Get a stream of position meta data
        Returns:
        The stream
      • resetVisible

        public void resetVisible()
        Resets all samples and positions to be visible, that is the state immediately after the VCF was constructed.
      • numberPositionsFromFile

        public static int numberPositionsFromFile​(java.io.File f)
                                           throws VCFInputException
        Utility function that returns the number of positions in a file without reading in any data
        Parameters:
        f - The VCF file
        Returns:
        The number of positions
        Throws:
        VCFInputException - If there is a problem with reading the VCF
      • numberSamplesFromFile

        public static int numberSamplesFromFile​(java.io.File f)
                                         throws java.io.IOException
        Utility function that returns the number of samples in a file without reading in any data
        Parameters:
        f - The VCF file
        Returns:
        The number of samples
        Throws:
        java.io.IOException - If there is an IO problem