Package VCF
Class VCF
- java.lang.Object
-
- VCF.VCF
-
public class VCF extends java.lang.Object
Represents the data from a VCF file
-
-
Constructor Summary
Constructors Constructor Description VCF(java.io.File f)
Constructor from a VCF fileVCF(java.io.File f, java.util.List<PositionFilter> filters)
Constructor from a file, filtering positions at read time.VCF(java.io.File f, java.util.List<PositionFilter> preFilters, java.util.List<PositionChanger> positionChangers, java.util.List<GenotypeChanger> genotypeChangers, java.util.List<PositionFilter> filters, java.util.List<java.lang.String> requiredFormats)
Constructor from a file, filtering positions at read time and changing genotypes read in.VCF(Meta meta, java.util.List<Position> positions)
Create a VCF object from data rather than a file
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description <V> V[][]
asArray(java.lang.String format, Mapper<V> mapper)
Gets data from the genotypes in the VCF as an array<V> V[][]
asArrayTransposed(java.lang.String format, Mapper<V> mapper)
Gets data from the genotypes in the VCF as a transposed arraybyte[][]
asByteArray(java.lang.String format, ByteMapper mapper)
Gets data from the genotypes in the VCF as a byte arraybyte[][]
asByteArrayTransposed(java.lang.String format, ByteMapper mapper)
Gets data from the genotypes in the VCF as a transposed byte arraydouble[][]
asDoubleArray(java.lang.String format, DoubleMapper mapper)
Gets data from the genotypes in the VCF as a double arraydouble[][]
asDoubleArrayTransposed(java.lang.String format, DoubleMapper mapper)
Gets data from the genotypes in the VCF as a transposed double arrayint[][]
asIntegerArray(java.lang.String format, IntegerMapper mapper)
Gets data from the genotypes in the VCF as an integer arrayint[][]
asIntegerArrayTransposed(java.lang.String format, IntegerMapper mapper)
Gets data from the genotypes in the VCF as a transposed integer arrayvoid
filterPositions(PositionFilter filter)
Filter the positions based on the given filter.void
filterSamples(SampleFilter filter)
Filter the samples based on the given filter.java.util.stream.Stream<Genotype>
genotypesByPositionStream()
Gets a stream of genotypes by position.java.util.stream.Stream<Genotype>
genotypesBySampleStream()
Gets a stream of genotypes by sample.java.util.stream.Stream<Genotype>
genotypeStream()
Gets a stream of genotypes.Meta
getMeta()
Get the meta information for this VCFPositionMeta[]
getPositions()
Get a stream of position meta datajava.lang.String[]
getSamples()
Get a stream of sample namesvoid
limitToPositions(java.util.List<PositionMeta> keep)
Limits the VCF to the given position.void
limitToSamples(java.util.List<java.lang.String> keep)
Limits the VCF to the given samples.int
numberPositions()
Returns the number of (visible) positions in the VCFstatic int
numberPositionsFromFile(java.io.File f)
Utility function that returns the number of positions in a file without reading in any dataint
numberSamples()
Returns the number of (visible) samples in the VCFstatic int
numberSamplesFromFile(java.io.File f)
Utility function that returns the number of samples in a file without reading in any datajava.util.stream.Stream<Position>
positionStream()
Returns a stream of positions in the VCFvoid
resetVisible()
Resets all samples and positions to be visible, that is the state immediately after the VCF was constructed.java.util.stream.Stream<Sample>
sampleStream()
Returns a stream of samples in the VCFPosition
singlePosition(PositionMeta position)
Returns the data for a single positionSample
singleSample(java.lang.String sample)
Returns the data for a single samplevoid
writeFile(java.io.File f)
Writes the VCF to a file.
-
-
-
Constructor Detail
-
VCF
public VCF(java.io.File f) throws VCFException
Constructor from a VCF file- Parameters:
f
- The VCF file- Throws:
VCFException
- If there is a problem with the VCF file or the data in it.
-
VCF
public VCF(java.io.File f, java.util.List<PositionFilter> filters) throws VCFException
Constructor from a file, filtering positions at read time. Ensures filtered positions are not stored in memory reducing memory usage when reading a large VCF file with many positions that will be filtered- Parameters:
f
- The filefilters
- The position filters to apply- Throws:
VCFException
- If there is a problem with the VCF file or the data in it.
-
VCF
public VCF(java.io.File f, java.util.List<PositionFilter> preFilters, java.util.List<PositionChanger> positionChangers, java.util.List<GenotypeChanger> genotypeChangers, java.util.List<PositionFilter> filters, java.util.List<java.lang.String> requiredFormats) throws VCFException
Constructor from a file, filtering positions at read time and changing genotypes read in. By changing genotypes as they are read in any information contained in the genotype field that will not be used can be discarded so saving memory.- Parameters:
f
- The filepreFilters
- A list of filters to be applied before any changes are applied (e.g. to filter out snps without the required data)positionChangers
- List of changers to apply to the positionsgenotypeChangers
- List of changers to apply to the genotypesfilters
- The position filters to apply (after the changers)requiredFormats
- A list of formats required to be in the VCF- Throws:
VCFException
- If there is a problem with the VCF file or the data in it
-
VCF
public VCF(Meta meta, java.util.List<Position> positions) throws VCFDataException
Create a VCF object from data rather than a file- Parameters:
meta
- The meta data for the VCFpositions
- The positions for the VCF (which includes information on samples and genotypes).- Throws:
VCFDataException
- If there is a problem with the passed in list of positions
-
-
Method Detail
-
genotypeStream
public java.util.stream.Stream<Genotype> genotypeStream()
Gets a stream of genotypes. Genotypes are returned by position i.e. the genotypes for one position are returned before moving onto the next position.- Returns:
- The stream
-
genotypesByPositionStream
public java.util.stream.Stream<Genotype> genotypesByPositionStream()
Gets a stream of genotypes by position. That is the genotypes for one position are returned before moving onto the next position.- Returns:
- The stream
-
genotypesBySampleStream
public java.util.stream.Stream<Genotype> genotypesBySampleStream()
Gets a stream of genotypes by sample. That is the genotypes for one sample are returned before moving onto the next sample.- Returns:
- The stream
-
singlePosition
public Position singlePosition(PositionMeta position)
Returns the data for a single position- Parameters:
position
- The position meta data to return the data for- Returns:
- The position data
-
singleSample
public Sample singleSample(java.lang.String sample)
Returns the data for a single sample- Parameters:
sample
- The string representing the sample to return the data for- Returns:
- The position data
-
positionStream
public java.util.stream.Stream<Position> positionStream()
Returns a stream of positions in the VCF- Returns:
- The stream
-
sampleStream
public java.util.stream.Stream<Sample> sampleStream()
Returns a stream of samples in the VCF- Returns:
- The stream
-
getMeta
public Meta getMeta()
Get the meta information for this VCF- Returns:
- The meta information
-
filterSamples
public void filterSamples(SampleFilter filter) throws VCFDataException
Filter the samples based on the given filter. Samples are merely hidden, not deleted, as they can then be unhidden (see resetVisible) which makes applying different filters to the same VCF much easier.- Parameters:
filter
- The sample filter to be applied.- Throws:
VCFDataException
- If there is a problem with the data in the VCF
-
filterPositions
public void filterPositions(PositionFilter filter) throws VCFDataException
Filter the positions based on the given filter. Positions are merely hidden, not deleted, as they can then be unhidden (see resetVisible) which makes applying different filters to the same VCF much easier.- Parameters:
filter
- The position filter to be applied.- Throws:
VCFDataException
- If there is a problem with the data in the VCF
-
limitToPositions
public void limitToPositions(java.util.List<PositionMeta> keep)
Limits the VCF to the given position. Again positions are hidden, not deleted.- Parameters:
keep
- The positions to keep.
-
limitToSamples
public void limitToSamples(java.util.List<java.lang.String> keep)
Limits the VCF to the given samples. Again samples are hidden, not deleted.- Parameters:
keep
- The positions to keep.
-
numberPositions
public int numberPositions()
Returns the number of (visible) positions in the VCF- Returns:
- The number of positions
-
numberSamples
public int numberSamples()
Returns the number of (visible) samples in the VCF- Returns:
- The number of samples
-
writeFile
public void writeFile(java.io.File f) throws java.io.IOException
Writes the VCF to a file. Only visible samples / positions are written.- Parameters:
f
- The file to write to- Throws:
java.io.IOException
- If there is an IO problem
-
asArray
public <V> V[][] asArray(java.lang.String format, Mapper<V> mapper) throws VCFDataException
Gets data from the genotypes in the VCF as an array- Type Parameters:
V
- The type of data returned- Parameters:
format
- The format in the genotype data to get the data frommapper
- A mapper mapping from the string (in the VCF) to the required type- Returns:
- The array
- Throws:
VCFDataException
- If there is a problem with the data in the VCF
-
asArrayTransposed
public <V> V[][] asArrayTransposed(java.lang.String format, Mapper<V> mapper) throws VCFDataException
Gets data from the genotypes in the VCF as a transposed array- Type Parameters:
V
- The type of data returned- Parameters:
format
- The format in the genotype data to get the data frommapper
- A mapper mapping from the string (in the VCF) to the required type- Returns:
- The array
- Throws:
VCFDataException
- If there is a problem with the data in the VCF
-
asIntegerArray
public int[][] asIntegerArray(java.lang.String format, IntegerMapper mapper) throws VCFDataException
Gets data from the genotypes in the VCF as an integer array- Parameters:
format
- The format in the genotype data to get the data frommapper
- A mapper mapping from the string (in the VCF) to an integer- Returns:
- The array
- Throws:
VCFDataException
- If there is a problem with the data in the VCF
-
asIntegerArrayTransposed
public int[][] asIntegerArrayTransposed(java.lang.String format, IntegerMapper mapper) throws VCFDataException
Gets data from the genotypes in the VCF as a transposed integer array- Parameters:
format
- The format in the genotype data to get the data frommapper
- A mapper mapping from the string (in the VCF) to an integer- Returns:
- The array
- Throws:
VCFDataException
- If there is a problem with the data in the VCF
-
asDoubleArray
public double[][] asDoubleArray(java.lang.String format, DoubleMapper mapper) throws VCFDataException
Gets data from the genotypes in the VCF as a double array- Parameters:
format
- The format in the genotype data to get the data frommapper
- A mapper mapping from the string (in the VCF) to a double- Returns:
- The array
- Throws:
VCFDataException
- If there is a problem with the data in the VCF
-
asDoubleArrayTransposed
public double[][] asDoubleArrayTransposed(java.lang.String format, DoubleMapper mapper) throws VCFDataException
Gets data from the genotypes in the VCF as a transposed double array- Parameters:
format
- The format in the genotype data to get the data frommapper
- A mapper mapping from the string (in the VCF) to a double- Returns:
- The array
- Throws:
VCFDataException
- If there is a problem with the data in the VCF
-
asByteArray
public byte[][] asByteArray(java.lang.String format, ByteMapper mapper) throws VCFDataException
Gets data from the genotypes in the VCF as a byte array- Parameters:
format
- The format in the genotype data to get the data frommapper
- A mapper mapping from the string (in the VCF) to a byte- Returns:
- The array
- Throws:
VCFDataException
- If there is a problem with the data in the VCF
-
asByteArrayTransposed
public byte[][] asByteArrayTransposed(java.lang.String format, ByteMapper mapper) throws VCFDataException
Gets data from the genotypes in the VCF as a transposed byte array- Parameters:
format
- The format in the genotype data to get the data frommapper
- A mapper mapping from the string (in the VCF) to a byte- Returns:
- The array
- Throws:
VCFDataException
- If there is a problem with the data in the VCF
-
getSamples
public java.lang.String[] getSamples()
Get a stream of sample names- Returns:
- The stream
-
getPositions
public PositionMeta[] getPositions()
Get a stream of position meta data- Returns:
- The stream
-
resetVisible
public void resetVisible()
Resets all samples and positions to be visible, that is the state immediately after the VCF was constructed.
-
numberPositionsFromFile
public static int numberPositionsFromFile(java.io.File f) throws VCFInputException
Utility function that returns the number of positions in a file without reading in any data- Parameters:
f
- The VCF file- Returns:
- The number of positions
- Throws:
VCFInputException
- If there is a problem with reading the VCF
-
numberSamplesFromFile
public static int numberSamplesFromFile(java.io.File f) throws java.io.IOException
Utility function that returns the number of samples in a file without reading in any data- Parameters:
f
- The VCF file- Returns:
- The number of samples
- Throws:
java.io.IOException
- If there is an IO problem
-
-