La meilleure façon de trouver une position dans le Ruisseau où donné de la séquence d'octets commence

Comment pensez-vous quelle est la meilleure façon de trouver une position dans le Système.Ruisseau où donné de la séquence d'octets de chantier (première occurence):

public static long FindPosition(Stream stream, byte[] byteSequence)
{
    long position = -1;

    ///???
    return position;
}

P. S. le principe Le plus simple encore la solution la plus rapide est préférée. 🙂

votre question est source de confusion...ce que vous recherchez? cette séquence spécifique d'octets dans le flux?
Je pense que la question du titre devrait être mis à jour. Flux est mal orthographié sous la forme de Vapeur, qui donne l'impression d'une question qui doit être marqué de la Valve.
En fait, je suis venu à cette question juste pour résoudre ce problème.
En fait, je suis à la recherche pour le guid dans le ruisseau.
la mémoire est-elle un problème? ou pouvez-vous lire l'ensemble du flux de données en un tableau d'octets?

OriginalL'auteur sh0gged | 2009-09-24

algorithm bytearray c#find stream

J'en suis arrivé à cette solution.

J'ai fait quelques cas-tests avec un fichier ASCII qui a été 3.050 KB et 38803 lines.
Avec une recherche byte array de 22 bytes dans la dernière ligne du fichier, j'ai obtenu le résultat dans environ 2.28 secondes (dans un lent/vieille machine).

public static long FindPosition(Stream stream, byte[] byteSequence)
{
    if (byteSequence.Length > stream.Length)
        return -1;

    byte[] buffer = new byte[byteSequence.Length];

    using (BufferedStream bufStream = new BufferedStream(stream, byteSequence.Length))
    {
        int i;
        while ((i = bufStream.Read(buffer, 0, byteSequence.Length)) == byteSequence.Length)
        {
            if (byteSequence.SequenceEqual(buffer))
                return bufStream.Position - byteSequence.Length;
            else
                bufStream.Position -= byteSequence.Length - PadLeftSequence(buffer, byteSequence);
        }
    }

    return -1;
}

private static int PadLeftSequence(byte[] bytes, byte[] seqBytes)
{
    int i = 1;
    while (i < bytes.Length)
    {
        int n = bytes.Length - i;
        byte[] aux1 = new byte[n];
        byte[] aux2 = new byte[n];
        Array.Copy(bytes, i, aux1, 0, n);
        Array.Copy(seqBytes, aux2, n);
        if (aux1.SequenceEqual(aux2))
            return i;
        i++;
    }
    return i;
}

Pour référence future, PadLeftSequence est à la recherche pour le premier non-appariement de l'octet qui a causé SequenceEqual de fausse déclaration. Il semble comme un micro-optimisation pour moi, puisqu'on pourrait s'attendre à SequenceEqual début retour sur un non-match de toute façon. Avertissement: j'ai pas fait de mesures, ce n'est qu'opinion.
ne marche pas ne fonctionne que si la séquence est à l'index d'une longueur multipication? Je veux dire, 6 octets suivants à l'indice 4 ne sera pas trouvé ?

OriginalL'auteur bruno conde

Si vous traiter le flux comme une autre séquence d'octets, il vous suffit de le rechercher comme vous le faisiez une chaîne de recherche. Wikipedia a un excellent article sur le sujet. Boyer-Moore est une bonne et simple de l'algorithme pour cela.

Voici un petit hack j'ai mis en place en Java. Il fonctionne et qu'il est assez proche si ce n'Boyer-Moore. J'espère que ça aide 😉

public static final int BUFFER_SIZE = 32;
public static int [] buildShiftArray(byte [] byteSequence){
int [] shifts = new int[byteSequence.length];
int [] ret;
int shiftCount = 0;
byte end = byteSequence[byteSequence.length-1];
int index = byteSequence.length-1;
int shift = 1;
while(--index >= 0){
if(byteSequence[index] == end){
shifts[shiftCount++] = shift;
shift = 1;
} else {
shift++;
}
}
ret = new int[shiftCount];
for(int i = 0;i < shiftCount;i++){
ret[i] = shifts[i];
}
return ret;
}
public static byte [] flushBuffer(byte [] buffer, int keepSize){
byte [] newBuffer = new byte[buffer.length];
for(int i = 0;i < keepSize;i++){
newBuffer[i] = buffer[buffer.length - keepSize + i];
}
return newBuffer;
}
public static int findBytes(byte [] haystack, int haystackSize, byte [] needle, int [] shiftArray){
int index = needle.length;
int searchIndex, needleIndex, currentShiftIndex = 0, shift;
boolean shiftFlag = false;
index = needle.length;
while(true){
needleIndex = needle.length-1;
while(true){
if(index >= haystackSize)
return -1;
if(haystack[index] == needle[needleIndex])
break;
index++;
}
searchIndex = index;
needleIndex = needle.length-1;
while(needleIndex >= 0 && haystack[searchIndex] == needle[needleIndex]){
searchIndex--;
needleIndex--;
}
if(needleIndex < 0)
return index-needle.length+1;
if(shiftFlag){
shiftFlag = false;
index += shiftArray[0];
currentShiftIndex = 1;
} else if(currentShiftIndex >= shiftArray.length){
shiftFlag = true;
index++;
} else{
index += shiftArray[currentShiftIndex++];
}           
}
}
public static int findBytes(InputStream stream, byte [] needle){
byte [] buffer = new byte[BUFFER_SIZE];
int [] shiftArray = buildShiftArray(needle);
int bufferSize, initBufferSize;
int offset = 0, init = needle.length;
int val;
try{
while(true){
bufferSize = stream.read(buffer, needle.length-init, buffer.length-needle.length+init);
if(bufferSize == -1)
return -1;
if((val = findBytes(buffer, bufferSize+needle.length-init, needle, shiftArray)) != -1)
return val+offset;
buffer = flushBuffer(buffer, needle.length);
offset += bufferSize-init;
init = 0;
}
} catch (IOException e){
e.printStackTrace();
}
return -1;
}

il ne pourrait pas être plus simple, mais c'est assez rapide. il pense que, compte tenu des contraintes de la lecture à partir d'un flux ne permet pas de simples si vous voulez de la vitesse. mais j'espère que mon code peut soulager certains de vos problèmes, ou d'aider quelqu'un dans le futur.
Il semble initBufferSize variable dans findBytes n'est pas utilisé.

OriginalL'auteur dharga

3

Vous aurez essentiellement besoin de garder un tampon de la même taille que byteSequence qu'une fois que vous avez trouvé que le "octet suivant" dans le flux des matchs, vous pouvez vérifier le reste, mais encore, puis revenir à la "prochaine, mais un" octet si c'est pas un match.

Il est susceptible d'être un peu délicate, quoi que vous fassiez, pour être honnête 🙁

OriginalL'auteur

J'avais besoin de faire moi-même, avait déjà commencé, et n'aime pas les solutions ci-dessus. J'ai particulièrement besoin de trouver où la recherche-octet-fin de la séquence. Dans ma situation, j'ai besoin d'avancer rapidement dans le flux jusqu'au terme de cette séquence d'octets. Mais vous pouvez utiliser ma solution pour cette question aussi:

var afterSequence = stream.ScanUntilFound(byteSequence);
var beforeSequence = afterSequence - byteSequence.Length;

Ici est StreamExtensions.cs

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace System
{
static class StreamExtensions
{
///<summary>
///Advances the supplied stream until the given searchBytes are found, without advancing too far (consuming any bytes from the stream after the searchBytes are found).
///Regarding efficiency, if the stream is network or file, then MEMORY/CPU optimisations will be of little consequence here.
///</summary>
///<param name="stream">The stream to search in</param>
///<param name="searchBytes">The byte sequence to search for</param>
///<returns></returns>
public static int ScanUntilFound(this Stream stream, byte[] searchBytes)
{
//For this class code comments, a common example is assumed:
//searchBytes are {1,2,3,4} or 1234 for short
//# means value that is outside of search byte sequence
byte[] streamBuffer = new byte[searchBytes.Length];
int nextRead = searchBytes.Length;
int totalScannedBytes = 0;
while (true)
{
FillBuffer(stream, streamBuffer, nextRead);
totalScannedBytes += nextRead; //this is only used for final reporting of where it was found in the stream
if (ArraysMatch(searchBytes, streamBuffer, 0))
return totalScannedBytes; //found it
nextRead = FindPartialMatch(searchBytes, streamBuffer);
}
}
///<summary>
///Check all offsets, for partial match. 
///</summary>
///<param name="searchBytes"></param>
///<param name="streamBuffer"></param>
///<returns>The amount of bytes which need to be read in, next round</returns>
static int FindPartialMatch(byte[] searchBytes, byte[] streamBuffer)
{
//1234 = 0 - found it. this special case is already catered directly in ScanUntilFound            
//#123 = 1 - partially matched, only missing 1 value
//##12 = 2 - partially matched, only missing 2 values
//###1 = 3 - partially matched, only missing 3 values
//#### = 4 - not matched at all
for (int i = 1; i < searchBytes.Length; i++)
{
if (ArraysMatch(searchBytes, streamBuffer, i))
{
//EG. Searching for 1234, have #123 in the streamBuffer, and [i] is 1
//Output: 123#, where # will be read using FillBuffer next. 
Array.Copy(streamBuffer, i, streamBuffer, 0, searchBytes.Length - i);
return i; //if an offset of [i], makes a match then only [i] bytes need to be read from the stream to check if there's a match
}
}
return 4;
}
///<summary>
///Reads bytes from the stream, making sure the requested amount of bytes are read (streams don't always fulfill the full request first time)
///</summary>
///<param name="stream">The stream to read from</param>
///<param name="streamBuffer">The buffer to read into</param>
///<param name="bytesNeeded">How many bytes are needed. If less than the full size of the buffer, it fills the tail end of the streamBuffer</param>
static void FillBuffer(Stream stream, byte[] streamBuffer, int bytesNeeded)
{
//EG1. [123#] - bytesNeeded is 1, when the streamBuffer contains first three matching values, but now we need to read in the next value at the end 
//EG2. [####] - bytesNeeded is 4
var bytesAlreadyRead = streamBuffer.Length - bytesNeeded; //invert
while (bytesAlreadyRead < streamBuffer.Length)
{
bytesAlreadyRead += stream.Read(streamBuffer, bytesAlreadyRead, streamBuffer.Length - bytesAlreadyRead);
}
}
///<summary>
///Checks if arrays match exactly, or with offset. 
///</summary>
///<param name="searchBytes">Bytes to search for. Eg. [1234]</param>
///<param name="streamBuffer">Buffer to match in. Eg. [#123] </param>
///<param name="startAt">When this is zero, all bytes are checked. Eg. If this value 1, and it matches, this means the next byte in the stream to read may mean a match</param>
///<returns></returns>
static bool ArraysMatch(byte[] searchBytes, byte[] streamBuffer, int startAt)
{
for (int i = 0; i < searchBytes.Length - startAt; i++)
{
if (searchBytes[i] != streamBuffer[i + startAt])
return false;
}
return true;
}
}
}

OriginalL'auteur Todd

1

Peu vieille question, mais voici ma réponse. J'ai trouvé que la lecture de blocs et de la recherche qui est extrêmement inefficace par rapport à lecture seulement un à la fois et va à partir de là.

Aussi, autant que je me souvienne, la accepté de répondre serait un échec si une partie de la séquence a été dans un bloc de lire et l'autre moitié dans l'autre - ex, étant donné 12345, à la recherche de 23, il serait lire 12, pas de match, alors lisez 34, pas de match, etc... n'ai pas essayé, bien que, voyant qu'elle exige net 4.0. En tout cas, c'est plus simple, et probablement beaucoup plus rapide.
```
static long ReadOneSrch(Stream haystack, byte[] needle)
{
int b;
long i = 0;
while ((b = haystack.ReadByte()) != -1)
{
if (b == needle[i++])
{
if (i == needle.Length)
return haystack.Position - needle.Length;
}
else
i = b == needle[0] ? 1 : 0;
}
return -1;
}
```
votre code est incorrect. envisager botte de foin = [ 2,1,2,1,1 ], l'aiguille = [ 2,1,1 ]. Votre code renvoie -1, mais la bonne réponse est 2

OriginalL'auteur ZigZagJoe

Vous devez vous connecter pour publier un commentaire.